[<<]Message[>>]    [<<]Author[>>]    [<<]Subject        [<<]Thread    

Number : 9101 Date : 2004-11-01 Author : Kan Yabumoto Subject : Re: Too many Files in a directory? Size(KB) : 2
Ernie Clark wrote: > > I am attempting to do a backup of some data for a Dentist. > I am trying to use XXCopy but I keep gettin a crash (Error 40). > The machine is Windows XP Pro. I noticed that the most common > place for the crash to occur is when trying to copy files > from a directory with 12.5G of data in it (as well as 7 > directories) contained in 92,000 files. I would say that any time you create a directory that contains more than 30,000 - 40,000, you are pushing the file system very hard and it's better to avoid such a situation at all cost. Since the Win32 environment still carries the DOS legacy by assigning a short-filename (SFN) to all files and directories and the name has to conform to the old DOS 8.3 format convention (and they have to be unique), the SFN-synthesis scheme creates a lot of strain (and penalty to the system performance in search of unique filenames). Even though a NTFS volume can retrieve a file info in a directory pretty quickly due to the hash function (the SFN is based upon a 4-digit hex number beyond the first 4 names). If the first two letters for the file names in the gigantic directory are common, I believe even the NTFS volume will have really a hard time dealing with it. In a NTFS volume, the first two letters of the SFN are derived from the original LFN. Then, a 4-digit hex number (representing a 16-bit hash value that is presumably based upon the full LFN) follows with the trailing "~1" (typically, but I guess it could become different). If the first two letters are spread completely evenly from AA to ZZ (a total of 26 x 26 = 676), then, each group that starts with the same first two letters will have 92,000 / 676 = 136 files. The 136 number is a small fraction of the theoretical (65536) limit for the number of variations that the hash value can provide. But, this is an ideal case. I assume ZZ or QQ are typically non-existent and other very common combinations (such as CH or SH) that may be easily 50-100 times more common than the mathematical average. But, as long as it is not close to 1000 times the pure average, the hash function has some room to accommodate more names. On the other hand, if the first two letters have either only one combination or only a handful combinations, the scheme is already in trouble. One way to find out how bad it is is to run a DIR /X command and see how similar the SFNs in the directory really are. Or, you may use XXCOPY for more specific answer: xxcopy \your-dir\aa* /n xxcopy \your-dir\ch* /n ... If the number of files in some of these listings is very large (say, 10,000 or more), you should seriously look for alternative scheme to organize your files. From the SFN-management point of view, it may be time for Microsoft to stop supporting the SFN, especially for NTFS volumes whose contents are not accessible to DOS without tricks. Kan Yabumoto
This message if part of XXCOPY's message Archive. The archive contains all the messages posted at Yahoo!Groups: XXCOPY.