[ Back to Table of Contents ]
[ << ]
[ >> ]
XXCOPY TECHNICAL BULLETIN #45
From: Kan Yabumoto tech@xxcopy.com
To: XXCOPY user
Subject: Comparing File Data
Date: 2007-11-02
====================================================================
Introduction:
XXCOPY performs a file comparison operation on a byte-by-byte
basis on various occasions. One such case is immediately
after a file is written in the destination directory to verify
the file data. With the /V2 switch, XXCOPY will re-open the
file in the destination and compare the contents with the
source file.
While the feature (/V) has been available with Microsoft's
XCOPY since the early days of DOS with the floppy disk, and
the operation is still practiced widely, there are questions
being raised for its efficacy. This is due to the fact that
today's common storage systems (such as the hard disk) are
equipped with multiple levels of buffering and layers of cache
schemes. Recent versions of Windows employs the write-behind
cache technique without which the performance will be
unacceptable. The shere volume of data necessitates
streamlining the data flow. That is, when XXCOPY verifies
the file contents, the data read from the destination file
will likely come from buffered memory rather than the disk
surface.
--------------------------------------------------------
XXCOPY provides the /CA switch for rudimentary control
of file data buffering. However, its use is practical
only for a limited amount of data volume, especially
in a network environment.
--------------------------------------------------------
A more practical way to verify the file contents after
a backup operation is to run the file copy operation at
full speed (without /V2 that has become questionable),
and to perform a separate run of XXCOPY for a data compare
operations on a byte-by-byte basis.
Another motivation for the data compare functions is the
fact that the traditional fast incremental backup operation
relies on a favorable assumption that matching the file size
and the timestamp values is sufficient to determine a pair
of files are indeed identical.
Data Comparison Functions:
Starting with Ver 2.96.3, a new set of data comparison
operations are available.
/CDM // Compares data and selects the file if matched.
/CDU // Compares data and selects the file if unmatched.
/CDX // Selects the file if unmatched and brand new files.
/CD0 // Does not compare file data (default).
(You may add a colon for readabilty; /CD:M, /CD:U, /CD:X or /CD:0.)
The four switches are mutually exclusive (if two or more
such switches are specified, the last one will prevail).
They perform a file-selection operation on their own, or in
combination with other file-selection switches. They are
especially useful when combined with various backup (/Bxx)
switches.
The data compare operation is carried out between the
file in the source directory and its counterpart in the
destination directory on a byte-by-byte basis. Prior to
the data comparison, the file size is always compared
first, and a mismatch in the size will always be treated
as a mismatch without accessing the file contents.
The /CDM and /CDU functions are exact opposite one another;
/CDM excludes any file that does not match exactly in the
file contents to its counterpart in the destination and
/CDU selects only files that match the contents if the
counterparts exactly.
Note that the /CDM, /CDX, and /CDU operations ignore the
timestamps unlike the /BS, /BE or /BI operations. This fact
makes the combinations such as /CDU/BS and /CDU/BI slightly
different (and more stringent) than the mere /CDU switch.
The /CDX function is a more generalized than /CDM where /CDX
does not exclude the brand new files. That is, /CDX is to
exclude the files strictly by the result of file-data comparison
that does not exclude files that are not subjected to data
comparison. In other words, the /CDX/U combination is
equivalent to /CDM.
/CDX is rarely useful but is provided mostly for symmetry's sake.
Command examples:
xxcopy \src\ \dst\ /bi /cdu /y // skip files that match exactly
xxcopy \src\ \dst\ /clone /cdu // combine with the popular switch
In the above examples, the addition of the /CDU switch makes
sure the incremental backup operation does not take chances in
determining files that are truly identical. A file in the
source directory that matches in the timestamp and the size to
its counterpart in the destination directory will be further
scrutinized by a byte-by-byte comparison before treated as
truly identical. Without the /CDU switch, the files with
matching timestamp and filesize will be skipped even though
there is a small possibility that the file contents vary.
Although the favorable assumption (without the use of /CDU)
is reasonable and necessary for speedy operation, the new
feature satisifies demainding users' needs.
xxcopy \src\ \dst\ /bs /cdu /l // select files that differ in data
This command makes a list of files that would be erroneously
treated identical to their backup copy even though their
contents are different. If the operation generates a null list,
that validates that the operation without the time-consuming
data comparison is good.
xxcopy \src\ \dst\ /rs /bs /cdm // deletes truly identical files
This command deletes files from the source directory only when
there is a perfect backup copy in the destination directory.
xxcopy \src\ \dst\ /cdu // incremental backup, ignore time
This command is similar to the first example above (/bi /cdu)
except that it completely ignores the timestamp. Files with
different contents are updated as well as brand new files
copied to the destination. If the contents of a file is
the same, it will be skipped even when the timestamp is
different.
xxcopy \src\ \dst\ /cdx /l // list identical and brand new files
A /cdx switch exclude files that go through file-data comparison
and result in a data mismatch. It includes brand new files.
xxcopy \src\ \dst\ /cdm /l // list files with the same contents
A /cdm switch implicitly adds the /u (common files only) function.
Special rules on switch combinations:
Like most other file-selection switches, the /CDM, /CDX and /CDU
operations work as an exclusion mechanism except for an exception
(see below). Each file-selection switch adds the types of files
excluded (reduces the number of files selected). For example,
the combination of /A and /CDU excludes files without the Archive
attributes (by /A) and files with unmatched contents (by /CDU).
The common rule is that adding a switch narrows the file selection.
As a special exception, when /CDU is specified with /BI or /BX,
the combined file-selection mechanism will be biased towards
reducing the types of files excluded (increases the number of
selected files selected). This concept can be better understood
by recalling the relationship between the timestamp and the file
size in /BI where the two aspects (time and size) are simultanously
applied (logically ANDed) to specify the excluded files (the skipped
files being more narrowly defined).
Having /CDU (to skip matched files) on top of /BI adds the third
element (in addition to the timestamp and the file size) that
specifies excluded files. As a result, /BI/CDU means the exclusion
of truly identical files whose timestamp, size and data all match
to the files in the destination. By applying the more stringent
requirement for skipping files with /CDU, the total number of files
selected will increase (contrary to the common observation that
adding a file-selection switch usually reduces the number of
selected files).
The /CDM and /CDU switches are exact complement one another.
The files selected by /CDM contains none of those selected by /CDU
and vice versa.
Drawback:
The user should be aware of the fact a comparison of files on a
byte-by-byte basis is time consuming. The popularity of the /BI
operation in incremental backup is due to its efficiency in
determining the need for updating the backup copy.
Tips:
Confused? Yes. We all are. In typical file management operations,
the most common usages of the file-data comparison function are
/BI/CDU which is for the most stringent incremental backup, and
/BS/CDM which is for selecting truly identical files in list-only
or file removal operations.
/CDU implicitly excludes files with different size, therefore,
when you specify /CDU/BZE, then, /BZE is superfluous in the
combination (/CDU alone covers the case for /BZE).
© Copyright 2008 Pixelab, Inc. All rights reserved.