XXCOPY
[ Back to Table of Contents ] [ << ] [ >> ]

XXCOPY TECHNICAL BULLETIN #45


From:    Kan Yabumoto           tech@xxcopy.com
To:      XXCOPY user
Subject: Comparing File Data
Date:    2007-11-02
===============================================================================

Introduction:

    XXCOPY performs a file comparison operation on a byte-by-byte
    basis on various occasions.  One such case is immediately
    after a file is written in the destination directory to verify
    the file data.  With the /V2 switch, XXCOPY will re-open the
    file in the destination and compare the contents with the
    source file.

    While the feature (/V) has been available with Microsoft's
    XCOPY since the early days of DOS with the floppy disk, and
    the operation is still practiced widely, there are questions
    being raised for its efficacy.  This is due to the fact that
    today's common storage systems (such as the hard disk) are
    equipped with multiple levels of buffering and layers of cache
    schemes.  Recent versions of Windows employs the write-behind
    cache technique without which the performance will be
    unacceptable.  The shere volume of data necessitates
    streamlining the data flow.  That is, when XXCOPY verifies
    the file contents, the data read from the destination file
    will likely come from buffered memory rather than the disk
    surface. 

       --------------------------------------------------------    
        XXCOPY provides the /CA switch for rudimentary control
        of file data buffering.  However, its use is practical
        only for a limited amount of data volume, especially
        in a network environment.
       --------------------------------------------------------    

    A more practical way to verify the file contents after
    a backup operation is to run the file copy operation at
    full speed (without /V2 that has become questionable),
    and to perform a separate run of XXCOPY for a data compare
    operations on a byte-by-byte basis.


    Another motivation for the data compare functions is the
    fact that the traditional fast incremental backup operation
    relies on a favorable assumption that matching the file size
    and the timestamp values is sufficient to determine a pair
    of files are indeed identical.


Data Comparison Functions:

    Starting with Ver 2.96.3, a new set of data comparison
    operations are available.

      /CDM    // Compares data and selects the file if matched.
      /CDU    // Compares data and selects the file if unmatched.  
      /CDX    // Selects the file if unmatched and brand new files.  
      /CD0    // Does not compare file data (default).

      (You may add a colon for readabilty; /CD:M, /CD:U, /CD:X or /CD:0.)


      The four switches are mutually exclusive (if two or more
      such switches are specified, the last one will prevail).
      They perform a file-selection operation on their own, or in
      combination with other file-selection switches.  They are
      especially useful when combined with various backup (/Bxx)
      switches.

      The data compare operation is carried out between the
      file in the source directory and its counterpart in the
      destination directory on a byte-by-byte basis.  Prior to
      the data comparison, the file size is always compared
      first, and a mismatch in the size will always be treated
      as a mismatch without accessing the file contents.

      The /CDM and /CDU functions are exact opposite one another;
      /CDM excludes any file that does not match exactly in the
      file contents to its counterpart in the destination and
      /CDU selects only files that match the contents if the
      counterparts exactly.

      Note that the /CDM, /CDX, and /CDU operations ignore the
      timestamps unlike the /BS, /BE or /BI operations.  This fact
      makes the combinations such as /CDU/BS and /CDU/BI slightly
      different (and more stringent) than the mere /CDU switch.

      The /CDX function is a more generalized than /CDM where /CDX
      does not exclude the brand new files.  That is, /CDX is to
      exclude the files strictly by the result of file-data comparison
      that does not exclude files that are not subjected to data
      comparison.  In other words, the /CDX/U combination is
      equivalent to /CDM.
      
      /CDX is rarely useful but is provided mostly for symmetry's sake.


Command examples:

      xxcopy \src\ \dst\ /bi /cdu /y  // skip files that match exactly
      xxcopy \src\ \dst\ /clone /cdu  // combine with the popular switch

        In the above examples, the addition of the /CDU switch makes
        sure the incremental backup operation does not take chances in
        determining files that are truly identical.   A file in the
        source directory that matches in the timestamp and the size to
        its counterpart in the destination directory will be further
        scrutinized by a byte-by-byte comparison before treated as
        truly identical.  Without the /CDU switch, the files with
        matching timestamp and filesize will be skipped even though
        there is a small possibility that the file contents vary.
        Although the favorable assumption (without the use of /CDU)
        is reasonable and necessary for speedy operation, the new
        feature satisifies demainding users' needs.


      xxcopy \src\ \dst\ /bs /cdu /l  // select files that differ in data

        This command makes a list of files that would be erroneously
        treated identical to their backup copy even though their
        contents are different.  If the operation generates a null list,
        that validates that the operation without the time-consuming
        data comparison is good.


      xxcopy \src\ \dst\ /rs /bs /cdm // deletes truly identical files

        This command deletes files from the source directory only when
        there is a perfect backup copy in the destination directory.


      xxcopy \src\ \dst\ /cdu         // incremental backup, ignore time

        This command is similar to the first example above (/bi /cdu)
        except that it completely ignores the timestamp.  Files with
        different contents are updated as well as brand new files
        copied to the destination.  If the contents of a file is
        the same, it will be skipped even when the timestamp is
        different.


      xxcopy \src\ \dst\ /cdx  /l     // list identical and brand new files

        A /cdx switch exclude files that go through file-data comparison
        and result in a data mismatch.  It includes brand new files.   


      xxcopy \src\ \dst\ /cdm  /l     // list files with the same contents

        A /cdm switch implicitly adds the /u (common files only) function.


Special rules on switch combinations:

    Like most other file-selection switches, the /CDM, /CDX and /CDU
    operations work as an exclusion mechanism except for an exception
    (see below).  Each file-selection switch adds the types of files
    excluded (reduces the number of files selected).  For example,
    the combination of /A and /CDU excludes files without the Archive
    attributes (by /A) and files with unmatched contents (by /CDU).
    The common rule is that adding a switch narrows the file selection. 

    As a special exception, when /CDU is specified with /BI or /BX,
    the combined file-selection mechanism will be biased towards
    reducing the types of files excluded (increases the number of
    selected files selected).  This concept can be better understood
    by recalling the relationship between the timestamp and the file
    size in /BI where the two aspects (time and size) are simultanously
    applied (logically ANDed) to specify the excluded files (the skipped
    files being more narrowly defined). 

    Having /CDU (to skip matched files) on top of /BI adds the third
    element (in addition to the timestamp and the file size) that
    specifies excluded files.  As a result, /BI/CDU means the exclusion
    of truly identical files whose timestamp, size and data all match
    to the files in the destination.  By applying the more stringent
    requirement for skipping files with /CDU, the total number of files
    selected will increase (contrary to the common observation that
    adding a file-selection switch usually reduces the number of
    selected files).

    The /CDM and /CDU switches are exact complement one another.
    The files selected by /CDM contains none of those selected by /CDU
    and vice versa.


Drawback:

    The user should be aware of the fact a comparison of files on a
    byte-by-byte basis is time consuming.  The popularity of the /BI
    operation in incremental backup is due to its efficiency in
    determining the need for updating the backup copy.


Tips:

    Confused?  Yes.  We all are.  In typical file management operations,
    the most common usages of the file-data comparison function are
    /BI/CDU which is for the most stringent incremental backup, and
    /BS/CDM which is for selecting truly identical files in list-only
    or file removal operations.  

    /CDU implicitly excludes files with different size, therefore,
    when you specify /CDU/BZE, then, /BZE is superfluous in the
    combination (/CDU alone covers the case for /BZE).



© Copyright 2016 Pixelab All rights reserved.

[ XXCOPY Home ] [ Table of Contents ] [ << ] [ >> ]

Join the XXCOPY group