[ Back to Table of Contents ] [ << ] [ >> ]


From:    Kan Yabumoto           tech@xxcopy.com
To:      XXCOPY user
Subject: Unicode Support in XXCOPY
Date:    2009-12-23


    Starting with Version 2.97.0,  XXCOPY added the support of Unicode (16-bit
    characters).  Earlier versions of XXCOPY were confined to process only
    8-bit characters in practically all aspects of its operations.  Being a
    console application program, XXCOPY is still confined to receive the
    command line (keyboard) input in 8-bit characters and its console output
    also in 8-bit strings.
       Even though the code page of the CMD.EXE console can be set
       to UTF-8 (code page:65001), the console is unable to process
       the full range of Unicode characters.

    Thus, earlier versions of XXCOPY had to rely on the code page of the
    console that determined the character encoding within the console
    environment which was often inadequte to access some of the files and
    directories present in a disk volume.

    With the support of Unicode, XXCOPY can process any file and directory
    name since all pathnames are internally represented by a Unicode string.

Input to XXCOPY using Unicode strings:

    Even with the Unicode support, XXCOPY will continue to operate in the
    console (CMD.EXE) window whose input and output streams are in 8-bit

    There are times when you need to specify a directory or path name
    that contains Unicode characters that are not mapped in the current
    code page.  In such a case, the only way to specify such Unicode
    strings is to use an external file with the /CF (for command file)
    and /EX (for exclusion list).  These files can be either in the
    traditional ANSI text format, or in the UTF-8 encoded Unicode text
    format with the presence of the Byte-Order Mark (BOM) at the beginning
    of the file.  Note that there is no explicit switches that control
    the type of input string to XXCOPY.  The UTF-8 BOM sequence at the
    beginning of the file implicitly tells XXCOPY that the file is
    formatted in UTF-8.

    You may create such a file using the ubiquitous NotePad utility and
    Save-As command with the selection of UTF-8 encoding.  Most text
    editors available today should provide a user option to create a file
    in the UTF-8 format with the BOM header byte sequence.

Output text by XXCOPY that contains Unicode strings:

    Another problem associated with Unicode text is XXCOPY's output.
    Since the console output by XXCOPY will be in 8-bit character stream,
    some of the characters may be displayed with a question mark (?) as
    a spaceholder.  This is a limitation of the console display.

    On the other hand, you should specify the /UT switch if you anticipate
    Unicode characters that cannot be represented by an 8-bit character.
    With the /UT switch, all XXCOPY output files (for /oA, /oN and /Fo)
    will be encoded in the UTF-8 format.  The default (/UT0) output files
    are made in Windows ANSI (8-bit) encodging.

The Special Dialog Window for User Prompts:

    From time to time, XXCOPY halts its operation with a user prompt,
    for example for the confirmation of a file overwrite (/Po).  Since
    the console display is usually limited in in 8-bit character string,
    the filename displayed on the console may not be recognizable.
    With the /PW switch, XXCOPY will pop up a dialog window that displays
    the full pathname in Unicode even when the display on the console
    window fails to show the proper characters.


    Unicode      http://en.wikipedia.org/wiki/Unicode
    UTF-8        http://en.wikipedia.org/wiki/UTF-8
    Code page    http://en.wikipedia.org/wiki/Code_page

© Copyright 2016 Pixelab All rights reserved.

[ XXCOPY Home ] [ Table of Contents ] [ << ] [ >> ]

Join the XXCOPY group