From: Kan Yabumoto tech@xxcopy.com
To: XXCOPY user
Subject: Unicode Support in XXCOPY
Date: 2009-12-23
===============================================================================
Background:
Starting with Version 2.97.0, XXCOPY added the support of Unicode (16-bit
characters). Earlier versions of XXCOPY were confined to process only
8-bit characters in practically all aspects of its operations. Being a
console application program, XXCOPY is still confined to receive the
command line (keyboard) input in 8-bit characters and its console output
also in 8-bit strings.
--------------------------------------------------------------
Even though the code page of the CMD.EXE console can be set
to UTF-8 (code page:65001), the console is unable to process
the full range of Unicode characters.
---------------------------------------------------------------
Thus, earlier versions of XXCOPY had to rely on the code page of the
console that determined the character encoding within the console
environment which was often inadequte to access some of the files and
directories present in a disk volume.
With the support of Unicode, XXCOPY can process any file and directory
name since all pathnames are internally represented by a Unicode string.
Input to XXCOPY using Unicode strings:
Even with the Unicode support, XXCOPY will continue to operate in the
console (CMD.EXE) window whose input and output streams are in 8-bit
characters.
There are times when you need to specify a directory or path name
that contains Unicode characters that are not mapped in the current
code page. In such a case, the only way to specify such Unicode
strings is to use an external file with the /CF (for command file)
and /EX (for exclusion list). These files can be either in the
traditional ANSI text format, or in the UTF-8 encoded Unicode text
format with the presence of the Byte-Order Mark (BOM) at the beginning
of the file. Note that there is no explicit switches that control
the type of input string to XXCOPY. The UTF-8 BOM sequence at the
beginning of the file implicitly tells XXCOPY that the file is
formatted in UTF-8.
You may create such a file using the ubiquitous NotePad utility and
Save-As command with the selection of UTF-8 encoding. Most text
editors available today should provide a user option to create a file
in the UTF-8 format with the BOM header byte sequence.
Output text by XXCOPY that contains Unicode strings:
Another problem associated with Unicode text is XXCOPY's output.
Since the console output by XXCOPY will be in 8-bit character stream,
some of the characters may be displayed with a question mark (?) as
a spaceholder. This is a limitation of the console display.
On the other hand, you should specify the /UT switch if you anticipate
Unicode characters that cannot be represented by an 8-bit character.
With the /UT switch, all XXCOPY output files (for /oA, /oN and /Fo)
will be encoded in the UTF-8 format. The default (/UT0) output files
are made in Windows ANSI (8-bit) encodging.
The Special Dialog Window for User Prompts:
From time to time, XXCOPY halts its operation with a user prompt,
for example for the confirmation of a file overwrite (/Po). Since
the console display is usually limited in in 8-bit character string,
the filename displayed on the console may not be recognizable.
With the /PW switch, XXCOPY will pop up a dialog window that displays
the full pathname in Unicode even when the display on the console
window fails to show the proper characters.
References:
Unicode http://en.wikipedia.org/wiki/Unicode
UTF-8 http://en.wikipedia.org/wiki/UTF-8
Code page http://en.wikipedia.org/wiki/Code_page
[ Table of Contents ]
[ << ]
[ Show as Detached ]
[ >> ]
|