Unicode Support in File Names: Windows, Mac, Emacs, Unison, Rsync, USB, Zip
This page is a report of my experience of Unicode support in file names of tools on Windows and Mac. It is part of the article Mac and Windows File Conversion .
Chinese and Non-ASCII Chars in File Name
If you have Chinese characters in file name, or Unicode characters such as curly quotes “ ” , the file name may be messed up when you move the file between Mac OS X and Windows, because that particular application or media or file transferring protocol may not understand non-ASCII chars.
For example:
- If you transfer the file by using the unix tool Unison or rsync, the file name may be messed up.
- If you transfer the file by a USB drive, the file names may be messed up. (because most USB drive by default are formatted using the FAT file system, and FAT does not support Unicode well.
- If you zip the file first and email to you, and retrieve from the other machine, the file names may be messed up because the version of zip util may not support Unicode file names.
Apps That Do Not Support File Names with Chinese Characters
• Windows Vista (64 bits, SP2) zip utility does not handle Chinese. (right click, send to, Compressed (zipped) Folder) If your folder or file name has Chinese chars, Windows will complain and refuse to compress.
• Windows Console does not support Unicode well. It prints Chinese chars as gibberish. This applies to any app using Windows Console, such as cmd.exe, PowerShell, Cygwin bash.
• Unison file sync tool does not handle Chinese names. (unison version 2.27.57) [see Complexity of Software Engineering; Emacs, Unicode, Unison]
• The dired in GNU Emacs for Windows, does not handle file with Chinese names. It shows up gibberish. (GNU Emacs 23.1.1 (i386-mingw-nt6.0.6002) of 2009-07-29 on SOFT-MJASON)
• Not sure if rsync supports Chinese fully. I think when using rsync on OS X to copy files from Mac to Windows (rsync version 2.6.9 protocol version 29), it works fine, but when using rsync on Windows (thru cygwin. rsync version 3.0.4 protocol version 30), to copy files from Windows to Mac, it has problems. Here's example of its error message:
building file list ... file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/????.JPG" file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/???.JPG" file has vanished: "/cygdrive/c/Users/xah/Documents/kacma pixra/prenu/200403_tony_relative/????.JPG"
Here's sample files names that create such error:
- [インターネットを使った数学勉強法.html]
- [E5 efterår 2006.htm]
- [名古屋城.jpg]
- [長鬍子.jpg]
- [鴿子與萱.jpg]
Apps that Supports Chinese
OS X 10.5's Terminal app supports Chinese fully.
(OS X 10.4.x does not. Detail: OS X 10.4.11's Terminal app, can display Chinese chars encoded with utf-8, for example, cat text_file
where the text file is utf-8 encoded. However, if a file name has Chinese, it does not show up correctly when doing ls
. (because file names in OS X are encoding with utf-16, because it is HFS+.) The Terminal app has a option under menu [Terminal ▸ Window Settings… ▸ Display ▸ Character Set Encoding]. However, the menu doesn't have utf-16 as a choice.)
Mac OS X 10.5's zip tool supports Chinese. (untested by me)
OS X 10.4.11, when connecting to Windows share, can transfer file with name that has Chinese characters, in both direction.
Windows Vista (sp2), when connecting to Mac share, can transfer file with name that has Chinese characters, in both direction.