HomeMathComputingArtsWordsLiteratureMusictwitter facebook webfeed

Convert File Encoding with Python for All Files in a Dir

Advertise Here For Profit

Xah Lee, 2005-03-08

Here's a Python program that convert character encoding for all files in a directory.

# python

import os
mydir= '/Users/t/web/p/monkey_king'

def changeEncoding(filePath):
    '''take a full path to a file as input, and change its encoding from gb18030 to utf-16'''
    print filePath

    tempName=filePath+'~-~'

    input = open(filePath,'rb')
    content=unicode(input.read(),'gb18030')
    input.close()

    output = open(tempName,'wb')
    output.write(content.encode('utf-16'))
    output.close()

    os.rename(tempName,filePath)


def myfun(dummy, dirr, filess):
    for child in filess:
        if '.html' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child):
            changeEncoding(dirr+'/'+child)
os.path.walk(mydir, myfun, 'dumb')

For sample files (Chinese) in unicode you can experiment with encoding, see: 西游记 (Journey To The West).

blog comments powered by Disqus