Pinyin Letter Frequency 拼音字母頻率

chinese pinyin letter frequency 20285
Pinyin letter frequency chart. [see Character Frequency Plot]

The text used is Chinese translation of “The Masque of the Red Death” by Edgar Allan Poe.

[see The Masque of the Red Death]

[see 紅死病的面具 The Masque of the Red Death By Edgar Allan Poe]

Here's the first paragraph, in Chinese character and in pinyin.


hua shuo “ hong si ” zai guo nei si nue yi jiu , xiang zhe ban zhi ming , zhe ban ke pa de wen yi wei shi wei ceng you guo 。 zhe bing de ju ti biao xian he te zheng jiu shi chu xie —— yi pian yin hong , ling ren fa zhi 。 huan zhe chu shi gan dao ju tong , tu ran yi zhen tou hun yan hua , yu shi quan shen mao kong da liang chu xie sang ming 。 zhi yao huan zhe de shen shang , te bie shi lian shang yi chu xian xing hong se ban dian jiu shi ran shang zhe wen yi de yu zhao , zhe shi zhu qin hao you shui ye bu gan jin shen qu jiu hu ta he wei wen ta 。 huan zhe cong de bing dao fa bing , yi zhi dao song ming , huan bu xiao ban xiao shi gong fu 。

full text masque_of_red_death_chinese_pinyin.txt

The Chinese character to pinyin is done by

Pinyin and Keyboard Layout

Here we try to find out which keyboard layout is best for input Chinese with pinyin input method.

1 2 3 4 5 6 7 8 9 0 a n i h d y u j g c v p m l s r x o ; k f . , b t / w e q \ [ ] ' - = z ` QWERTY Layout
Pinyin heatmap on QWERTY layout
! @ # $ % ^ & * ( ) 1 2 3 4 5 6 7 8 9 0 a b c d e f g h i j k l m n o p q r s t u v w x y z , . ' \ | / + = _ - { } [ ] ; ` ~ Dvorak Layout
Pinyin heatmap on Dvorak layout
1 2 3 4 5 6 7 8 9 0 a k u h s j l n d c v ; m i r p x y o e t . , b g / w f q \ [ ] ' - = z ` Colemak layout
Pinyin heatmap on Colemak layout

[see Dvorak Keyboard Layout]

Pinyin Letter Frequency Problem, the Removal of V

There is a interesting issue about v and ü in Chinese pinyin. In pinyin, the letter v is not used, but you have ü. However, for pinyin input system, you have a hack of typing v for ü, because otherwise ü is hard to type.

on Microsoft Windows's pinyin input, u also do ü. But not on MacOS.

So, now there is a interesting question when you compile statistics of pinyin letter frequency. Given a piece of Chinese text, you can translate them into pinyin, then compute the letter frequency. In this way, you'll see zero use of v. However, this is not a proper stat for the purpose of keyboard layout, because, people do type v, while your stat no use of the key v.

To fix it, one needs to convert ü to v, then, compute the statistics. But this may not be readily done, because in order to do that, the software that convert chinese into pinyin will need to include tones to create ü.

But, this “error” isn't too bad. Because the character ü in pinyin does not occur frequently. I think mostly it's only used for the chars 女 綠.

Layout Efficiency

