阅读上一个主题 :: 阅读下一个主题 |
作者 |
刚才仔细看了一下启明转贴的那个文章,刚看了一点,我就不懂了。 |
 |
所跟贴 |
刚才仔细看了一下启明转贴的那个文章,刚看了一点,我就不懂了。 -- Anonymous - (907 Byte) 2005-1-14 周五, 上午11:39 (455 reads) |
冬冬 [博客] [个人文集]
游客
|
|
|
作者:Anonymous 在 罕见奇谈 发贴, 来自 http://www.hjclub.org
----------------------------------
随着汉字容量增大,信息熵的增加趋缓;汉字增加到12370以后,不再使信息熵有明显的增加。通过数理语言学中著名的齐普夫定律(ZIPF'S LAW)核算,汉字的容量极限是12366个汉字,汉字静态平均信息熵的值是9.65比特,或者说,汉字的平均信息量是9.65比特(见冯志伟提出的“汉字容量极限定律”)。这是当今世界上信息量最大的文字符号系统。下面是联合国五种工作语言文字的信息熵比较:
-------------------------------------------
Zipf's law
(definition)
Definition: The probability of occurrence of words or other items starts high and tapers off. Thus, a few occur very often while many others occur rarely.
Formal Definition: Pn 1/na, where Pn is the frequency of occurrence of the nth ranked item and a is close to 1.
See also Zipfian distribution, Lotka's law, Benford's law, Bradford's law.
Note: In the English language words like "and," "the," "to," and "of" occur often while words like "undeniable" are rare. This law applies to words in human or computer languages, operating system calls, colors in images, etc., and is the basis of many (if not, all!) compression approaches.
Named for George Kingsley Zipf.
Zipf's law is an experimental law, not a theoretical one. Zipfian distributions are commonly observed in many kinds of phenomena. The causes of Zipfian distributions in real life are a matter of some controversy, however.
---------------------------------------------------------
根据Zipf's law的定义,如何能算出汉字的容量极限是12366个汉字???什么叫容量极限?什么叫汉字的容量极限?
常用汉字不是一共才几千个吗?
作者:Anonymous 在 罕见奇谈 发贴, 来自 http://www.hjclub.org |
|
|
返回顶端 |
|
 |
|
|
|
您不能在本论坛发表新主题 您不能在本论坛回复主题 您不能在本论坛编辑自己的文章 您不能在本论坛删除自己的文章 您不能在本论坛发表投票 您不能在这个论坛添加附件 您不能在这个论坛下载文件
|
based on phpbb, All rights reserved.
|