Size dependent word frequencies and translational invariance of books
2010 (English)In: Physica A: Statistical Mechanics and its Applications, ISSN 0378-4371, E-ISSN 1873-2119, Vol. 389, no 2, 330-341 p.Article in journal (Refereed) Published
It is shown that a real novel shares many characteristic features with a null model in which the words are randomly distributed throughout the text. Such a common feature is a certain translational invariance of the text. Another is that the functional form of the word-frequency distribution of a novel depends on the length of the text in the same way as the null model.This means that an approximate power-law tail ascribed to the data will have an exponent which changes with the size of the text-section which is analyzed.A further consequence is that a novel cannot be described by text-evolution models like the Simon model.The size-transformation of a novel is found to be well described by a specific Random Book Transformation.This size transformation in addition enables a more precise determination of the functional form of the word-frequency distribution.The implications of the results are discussed.
Place, publisher, year, edition, pages
Elsevier , 2010. Vol. 389, no 2, 330-341 p.
Word frequencies, Zipf's law, Simon model
Specific Languages Other Physics Topics
IdentifiersURN: urn:nbn:se:umu:diva-27635DOI: 10.1016/j.physa.2009.09.022ISI: 000271844100015OAI: oai:DiVA.org:umu-27635DiVA: diva2:276860