Introduction and background. The outcome of a random process is often well described by a bell-shaped curve, the normal distribution. Some hundred years ago, it was noticed that things like the richness among people, town sizes, surnames, and the frequency of words have different, broader distributions. The figure shows the probability of finding a word which occurs k times in a novel. If the words were distributed according to normal expectations, they would fall on the full curve in the figure. Many, more or less system-specific, proposals for the deviation from normal have been suggested under names such as 'rich gets richer', 'principle of least effort', 'preferential attachment' and 'independent proportional growth'. Here, it is argued that the phenomenon is connected to a more ubiquitous random group formation. A group is like a soccer team with positions to fill. You want the right player in the right position. Thus, unlike the normal distribution where you pick a player for the team, one now tries to pick a player for a position in the team.
Main results. Information theory is used to find the most likely distribution of group sizes given the number of objects, groups and the number of objects in the largest group. The result is the dashed curve in the figure. The same striking agreement is found for all data sets investigated.
Wider implications. This paper gives a new starting point for the understanding of Zipf-type phenomena.
Zipf's law unzipped
Seung Ki Baek et al 2011 New J. Phys. 13 043004 http://dx.doi.org/10.1088/1367-2630/13/4/043004
Via Complexity Digest, Bernard Ryefield