Following a discussion with René Zandbergen on the similarity between [l] and [r], I thought it best to share some statistics I’ve had hanging around for a while. I made them when looking into another hypothesis, which I really should share as well some day. The statistics relate to words which are ‘weak strings’ followed by [l, r].
(Weak strings are my name for the very common combination of a bench [ch, sh], followed by zero to two [e], and ending with [o, y/a].)
The statistics show the very possible combinations of weak strings, with an additional ending of [r] or [l], and the number of tokens. Here are the two tables:
The statistics are interesting because they show that the two sets of words share a common pattern. There are three possible factors by which to alter the weak string: replace [ch] with [sh], switch between the number of [e] from none to two, and change [o] for [y] (here [y] is expressed as [a], as is to be expected).
Changes in each one of these factors produces the same kind of frequency change, regardless of whether the word ends [r] or [l]. So, in all cases the [o] form of the word is more common than the [y] form. Likewise, in all but two cases the [ch] form is more common than the [sh] form (both exceptions are minor). Lastly, increasing the number of [e] makes the word less common, with one exception.
The similarity in frequency patterns suggest some underlying cause. One possibility is that [r, l] are suffixes to a shared set of words which already has that pattern. I investigated this hypothesis (in fact, it was the reason I made the statistics in the first place), but the link appears to be weak.
Here are the frequencies for the same twelve words without any suffix:
The differences should be clear: the [y] forms are more common than the [o] forms; one [sh] form is significantly more common than the [ch] form, and increasing the number of [e] does not make the word less common.
We must therefore find another explanation for the similarity of the frequency patterns in [r] and [l]. It may be that [r] and [l] have some underlying link, such as sound, which means they behave in similar ways. They do, generally, have a similar distribution in words, which reinforces this suggestion.
One final observation is that words with [o] are more common ending [l] than [r], and the opposite is true for words with [y]. This can be seen in the statistics above, but also in the text as a whole, as shown in the table below:
|All||With [o]||%||With [y]||%|
(Percentages do not sum to 100%, because some instances of word final [r, l] may be preceded by other characters, particularly [i] in the case of [r].)
If the change between [y, o] can cause a shift between [r, l], then maybe they are closely linked? All thoughts are welcome.