My last two posts were about the relative frequencies of words prefixed with [o] and [qo]. Although there is some interesting information in those posts I don’t feel that it is well presented. After having examined and sorted the statistics again I want to propose a neater way of thinking about these words.
The core idea is that [o] is a prefix which can be added to the start of a word and that [q] is another prefix which can be added to words starting [o]. Thus every word is a member of a potential ‘triplet’ with a plain form, an [o] form, and a [qo] form. For example, [dol] would be a plain with [odol] as its [o] form and [qodol] as its [qo] form. It is the relative frequencies of these three forms which interests us in this post.
Not all triplets have the same relative frequencies. Some have [qo] forms which are common, others have [qo] forms which are rare. The same goes for plain and [o] forms. Thus there there are six possible combinations, depending on which form is higher (or lower) than the others:
Plain > [o] > [qo]
Plain > [qo] > [o]
[o] > Plain > [qo]
[o] > [qo] > Plain
[qo] > Plain > [o]
[qo] > [o] > Plain
I sorted all triplets with at least 30 tokens (about 140 triplets) between all three forms into these six groups. It should be noted that one group, [qo] > plain > [o], did not exist. Simply put, it is not possible to have a high frequency of [qo] forms without a relatively high frequency of [o] forms. Only one triplet had a [qo] form more than three times as common than they [o] form.
Although these groups were a good starting point for classification of the triplets, they were too blunt in their discrimination between relative levels. For example, the triplet [raiin] had an [o] form with one more token than the plain form, which [lkedy] had a plain form with two more tokens than the [o] form. In reality, both of these had plain and [o] forms which were more or less equal.
Thus I further sorted these group into four types which could be described fairly easily to show the relative frequencies. I will present the four types below.
Type I: 55 triplets. They all have high counts for the plain forms but low counts for both [o] and [qo] forms, which are mostly less than 10% of the plain form. The plain forms start with [a, o, y], [ch, sh], [d, s], and [cth] (but not [ckh]).
Type II: 42 triplets. The [o] form is the highest frequency form (or, for a few, nearly the highest). The plain and [qo] forms are less frequent, but with [qo] higher than plain. About half the plain forms begin with [t] and another three with [p], thirteen [k], a couple [l] and [r].
Type III: 17 triplets. The [qo] form is the highest frequency form, followed strictly by [o] then by the plain form usually much lower. Fifteen of the plain forms start with [k], with one each for [e] and [t].
Type IV: 20 triplets. Both the plain and [o] forms have at least ten tokens, but the [qo] form is always less than 40% as frequent of either, and usually much lower. Fourteen of the plain forms start with [l], the rest with [r] except [m].
There were also six triplets which didn’t fit easily into any of these types. Three began with [ckh] in the plain form and one was a Grove word.
I hope it is clear that first character of the plain form is a key indicator of how common the [o] and [qo] are both in total and relatively. Only Type II overlapped significantly, with Types III and IV. Otherwise, it is possible to broadly predict, for any given word, how often it will be prefixed with [o] and [qo].
For example, think of the plain form [dol] which I mentioned earlier. We can see that the plain form begins with [d] and so it is part of a Type I triplet. Thus the [o] and [qo] forms should occur much less than the plain form. So [dol] has 117 tokens, whereas [odol] has 2 and [qodol] only 1.