[qo], [o], and the Root Word

In my last post I spoke about the relationship between [q] and [o], given that one character is almost invariably followed by the other. In this post I want to talk about the root word, the one which begins after [qo] and [o] and how it affects the frequency of those prefixes.

For the sake of this post I used the same word triplets which I used in the last: three words which are the same except that one form is prefixed with [o], one is prefixed with [qo], and one has no prefix at all (a null prefix). So, for example, [key, okey, qokey] would be be one triplet. Each triplet has at least one valid form (which I take to mean having five or more tokens), though some forms a triplet may not occur at all.

I’ll present a narrative description of the results, sorted by the first character of the root word. Some characters had too few valid words to be included.

[ch, sh]

The bench characters both showed very low frequencies of [o] forms compared to null forms, with none greater than 3%. The highest was [chy] with 155 tokens and its [o] form [ochy] with 5 tokens. The highest count was [chedy] with 501 tokens and its counterpart [ochedy] with 8 tokens. The [q] forms were even rarer.

Even though some individual [qo] and [o] forms rose above the validity threshold, most did not, and it is possible to say that the overall pattern seems to be invalid. That is, words beginning [ch, sh] do not have [o] and [qo] forms.

[a, y]

The picture for [a, y] is much like the above for [ch, sh]. However, the frequency ratio peaked at about 6% [o] and [qo] forms. Some of the counterparts, such as [oaiin] and [qoaiin], with 26 and 23 tokens respectively, are clearly valid words, but still relatively rare compared to the null form [aiin] with 470 tokens.

These words can be considered to marginally take [qo] and [o] prefixes.


Most of these words had moderate levels of [o] forms, ranging from 30% to 130% relative frequency. The main exception was [r], for which the [o] form, [or], was obviously extremely common. However, no [r] word had a valid [qo] form except [qor].

It seems as though [o] is a valid prefix for these words, but [qo] is not.


The relative frequencies for [o] and [qo] forms went up to about 17%, with the [o] forms being always a little more common than the [qo] forms. The high frequencies of the null forms means that many of the [o] and [qo] forms have moderate to high counts, with [odaiin] having 61 tokens.


The [o] forms had strong relative frequencies, ranging from 32% to over 400% for [oly] ([ol] was even higher but, as with [or], is really an exception). All valid null forms beginning [l] produced valid [o] forms.

However, the rates for [qo] forms were much lower, with all being under 50% of null frequency and more than half under 10%. Only seven [l] words (of 35) produced valid [qo] forms. Ten produced no [qo] counterparts at all.

It is clear that [o] forms are valid for [l], but [qo] forms are mostly not.

[ckh, cth]

There were too few examples of these words to get good results. However, it seems as though they mostly take [o] and [qo] prefixes in small but valid amounts. The exceptions were [ckhol, cthol], which only had one [o] counterpart each.


All but four (of 37) null forms had [o] forms at parity or higher, with the highest being [okal] at 600% of [kal]. Similarly, all but eight had [qo] forms at parity or above. In nearly half the cases the [qo] form was more frequent that the [o] form, and several others were nearly at parity.

The [o] and [qo] forms for [k] words have strong frequencies, with the exceptional phenomena of [qo] often being stronger.


All but four (of 28) null forms had [o] forms at parity of higher, with the highest being [otam] with 47 tokens compared to 5 tokens for [tam]. Nine null forms had [qo] forms below parity, with the rest higher. For both [o] and [qo] forms the lowest relative frequencies were still healthy at 75% and 33% respectively.

However, unlike [k], the [qo] forms were typically less common than the [o] counterparts. Only three [qo] forms were at or above parity with [o]. Many were less than half as common, with three dropping below the validity threshold.

It seems that while both [o] and [qo] are valid prefixes, [o] is much more common.


Most words had a moderate to strong frequency of [o] forms, though with a lower level of [qo] forms.

Overall, much like [t].


A number of different patterns were found regarding the influence of the root word’s initial character. I think they can be broadly put into three groups.

The characters [a, y, ch, sh] typically did not show a high frequency of [o] or [qo] forms. It is arguable whether these prefixes are really productive in the same way as for other words. It should be noted that, according to the strong–weak split, all these are weak characters, as is [o].

The characters [r, l] showed a good number of [o] forms, but with much lower, or non–existent, [qo] forms.

The characters [k, t, p] all showed a strong number of [o] forms and also high levels of [qo] forms. However, [k] had even stronger [qo] forms while [t, p] had somewhat lower numbers for [qo].

The characters [ckh, cth] were too few to typify, and [d] is difficult due to the high number of null forms.


I’m not sure. That’s hardly what you want to hear, but it’s the truth. The low frequencies of [o] and [qo] forms for root words beginning with weak characters could show that these prefixes have some connection to last–first combinations. This would certainly be my preferred reading.

Yet quite how to explain the good presence of [o] but not [qo] for [l, r]? I note that, when we discussed last–first combinations and Transformation Theory, the words beginning [l] were those most likely to show a strong preference against the sequence [y.o]. It could be that [l] words avoid [y.o] by removing the [o], whereas gallows characters avoid the sequence by adding [q] to give [y.qo], but this is nothing more than a suggestion to be investigated.

The difference between [t] and [k], with one having a lower [qo] than [o] and the other a higher, reminds me of something we discussed a while ago. We talked about the distribution of [k] and [t], and noted that the strings [qoke, qoka] were much more common than [qote, qota]. Though it’s hard to know whether we’re seeing the same thing from two different angles, or if one is causing the other.

I feel as though this is another part of an emerging puzzle, and it’s not clear how the piece should be fitted together.


4 thoughts on “[qo], [o], and the Root Word

  1. Hello Emma, it’s great to read all this new information! Thank you very much for sharing it!
    You wrote:
    “The characters [a, y, ch, sh] typically did not show a high frequency of [o] or [qo] forms. It is arguable whether these prefixes are really productive in the same way as for other words. It should be noted that, according to the strong–weak split, all these are weak characters, as is [o].”

    Could you please explain what you mean by “productive” here? Does this sentence suggest that [a, y, ch, sh] might have a similar function to that of o/qo (whatever it is)?


      • Hi Emma,
        I was wondering if the parallel couldn’t be even stronger, i.e. if the other weak characters might sometimes occur instead of o- and be dropped on other occasions.
        For instance:
        there are 1863 unique o- words; 37% of these (705) correspond to existing words if one removes the o- prefix

        there are 300 unique a- words; 42% of these (128) correspond to existing words if one removes the a- prefix

        there are 619 unique y- words; 58% of these (365) correspond to existing words if one removes the y- prefix

        there are 1149 unique ch- words; 42% of these (484) correspond to existing words if one removes the ch- prefix

        there are 562 unique sh- words; 46% of these (263) correspond to existing words if one removes the sh- prefix

        If I haven’t miscounted things, it seems that (under this specific measure) the other characters could also function as “optional” prefixes?


        • I think there are two explanations:
          1) that the language has a number of prefixes; or
          2) the word structure means that some words naturally start with these characters.

          It could be a mix of the two, but I would guess that [q], [y], and some cases of [ch, sh] are definitely prefixes, while some [o] must be essential parts of the word.

          The line I’m seeking to pursue at the moment is that [o] is optional for some words and responds to last-first combinations by presence of removal, for other words it is essential and that [q] is used as a response. Of course, the figures we had for [o] and not [o] words didn’t show a strong and consistent response after [y] as I expected. The response seemed to differ according to [dy] and not [dy] endings.

          I wonder, and I suppose this is a request for more stats, how [qo] and [o] word pairs respond to different environments. I want to see [q] after words ending with weak characters, and [o] after words ending with strong characters.

          (Though, of course, I could be quite wrong with my strong-weak split. But it does provide a way of looking at some questions in a new way.)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s