The Relationship between [q] and [o]

The character [q] is the most stereotyped character in the Voynich script. It almost always occurs 1) at the beginning of a word, and 2) before the character [o]. It also mostly occurs after a word ending [y]. Thus its immediate context can be guessed in most cases.

Some have proposed, due to the character’s position at the beginning of a word, and its lack of occurrence in isolated labels, that it stands for a whole word (a morphogram), maybe a grammatical word like the ampersand: &. I think this is highly doubtful as we would expect it to occur before all kinds of characters and not only [o].

The morphogram proposal can be salvaged by suggesting that the two characters [qo] are an inseparable digraph, or some such. This would be much more reasonable as [qo] occurs before all kinds of characters. Though it still has a small weakness in that [q] rarely occurs before a few other characters, such as [e]. This, however, is not a fatal objection.

I wish here to present some statistics to show that [qo] may well be separable and thus the morphogram proposal has a poor foundation.

No [q] without [o]

We know that words beginning [q] must nearly always have [o] as the next character. However, the removal of the [q] will almost always result in a valid word. There are few exceptions to this and most are marginal.

I took every word beginning with [q] which had five or more tokens (I will treat five as the threshold for validity in this post), and counted the number of tokens for the same words without the initial [q]. So, for example [qokey] had the counterpart word [okey]. The list contained 123 word pairs.

Here’s what I found:

  • every [q] word with more than 15 tokens (48 of 123) had a valid counterpart beginning [o];
  • all but two of the [q] words with more than 10 tokens (73 of 123) had a valid [o] counterpart;
  • only 15 valid [q] words lacked a valid [o] counterpart, and 13 of those words had fewer than ten tokens;
  • only 4 lacked no instances of an [o] counterpart: one had nine tokens, the other three just five tokens.

Here is the list of [q] words without valid counterparts, and the count for each:

With o Count With qo Count
okl 0 qokl 9
otched 0 qotched 5
oeol 0 qoeol 5
ool 0 qool 5
ockhol 1 qockhol 7
okeechy 1 qokeechy 6
oked 2 qoked 7
okeedar 2 qokeedar 6
okeed 3 qokeed 15
okshedy 3 qokshedy 11
oor 3 qoor 8
okod 3 qokod 7
oeeey 4 qoeeey 7
opol 4 qopol 6
otshy 4 qotshy 5

The most interesting words on this list are [qoor] and [qool]. The word [oor] is rare and [ool] simply doesn’t exist, yet [or] and [ol] are very common. The string [oo] is not common itself, so we’re looking at a marginal set of words. Indeed, the counts for most of the 15 above are low and really only two or three stand out as exceptions.

Ratios of [q] to [o]

I further wish to show that the number of [q] words is limited by the number of its [o] counterparts.

Here are a few key stats, based on the same list of word pairs as before:

  • only four valid [q] words exist without any [o] counterparts (so their ratio cannot be calculated);
  • 72 of the 123 valid [q] words are less common than their [o] counterparts;
  • of the remaining 47 valid [q] words, only six occur three times more than their [o] counterparts and all are words we have already listed above as exceptions without valid [o] counterparts.

So, in sum, all the words not given earlier as exceptions are either less common than their [o] counterparts or no more than three times as common. It should be noted that there is no absolute floor provided by [o] words, and that some words which occur commonly, such as [or] with 366 tokens, might have relatively low numbers of [q] counterparts, [qor] with only 23 tokens. But the opposite is not and cannot be true. The most extreme case is [qokeedy] with 305 tokens, whereas [okeedy] has 105 tokens.

The Null Hypothesis

The proposal that [qo] is the prefix and not [q] alone suggests that words beginning [qo] should be linked with words with no (or null) prefix. So, for example, a word such as [qokey] is formed from [qo] plus [key] instead of [q] plus [okey].

Let’s look at the statistics for [q] words and their null counterparts:

  • every valid [q] word with 19 or more tokens (43 of 123) has a valid null counterpart;
  • all but five valid [q] words with 10 or more tokens (73 of 123) had valid null counterparts;
  • 29 valid [q] words lacked a valid null counterpart, and 24 of those words had fewer than ten tokens;
  • five lacked no instances of a null counterpart: two had eight tokens and three had six tokens.

I won’t bother providing a table, as you can see that the stats for the null counterparts are somewhat worse: the number of valid [q] words lacking a valid counterparts nearly doubles from 15 words lacking [o] counterparts to 29 lacking valid null counterparts.

The ratio stats are even worse, with a much wider spread:

  • five valid [q] words have no instances of null counterparts (so their ratio cannot be calculated);
  • only 40 of the 123 valid [q] words are less common than their null counterparts;
  • of the remaining 78 valid [q] words, 26 are three or more times common than their null counterparts, and 12 of these have valid null counterparts.

In short, the frequency of the null counterparts bears little relation to the frequency of the [q] words. The word [qokal], with 191 tokens, is more than eight times as common as [kal], with only 23 tokens. Yet both are clearly valid words with relatively high counts.


The idea that [qo] is a prefix is the only way to preserve the hypothesis that [q] is grammatical in nature. However, the competing hypothesis, that [q] is a prefix which adds onto words already beginning with [o] performs better with a simple set of statistics: [o] counterparts to [q] words show more validity as words than null counterparts and the frequency ratio is tighter.

None of the numbers in this post are proof, but they at least give us caution that existing ideas aren’t well supported.

We ought to be more agnostic about the nature of [q] and seek other hypotheses. The possibility of a sound value for [q] should be explored as a potential better fit. There are a few sounds which could work, and there are explanations for the non–occurrence of the character in labels.

I think that a radical new solution to the problem of [q] should be sought.

4 thoughts on “The Relationship between [q] and [o]

  1. Wonderful post, Emma.
    One complicating factor is the fact that q or qo is rare in labels. This would indicate to me that it is either grammatical or some kind of proclitc. Though there are surely other (better) options.
    While reading your post there was one alternative which crossed my mind. What if words are just commonly prefixed and “o” and “qo” are both (different) prefixes?


    • Hi Koen, I appreciate that the situation with labels is the cause of speculating that [q] or [qo] might be grammatical. But I just don’t think there’s any real evidence for it.

      I have another post coming which will show that the relative levels of [qo] and [o] depend upon the following letters. So that [ok, qok] and [ot, qot] show different patterns. This should reinforce the phonological possibility of [q].


  2. This is of course something I’ve also struggled with. The statistics are tricky to interpret because of semantics, e.g. if qo encodes ‘the’, then there may well be semantic linkage between it and the attached word. That is, your conclusions assume that the two halves operate independently, even though they are immediately adjacent.

    My own proof (I can’t recall if I wrote this up) that qo is a token was simply that the free-standing word qol occurs overwhelmingly in places where l- words appear (e.g. Q13) but rarely elsewhere.

    This was more of a causal proof than a statistical (probabilistic) correlation: and as such tried to sidestep the confounding problems of semantic linkage.


    • Hi Nick, there are definitely links between the [q] prefix and the word it attaches to. I intend to address this soon.

      Although I don’t know what [qol] is, I think that were it connected to words beginning [l] we would see it a lot more in Quire 20. I would guess it is linked to [ol] as that word also appears most commonly in Quire 13.


