Low Level Word Structure

In two earlier articles I proposed that the characters [y] and [a] are equivalent, and that [y] is not expressed in a middle position after [e]. (If you have not read those articles, it is best to read them now before this article.) The combination of these findings is that [y] and its different expressions occur in most of the same environments as [o], and that when one is swapped for the other the result is often a valid word. No other character can be swapped with either of them and reliably outcome in valid words. Because of this [y] and [o] must have some strong relationship.

Unlike the relationship between [y] and [a] — where the two characters occur in different environments which never overlap — the characters [y] and [o] occur in environments which significantly overlap. The writer of the Voynich text did not have to choose between [y] and [a] because the environment dictates which one is needed and the other is not possible. But they must have had a reason for choosing [y] or [o] because either one would have made valid word. So these two characters must have contrasted in the mind of the writer—they made the difference between one word and another—but they were obviously still similar enough to go in the same environments.

Thus [y] and [o] can be considered the same class of character: they work in the same way but do not have the same value. And we must immediately note that almost all words contains at least one character — or reflex of a character — from this class. Given that most words contain one or more examples of these two characters, and thus they must be essential to the structure of words, I would like to label this class of characters primes.

In this article I would like to investigate the low level word structure using this newfound class of primes as a starting point. In this view the simplest word consists of one prime and all the other characters are structured around it. More complex words have more than one prime, and the characters must relate to one prime or the other. Word can thus be broken up in sections, each one of which contains a single prime to which all the other characters in that section relate.

It is the goal of this article to discover if this kind of analysis can properly model or describe the low level structure of Voynich words. If so, the typical section should be regular and predictable. The high level structure of a word — or how characters relate over the whole word — will be a later article.


Let us look at a few words so that the kind of analysis I am proposing will be clearer.

The word [daiin] contains only one prime [a], which is a reflex of [y], and thus the whole word is a single section. The body of the section is made up of the prime and everything before it, so in this case is [dy]. The section’s tail is everything which comes after the prime, so here [iin]. For another simple word [chol]: the [o] is the prime, so the body is [cho] and the tail is [l].

A more complex word is [qokaiin]. It has two primes [o] and [y] (here expressed as [a]), and thus two sections. For the purposes of breaking words into more than one section I will take everything before a prime as part of that prime’s section, until another prime is reached, and everything after a prime if it is the final one in a word. Thus all non-final sections consist of bodies, and only final sections have tails. So the word [qokaiin] has the sections [qo]—which is only a body—and [kaiin], which has a body of [ky] and a tail of [iin].

A yet more complex word would be [otedy]. Because of [y] deletion in a middle position, we should consider this word to have a prime between the [e] and [d]. We could mark this as [y], but for the purpose of this post I will mark as a null with the symbol [Ø]. The word then has three primes and three sections. The first is [o], the second is [teØ] and the third is [dy]. All three are body only as the last section has no tail.

We can quickly recap the structure of these words as laid out below. The + indicates the link between two sections, and — as the link between the body and tail of a single section.

[daiin] = [dy] — [iin]

[chol] = [cho] — [l]

[qokaiin] = [qo] + [ky] — [iin]

[otedy] = [o] + [teØ] + [dy]

With this breakdown of words we can already see that the bodies and tails of sections in different words have repeating structures. Both [daiin] and [otedy] contain the section body [dy], while [daiin] and [qokaiin] contain the section tail [iin]. This is encouraging, and it is hoped that the majority of words will be composed of a finite set of such parts which can be reduced to a simple pattern.

The rest of this article is concerned with showing which section bodies and tails are common and how they are structured. First the greater part dealing with section bodies, then next the smaller part dealing with section tails. The following is based upon a list of all words in the Voynich text with ten or more occurrences. Although it will not be exhaustive it should provide insight into the typical word structure.


By definition, the rightmost glyph in the body of a section is a prime, and must always be present: either [o], [y], or one of the two reflexes of [y], [a] and [Ø]. Both [o] and [y] may occur alone as the complete section, either as single glyph words or at the first section in a words. The occurrence of [y] or [o] as the only glyph in the first section in a word is very common.

All other glyphs in the body are optional. Word which do not meet the basic criteria of having at least one prime are short, typically one or two glyphs long. These words make up only a small number of the total words, and are not included in this analysis.

To the immediate left of the prime there may occur an [e] sequence: a string of [e] from one to three characters long. However, only [e] and [ee] are common. When the prime is [Ø] an [e] sequence must by definition occur, and it is the rightmost glyph in the section to be expressed. It is thus possible for it to constitute the whole body of a section, with neither prime nor any other glyph. If an [e] sequence occurs in a section it must be to the immediate left of the prime position and nowhere else, by definition.

To the left of an [e] sequence — or the Prime should an [e] sequence not occur — there may be either of the glyphs [ch, sh]. The glyph [ch] is more common than [sh], but it would seem that both are equally valid. Its occurrence can be dependent on the surrounding characters, and it must be present in some situations. More will be said about this below.

To the left of [ch, sh], any [e] sequence, and the rime, comes the widest selection of glyphs which may occur in the body of a section: [k, t, p, f, ckh, cth, cph, cfh, d, s, l, r]. These characters work in different ways, and can be grouped according to how the interact with the two possible glyphs to the right of them, which have already been discussed.

The first and simplest group is [ckh, cth, cph, cfh] which will not take [ch, sh] to their right, but will take an [e] sequence. The next is [k, t, d, s, l, r] which will all take any combination of [ch, sh] and an [e] sequence. The third is [p, f] which will take [ch, sh] without an [e] sequence, but will not take an [e] sequence without [ch, sh]. All glyphs readily take only a prime to their right with no other glyphs between.

The only glyph which can occur further to the left in most cases is either of [ch, sh], and only if one of the selection just mentioned above is also present. However, the occurrence of this further [ch, sh] is complicated. The glyphs [ckh, cth, cfh, cph] take the glyphs [ch, sh] readily. The glyphs [k, t, f, p] will take them only in low numbers, and almost not at all if they have a [ch, sh] to their right. Of the glyphs [d, s, l, r] the situation is quite complex. The glyph [d] takes [ch, sh] readily before it—but there is a further pattern noted below which may account for this—which goes to a lesser extent for [s]. However, [l, r] only rarely take [ch, sh] before them if at all.

The patterns set out above account for a great many permissible section bodies, at least among the common words. Sequences such as [keo, tchy, lo, dshe, ckheey] are all found with frequencies which suggest they are perfectly valid according to the rules governing Voynich word formation. Sequences such as [kdchy, tlo, cthcho, chrey, dfy] should not occur, and do not but for the rarer exception.

However, there are three more parts to the general pattern of section bodies which apply only to particular glyphs.

  1. The glyph [d] may occur in the place of an [e] sequence so long as [ch, sh] is present to the left ([ldy] also occurs but that may relate to yet another pattern below). Section bodies such as [chdy, shdy, kchdy, tchdy, pchdy, lchdy] all occur, though none with the prime [o]. The fact that [dy] regularly appears as the final section in a word while [do] does not, makes this pattern rather curious. It seems as though a different explanation should be sought.
  2. The next particular pattern is that [l] may occur before some characters at the beginning of a section body. The most common is [k], though [t] and [d] occur in smaller numbers. Indeed, the sequence [lk] is so common, and with no other explanation or clear generalization, that it might be considered a specific case. I believe it could be evidence that [lk] has a particular value as a digraph.
  3. The third pattern is well-known as need to real explanation. Any body which contains only the prime [o] and is the first section of a word, cad add [q] to the left and not further glyphs. This makes it always the first glyph of any word it occurs in, and accounts for almost all occurrences of [q].

The section tails are much easier to describe and general much shorter. Unlike section bodies, however, which are not determined in any way by their prime, tails are significantly constrained by the prime which comes before them. Though the prime is not counted as part of the tail, for this reason it must be considered. Because [y] is seldom found before a tail, and [a] conditioned by the following glyph, three prime must be distinguished: [a], [o], and [Ø].

Many section tails are simple and consist of a single glyph. The prime [Ø] may only take [d, s] and nothing else. For [a] and [o] the tail is often one glyph: for [o] is can be [d, s, l, r, m] and for [a] is can be [l, r, m, n]. However, both can also take [i] sequences, which are one to three occurrences of the glyph [i] and always followed by either [l, r, m, n] (although [o] is more limited on which combinations it can take).

The only rarer or untypical part of the tail pattern is the sometime occurrence of two glyphs (that is, apart from [i]). The only one which occurs with any frequency, however, is [ls].


I have sought to show in the table below what has been said in the text. It should make the overall structure of a Voynich word section clearer. Bear in mind this is a generalized model, it neither explains all possible sections nor are all sections it suggests common. Think of it as a fence around the likeliest section structures, with those outside being unlikely or impossible.
Generalized Low Level Structure

The model pattern of Voynich words I have laid out is tentative and subject to change. I present it here as a first attempt with the hope that it provide a base to build upon. I will soon look at the high level structure of Voynich words and I expect that afterwards I will want to revisit the model.

There is also the question of how well the model pattern fits less common words. Exceptions may prove useful in refining the model, or even disproving its validity.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s