Last–First Combinations and Transformation Theory

In May I proposed the Transformation Theory. The theory stated that words are transformed from their original shape through the influence of neighbouring characters and other aspects of the text. I presented a small piece of research regarding words beginning with [o], which provided very basic support for the theory. I now wish to present a more substantial look at the same words.

The statistics used in this post were provided by Marco Ponzi, whom I must thank. He also provided a great deal of discussion, and some suggestions, which find their way into this post. However, I do not wish to suggest he agrees with this post, in whole or in part (unless he otherwise states).

Note: a transcription with a ‘.’ denotes a space between words. So a transcription such a [n.o] means a sequence where a word ends [n] and the following word begins [o].

The following research is built on the idea of last–first combinations. Characters found at the end and beginning of words can be broadly classified into two groups, strong [k, t, r, n, s] and weak [y, o, a, ch, sh], according to the characters they prefer to neighbour. A word ending with a strong character prefers to be followed by a word beginning with a weak character, and vice versa. Strong–strong and weak–weak combinations are typically less common than strong–weak and weak–strong.

When last–first combinations show that certain sequences, such as [n.k] and [y.t], are common or uncommon, this is based on an aggregate of thousands of different words sharing only the last or first letters. In order for last–first combinations to become part of the Transformation Theory we need to show that the combinations are true with regard to individual words. We can do this by altering the shape of a word and measuring the changes in the last–first sequences it is part of.

For the research we took pairs of words which differed only by the presence or absence of an initial [o]. This was chosen because it is a very common variation and many word pairs exist with and without initial [o], and because [o] is a weak character usually followed by typical strong characters such as [k, t, r] (and [l], which acts strong at the beginning of a word). Thus, we should find numerous word pairs which begin [ot, ok, ol, or] in one form and [t, k, l, r] in the other. The theory states that by removing the initial [o] and altering the first character from weak to strong, but keeping the rest of the word the same, we should see a significant difference in the last character of the preceding words.

Twenty–one pairs which met the criteria were chosen on the basis of frequency. All pairs had at least 77 tokens in all, and each word of a pair had at least 31 tokens. The twenty–one pairs were: [or, r], [okaiin, kaiin], [okeedy, keedy], [okar, kar], [otol, tol], [okain, kain], [okedy, kedy], [okeey, keey], [otar, tar], [otedy, tedy], [otaiin, taiin], [olkeedy, lkeedy], [olkeey, lkeey], [oraiin, raiin], [olchedy, lchedy], [okol, kol], [opchedy, pchedy], [olkain, lkain], [otchedy, tchedy], [olor, lor], and [olkaiin, lkaiin]. It should be noted that several features, such as different word endings, the presence or [e] or [i] within the words, and overall length were represented multiple times in different combinations. This allowed us to check if the last character of the preceding word had any influence beyond the variable initial [o].


The most extreme change was observed following words ending with [n]. In all twenty–one word pairs the word beginning [o] was more common after [n] than that without the initial [o]. In many pairs there were no examples of the latter, despite words ending [n] occurring commonly before the [o] version of the pair. For example, [otar] has 41 tokens (30% of all its occurrences) in the sequence [n.otar], yet there are zero for [tar]. Likewise, [okain] has 43 tokens (30%) of [n.okain] and zero for [kain]. A full 13 of the 21 pairs showed this situation.

Here are the percentages after [n] for each word pair:

Without [o] % With [o] %
kain 0 okain 30
tar 0 otar 30
kaiin 0 okaiin 29
taiin 0 otaiin 27
lkeedy 0 olkeedy 26
lkain 0 olkain 25
keedy 0 okeedy 23
raiin 0 oraiin 21
tedy 0 otedy 20
pchedy 0 opchedy 20
keey 0 okeey 19
tol 0 otol 12
kedy 0 okedy 11
lchedy 1 olchedy 5
lkeey 3 olkeey 25
kol 3 okol 12
kar 4 okar 27
lkaiin 4 olkaiin 23
lor 5 olor 19
tchedy 6 otchedy 21
r 7 or 16

Words following [r] showed the same bias toward the version with initial [o], but without such extreme statistics. The sequence [r.o] was always the more common, sometimes more than twice as common but also sometimes the difference was not significant.

Here are the percentages after [r] for each word pair:

Without [o] % With [o] %
lkaiin 2 olkaiin 39
r 6 or 28
kedy 2 okedy 19
lchedy 1 olchedy 18
kain 4 okain 20
taiin 0 otaiin 15
keedy 2 okeedy 17
pchedy 3 opchedy 18
lkeedy 0 olkeedy 14
raiin 5 oraiin 18
tol 4 otol 17
lkeey 5 olkeey 18
tedy 0 otedy 12
tchedy 0 otchedy 12
lkain 11 olkain 19
kol 11 okol 18
kar 10 okar 17
tar 14 otar 20
kaiin 8 okaiin 14
lor 7 olor 10
keey 7 okeey 9

Where the word pairs followed words ending [o] with any frequency (which was only in about 8 of 21 pairs) there was a clear preference for theform without initial [o]. Some of these could be misreadings. For example, 28% of [r] occurred after words ending [o], but the sequence [o.r] might simply be [or] with a misplaced space. However, it is telling that only 3% of [or] occurred in the sequence [o.or].

The most difficult to understand result is that after words ending [y], the most common word ending in the text as a whole. With [y] and [o] both being weak characters we would expect a bias against the sequence [y.o]. However, even though many pairs did show a small avoidance of this, the differences were small and some pairs showed the opposite preference.

The only set of words to consistently show a strong preference one way or the other after [y] were those where the characters following [o] were either [lk] or [lch]. Of these five pairs, four had a 2:1 preference for avoiding [y.o].

At the suggestion of Marco we broke the words ending [y] down into two groups: those ending [dy] (a very common ending) and those ending [y] but not [dy]. This breakdown proved to be interesting. Two things were observed, though only one of which directly relates to Transformation Theory.

Firstly, the word pairs ending [dy] themselves were much more likely (across both versions) to come after other words ending [dy]. I won’t comment more on this, as it really deserves its own place.

Secondly, many pairs showed a preference for the form with initial [o] after words ending [dy], even where they did not show such a preference for words ending not [dy]. An example will make this clear. The pair [kaiin] and [okaiin] occur equally after all words ending [y] (32% and 29% respectively). However, nine out of ten times [kaiin] comes after a word ending only [y] not [dy], whereas for [okaiin] the split is 54% for not [dy] and 46% for [dy].


It would seem that last–first combinations have some reality at the level of individual words. The sequences [n.o] and [r.o] are preferred much like the aggregate statistics would suggest, and sequences such as [n.k] and [r.t] are often avoided. So strong–strong combinations are clearly not preferred.

The results for weak–weak combinations are mixed. The phrase [o.o] is not preferred where is exists. The [y.o] sequence has a tendency to be avoided only where the preceding word doesn’t end [dy] or the following word begins [olk, olch]. Where the preceding word does end [dy], then the sequence [dy.o] may be just as much, if not more common. It seems as though [dy] is, for some reason, ignored by the rules of last–first combinations.

Are the word pairs real?

It might be objected that the pairs of words used in this piece of research aren’t real pairs. That’s certainly a reasonable objection, and until we can read the text we cannot, sadly, prove it positively one way or the other. We can only definitely say that they are orthographic pairs, being spelt the same way other than the initial [o] character.

However, we can show that many pairs have similar distributions within the text. This is what we would expect were they true pairs, appearing in the same topics but in varying forms. Using Pearson’s correlation coefficient, where a value of +1 is total positive correlation, -1 total negative correlation, and 0 no correlation, 11 of the 21 pairs score .50 or better. (The worst performing pair is [olkeedy, lkeedy] with a value of .05. However, they are found almost exclusively in quires 13 and 20, just not in the same folios.)


I’m wary of putting too much weight on a single set of figures for proving a theory. But it certainly looks reasonable to suggest that the last–first combinations seen in aggregate do operate at the level of individual words. There’s nothing obviously against the last–first combinations, and even the worst performing, the weak–weak sequence [y.o], seems to be complicated by the treatment of the word ending [dy].

Naturally, further experiments, taking the same question from different angles, would add to the debate. However, the stark preference for the [n.o] sequence shows that, at least in some places, there’s little more than can be said. It would seem that, if a variant with initial [o] exists, then it will occur after words ending [n].

Taking this one phrase, [n.o], which is the strongest individual last–first combination we have seen, what is the consequence for Transformation Theory? The theory would state that one or both of the words in such a phrase have been altered under the influence of their neighbour. The variant beginning [o] occurs after a word ending [n] specifically because the [n] causes the [o] to be added (or the [o] causes the [n], but there are reasons this is unlikely). The context of one word thus transforms the other.

Were we to take this individual transformation as definitely true then we could use the knowledge to ‘undo’ the transformation and restore the original, underlying, text. For example, in the phrase [aiin otedy] the second word might have originally been spelt [tedy], only taking the initial [o] due to the influence of [aiin]. The original text might have been [aiin tedy].

The undoing of words transformations would, if repeated throughout the manuscript, have the likely effect of making it more regular: a smaller vocabulary, with more repeated phrases, and potentially more obvious grammar. The research goal for Transformation Theory is to gather further evidence for transformations and refine our understanding of them. Last–first combinations are one part of this goal.

I realize that this post doesn’t include the full statistics, only my impressionistic assessment of them, so I’m going to ask Marco’s permission to post them as a separate file.


Some Observations on Bench Gallows (2)

Further to my last post, I wanted to add more observations about bench gallows and they way they work. I also have an embryonic hypothesis to share at the end.

Although the position and function of bench gallows within word is very similar to other gallows, the relative proportions are often very different. So while a string such as [qoke] is common, occurring nearly 1,400 times, the string [qockh] occurs 70 times, or only one twentieth. Given that [k] is about ten times more common that [ckh], then [qockh] is half as common as we might expect.

We should thus not expect that what is true for gallows is always true for bench gallows. Investigation into the differences could be insightful, and the ‘neighbourhood’ characters of bench gallows will be the theme of this post.

What Comes Before

I used to make some basic stats for the bench gallows [ckh, cth] and their gallows counterparts [k, t]. In each case different environments were put before the gallows and the counts converted into percentages for the total occurrences of these characters.

[k] [t] [ckh] [cth]
Word Start 13 17 22 53
Start [oG] 28 42 4 5
Start [qoG] 35 19 8 3
Start [yG] 7 9 1 0
Start [WG] 15 12 57 31

(In the above table, G stands for any kind of gallows, and W for a weak string.)

Some differences are obvious, others less so. We can see straight away that most words with a bench gallows either begin with that character or with a weak string. For both bench gallows about 80% of all occurrences are from these two environments, which account for just under 30% of [k, t]. However, [ckh] and [cth] themselves act differently in these two positions, which suggests something deeper is occurring.

The bench gallows are both less common after a word–initial [o] or [qo], though with enough occurrences to show that such a string is valid. There’s also a difference between [ckh] and [cth] which might be mirroring in that between [k] and [t].

Most interestingly is the almost complete lack of bench gallows after a word–initial [y]. The actual counts for both are in single figures. We might have expected somewhere between 50 and 100 for each were they as common as for [k, t].

What Comes After

Here is the same table, with percentages, for different environments occurring after the gallows.

[k] [t] [ckh] [cth]
Before [o] 9 14 11 24
Before [y] 8 8 50 39
Before [a] 30 27 4 8
Before [ch, sh] 13 20 0 0
Before [e] 38 30 27 21

Again, there are some stark differences. Firstly, as is well known, benches simply don’t come after bench gallows. The percentages are zero, and the actual counts are three each.

More interesting is the crossover between [y] to [a]. Gallows are followed by a moderate amount of [y] but plenty of [a], whereas bench gallows are followed by lots of [y] and only a moderate to low amount of [a]. The cause must be due to a lack of words ending [l, r, m, n]. Though I need to do more research on this point, it would seem that the string [iin] is very uncommon in bench gallows words.

Gallows before [o] is ambiguous with no clear pattern. It seems as though [k, ckh] and [t, cth] is a more natural split, so something else is happening and may be unrelated to the nature of bench gallows. The same may be true for gallows before [e], though splitting the figures down for [e] and [ee] shows an interesting pattern I’ll save for another post.


There is good preliminary evidence that bench gallows don’t have the same environments as plain gallows. There are several environments which are so utterly different in frequency.

The question is whether such environments are caused by bench gallows or are the cause of bench gallows. It is not easy to answer, though I think the latter is more likely.

Further investigation into the more interesting environmental differences may yield further clues concerning bench gallows. Certainly looking at weak strings before bench gallows, and the lack of [a] after them, seem the best bets.

I think there’s a potential hypothesis which could explain much of what we’re seeing, though it’s not without its flaws. It could be that a bench gallows is where a plain gallows has ‘captured’ adjacent characters in some way.

If the [h] of [ckh] were a ‘captured’ [e], that would explain 1) the lack of [ch, sh] following as [kech, kesh] are relatively rare, and 2) the lack of [a] as [eai] is also rare.

Quite why this might happen is a whole other question.

Some Observations on Bench Gallows (1)

Bench gallows are the characters transcribed in EVA as [ckh, cth, cfh, cph]. They derive their name from their appearance: they look like a gallows character [k, t, f, p] with a bench [ch] draw through it.

Within the text bench gallows characters tend to act most similar to their gallows counterparts, occurring in the same ‘slot’ in the word structure. They are only 10% to 20% as common as their counterparts, however, and are more common in Currier A than B. A few pages have no occurrences of bench gallows, and many have substantial portions of text without one.

What are they?

One of the key questions about bench gallows is their nature. Although they look as though they are two characters combined, is that the case? We know that some characters share the same strokes. The character [i] is a single stroke also found in [l, r, n, m], and [e] is found in [o, y, d, s, ch, sh], while [a] has them both. It may be that the graphical similarity is superficial with no deeper link.

However, we can be fairly sure that the gallows part of the character is actually related to gallows. Firstly, as already mentioned, when we look at the word structure of Voynich words, we can see that bench gallows work in a similar way to gallows. Next, the characters [cfh, cph] often appear in the first lines of paragraphs just like [f, p]. Last, the four different bench gallows mirrors exactly the four different gallows: not only strokes have been borrowed but a whole set of characters.

The bench part of the character is more uncertain. The strokes are simpler and there’s only a single character which has been borrowed, [ch], even though there are two benches, [ch, sh]. There’s also the fact that a bench gallows doesn’t act like a bench in the text. Yet there’s some evidence that this part of the character really is a bench.

Gallows characters are often followed by benches. Anywhere from 15% to 50% of gallows are followed by [ch, sh]. Yet for bench gallows that figure is effectively 0%. There are only 3 [ckhch] and [cthch], and 1 [cphch]. None of the other possible combinations exist, and it might as well be considered an invalid combination.

This exclusion of bench characters following a bench gallows is a positive sign that there is some relationship between the two. It should be noted, however, that the occurrence of benches before bench gallows, in strings such as [chckh], is perfectly valid. Indeed, it seems to be more common than for regular gallows, which will be discussed in another post.

Bench Gallows Variants

Although most bench gallows are of a simple type, several variants exist. One common variation is the extended bench gallows with an extra [e] stroke joined to the crossbar, transcribed as an additional [h]. About 30 [ckhh] are recorded in the transcription, along with 23 [cthh], 7 [cfhh], and 13 [cphh].

However, visual inspection of these reveal a large number of ambiguous readings. Although some are definitely correct, others are not and many are hard to judge. It is not easy to know whether a string should be [ckhh] or [ckhe], for example.

Despite this, they remain an interesting insight into the character, and question whether the bench part is a bench at all. For there are no, or almost no, examples of [cch] or [chh] in the text. Why can a bench gallows take an extra [e] stroke—even if rarely—when benches cannot? Surely we would expect to see extended benches in the same ratios as extended bench gallows?

Another common variant is the replacement of the initial [c] stroke with an [i] stroke. These variants are as common as the extended bench gallows: 33 [ikh], 26 [ith], 6 [ifh], and 8 [iph]. Once again, visual inspection reveals the possibility that at least some are misreadings or miswritings.

However, we can be sure that not only are many of these ‘i–bench gallows’ real, but that the [i] stroke really is what it looks like. The [i] stroke is the conditioning environment is the cause for the variation of [y] into [a], so we should expect that to happen before these characters. This is exactly what we see.

There are 13 [aikh], 14 [aith], 4 [aifh], and 6 [aiph]. Though the numbers are small they are significant ratios. We can also be sure that normal bench gallows don’t cause [a]: there is 1 [ackh], 2 [acth], and 2 [acph]. These are not only small numbers but tiny ratios at <1% of all bench gallows.

Moreover, like above with other variant gallows, these combinations don’t occur in benches alone. The character pair [ih] occurs only thrice, and [ci] not at all. So once again we see that bench gallows can be formed with something which is distinctly not a bench.


I am unsure of what this all means. I think that bench gallows are not simply gallows with extra strokes which are purely graphical. Those strokes are likely to be related to other characters in the script and it is thus some kind of ligature.

However, I don’t think bench gallows are strictly a ligature between a bench and a gallows. It would seem that characters are being linked together with a crossbar, like a bench, though maybe for a different reason. We can be sure that one of the characters in the ligature is a gallows, but the identity of the others is harder to understand.

I have some further thoughts about the way bench gallows are used in words, and I’ll put these in another post.

Initial [o] Transformation

In my last post I spoke about my Transformation Theory, and how I believe that the shape of words may be influenced by their surroundings. I speculated that, in light of First–Last Combinations, certain characters may be used to break up unwanted combinations. My example was that, in a phrase such as [dar kedy], [d k] are both ‘strong’ characters and a ‘weak’ character is inserted between them. One such character could be [o], which indeed comes at the beginning of many words.

In this post I would like to look further into this suggestion. We already know that certain word–end characters prefer to match up with certain word–start characters, but these statistics are very general. Though we can say that [r o] is more common than [r t], we can’t be sure that these two facts are related by a transformation. It could simply be that the phrase [or oraiin] is really common and [or tchedy] isn’t. We need to compare them to [or raiin] and [or otchedy] to get a better insight.

I took six pairs of words, one beginning with a strong character (in this case two [t], two [k], one [r], and one [l]) and the same word with an [o] added to the beginning. (I must be clear that by ‘same word’ I simply mean the same string of characters and I make no claim as to actual relatedness here.) So, for example, one pair was [tchey] and [otchey].

I then went through the manuscript and recorded the word which come before each instance of such words, dismissing those which had no word before them and those at the beginning of a line (we know that word statistics are different here). I noted the last character of the word which came before, and counted it as strong if it was [n, r, s], and also counted [d] as strong for the following words beginning [k, t].

So, for instance, [ar, otain, cheos] would always be strong, and [otey, sho, qokal] always weak. A word such as [qoked] would be strong for the [t, k] word pairs, but weak for the [l, r] pairs.

Here are the results:

Word Total Strong No Strong %age
tchey 19 2 10.5
otchey 27 9 33.3
tchedy 10 2 20.0
otchedy 30 13 43.3
kchey 17 4 23.5
okchey 26 14 53.8
kchedy 20 2 10.0
okchedy 23 12 52.2
lol 35 3 8.6
olol 15 10 66.7
raiin 73 3 4.1
oraiin 32 18 56.3

The strong percentages for words not beginning [o] range from 4% to 24%, for the words beginning [o] from 33% to 67%. These are quite wide spreads, but the two ranges do not overlap. Also, in each pair the word with [o] has at least double the strong percentage of the one without [o] at the start.

Some of the words occur only a few times, which may make the statistics unreliable in places. But this is a necessary problem as the total number of any word is uncontrollable. Despite this, the pattern is consistent. Of course, running such statistics for more word pairs would provide greater evidence.

I hope that it is, however, enough for us to consider the hypothesis that  in some instances a word–start [o] is used to break up a strong–strong sequence. Many instances of words beginning [o] obviously don’t do this but that needn’t worry us. A word such as [okedy] can be a word in its own right as well as a version of [kedy].

Of course, this ask the question as to which of the two versions is original. There are two pieces of evidence, but which contradict one another. The first is that words beginning [o] are more common as labels than in the text, and it is in labels we ought find the lowest level of influence from surrounding words (obviously). The other evidence is that the first character of Voynich words contains less information than in a natural language, suggesting that it is less integral to the word.

No doubt the ideas of this post will be controversial, so thoughts are very welcome.

The Transformation Theory

I’m not a smart woman, and the distinction between hypothesis and theory often defeats me. It seem as though the dividing line is acceptance though I’m not sure exactly where that should be drawn. I often say ‘hypothesis’ when I’m writing to mean a little explanation of an observation, something which only matters for a small part of the text. Now I would like to use the word ‘theory’ to mean a big explanation. I want to put forward an explanation for the whole text. Not a solution, as I can’t read a single word, but an explanation of why Voynich text is like it is.

My theory brings together a lot of the things I’ve been researching over the last months. I have certainly hinted at it here and there and many of the details will already be known. I must admit that I’ve been using this theory to guide my views of the text for some time yet I’ve selfishly kept it to myself.

The Transformation Theory is that the words of the Voynich text have undergone transformations which altered their shape and the characteristics of the text, resulting in the ‘transformed text’ of the manuscript which is somewhat different from the ‘normal text’ as mentally composed by the author. This transformation was not a deliberate ploy to obscure or deceive, but part of a linguistic process as the composition moved from individual words to integrated text.

In the Voynich text, words which might have an ‘ideal’ spelling in isolation are altered in relation to their neighbouring words and their place in a line. This results in a word having multiple different spellings, but which are regular and predictable in a known environment. The cause of this is the interaction of different sounds across word boundaries and prosodic (broadly, speaking patterns) effects in a sentence.

This kind of transformation can be seen in multiple languages. French has liaison, where certain words gain a consonant when followed by a vowel: the ‘s’ in mes is ideally silent, but pronounced when follow by a vowel, such as in mes amis. Welsh has mutation, when the first sound of a word changes according to the last sound of the word before: diod is pronounced with an initial /d/ sound in isolation, but in the phrase fy niod it has an /n/ sound. Greek had movable nu where an /n/ sound was inserted to prevent a word ending in a vowel being adjacent to one beginning with a vowel. Sanskrit has a whole heap of such rules which affect words throughout the text—just as in the Voynich manuscript.

Evidence of transformation can be seen in several places. The most important, though little studied, is in first–last combinations: pairs of characters at the end and beginning of adjacent words have preferences. Transformation Theory states that one (or even both) of these characters may have been altered from the normal text according to a rule governing their interaction. So, for example, if a word ends [r] and the next word begins [k] in the normal, a character such as [o] or [y] may be inserted to give the transformed text. Similar processes could affect a large portion of the text.

A more limited process in scope, but one where the evidence of transformation is much stronger, is line start transformations. The characters at the start of lines have different statistics to those elsewhere in a line. The word beginnings [sa, so, ych, ysh, dch, dsh] are much more common here. Even though these transformations have not been fully explored, it seems likely that words beginning [sa] is the result of [s] being added to the beginning of words starting [a]. The line end, with its preference for words ending [m], is similar though with an even narrower scope.

I would also like to add that words ending with [iin] potentially fit into the Transformation Theory. When we look at words containing [i] we can see some specific patterns: they occur mainly in one and two syllable words and in words which have fewer tokens. Given that [a] is a variant of [y] before certain characters, we have pairs, such as [dy] and [daiin], which effectively differ by the presence of an [i] sequence. There’s some evidence (I’ve never presented it, however) to suggest that words ending [y] and [aiin] have different distributions within the line. If so, such variant pairs may have [y] as their normal spelling, with [aiin] the result of a transformation to show some linguistic property (most likely prosodic).

Transformation Theory has the potential to explain multiple aspects of the Voynich text which are so far unexplained. The first and most important is the lack of word order. If words are subject to transformation from an underlying normal text, then they might be expressed in different ways in depending of their environment. A phrase such as [oty qokeedy] might be the same as [taiin okeedy] in a different environment. Or a phrase like [okedy ar qotaiin], if it breaks over a line, could become [okedy sar qotaiin]. (Note that these are examples for exposition, and not necessarily true.)

The theory explains labelese—the different word statistics associated with label words—by putting such words effectively outside the transformation the text undergoes. If transformation works by altering words according to their environment then labels, which have no environment because they are usually isolated, should not be transformed. Labels are then the normal text. Of course, some labelese words are found in the main text, but there is no reason why every word in the transformed text must be altered, if the environment does not cause it.

Currier A and B, the different languages or dialects present in different parts of the manuscript, are partly the result of different transformation rules. If the writer changed the way words interact—or rather, changed how such interaction was shown—then the text would look significantly different. Though it is doubtful that this can explain the whole origin of the Currier languages, it at least takes the difference away from a substantial change in language and puts it more in control of the author.

Lastly, it is sometimes said that the Voynich contains an overabundance of unique words, more than would be expected. (I do not know if this is true, but I take it as a suggestion.) Transformation Theory would deal with such a problem very easily. If every word has the potential to alter in a number of ways depending on its environment, and the number of possible environments was large, then the outcome would be lots of unique words. There must be some limit on the number of variants each word could have, but it would not have to be great.

If the Transformation Theory is accepted then the path for future research is immediately obvious. We should interrogate the text to discover all the transformations which are present and their scope and scale. Every transformation discovered and properly understood would give us the power to undo it and reveal the underlying normal text. This normal text should show greater evidence of word order and grammatical constructions, and even more and longer repeated phrases.

Thus, it is hoped that the normal text would be amenable to ordinary efforts at decipherment.

The Transformation Theory is most useful to those researchers who favour or study the possibility of a linguistic solution to the Voynich text. It answers some common objections and gives a clear way forward. It is certainly, as I said above, the way in which I have viewed the text for some months now. Yet if any researcher can take something from the theory and build on it, that is to be welcomed.

The Distribution of [k] and [t]

The characters [k] and [t] are the two most common ‘gallows’ characters in the script. They occur about 10800 and 6900 times respectively, and are both spread widely throughout the manuscript. They act in similar (though not identical) ways within words, and it is typically possible to replace one with the other to result in a valid word.

It is reasonable to work with the assumption that [k] and [t], which look alike and work alike, have connected sound values. We should, however, always keep in mind that this may not be true. It is also quite normal for similar sounds in natural languages to occur at very different frequencies. For example, in English the phoneme /t/ is about 50% more common than /d/, and /p/ is over twice as common as /g/, yet all are stop consonants. We should therefore also assume that the difference in the number of [k] and [t] is no problem for a linguistic solution.

What I would like to do with this post is make the difference a problem. Although it sounds counter–intuitive, I want to start from the position or belief that [k] and [t] should occur equally throughout the text and then question why this is not so. It doesn’t matter if this position is wrong, only if it gives us some insight into these characters.

The most obvious place to look for answers is in the distinction between Currier A and Currier B languages, and it isn’t hard to find them. The table below shows the percentage split between [k] and [t] in the different parts of the manuscript and in the whole:

[k] [t]
Currier A 55 45
Currier B 65 35
Whole Text 62 38

It’s clear that the different between [k] and [t] in frequency has more to do with Currier B than A, though even in the latter [k] is still more common. But it tells us nothing about the exact environments in which [k] is more common, and that requires a much more thorough search.

I went through a list of environments where gallows characters occur and in each case compared the frequency of [k] and [t] in those environments. In a wide range of environments the distribution of [k] and [t] was within a 55/45 percentage split. These include: at the start of a word; inside a word at the start of a line; second position in a word beginning [o] or [y]; at the start of a word and immediately followed by [e]; and before [o]. I doubt this list is exhaustive.

Other environments saw the frequency edge toward what is found in the manuscript as a whole. Through trying numerous variations on these I found a small number of environments where one gallows was heavily favoured over another. There are three that I want to note.

1. After [l]: this environment is already well known to favour [k]. The combination [lk] is ten times more common than [lt] with a 91/9 split. However, it occurs no more than 1200 combined, and so cannot be the source of the preponderance of [k].

2. Word and Line Start: when a gallows occurs at the beginning of a word at the start of a line it is eight times more likely to be [t] with a 89/11 split. Note that this excludes the first words of paragraphs and only counts the second and following lines. The numbers for this environment are very small, however, at 252 occurrences combined.

3. In the string [qo*e]: it amazed me to discover how specific this environment is. The string [qoke] is more common than the string [qote] by a 78/22 split. It would seem that both parts of this environment are needed: gallows in words beginning [o*e] have a 53/47 split, words beginning [qo*ch] have a 56/44 split, and before [e] still only 66/34.

Now, here’s the interesting bit: environments 1 and 3 are much more common in Currier B than Currier A. They are similarly biased in their [k/t] split in both languages, but because they occur more often in Currier B, that language shows a greater occurrence of [k]. We don’t, however, know why they occur more in one language than the other.

Yet environment 2 has its own interest: why should [k] at line start be so much less common than [t]? It is another line pattern like so many we have seen before, such as [so, sa, dch, dsh, ych, ysh]. In those cases we could guess that the original words has had a letter added to the beginning. While adding an initial character does tend to explain many Grove words, the environment 2 explicitly excludes the first words in paragraphs.

We could propose that words beginning [k] have a character added preventing the gallows from being initial, yet neither words beginning [ok], [qok], or [yk] are significantly more common than their ratio throughout the text. We could also propose that words beginning [k] are moved away from the line start, but as with Grove words such an explanation is unbelievable.

Another, more radical, option is that [k] can transform into [t] when it occurs word initially at the line start. Such a suggestion obviously needs a lot more evidence to back it up, and raises some startling questions about the relationships of the gallows characters. But there’s a neat link between this speculative answer for environment 2 and the problem of environments 1 and 3.

If proposes that some kind of transformation between [k] and [t] is possible, and we see that [k] is much more common in environments such as [lk] and [qoke], might these, then, be conditioning environments for such a transformation in the reverse direction? The numbers seem to work.

After [l] there are 981 [k] and 88 [t]. Over 95% of these occur in Currier B. In the string [qo*e] there are 1,387 [k] and 354 [t]. Nearly 90% of these occur in Currier B. Together, then, we can say that these two account for maybe 2,000 occurrences of [k] which might not otherwise happen. Thus if 2,000 occurrences of [k] began as [k] and [t] in the ratio of 55/44, there was originally 1,100 [k] and 900 [t].

If we subtract 900 [k] in Currier B and add 900 [t] we end up with 6,556 [k] and 4,871 [t]. This is a ratio of 57/43, which is pretty near to the ratio for Currier A. Given that we may have missed some less obvious environments where [k] outnumbers [t], we can guess the ratios for [k] and [t] in Currier A and B can be brought into alignment.

We can also state an hypothesis: the changes in the relative frequency of [k] and [t] between Currier A and B are linked to word environments which are mainly found in Currier B. The actual, underlying distribution of [k] and [t] is therefore consistent throughout the text.

(I apologize for the general roughness of this post. I think there is a lot more to be said on this than I can explain.)

(Some parts of this post were developed after discussion with Marco Ponzi, whom I should thank for his insights and thoughtfulness on all things Voynich. Though I make no claim as to whether he thinks my hypothesis is brilliant or bonkers.)

Weak Strings and [l, r]

Following a discussion with René Zandbergen on the similarity between [l] and [r], I thought it best to share some statistics I’ve had hanging around for a while. I made them when looking into another hypothesis, which I really should share as well some day. The statistics relate to words which are ‘weak strings’ followed by [l, r].

(Weak strings are my name for the very common combination of a bench [ch, sh], followed by zero to two [e], and ending with [o, y/a].)

The statistics show the very possible combinations of weak strings, with an additional ending of [r] or [l], and the number of tokens. Here are the two tables:

chor 220 shor 97 chol 397 shol 187
cheor 100 sheor 51 cheol 173 sheol 114
cheeor 14 sheeor 9 cheeol 9 sheeol 14
char 72 shar 34 chal 48 shal 15
chear 51 shear 21 cheal 30 sheal 21
cheear 1 sheear 2 cheeal 2 sheeal 1

The statistics are interesting because they show that the two sets of words share a common pattern. There are three possible factors by which to alter the weak string: replace [ch] with [sh], switch between the number of [e] from none to two, and change [o] for [y] (here [y] is expressed as [a], as is to be expected).

Changes in each one of these factors produces the same kind of frequency change, regardless of whether the word ends [r] or [l]. So, in all cases the [o] form of the word is more common than the [y] form. Likewise, in all but two cases the [ch] form is more common than the [sh] form (both exceptions are minor). Lastly, increasing the number of [e] makes the word less common, with one exception.

The similarity in frequency patterns suggest some underlying cause. One possibility is that [r, l] are suffixes to a shared set of words which already has that pattern. I investigated this hypothesis (in fact, it was the reason I made the statistics in the first place), but the link appears to be weak.

Here are the frequencies for the same twelve words without any suffix:

cho 69 sho 130
cheo 65 sheo 47
cheeo 17 sheeo 8
chy 155 shy 104
chey 344 shey 283
cheey 174 sheey 144

The differences should be clear: the [y] forms are more common than the [o] forms; one [sh] form is significantly more common than the [ch] form, and increasing the number of [e] does not make the word less common.

We must therefore find another explanation for the similarity of the frequency patterns in [r] and [l]. It may be that [r] and [l] have some underlying link, such as sound, which means they behave in similar ways. They do, generally, have a similar distribution in words, which reinforces this suggestion.

One final observation is that words with [o] are more common ending [l] than [r], and the opposite is true for words with [y]. This can be seen in the statistics above, but also in the text as a whole, as shown in the table below:

All With [o] % With [y] %
Ending [l] 5885 3590 61 2061 35
Ending [r] 5595 2256 40 2646 47

(Percentages do not sum to 100%, because some instances of word final [r, l] may be preceded by other characters, particularly [i] in the case of [r].)

If the change between [y, o] can cause a shift between [r, l], then maybe they are closely linked? All thoughts are welcome.