Some Observations on Bench Gallows (2)

Further to my last post, I wanted to add more observations about bench gallows and they way they work. I also have an embryonic hypothesis to share at the end.

Although the position and function of bench gallows within word is very similar to other gallows, the relative proportions are often very different. So while a string such as [qoke] is common, occurring nearly 1,400 times, the string [qockh] occurs 70 times, or only one twentieth. Given that [k] is about ten times more common that [ckh], then [qockh] is half as common as we might expect.

We should thus not expect that what is true for gallows is always true for bench gallows. Investigation into the differences could be insightful, and the ‘neighbourhood’ characters of bench gallows will be the theme of this post.

What Comes Before

I used voynichese.com to make some basic stats for the bench gallows [ckh, cth] and their gallows counterparts [k, t]. In each case different environments were put before the gallows and the counts converted into percentages for the total occurrences of these characters.

[k] [t] [ckh] [cth]
Word Start 13 17 22 53
Start [oG] 28 42 4 5
Start [qoG] 35 19 8 3
Start [yG] 7 9 1 0
Start [WG] 15 12 57 31

(In the above table, G stands for any kind of gallows, and W for a weak string.)

Some differences are obvious, others less so. We can see straight away that most words with a bench gallows either begin with that character or with a weak string. For both bench gallows about 80% of all occurrences are from these two environments, which account for just under 30% of [k, t]. However, [ckh] and [cth] themselves act differently in these two positions, which suggests something deeper is occurring.

The bench gallows are both less common after a word–initial [o] or [qo], though with enough occurrences to show that such a string is valid. There’s also a difference between [ckh] and [cth] which might be mirroring in that between [k] and [t].

Most interestingly is the almost complete lack of bench gallows after a word–initial [y]. The actual counts for both are in single figures. We might have expected somewhere between 50 and 100 for each were they as common as for [k, t].

What Comes After

Here is the same table, with percentages, for different environments occurring after the gallows.

[k] [t] [ckh] [cth]
Before [o] 9 14 11 24
Before [y] 8 8 50 39
Before [a] 30 27 4 8
Before [ch, sh] 13 20 0 0
Before [e] 38 30 27 21

Again, there are some stark differences. Firstly, as is well known, benches simply don’t come after bench gallows. The percentages are zero, and the actual counts are three each.

More interesting is the crossover between [y] to [a]. Gallows are followed by a moderate amount of [y] but plenty of [a], whereas bench gallows are followed by lots of [y] and only a moderate to low amount of [a]. The cause must be due to a lack of words ending [l, r, m, n]. Though I need to do more research on this point, it would seem that the string [iin] is very uncommon in bench gallows words.

Gallows before [o] is ambiguous with no clear pattern. It seems as though [k, ckh] and [t, cth] is a more natural split, so something else is happening and may be unrelated to the nature of bench gallows. The same may be true for gallows before [e], though splitting the figures down for [e] and [ee] shows an interesting pattern I’ll save for another post.

Thoughts

There is good preliminary evidence that bench gallows don’t have the same environments as plain gallows. There are several environments which are so utterly different in frequency.

The question is whether such environments are caused by bench gallows or are the cause of bench gallows. It is not easy to answer, though I think the latter is more likely.

Further investigation into the more interesting environmental differences may yield further clues concerning bench gallows. Certainly looking at weak strings before bench gallows, and the lack of [a] after them, seem the best bets.

I think there’s a potential hypothesis which could explain much of what we’re seeing, though it’s not without its flaws. It could be that a bench gallows is where a plain gallows has ‘captured’ adjacent characters in some way.

If the [h] of [ckh] were a ‘captured’ [e], that would explain 1) the lack of [ch, sh] following as [kech, kesh] are relatively rare, and 2) the lack of [a] as [eai] is also rare.

Quite why this might happen is a whole other question.

Some Observations on Bench Gallows (1)

Bench gallows are the characters transcribed in EVA as [ckh, cth, cfh, cph]. They derive their name from their appearance: they look like a gallows character [k, t, f, p] with a bench [ch] draw through it.

Within the text bench gallows characters tend to act most similar to their gallows counterparts, occurring in the same ‘slot’ in the word structure. They are only 10% to 20% as common as their counterparts, however, and are more common in Currier A than B. A few pages have no occurrences of bench gallows, and many have substantial portions of text without one.

What are they?

One of the key questions about bench gallows is their nature. Although they look as though they are two characters combined, is that the case? We know that some characters share the same strokes. The character [i] is a single stroke also found in [l, r, n, m], and [e] is found in [o, y, d, s, ch, sh], while [a] has them both. It may be that the graphical similarity is superficial with no deeper link.

However, we can be fairly sure that the gallows part of the character is actually related to gallows. Firstly, as already mentioned, when we look at the word structure of Voynich words, we can see that bench gallows work in a similar way to gallows. Next, the characters [cfh, cph] often appear in the first lines of paragraphs just like [f, p]. Last, the four different bench gallows mirrors exactly the four different gallows: not only strokes have been borrowed but a whole set of characters.

The bench part of the character is more uncertain. The strokes are simpler and there’s only a single character which has been borrowed, [ch], even though there are two benches, [ch, sh]. There’s also the fact that a bench gallows doesn’t act like a bench in the text. Yet there’s some evidence that this part of the character really is a bench.

Gallows characters are often followed by benches. Anywhere from 15% to 50% of gallows are followed by [ch, sh]. Yet for bench gallows that figure is effectively 0%. There are only 3 [ckhch] and [cthch], and 1 [cphch]. None of the other possible combinations exist, and it might as well be considered an invalid combination.

This exclusion of bench characters following a bench gallows is a positive sign that there is some relationship between the two. It should be noted, however, that the occurrence of benches before bench gallows, in strings such as [chckh], is perfectly valid. Indeed, it seems to be more common than for regular gallows, which will be discussed in another post.

Bench Gallows Variants

Although most bench gallows are of a simple type, several variants exist. One common variation is the extended bench gallows with an extra [e] stroke joined to the crossbar, transcribed as an additional [h]. About 30 [ckhh] are recorded in the transcription, along with 23 [cthh], 7 [cfhh], and 13 [cphh].

However, visual inspection of these reveal a large number of ambiguous readings. Although some are definitely correct, others are not and many are hard to judge. It is not easy to know whether a string should be [ckhh] or [ckhe], for example.

Despite this, they remain an interesting insight into the character, and question whether the bench part is a bench at all. For there are no, or almost no, examples of [cch] or [chh] in the text. Why can a bench gallows take an extra [e] stroke—even if rarely—when benches cannot? Surely we would expect to see extended benches in the same ratios as extended bench gallows?

Another common variant is the replacement of the initial [c] stroke with an [i] stroke. These variants are as common as the extended bench gallows: 33 [ikh], 26 [ith], 6 [ifh], and 8 [iph]. Once again, visual inspection reveals the possibility that at least some are misreadings or miswritings.

However, we can be sure that not only are many of these ‘i–bench gallows’ real, but that the [i] stroke really is what it looks like. The [i] stroke is the conditioning environment is the cause for the variation of [y] into [a], so we should expect that to happen before these characters. This is exactly what we see.

There are 13 [aikh], 14 [aith], 4 [aifh], and 6 [aiph]. Though the numbers are small they are significant ratios. We can also be sure that normal bench gallows don’t cause [a]: there is 1 [ackh], 2 [acth], and 2 [acph]. These are not only small numbers but tiny ratios at <1% of all bench gallows.

Moreover, like above with other variant gallows, these combinations don’t occur in benches alone. The character pair [ih] occurs only thrice, and [ci] not at all. So once again we see that bench gallows can be formed with something which is distinctly not a bench.

Thoughts

I am unsure of what this all means. I think that bench gallows are not simply gallows with extra strokes which are purely graphical. Those strokes are likely to be related to other characters in the script and it is thus some kind of ligature.

However, I don’t think bench gallows are strictly a ligature between a bench and a gallows. It would seem that characters are being linked together with a crossbar, like a bench, though maybe for a different reason. We can be sure that one of the characters in the ligature is a gallows, but the identity of the others is harder to understand.

I have some further thoughts about the way bench gallows are used in words, and I’ll put these in another post.

Initial [o] Transformation

In my last post I spoke about my Transformation Theory, and how I believe that the shape of words may be influenced by their surroundings. I speculated that, in light of First–Last Combinations, certain characters may be used to break up unwanted combinations. My example was that, in a phrase such as [dar kedy], [d k] are both ‘strong’ characters and a ‘weak’ character is inserted between them. One such character could be [o], which indeed comes at the beginning of many words.

In this post I would like to look further into this suggestion. We already know that certain word–end characters prefer to match up with certain word–start characters, but these statistics are very general. Though we can say that [r o] is more common than [r t], we can’t be sure that these two facts are related by a transformation. It could simply be that the phrase [or oraiin] is really common and [or tchedy] isn’t. We need to compare them to [or raiin] and [or otchedy] to get a better insight.

I took six pairs of words, one beginning with a strong character (in this case two [t], two [k], one [r], and one [l]) and the same word with an [o] added to the beginning. (I must be clear that by ‘same word’ I simply mean the same string of characters and I make no claim as to actual relatedness here.) So, for example, one pair was [tchey] and [otchey].

I then went through the manuscript and recorded the word which come before each instance of such words, dismissing those which had no word before them and those at the beginning of a line (we know that word statistics are different here). I noted the last character of the word which came before, and counted it as strong if it was [n, r, s], and also counted [d] as strong for the following words beginning [k, t].

So, for instance, [ar, otain, cheos] would always be strong, and [otey, sho, qokal] always weak. A word such as [qoked] would be strong for the [t, k] word pairs, but weak for the [l, r] pairs.

Here are the results:

Word Total Strong No Strong %age
tchey 19 2 10.5
otchey 27 9 33.3
tchedy 10 2 20.0
otchedy 30 13 43.3
kchey 17 4 23.5
okchey 26 14 53.8
kchedy 20 2 10.0
okchedy 23 12 52.2
lol 35 3 8.6
olol 15 10 66.7
raiin 73 3 4.1
oraiin 32 18 56.3

The strong percentages for words not beginning [o] range from 4% to 24%, for the words beginning [o] from 33% to 67%. These are quite wide spreads, but the two ranges do not overlap. Also, in each pair the word with [o] has at least double the strong percentage of the one without [o] at the start.

Some of the words occur only a few times, which may make the statistics unreliable in places. But this is a necessary problem as the total number of any word is uncontrollable. Despite this, the pattern is consistent. Of course, running such statistics for more word pairs would provide greater evidence.

I hope that it is, however, enough for us to consider the hypothesis that  in some instances a word–start [o] is used to break up a strong–strong sequence. Many instances of words beginning [o] obviously don’t do this but that needn’t worry us. A word such as [okedy] can be a word in its own right as well as a version of [kedy].

Of course, this ask the question as to which of the two versions is original. There are two pieces of evidence, but which contradict one another. The first is that words beginning [o] are more common as labels than in the text, and it is in labels we ought find the lowest level of influence from surrounding words (obviously). The other evidence is that the first character of Voynich words contains less information than in a natural language, suggesting that it is less integral to the word.

No doubt the ideas of this post will be controversial, so thoughts are very welcome.

The Transformation Theory

I’m not a smart woman, and the distinction between hypothesis and theory often defeats me. It seem as though the dividing line is acceptance though I’m not sure exactly where that should be drawn. I often say ‘hypothesis’ when I’m writing to mean a little explanation of an observation, something which only matters for a small part of the text. Now I would like to use the word ‘theory’ to mean a big explanation. I want to put forward an explanation for the whole text. Not a solution, as I can’t read a single word, but an explanation of why Voynich text is like it is.

My theory brings together a lot of the things I’ve been researching over the last months. I have certainly hinted at it here and there and many of the details will already be known. I must admit that I’ve been using this theory to guide my views of the text for some time yet I’ve selfishly kept it to myself.

The Transformation Theory is that the words of the Voynich text have undergone transformations which altered their shape and the characteristics of the text, resulting in the ‘transformed text’ of the manuscript which is somewhat different from the ‘normal text’ as mentally composed by the author. This transformation was not a deliberate ploy to obscure or deceive, but part of a linguistic process as the composition moved from individual words to integrated text.

In the Voynich text, words which might have an ‘ideal’ spelling in isolation are altered in relation to their neighbouring words and their place in a line. This results in a word having multiple different spellings, but which are regular and predictable in a known environment. The cause of this is the interaction of different sounds across word boundaries and prosodic (broadly, speaking patterns) effects in a sentence.

This kind of transformation can be seen in multiple languages. French has liaison, where certain words gain a consonant when followed by a vowel: the ‘s’ in mes is ideally silent, but pronounced when follow by a vowel, such as in mes amis. Welsh has mutation, when the first sound of a word changes according to the last sound of the word before: diod is pronounced with an initial /d/ sound in isolation, but in the phrase fy niod it has an /n/ sound. Greek had movable nu where an /n/ sound was inserted to prevent a word ending in a vowel being adjacent to one beginning with a vowel. Sanskrit has a whole heap of such rules which affect words throughout the text—just as in the Voynich manuscript.

Evidence of transformation can be seen in several places. The most important, though little studied, is in first–last combinations: pairs of characters at the end and beginning of adjacent words have preferences. Transformation Theory states that one (or even both) of these characters may have been altered from the normal text according to a rule governing their interaction. So, for example, if a word ends [r] and the next word begins [k] in the normal, a character such as [o] or [y] may be inserted to give the transformed text. Similar processes could affect a large portion of the text.

A more limited process in scope, but one where the evidence of transformation is much stronger, is line start transformations. The characters at the start of lines have different statistics to those elsewhere in a line. The word beginnings [sa, so, ych, ysh, dch, dsh] are much more common here. Even though these transformations have not been fully explored, it seems likely that words beginning [sa] is the result of [s] being added to the beginning of words starting [a]. The line end, with its preference for words ending [m], is similar though with an even narrower scope.

I would also like to add that words ending with [iin] potentially fit into the Transformation Theory. When we look at words containing [i] we can see some specific patterns: they occur mainly in one and two syllable words and in words which have fewer tokens. Given that [a] is a variant of [y] before certain characters, we have pairs, such as [dy] and [daiin], which effectively differ by the presence of an [i] sequence. There’s some evidence (I’ve never presented it, however) to suggest that words ending [y] and [aiin] have different distributions within the line. If so, such variant pairs may have [y] as their normal spelling, with [aiin] the result of a transformation to show some linguistic property (most likely prosodic).

Transformation Theory has the potential to explain multiple aspects of the Voynich text which are so far unexplained. The first and most important is the lack of word order. If words are subject to transformation from an underlying normal text, then they might be expressed in different ways in depending of their environment. A phrase such as [oty qokeedy] might be the same as [taiin okeedy] in a different environment. Or a phrase like [okedy ar qotaiin], if it breaks over a line, could become [okedy sar qotaiin]. (Note that these are examples for exposition, and not necessarily true.)

The theory explains labelese—the different word statistics associated with label words—by putting such words effectively outside the transformation the text undergoes. If transformation works by altering words according to their environment then labels, which have no environment because they are usually isolated, should not be transformed. Labels are then the normal text. Of course, some labelese words are found in the main text, but there is no reason why every word in the transformed text must be altered, if the environment does not cause it.

Currier A and B, the different languages or dialects present in different parts of the manuscript, are partly the result of different transformation rules. If the writer changed the way words interact—or rather, changed how such interaction was shown—then the text would look significantly different. Though it is doubtful that this can explain the whole origin of the Currier languages, it at least takes the difference away from a substantial change in language and puts it more in control of the author.

Lastly, it is sometimes said that the Voynich contains an overabundance of unique words, more than would be expected. (I do not know if this is true, but I take it as a suggestion.) Transformation Theory would deal with such a problem very easily. If every word has the potential to alter in a number of ways depending on its environment, and the number of possible environments was large, then the outcome would be lots of unique words. There must be some limit on the number of variants each word could have, but it would not have to be great.

If the Transformation Theory is accepted then the path for future research is immediately obvious. We should interrogate the text to discover all the transformations which are present and their scope and scale. Every transformation discovered and properly understood would give us the power to undo it and reveal the underlying normal text. This normal text should show greater evidence of word order and grammatical constructions, and even more and longer repeated phrases.

Thus, it is hoped that the normal text would be amenable to ordinary efforts at decipherment.

The Transformation Theory is most useful to those researchers who favour or study the possibility of a linguistic solution to the Voynich text. It answers some common objections and gives a clear way forward. It is certainly, as I said above, the way in which I have viewed the text for some months now. Yet if any researcher can take something from the theory and build on it, that is to be welcomed.

The Distribution of [k] and [t]

The characters [k] and [t] are the two most common ‘gallows’ characters in the script. They occur about 10800 and 6900 times respectively, and are both spread widely throughout the manuscript. They act in similar (though not identical) ways within words, and it is typically possible to replace one with the other to result in a valid word.

It is reasonable to work with the assumption that [k] and [t], which look alike and work alike, have connected sound values. We should, however, always keep in mind that this may not be true. It is also quite normal for similar sounds in natural languages to occur at very different frequencies. For example, in English the phoneme /t/ is about 50% more common than /d/, and /p/ is over twice as common as /g/, yet all are stop consonants. We should therefore also assume that the difference in the number of [k] and [t] is no problem for a linguistic solution.

What I would like to do with this post is make the difference a problem. Although it sounds counter–intuitive, I want to start from the position or belief that [k] and [t] should occur equally throughout the text and then question why this is not so. It doesn’t matter if this position is wrong, only if it gives us some insight into these characters.

The most obvious place to look for answers is in the distinction between Currier A and Currier B languages, and it isn’t hard to find them. The table below shows the percentage split between [k] and [t] in the different parts of the manuscript and in the whole:

[k] [t]
Currier A 55 45
Currier B 65 35
Whole Text 62 38

It’s clear that the different between [k] and [t] in frequency has more to do with Currier B than A, though even in the latter [k] is still more common. But it tells us nothing about the exact environments in which [k] is more common, and that requires a much more thorough search.

I went through a list of environments where gallows characters occur and in each case compared the frequency of [k] and [t] in those environments. In a wide range of environments the distribution of [k] and [t] was within a 55/45 percentage split. These include: at the start of a word; inside a word at the start of a line; second position in a word beginning [o] or [y]; at the start of a word and immediately followed by [e]; and before [o]. I doubt this list is exhaustive.

Other environments saw the frequency edge toward was is found in the manuscript as a whole. Through trying numerous variations on these I found a small number of environments where one gallows was heavily favoured over another. There are three that I want to note.

1. After [l]: this environment is already well known to favour [k]. The combination [lk] is ten times more common than [lt] with a 91/9 split. However, it occurs no more than 1200 combined, and so cannot be the source of the preponderance of [k].

2. Word and Line Start: when a gallows occurs at the beginning of a word at the start of a line it is eight times more likely to be [t] with a 89/11 split. Note that this excludes the first words of paragraphs and only counts the second and following lines. The numbers for this environment are very small, however, at 252 occurrences combined.

3. In the string [qo*e]: it amazed me to discover how specific this environment is. The string [qoke] is more common than the string [qote] by a 78/22 split. It would seem that both parts of this environment are needed: gallows in words beginning [o*e] have a 53/47 split, words beginning [qo*ch] have a 56/44 split, and before [e] still only 66/34.

Now, here’s the interesting bit: environments 1 and 3 are much more common in Currier B than Currier A. They are similarly biased in their [k/t] split in both languages, but because they occur more often in Currier B, that language shows a greater occurrence of [k]. We don’t, however, know why they occur more in one language than the other.

Yet environment 2 has its own interest: why should [k] at line start be so much less common than [t]? It is another line pattern like so many we have seen before, such as [so, sa, dch, dsh, ych, ysh]. In those cases we could guess that the original words has had a letter added to the beginning. While adding an initial character does tend to explain many Grove words, the environment 2 explicitly excludes the first words in paragraphs.

We could propose that words beginning [k] have a character added preventing the gallows from being initial, yet neither words beginning [ok], [qok], or [yk] are significantly more common than their ratio throughout the text. We could also propose that words beginning [k] are moved away from the line start, but as with Grove words such an explanation is unbelievable.

Another, more radical, option is that [k] can transform into [t] when it occurs word initially at the line start. Such a suggestion obviously needs a lot more evidence to back it up, and raises some startling questions about the relationships of the gallows characters. But there’s a neat link between this speculative answer for environment 2 and the problem of environments 1 and 3.

If proposes that some kind of transformation between [k] and [t] is possible, and we see that [k] is much more common in environments such as [lk] and [qoke], might these, then, be conditioning environments for such a transformation in the reverse direction? The numbers seem to work.

After [l] there are 981 [k] and 88 [t]. Over 95% of these occur in Currier B. In the string [qo*e] there are 1,387 [k] and 354 [t]. Nearly 90% of these occur in Currier B. Together, then, we can say that these two account for maybe 2,000 occurrences of [k] which might not otherwise happen. Thus if 2,000 occurrences of [k] began as [k] and [t] in the ratio of 55/44, there was originally 1,100 [k] and 900 [t].

If we subtract 900 [k] in Currier B and add 900 [t] we end up with 6,556 [k] and 4,871 [t]. This is a ratio of 57/43, which is pretty near to the ratio for Currier A. Given that we may have missed some less obvious environments where [k] outnumbers [t], we can guess the ratios for [k] and [t] in Currier A and B can be brought into alignment.

We can also state an hypothesis: the changes in the relative frequency of [k] and [t] between Currier A and B are linked to word environments which are mainly found in Currier B. The actual, underlying distribution of [k] and [t] is therefore consistent throughout the text.

(I apologize for the general roughness of this post. I think there is a lot more to be said on this than I can explain.)

(Some parts of this post were developed after discussion with Marco Ponzi, whom I should thank for his insights and thoughtfulness on all things Voynich. Though I make no claim as to whether he thinks my hypothesis is brilliant or bonkers.)

Weak Strings and [l, r]

Following a discussion with René Zandbergen on the similarity between [l] and [r], I thought it best to share some statistics I’ve had hanging around for a while. I made them when looking into another hypothesis, which I really should share as well some day. The statistics relate to words which are ‘weak strings’ followed by [l, r].

(Weak strings are my name for the very common combination of a bench [ch, sh], followed by zero to two [e], and ending with [o, y/a].)

The statistics show the very possible combinations of weak strings, with an additional ending of [r] or [l], and the number of tokens. Here are the two tables:

chor 220 shor 97 chol 397 shol 187
cheor 100 sheor 51 cheol 173 sheol 114
cheeor 14 sheeor 9 cheeol 9 sheeol 14
char 72 shar 34 chal 48 shal 15
chear 51 shear 21 cheal 30 sheal 21
cheear 1 sheear 2 cheeal 2 sheeal 1

The statistics are interesting because they show that the two sets of words share a common pattern. There are three possible factors by which to alter the weak string: replace [ch] with [sh], switch between the number of [e] from none to two, and change [o] for [y] (here [y] is expressed as [a], as is to be expected).

Changes in each one of these factors produces the same kind of frequency change, regardless of whether the word ends [r] or [l]. So, in all cases the [o] form of the word is more common than the [y] form. Likewise, in all but two cases the [ch] form is more common than the [sh] form (both exceptions are minor). Lastly, increasing the number of [e] makes the word less common, with one exception.

The similarity in frequency patterns suggest some underlying cause. One possibility is that [r, l] are suffixes to a shared set of words which already has that pattern. I investigated this hypothesis (in fact, it was the reason I made the statistics in the first place), but the link appears to be weak.

Here are the frequencies for the same twelve words without any suffix:

cho 69 sho 130
cheo 65 sheo 47
cheeo 17 sheeo 8
chy 155 shy 104
chey 344 shey 283
cheey 174 sheey 144

The differences should be clear: the [y] forms are more common than the [o] forms; one [sh] form is significantly more common than the [ch] form, and increasing the number of [e] does not make the word less common.

We must therefore find another explanation for the similarity of the frequency patterns in [r] and [l]. It may be that [r] and [l] have some underlying link, such as sound, which means they behave in similar ways. They do, generally, have a similar distribution in words, which reinforces this suggestion.

One final observation is that words with [o] are more common ending [l] than [r], and the opposite is true for words with [y]. This can be seen in the statistics above, but also in the text as a whole, as shown in the table below:

All With [o] % With [y] %
Ending [l] 5885 3590 61 2061 35
Ending [r] 5595 2256 40 2646 47

(Percentages do not sum to 100%, because some instances of word final [r, l] may be preceded by other characters, particularly [i] in the case of [r].)

If the change between [y, o] can cause a shift between [r, l], then maybe they are closely linked? All thoughts are welcome.

Observation on Double Dealers

Stolfi put the characters [d, l, r, s] together in a loose grouping he called ‘dealers’. The characters do not all act alike, nor do they look alike. They have some similarities, however, including the ability to occur both at the beginning or end of words.

In this post I want to mention briefly an observation on these dealers characters. Unlike gallows, which almost never occur next to one another, the dealers do so. And the patterns by which they do are interesting.

Below is a table for all dealers bigrams in the Voynich text:

1st \ 2nd l r d s
l 28 40 452 162
r 18 2 43 6
d 82 14 23 21
s 5 4 32 6

(The rows show the first characters in a pair and the column the second character.)

Note that the two most common bigrams [ld, ls] begin with [l], and the third most [dl] ends with an [l]. Of course, bigrams with [d] are also common, so we should not read too too much into these numbers.

However, because these bigrams may occur anywhere in the text we cannot be sure they are not split over syllable boundaries. This is an important consideration if we wish to judge which combinations are valid and which are not. Consider that the English word ‘weightlifting’ does not show that the combination /tl/ is valid: the /t/ is the end of one syllable while the /l/ is the beginning of another.

We can dodge this problem by only counting those double dealers which occur at the beginning and ends of words. In this way we can be assured that the bigram is unlikely to be split over a syllable boundary.

Below is a table for double dealers at the beginning of words:

1st \ 2nd l r d s
l 12 5 44 9
r 1 0 0 0
d 18 2 6 7
s 1 0 1 2

As we can see, many combinations simply don’t occur, and none beginning [r, s] can be considered valid. We again see that [d, l] are the characters with more frequent combinations, though the numbers overall are very, very low. Even the most common [ld], occurs just 44 times, and more than half of these are at the end of lines—a position which means they may be atypical.

The next table is for double dealers at the end of words:

1st \ 2nd l r d s
l 6 17 34 106
r 12 2 5 2
d 29 9 1 11
s 3 3 12 2

These are somewhat better results. The most common bigram [ls] occurs enough that it is seen a few times (though just a few!) in all sections of the manuscript. Even so, it is still a word which occurs markedly at the end of lines.

This last table also shows that [l] is clearly the most common combining character, far more than [d, r, s]. Again, I want to stress that these numbers are low, and the bigrams can only be marginal to the text as a whole. But when we consider that [l] also combines well with other characters [k, t, ch, sh]—at least as the first character of the pair—we can see that it is exceptional in some way.

Naturally, my thoughts turn to considering what sound might be able to combine in this way, especially in a word structure which is often quite rigid. There is one, but that will have to wait for another day.