The Distribution of [k] and [t]

The characters [k] and [t] are the two most common ‘gallows’ characters in the script. They occur about 10800 and 6900 times respectively, and are both spread widely throughout the manuscript. They act in similar (though not identical) ways within words, and it is typically possible to replace one with the other to result in a valid word.

It is reasonable to work with the assumption that [k] and [t], which look alike and work alike, have connected sound values. We should, however, always keep in mind that this may not be true. It is also quite normal for similar sounds in natural languages to occur at very different frequencies. For example, in English the phoneme /t/ is about 50% more common than /d/, and /p/ is over twice as common as /g/, yet all are stop consonants. We should therefore also assume that the difference in the number of [k] and [t] is no problem for a linguistic solution.

What I would like to do with this post is make the difference a problem. Although it sounds counter–intuitive, I want to start from the position or belief that [k] and [t] should occur equally throughout the text and then question why this is not so. It doesn’t matter if this position is wrong, only if it gives us some insight into these characters.

The most obvious place to look for answers is in the distinction between Currier A and Currier B languages, and it isn’t hard to find them. The table below shows the percentage split between [k] and [t] in the different parts of the manuscript and in the whole:

[k] [t]
Currier A 55 45
Currier B 65 35
Whole Text 62 38

It’s clear that the different between [k] and [t] in frequency has more to do with Currier B than A, though even in the latter [k] is still more common. But it tells us nothing about the exact environments in which [k] is more common, and that requires a much more thorough search.

I went through a list of environments where gallows characters occur and in each case compared the frequency of [k] and [t] in those environments. In a wide range of environments the distribution of [k] and [t] was within a 55/45 percentage split. These include: at the start of a word; inside a word at the start of a line; second position in a word beginning [o] or [y]; at the start of a word and immediately followed by [e]; and before [o]. I doubt this list is exhaustive.

Other environments saw the frequency edge toward what is found in the manuscript as a whole. Through trying numerous variations on these I found a small number of environments where one gallows was heavily favoured over another. There are three that I want to note.

1. After [l]: this environment is already well known to favour [k]. The combination [lk] is ten times more common than [lt] with a 91/9 split. However, it occurs no more than 1200 combined, and so cannot be the source of the preponderance of [k].

2. Word and Line Start: when a gallows occurs at the beginning of a word at the start of a line it is eight times more likely to be [t] with a 89/11 split. Note that this excludes the first words of paragraphs and only counts the second and following lines. The numbers for this environment are very small, however, at 252 occurrences combined.

3. In the string [qo*e]: it amazed me to discover how specific this environment is. The string [qoke] is more common than the string [qote] by a 78/22 split. It would seem that both parts of this environment are needed: gallows in words beginning [o*e] have a 53/47 split, words beginning [qo*ch] have a 56/44 split, and before [e] still only 66/34.

Now, here’s the interesting bit: environments 1 and 3 are much more common in Currier B than Currier A. They are similarly biased in their [k/t] split in both languages, but because they occur more often in Currier B, that language shows a greater occurrence of [k]. We don’t, however, know why they occur more in one language than the other.

Yet environment 2 has its own interest: why should [k] at line start be so much less common than [t]? It is another line pattern like so many we have seen before, such as [so, sa, dch, dsh, ych, ysh]. In those cases we could guess that the original words has had a letter added to the beginning. While adding an initial character does tend to explain many Grove words, the environment 2 explicitly excludes the first words in paragraphs.

We could propose that words beginning [k] have a character added preventing the gallows from being initial, yet neither words beginning [ok], [qok], or [yk] are significantly more common than their ratio throughout the text. We could also propose that words beginning [k] are moved away from the line start, but as with Grove words such an explanation is unbelievable.

Another, more radical, option is that [k] can transform into [t] when it occurs word initially at the line start. Such a suggestion obviously needs a lot more evidence to back it up, and raises some startling questions about the relationships of the gallows characters. But there’s a neat link between this speculative answer for environment 2 and the problem of environments 1 and 3.

If proposes that some kind of transformation between [k] and [t] is possible, and we see that [k] is much more common in environments such as [lk] and [qoke], might these, then, be conditioning environments for such a transformation in the reverse direction? The numbers seem to work.

After [l] there are 981 [k] and 88 [t]. Over 95% of these occur in Currier B. In the string [qo*e] there are 1,387 [k] and 354 [t]. Nearly 90% of these occur in Currier B. Together, then, we can say that these two account for maybe 2,000 occurrences of [k] which might not otherwise happen. Thus if 2,000 occurrences of [k] began as [k] and [t] in the ratio of 55/44, there was originally 1,100 [k] and 900 [t].

If we subtract 900 [k] in Currier B and add 900 [t] we end up with 6,556 [k] and 4,871 [t]. This is a ratio of 57/43, which is pretty near to the ratio for Currier A. Given that we may have missed some less obvious environments where [k] outnumbers [t], we can guess the ratios for [k] and [t] in Currier A and B can be brought into alignment.

We can also state an hypothesis: the changes in the relative frequency of [k] and [t] between Currier A and B are linked to word environments which are mainly found in Currier B. The actual, underlying distribution of [k] and [t] is therefore consistent throughout the text.

(I apologize for the general roughness of this post. I think there is a lot more to be said on this than I can explain.)

(Some parts of this post were developed after discussion with Marco Ponzi, whom I should thank for his insights and thoughtfulness on all things Voynich. Though I make no claim as to whether he thinks my hypothesis is brilliant or bonkers.)

6 thoughts on “The Distribution of [k] and [t]

      • Very useful, I’m pleased to say. This is one of the very few posts I’ve seen that suggests any practical way of reconciling Currier A and Currier B to any significant degree, so well done to you. 🙂

        Incidentally, the series of Voynich-related posts I’m in the middle of writing is all about the idea that people too readily put up artificial barriers between linguistic and cryptological attacks on Voynichese: when in fact there is a ton of work that needs to be done to understand how Voynichese works before we’ll be in a position to start trying to properly determine what kind of a thing it is, i.e. whether or not it has phonemes. 😉

        Like

        • Oh, that’s very positive then!

          I do agree that there is a lot of joint work to be done regardless of individual beliefs in the solution. I hope that some of the ideas I come up with are useful even if a researcher doesn’t agree with a linguistic solution.

          Like

  1. Hello Emma, thank you for this great post!
    I would like to suggest that there could be other qo* environments that behave similarly to what you discussed for qo*e.
    E.g. qoka | qota seem to me to occur a total of about 1300 times: 76% and 24% respectively. Also in this case, almost all occurrences are in Currier B.

    Like

Leave a comment