The Existence of [y] Deletion

After writing the article in which I proposed that [y] and [a] are equivalent in some way, I realized that the outcome posed a difficult question. The reason why I first considered that the characters [y] and [a] could be related — but not stated in that article — is that they both occur in some of the same contexts as [o]. In some places [o] can be swapped for [y], in others for [a], but nowhere either [y] or [a] equally. Each character alone matches part of the distribution of [o] but not the whole, and together they match an even greater part of that distribution. But curiously still not the whole.

If we accept that [y] and [a] are equivalent we end up with a combined character which comes near to matching [o] in the way it is used in the text, but falls short. Why is this so, and how can we explain it? This is the question which bothered me and which I seek to answer in this article.

The character [o] occurs practically in any position in a word: beginning, end, or middle. It has few restrictions on which characters it can occur before or after, though it does not occur before [g] or [n]. Yet [y/a] occurs at the beginning and end of words in either of its two forms, but only in the middle of words as [a], in positions before [l, r, m, n, i]. In other middle positions — where [o] freely occurs — [a] does not and [y] only sporadically. It is this part of the distribution of [o] which is thus unaccounted for by [y/a] and which I would like to explain.

Although it could be argued that there is no reason why [y/a] should match [o] in distribution and so there is nothing to explain, I believe it is far more productive to assume that there was regularity in whatever process made the text of the Voynich manuscript. So if we observe that [y/a] is similar to [o] in some ways we are right to question why it is not in all ways. The alternative is to assume that a fairly regular text was made by an irregular process, which is logically worse. The lack of [y/a] in these middle positions may teach us something about the Voynich text and the underlying language.

To solve the problem of the missing [y] we must have some way of alternating the occurrence of the two environments: one where [y/a, o] occur, and the other where only [o] occurs: between the middle and final or initial positions. Such an alternation will let us compare the environments in which [y/a] does and does not occur and see what differs.

The most obvious choice is to use the characters [dy] as they are a very common ending to words in the text. Many instances of [dy] occur after the character [o], and by removing those characters we can change the environment of the [o] from middle to final. When we do so, we find that — although less common than the words ending in [dy] — the resulting words ending in [o] all occur. Thus, taking a series of words and their frequencies: [okeody] 37, [okeeody] 16, [oteody] 39, and [oteeody] 11; but also [okeo] 14, [okeeo] 15, [oteo] 13, and [oteeo] 12. None of the resulting words are overly common, but taken together represent a valid pattern. The environment of a middle [o] before [dy] is similar to that of a final [o] once [dy] is removed.

But few examples of [dy] after [y/a] exist, as we expected. Taking the same series of words as above, but with [y] in the place of [o], we get the following: [okeydy] 0, [okeeydy] 0 , [oteydy] 1, and [oteeydy] 0. And, of course, the striking part comes when we remove [dy] and count the resulting words: [okey] 64, [okeey] 177, [otey] 57, and [oteey] 140. Alternating between middle and final environments for [y] gives the expected outcome, namely that [y/a] does not occur in one but does in the other. Although the examples given are only four words, the same rule goes for words ending with and without [dy] throughout the Voynich text.

So here we’ve been able to find an environment where a middle [o] occurs but where matching words with a middle [y] do not. Yet when we transform that environment to produce a final [o] we not only find valid words, but also ones which match words with a final [y]. Now we must ask ourselves what differs between the two environments that can help explain the lack of [y].

The obvious difference is that, in the text as a whole, the most common character before [dy] is not [o], but [e]. Again, using the same example words as above, we get the following counts: [okedy] 118, [okeedy] 105, [otedy] 155, and [oteedy] 100. It is tempting to think that [e] is somehow linked to [y/a], being maybe a third reflex of that character in the middle of words. But the many occurrences of [e] before [y] makes this highly unlikely as the character does not double up elsewhere: there are thousands of [ey] at the ends of words, but very few [ay, ya, yy, aa] at all.

But what happens when we transform the environment of the [e] to a final position by removing the [dy]? Well, this: [oke] 1, [okee] 0, [ote] 1, and [otee] 0. Indeed, final [e] occurs hardly anywhere in the whole text, the only exceptions being [she] and [shee]. It would seem that [e] can no more be final than [y] can be middle. As in the earlier article when we saw that [y] and [a] do not occur in the same environments, we seem to have here another case of complementary distribution and the same suggestion of a link: [e] in the middle of words without a following [o] must have some kind of relationship with [y/a].

(Before anybody thinks I have based this on four words alone, the same figures repeat for any word undergoing the same transformation. Wherever we find an [e] not followed by [y/a, o], removing the rest of the word and adding [y] always results in a valid word. Yet removing the rest of the word and leaving [e] as the final letter mostly results in invalid words. This is true whether there is a single [e] or double [ee].)

Thus we come to the conclusion that [ey] is a normal and regular sequence which occurs at the end of words, but — for some reason — when the environment is changed to the middle of a word the [y] is deleted leaving [e] alone. The only exceptions being where the environment produces an [a] instead. So just as we consider [cheo] to be the same string as occurs in the words [cheor], [cheol], [cheody], and [cheoky], so [chey] must give rise not only to [chear] and [cheal], but also [chedy] and [cheky].

With this we have found our ‘missing’ [y]: it is deleted in certain environments. This [y]–deletion, along with the equivalence of [y] and [a], makes an almost exact counterpart for [o], occurring in all the same environments. Although seemingly complex with three different expressions, the rules governing the character [y] are very regular:

1. [y] at the end of words, and at the beginning if not in the context for [a].

2. [a] before [l, r, m, n, i].

3. [Ø] when after [e] and before any character not giving rise to [a].

If true, the implications are significant to the structure of words. The knowledge that [o] and [y/a/Ø] finally have the same distribution lets us put them together in the same class of character: they are not the same nor have the same value, but occur in the same positions. Further, most words in the Voynich text contain at least one instance of [o] or [y/a/Ø], the majority which do not being short, often single characters (indeed, it seems that the longer a word is the more occurrences of [o, y/a/Ø] it will have). These two characters are thus essential for typical words, and give us a potential opening into a new way of understanding the structure of Voynich words.

I hope in a future article to explore word structure using this class of characters as a starting point.


2 thoughts on “The Existence of [y] Deletion

  1. Emma: rather than concluding that word terminal ‘y’ can extremely often be replaced by a valid word end block, isn’t it just as straightforward to conclude that any valid word end block can be replaced by ‘y’? That is, that word-terminal ‘y’ seems to function as a shorthand token marking the truncation of a longer word? (‘Truncatio’).

    Word-initial ‘y’ seems to have a quite different function, in particular when it pairs with gallows characters. But that’s another story entirely.


  2. Hi Nick, the problem is with the persistence of [o]. Although your solution may, in truth, be right, it does not answer the question of why [y/a] and [o] are similar in distribution but not in the middle of words. While we could simply say that their similarity is an illusion, or that they are similar but not wholly so, the problem of the article is an attempt to make them as similar as can be. I begin with the assumption of regularity and follow that through to some kind of theory about how the script (and/or language) works.

    It may not be right but it does give us a workable theory which can be taken forward for further research.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s