The Equivalence of [a] and [y]

The Voynich script constitutes just one part of the overall puzzle which surrounds the text of the Voynich manuscript, along with the contents of the text and the way in which it is encoded. There is no clear relationship between the Voynich script and any other known writing system. Despite the similarity of some characters to letters in other scripts there is no accepted proposal linking the Voynich script with any other. All existing knowledge about the script therefore comes from evidence internal to the manuscript itself.

The script consists of an unknown number of characters, the total depending on how they are counted. Typical counts range in the twenties, but a higher total is possible if rare characters are included, and a lower one if combinations and modifications are not counted as characters in themselves. For example, the characters [cfh, ckh, cph, cth] appear to be combinations of the characters [f, k, p, t] with the character [ch]. Whether these characters should be counted separately or only their constituents is unknown, and thus the total number of characters in the Voynich script is variable.

However, given the general size of Voynich script it has been considered—at least by those who favour a linguistic solution—as being most likely an alphabet. That is to say, each character represents a single sound (or maybe combination of sounds) in the underlying language and the structure of characters within a word in the text represents the structure of sounds within a spoken word of the language. Each error in our understanding of the script is not only a single incorrect sound but a flaw in our understanding of how sounds are structured and relate in the language. There is a great deal unproven in the theory that the Voynich script is an alphabet, but it is enough here to show that the status of different characters has significant implications to any analysis and solution. The less we understand the characters which make up the Voynich script then the worse our understanding of the sounds and structure of words in the text, and of the language itself.

To this end I wish to provide evidence and an argument that two of the commonly distinguished characters, namely [a] and [y], are in fact related and may simply be graphical variants with the same value. Although I have not seen a case for this equivalence argued elsewhere, I am willing to accept that I may not be the first to propose or argue the fact.

Complementary Distribution

The characters [a] and [y] are both common in the Voynich text, each occurring thousands of times and in all parts of the manuscript. They are part of the core script. However, they occur in clearly different environments: they take different positions within words and occur next to different characters. Their distribution within words is particularly easy to notice by casual observation, but simple tabulation of that distribution is still revealing.

I generated a list of the most common words in the Voynich text using a widely available transliteration. Each word on the list occurred at least 10 times within the text, and so cannot be considered a writing or reading mistake, nor unusual in the underlying language. The list contained 508 entries in all. Using the list I generated the table below showing whether [a] or [y] occurred as the first, last, or in the middle characters of common words. Thus the table gives the typical (though not exhaustive) distribution of the characters within the text.













From the table above it is clear that while both [a] and [y] occur at the beginning of words in the list, only [a] occurs in the middle and only [y] at the end. Even among less common words this distribution holds well, with only around 3% of instances of [y] occurring as a middle character, and <1% of [a] occurring at the end of a word. The distributions of [a] and [y] only seem to overlap at the beginning of words and not typically elsewhere.

Yet if we take a closer look at the beginning of words we can see that the overlap between [a] and [y] is apparent and not real. The characters following [a] and [y] in the text fall into two separate groups, with no characters in both groups. Below is a table showing what characters come after [a] or [y] as the second character in a word.


After <a>

After <y>































As we can see by this table, even at the beginning of words [a] and [y] do not occur in the same environments. Each one only occurs before certain characters and not before others. Add this to the already established lack of overlap at the middle and end of words, and we can see that the distribution of [a] and [y] do not overlap at all in typical words.

This kind of distribution between two elements is known as complementary distribution in linguistics. It is usually taken to indicate that the two elements—here characters—have some relationship to each other. The two characters may have the same or similar value with their occurrence in the text conditioned by the position in the word or by nearby characters. In English we can think of the example of “a” and “an”, which are the same word but which changes shape depending upon whether the following word begins with a consonant or a vowel: a zebra, an aardvark. In this example the difference is a sound change—an /n/ is inserted when the next word begins with a vowel—but the difference could simply be graphical. The characters in the Arabic script change shape depending on where they are in a word, or sometimes if they follow or are followed by certain letters.

The exact relationship between [a] and [y] in the Voynich script, whether it is a graphical difference, a sound change, or something else, is unknown. Not enough is currently known about the script or the way in which it works to make a firm argument. However, we can examine the graphical aspect of the difference between [a] and [y] to make a tentative argument.

Different Strokes

The characters [a] and [y] are naturally easy to differentiate, and all transcriptions I know have classed them as separate characters. Even so, they bear some graphical similarity. Both consist of two strokes, the first in both cases being a semicircular stroke open to the right, being roughly the same as the character [e]. The second strokes of both characters lie directly to the right of the first, but differ in shape. For [a] the second stroke is a short stroke beginning from the mean line and running down and right toward the baseline. It is identical to the character [i]. For [y] the second stroke begins from the mean line, runs down and right toward the baseline, but then curves leftward and continues a for a significant length below the baseline.

The most obvious difference between [a] and [y], therefore, is the downward curved reach of the second stroke, being otherwise rather similar. The likeness strengthens the identity of the characters as related, though this is only impressionistic.

A far more significant relationship lies between the shape of the second stroke and that of the following characters. As mentioned above, the character [a] occurs at the beginning of a word only before the characters [i, l, m, r]. Along with [n], these characters make up the bulk of all those which follow [a] anywhere in the text. That [a] is followed only by a limited a limited set of characters was noted as long ago as Currier, who also noted that those characters included a stroke identical to [i], which is the second stroke of [a]. There is thus good reason to believe that the choice of [a] rather than [y] is conditioned by the presence of an [i] stroke in the character directly following.

This seems to be good evidence that [a] is a graphical variant of [y]: it occurs in a very specific environment conditioned on a graphical basis. However, it is not possible to say that the conditioning of [a] is simply graphical and otherwise without meaning. There could be an underlying relationship between the [i] stroke and another feature which links the observed relationship as an explaining factor. We can only state that there is a relationship while remaining agnostic about its features or meaning.

In the Wild

If the characters [a] and [y] are related to each other there should be evidence for this within the text. The hypothesis is that the characters are conditioned by different environments, and so by controlling for the environment within the text we should be able to control the appearance of either [a] or [y]. Two examples of evidence will be given: the lack of [a] as a standalone character, and the possibility of [y] becoming [a] with the addition of a suffix.

Characters in the Voynich script sometimes stand alone within the text, clearly set apart from other characters. In such circumstances we would expect [y] to appear and not [a], due to the lack of following characters to condition its use. Within the body of running text—that is, made up of whole words separated by spaces—a single character can appear alone as a word in itself. The character [y] appears alone in such circumstances maybe a hundred times, whereas [a] not more than once or twice.

Another similar environment are the so–called “key–like sequences” where a series of individual characters is written in a row or column. The meaning of these sequences is unknown, but they focus on the characters alone rather than as part of a word or text. There are four such sequences considered to be original to the manuscript: 49v shows multiple [y] but no [a]; the repeating sequence on 57v contains [y] but not [a]; 66r shows multiple [y] but no [a]; and 76r contains neither [y] nor [a].

Although the evidence is not strong, the lack of [a] as a standalone character outside of the conditioning environment where it is usually found reinforces the idea that it is a variant of [y]. Where we can show characters isolated from the following characters which we suspect condition presence or absence, then [a] is always absent.

The second way of controlling the environment within which the characters appear is by the use of an affix. The Voynich language is well known for the apparent ‘modularity’ of words, with many longer words seemingly built up with affixes. A useful fact for the study of [a] and [y] is that while [y] is a common word ending, [a] is often the first character of a suffix. This gives us an opportunity for testing the relationship. We would expect that the root of a word ending in [y] to be found with some fixed frequency to that of the same root ending with a suffix with an initial [a]. This is because according to the hypothesis [y] and [a] are the same character in a different environment, and rather than being part of the suffix, [a] appears because the final [y] is transformed by the following characters of the suffix.

To make this clearer, let us formulate a test based on the most common suffix beginning with [a]: [–aiin]. If [a] and [y] are the same character then a word such as [oky] is in fact the root of [okaiin], rather than the two words sharing the common root [ok–]. The suffix is thus [–iin] and not [–aiin], with the first character [i] of the suffix causing the final [y] of [oky] to transform into [a].

Using the same list of the most common words in the Voynich text as above it is possible to make a list of the twenty most common word ending in [–aiin] and their counterparts ending in [–y]. Below is the table of such a list along with the token counts for each word and the frequency ratio between each pair:










































































































Although the above table does not present a clear and unambiguous relationship between the two sets of words, it does allow for the possibility of a relationship. In all cases the most common words ending in [–aiin] have counterparts ending in [–y] which are themselves common (that is, with more than ten occurrences). Likewise, in 15 of the 20 cases the word ending in [–aiin] is more common than that ending in [–y], and within a fairly narrow range.

In comparison, the counterpart words with no ending at all—as though [ok] were the root of [oky] and [okaiin]—are mostly rare or non–existent. Indeed, in the case of [y] and [aiin] there is no possible word without an ending. These two words have sometimes been considered curious because they seem to be endings without a root, but the equivalence of [a] and [y] shows that they have a single character root which transforms with the addition of the [–iin] suffix. The redefining of that suffix as [–iin] also brings into line the less common series of words which end with [–oiin]. Rather than being a different suffix, it is rather the same suffix attached to a word ending in a different character. Words ending [–o] are not as common as [–y] but occur nonetheless.

Conclusion & Implications

The equivalence of [a] and [y] in the Voynich script seems possible or even likely. The two characters have been shown to have a complementary distribution, their appearance dependent on the following character. A graphical link has been highlighted between all the characters which follow [a] but not [y], which [a] itself also shares. Two sets of words where [y] is transformed into [a] by the addition of a suffix have been shown to be possibly related.

Although the evidence and argument set out here is not exhaustive, it is suggestive. More work is needed on those words which might show a transformation between [y] and [a], as the treatment here is only preliminary. Hopefully the outcome will strengthen the evidence available to make this equivalence. Undoubtedly there will be objections to the hypothesis I have not dealt with here, which may yet prove it wrong.

Should the equivalence be accepted there are a number of implications that it could have upon research into the Voynich script and language:

1. Any theory which proposes radically different values for [a] and [y] must not be correct. While the two characters need not have the same value, they must have values which are conditionally linked.

2. Any theory which proposes non–interaction between characters and their context must not be correct. The Voynich text is not a string of single characters, but must include a plausible system of interaction.

3. Characters may have more than one shape depending on their context.

4. The strokes which form characters are important in some way. At the very least the seeming identity of the [i] stroke in different characters is not simply appearance, but must be an actual fact of the script.

Implications 1 and 2 are the most significant and may prove damaging to a number of other theories. Implications 3 and 4 have long been proposed but receive confirmation from this hypothesis.


11 thoughts on “The Equivalence of [a] and [y]

  1. I want to note here that Jacques Guy and Robert Firth may have proposed this equivalence at some time in the 90s. Even though I cannot find a full exposition of their thoughts, mention of this idea can be found in a few places of the Voynich Mailing List archive.


  2. Emma, I just came across this post of yours, and now I think it may seem like I’m stealing your ideas.
    To me, the Voynichese scrips looks like a very limited set of glyphs, of which many (not all) have an “ornate” and a “normal” form. This, together with “a” generally not turning up at word-final position, led me to believe that was just an with a flourish. I’ll link to this post and mention that you were the first to suggest that and may just have the same sound value.


  3. Hmm, something went very wrong in that post, it took my EVA notations as some sort of programming code and made links of it. So I meant that “y” is an “a” with a flourish 🙂


  4. Hmm. Your point about the complementary distribution of [a] and [y] is persuasive, and the idea of [a] being selected by a following line-letter ties in nicely with Bruce Cham’s curve-line hypothesis.

    This leads me to a follow-on question. Cham suggests that [l] is the line-letter counterpart to [y], which is a curve letter. Let ( represent an [e] stroke, \ an [i] stroke, and J a downward tail. Then Cham suggests that [y] is (J, while [l] is \J.

    So here’s my question. As you’ve demonstrated that a following \ seems to transform (J [y] into (\ [a], do you know if anything similar can be observed about [l]? One might expect that before \, \J [l] might transform into \\ [ii]; do you know if the data back this up?

    If not, I’ll run some frequency counts and report back!


    • The idea that [a] and [y] are related comes from a questioning of their distribution within words and not from the curve-line hypothesis. The two characters don’t seem to readily occur in the same environments, which is what led me to the conclusion. That the environments happen to coincidence with a difference in graphical stroke is intriguing, but I don’t believe in the curve-line hypothesis.


      • The idea that [a] and [y] are related comes from a questioning of their distribution within words

        I’m aware of that, of course. All the same, it’s interesting that the environments that seem to select [a] instead of [y] are precisely those where an [i]-stroke follows. This would tend to corroborate the curve-line hypothesis, if rather weakly.

        Regardless of the CL hypothesis, though, if one tail can turn into an [i]-stroke, perhaps another one can too, which is why I plan to investigate complementary distribution of [l] and [ii]. It’s not so much about the CL hypothesis as about seeing if there’s anything generalizable in your findings here.

        The CL hypothesis looks somewhat persuasive to me, but of course I haven’t done the level of research that you have. What do you see as problematic for the curve-line hypothesis, if you don’t mind my asking? (I haven’t seen it addressed on your blog, unless I missed something.)


        • The main problem is that it doesn’t describe word structure all that well. It has numerous exceptions, some of which are substantial. The character [o], for example, doesn’t behave like a curve and the author is forced to ‘patch’ the hypothesis to dismiss this. Yet the patch is inadequate and still fails to explain why [o] behaves how it does. Given that [o] is the most common character, this is a problem. There’s also a significant problem with [l].

          It seems to me that the [i] or [e] stroke within some characters may be important. But I don’t think it is a key organizing principle of the text, rather than of the script.


          • Thanks for the explanation! I will certainly have another look at the behavior of [o] in light of that.

            Meanwhile, I’ll post back here when I look at [ii] and [l] in more detail. There seems to be *something* to the idea, but I’m not sure how much yet.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s