Last year I laid out my understanding of the low level and high level structure of Voynich words. As I consider the Voynich manuscript to be linguistic, I am happy to believe that the two structures relate to syllables. Specifically, the low level structure shows how a syllable itself is to be constructed, and the high level structure shows how syllables come together in words.

Now, after some delay, I have taken this ideal syllable and word structure and sought to apply it to actual words in the Voynich manuscript. For the purposes of the following a word type is a word with a specific spelling, such as [chor] or [opaiin], and a word token is an individual occurrence of a word type, so [chor] has 218 tokens and [opaiin] has 13 tokens.

I took the text of the whole Voynich manuscript and filtered all those words with fewer than five tokens or with uncertain readings. The filter of at least five tokens was chosen to provide 1) a wordlist short enough to sort by hand, and 2) a reasonable likelihood that the words are valid and not the result of writing or reading mistakes. My wordlist thus held 913 word types totalling 26,372 tokens—roughly two thirds of the total tokens in the manuscript.

I split every word type on the list to show the syllables it contains, and then sorted them into lists by number of syllables. Syllables were discovered using a fairly simple process: [a, y, o] are vowels and every instance of those indicates a syllable; [e] sequences are vowels if not immediately followed by [a, y, o]; and [ch, sh] count as vowels if not immediately followed by an [e] sequence or [a, y, o]; then, working from left to right, every character is part of the syllable belonging to next vowel on the right, except at the end of words where there are no more rightward vowels, where characters belong to the syllable of the last vowel to the left.

The whole of the wordlist was thus broken down into five smaller lists for words of 0 to 4 syllables. The statistics for each list are as follows:

0 syllables: 22 types, 634 tokens

1 syllable: 280 types, 11640 tokens

2 syllables: 500 types, 11504 tokens

3 syllables: 110 types, 2589 tokens

4 syllables: 1 type, 5 tokens

The list for two syllable words held the greatest number of word types, but one and two syllables words had roughly the same number of word tokens. Thus the word tokens by type is highest for one syllable words, with tokens by type for two and three syllables words about joint lowest (four syllable words are technically lower, but with only one example).

The number of one syllable words is likely limited by the total number of possible syllables in the Voynich language. Although some more possible syllables appear in two and three syllables words which do not appear alone, there is a finite ceiling to how many one syllable words there can be, and this is relatively low due to the rigid syllable structure.

The most interesting aspects occur at either end of the distribution in those words of no syllables and four syllables. The possibility of words without vowels should not be shocking, but it does prompt us to give some explanation. It could be that other vowel characters exist, or that not all words are fully written, or that characters are not always used for a sound. However, the small percentage of tokens which have no syllables suggest that it is not a great problem for my syllabification.

Yet the almost complete lack of words longer than three syllables is rather unexpected. It is often repeated that the Voynich texts lacks the short words common to many languages, but the truth is that it lacks long words. Over 85% of both word types and word tokens are one or two syllable words.

It is noteworthy that most of the multi–syllables words follow the breakdown rules which I put forward in my article on high level structure. One syllable of a word can be anything (the ‘Free’ syllable), but the other one or two must select from a much narrower pool. Moreover, the number of possible choices narrows further whether the additional syllable is to be put before or after the Free syllable (and there may be only one before and one after).

I believe the results of the syllabification were fairly successful, and that my method is at least as sound as any other. The outcome is a fairly regular set of syllables put together to form words in a fairly regular way. If further examination of the results gives us more insight into the structure of Voynich words then we can be sure that there is some basis for regarding the syllabification as at least partly right. Each of the four wordlists from none to three syllables will be examined in more details in future posts.


4 thoughts on “Syllabification

  1. Hi Emma, and thanks for this. How exactly did you decide that the ‘vowel’ characters in your analysis are in fact vowels? You say that:

    ” Syllables were discovered using a fairly simple process: [a, y, o] are vowels and every instance of those indicates a syllable; [e] sequences are vowels if not immediately followed by [a, y, o]; and [ch, sh] count as vowels if not immediately followed by an [e] sequence or [a, y, o]”

    but that seems to make assumptions about the status of several Voynich letters. I wonder how you got to those decisions? For example that “[ch, sh] count as vowels if not immediately followed by an [e] sequence or [a, y, o]”.



    • Hi Stephen, I found that [a, y] and [o] have similar distributions and so form a class of characters. This class is very common and occur in almost every word. Moreover, the longer a word is the more often they occur. So I identified it as the most likely candidate for a class of vowels. When I applied these as vowels to words I found that the breakdown into syllables was regular and, further, that those syllables were also structured within words.

      The question of whether [ch, sh] and [e, ee] act like vowels in certain positions is a difficult one. Certainly for [e, ee] I believe that a process of [y] deletion is taking place when [y] is in a medial environment after [e, ee]. We can see this by comparing words ending [edy] and those ending [eody]. When the [dy] is removed the former have few counterparts ending [e] while the latter have plenty ending [eo]. Converse, while there are lots of words ending [ey] there are few ending [eydy].

      I generalized this idea of [y] deletion to the environment medially after [ch, sh], where word structure suggest it also occurs. However, while Marco Ponzi has provided some statistics which support [y] deletion after [e, ee], the same statistics don’t support it after [ch, sh]. I’m uncertain whether the latter is a real process or not.

      I am also, of course, unsure of what [y] deletion really signifies. It could be that the vowel is present but unwritten; that it changes and the changed vowel is unwritten; or that [e, ee, ch, sh] are able to act like vowels and [y] is elided.


      • Thanks for the useful clarification, very helpful. So are you now *sure* that these are vowels, or is your work on syllabification to some extent based on assumptions, and thereore possbly subject to modification as you get more insights?


        • I’m not sure, though I believe that such a thing is ultimately provable. I think that my assignment of these characters as vowels is not an assumption but a ‘best fit’. There are some assumptions underlying it, such as the presence of a natural language and no enciphering, however. I expect that some elements of [y] deletion will be modified, but I feel quite confident otherwise.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s