Last year I laid out my understanding of the low level and high level structure of Voynich words. As I consider the Voynich manuscript to be linguistic, I am happy to believe that the two structures relate to syllables. Specifically, the low level structure shows how a syllable itself is to be constructed, and the high level structure shows how syllables come together in words.
Now, after some delay, I have taken this ideal syllable and word structure and sought to apply it to actual words in the Voynich manuscript. For the purposes of the following a word type is a word with a specific spelling, such as [chor] or [opaiin], and a word token is an individual occurrence of a word type, so [chor] has 218 tokens and [opaiin] has 13 tokens.
I took the text of the whole Voynich manuscript and filtered all those words with fewer than five tokens or with uncertain readings. The filter of at least five tokens was chosen to provide 1) a wordlist short enough to sort by hand, and 2) a reasonable likelihood that the words are valid and not the result of writing or reading mistakes. My wordlist thus held 913 word types totalling 26,372 tokens—roughly two thirds of the total tokens in the manuscript.
I split every word type on the list to show the syllables it contains, and then sorted them into lists by number of syllables. Syllables were discovered using a fairly simple process: [a, y, o] are vowels and every instance of those indicates a syllable; [e] sequences are vowels if not immediately followed by [a, y, o]; and [ch, sh] count as vowels if not immediately followed by an [e] sequence or [a, y, o]; then, working from left to right, every character is part of the syllable belonging to next vowel on the right, except at the end of words where there are no more rightward vowels, where characters belong to the syllable of the last vowel to the left.
The whole of the wordlist was thus broken down into five smaller lists for words of 0 to 4 syllables. The statistics for each list are as follows:
0 syllables: 22 types, 634 tokens
1 syllable: 280 types, 11640 tokens
2 syllables: 500 types, 11504 tokens
3 syllables: 110 types, 2589 tokens
4 syllables: 1 type, 5 tokens
The list for two syllable words held the greatest number of word types, but one and two syllables words had roughly the same number of word tokens. Thus the word tokens by type is highest for one syllable words, with tokens by type for two and three syllables words about joint lowest (four syllable words are technically lower, but with only one example).
The number of one syllable words is likely limited by the total number of possible syllables in the Voynich language. Although some more possible syllables appear in two and three syllables words which do not appear alone, there is a finite ceiling to how many one syllable words there can be, and this is relatively low due to the rigid syllable structure.
The most interesting aspects occur at either end of the distribution in those words of no syllables and four syllables. The possibility of words without vowels should not be shocking, but it does prompt us to give some explanation. It could be that other vowel characters exist, or that not all words are fully written, or that characters are not always used for a sound. However, the small percentage of tokens which have no syllables suggest that it is not a great problem for my syllabification.
Yet the almost complete lack of words longer than three syllables is rather unexpected. It is often repeated that the Voynich texts lacks the short words common to many languages, but the truth is that it lacks long words. Over 85% of both word types and word tokens are one or two syllable words.
It is noteworthy that most of the multi–syllables words follow the breakdown rules which I put forward in my article on high level structure. One syllable of a word can be anything (the ‘Free’ syllable), but the other one or two must select from a much narrower pool. Moreover, the number of possible choices narrows further whether the additional syllable is to be put before or after the Free syllable (and there may be only one before and one after).
I believe the results of the syllabification were fairly successful, and that my method is at least as sound as any other. The outcome is a fairly regular set of syllables put together to form words in a fairly regular way. If further examination of the results gives us more insight into the structure of Voynich words then we can be sure that there is some basis for regarding the syllabification as at least partly right. Each of the four wordlists from none to three syllables will be examined in more details in future posts.