Following on from my earlier post about the low level word structure, I now wish to discuss high level word structure. The low level word structure dealt with splitting words into sections based on the presence of Primes (<y, o> and their reflexes), and attempted to model how such sections were constructed from separate characters. The high level structure deals with how sections come together to make words. This is important because many words in the Voynich manuscript are made up of more than one section and they show definite rules on composition.
It is best to think of every word as having one section body which can consist of any valid combination of characters, so long as it abides by the low level structure. So it could be a single Prime, such as <y>, or a complex body, such as <kcheo>. A single section word could consist of this and nothing more, with the option of adding a tail.
A word with multiple sections could have another section before, after, or both before and after (relatively few words are more than three sections long). I would like to name these sections as: Free, the section which is composed of a ‘free’ selection of characters; Fore, the section which is placed before the Free section; and After, the section which is placed after the Free section.
However, Fore and After sections are constrained in their composition, unlike Free sections. To reveal more about the high level structure of words we must look at these constraints. We will look first at After sections, which are the same whether or not a Fore section is present, and then look at Fore sections, which differ depending on whether an After is present.
After sections are conspicuous in their most common form. Anybody who has engaged with the text of the Voynich manuscript knows that many words end with <–dy>. This is by far the most common After section. Other ones are <ly, ry, ldy>, though these are much less common. It seems as though <chy> is also possible, but yet even less common. Nothing else seems to occur in great numbers.
There are two important things to note about After sections. First, they all take the prime <y>. Words ending in <o> do occur in the text of the Voynich manuscript—though only maybe one fifteenth as often as those ending in <y>—but it seems as though they are all words without an After section, instead ending in a Free section. Second, After sections seldom take tails. The most common I can find is <otedar> (11 tokens). Others occur, but in small numbers.
Fore sections are a little bit more interesting, especially in the presence of After sections. A Fore section for a word without an After section may be either Prime <y, o>, with the option of adding an <e> sequence and/or <ch,sh> to the left. For example, <o, y, sho, chy, cheo, sheey> are all valid Fore sections, among others. (Bear in mind, however, that a <y> at the end of Fore section could transform into <a> or not be expressed when put next to the Free section, depending on its environment.) Additionally, if the Fore section consists of <o> alone it may take <q> to the left.
However, when a word has an After section the Fore section should normally be <y, o> without an <e> sequence or <ch, sh> to the left. Words which violate this rule are possible but much less common: compare <okeedy> (105 tokens) and <ykeedy> (30 tokens) with <chokeedy> (3 tokens) and <chekeedy> (4 tokens). But, again, a <q> may be added to the left of an <o>.
The typical Voynich word has a very definite structure. Although such structure has been pointed out for many years, the use of Primes to split up words sheds new light on its workings. Classifying word sections according to their position in a word seems to work well, with the rules on constraints being relatively clear and rather simple. I believe this clarity and simplicity is a good sign, showing that we have, at last, some real insight into word structure.
A few points regarding high level word structure should feed back into low level word structure, further simplifying it. Sections such as <shckhy> and <pchdy>, although accepted as single sections for that analysis, may in fact by fusions of a Fore and Free section (<shy> + <ckhy>) and a Free and After section (<pchy> + <dy>) from which a <y> has been deleted or unexpressed.
It should also be noted that what other analyses have seen as words composed of beginnings and endings without a middle, such as <oly> and <chody>, are in fact a Free section plus a Fore or After section. Although it might be ambiguous as to which section is which, the presence of the same words but with tails, such as <olar> and <chodaiin>, suggest that the second section in each word is a Free section and not an After section.