Length of [e] Sequences 1

The character [e] is one of only two characters in the Voynich script which is regularly found repeated. Both [ee] and [eee] occur enough times to be valid sequences. As all three typically occur in the same position within words, they can be generically thought of as ‘[e] sequences’ filling a specific structural slot.

Although we can guess from their similar appearance and position that [e] sequences are a valid ‘family’ within the wider script, why one length is chosen over another is unknown. It may be that they encode different sounds or meanings. It may also be that they are conditioned by their environment. They have also sometimes been linked to benches [ch, sh], which have a related appearance and are often neighbouring [e] sequences in Voynich words.

I don’t expect in this post to answer these questions, but I want to set out some statistics and thoughts which seem pertinent.

Jorge Stolfi, in his page outlining his Grammar for Voynichese Words, provides some interesting tables for [e] sequences. In words which don’t contain a gallows the following counts for the different lengths are found:

[e]: 68
[ee]: 185
[eee]: 90

We can clearly see that [ee] is the most common, followed by [eee] at half as common, then [e] which is nearly only a third as common as [ee]. However, these numbers don’t include those [e] sequences following a bench character, which have quite different counts (let [B] stand for any bench, [ch, sh]):

[Be]: 3851
[Bee]: 917
[Beee]: 24

The counts are generally much higher, but the ratios are totally different. The most common is [e], [ee] is only a quarter as common, and [eee] less than 1% as common.

The difference in the ratios is also found in [e] sequences which follow gallows (let [G] stand for any non–bench gallows, [t, k, p, f]):

[Ge]: 2160
[Gee]: 2339
[Geee]: 189

[GBe]: 1102
[GBee]: 101
[GBeee]: 2

Although neither of these two sets of token counts match those above in environment without gallows, their general direction is the same. Those [e] sequences without a bench before them have high [ee] and significant [eee]. Those [e] sequences following a bench have low [ee] and insignificant [eee].

I performed some follow–up tests, breaking the figures down with specific gallows characters and with following characters such as [y, o, d]. In each case the same pattern emerged: all other things being equal, [e] sequences are shorter after [ch, sh].

This is also observed about bench gallows [cth, ckh], suggesting that they cause the same environment as [ch, sh]. This chimes with earlier observations on bench gallows and their possible relationship with [ch, sh], namely that they aren’t followed by [ch, sh].

Lastly, I want to mention an old observation by Currier, that [p, f] are never followed immediately by [e] sequences. They are however, followed by [ch, sh], which can then be followed by [e] sequences. The [e] sequences in these cases adhere to the same pattern as for gallows as a whole: [ee] is much less common than [e].

I don’t know how to explain all the above observations. There is likely much more to add before we have the full picture. For example, most words won’t begin or end with [e], but what can we say about the few which do? What observations can we make when a word contains two separate [e] sequences?

Bench characters are clearly an important environment for [e] sequences. Why they seem to govern the length of [e] sequences is unknown, but worth taking up as an idea for future investigations.


10 thoughts on “Length of [e] Sequences 1

  1. I created my own transcript, so I don’t depend on interpretations by others like Stolfi and Takahashi… I’ve noticed quite a few quadruple EVA-e sequences (I use “c” for this character).

    I observed cccc on folio 7r:8. The first unit is two EVA-e characters joined, as these shapes were often rendered in Carolingian text, the next two are farther apart. I suspect the first is intended to be read as a unit (I think that cc is probably intended to be interpreted separately from two cees that are not joined)

    On folio 21v:3 we also see quadruple-c but this time the last two are closely joined and the first two are farther apart. Once again, I believed the more tightly coupled pair might be intended as a unit.

    On f57r:5 one finds quad-c at the beginning of a Vword.. Note how they are carefully spaced so there’s no possibility of mistaking them as being coupled with their neighbors.

    On 68r2 outer ring, just below 4:00 o’clock, there is another instance of four in a row, but it’s hard to see. At first it might look like 4occco-gallows but there is a faint sign of a bench and the “o” doesn’t appear completely closed, so it might be 4occc and then a c-shape that is the left part of a bench. I’m not certain of this one, but I’m not convinced the glyph just before EVA-t is an “o” either.

    There are a dozen more quads on cosmo and plant pages, making the total approximately 15 to 17, depending how one reads the ones that are hard to see. The count also changes if one takes into consideration pairs that are tightly coupled versus glyphs that are spaced farther apart.

    I thought you might want to know about quads that are not recorded in other transcripts, and that it was a common scribal convention, in the early Middle Ages, to couple two c-shapes to create a single letter (one that may or may not be reflected in the VMS).


    • That’s very interesting, thanks. I certainly do worry that we’re hampered by the transcription in this particular case. It would be so easy for the transcriber to mistake [ch] and [ee].

      I think the idea that [e] sequences are single letters is something we need to consider very strongly. I’m trying to think of ways in which it could be proven, however.


  2. Speaking of transcription: I think it a great pity that we still use EVA in these analytical discussions. With the best will in the world, it can’t help newcomers unless they learn the flawed transcription, and there’s no way in the world that calling a glyph ‘e’ won’t impress the notion that the thing represents a vowel.

    There’s a digital script at voynich.ninja, which makes me wonder (a) whether it has problems known to the cryptographers and linguists but of which I’m ignorant and (b) whether it can’t be downloaded for use on other sites.

    Why not write Voynichese as she looks and perhaps get a new angle on it?


    • Yes, Voynichese fonts do exist and could be used on websites just as voynich.ninja does, and overall I like the idea. However, the ones that I know of still rely on the character distinctions being made roughly in the way that EVA does them, so *to some extent* we’re back at square one even if we do use them (that is, we still have the problem of assuming we’ve done the glyph identification correction, even if we no longer call a particular character by the name of a Roman one). Of course, we could just refer to scans of the original now that those are widely available, but that has its own problems…


  3. Hi Emma,
    I don’t think it’s possible to assign specific meanings to the various ot*dy labels in Quire13, but they could suggest that sequences of EVA:ch and e are related.

    1. f77r has the EVA:otedy label next one of the openings in the horizontal tube at the top
    2. The nymph at center left is labeled EVA:otchdy;
    3. next to this nymph there is a paragraph starting with EVA:otedy

    4. f82v begins with a nymph labeled EVA:otechdy
    5. Next to her there is another nymph labeled EVA:otedy
    6. The rainbow at the bottom is labeled EVA:oteedy

    7. EVA:otchdy appears again in a prominent position below the rainbow tube in f83r

    These seven occurrences of otedy, oteedy, otchdy, otechdy seem to be related both spatially and semantically. Label otedy is applied to both a tube and a nymph, so there must be a semantic overlap between the two kinds of subjects: it is unlikely that these words are totally unrelated. Another explanation could be that these are all accidentally different spellings of a single word, but this also seems strange: the two nymphs otechdy and otedy in f82v appear next to each other, they must be different entities. A third possibility is that the words are both phonetically and semantically close, but not identical. A the moment, I cannot think of any parallel for this eventuality, but my superficial impression is that the third option is more likely than the alternatives.

    I understand that we can only speculate, but do you have any thoughts about this group of words and images?


    • It’s clear that these words are similar. The [e] and [ch] fit into the same ‘slot’ of the word and are structurally interchangeable. Yet, as you point out, the writer must have often seen examples of such similar words when writing but still chosen a different form. So [otedy] and [otchdy] are nearby (and one must have been written first!) and thus must have been different in the mind of the writer.

      That doesn’t mean, however, that the difference is great. We don’t know what these characters might encode, or the difference between them, so we could be dealing with something trivial.

      The question is what causes one to occur rather than the other? If it is something phoneme-related then the difference between the words has the potential to be really significant. If the cause is line position, or even dialect, then the difference could be much smaller.

      Here are two points that are worth considering:
      both [ch] and [sh] are truly universal characters in the text, on all pages (though [sh] has a significant line start distribution);
      [e], and more strongly [ee], are much more common in Currier B than A, and there are pages where they are rare or even missing.

      What this initially suggests to me is that [e] is not as core a part of the script as [ch, sh], at least throughout the text as a whole. Whatever drives the writer to write [e] must be based partly on something above the individual word (such as Currier language, if not something else).


  4. If you don’t mind indulging the hypothesis of a rank amateur, could the h in a bench be replacing an e, or otherwise causing it to be deleted? If it did, then you would have a correlation in the frequencies ee/he > eee/hee > e and this would hold true for Be > Bee also. Then heee and Beee would equate to eeee and be either errors or just really rare, I don’t have the standing to say which is more likely. (Of course the frequencies for h alone and e alone aren’t going to match up because how would you distinguish between “h replacing e” and “h that just isn’t followed by e”? I guess if h was more prevalent than e overall, then that might be an indirect bit of proof. But you’re very clever at coming up with ways to test these things, so you’ll probably see a minimal pair or something to look for that I wouldn’t.)

    Liked by 2 people

    • I think your hypothesis is very good and well worthy of further investigation. (I’ve seen many researchers not come up with an hypothesis half as a good in all their time.)

      I have to admit that the possibility has crossed my mind too, but coming from a totally different angle: namely, that the first part of [ch] (lets forget [sh] for the moment, as anything we say for one goes for both) seems to sometimes occur alone and sometimes attached to other glyphs, such as [cy] and [co]. This would make sense if [c] was a separate glyph which usually came before [e] and simply attached to it. It would also make sense of bench gallows: [ckh] is actually [cke], the crossbar simply extended to give it something to attach to.

      There are problems with this hypothesis. The first is that we have to show that [h], as a separate glyph doesn’t really exist. This might be possible. The second is that although [chch] is rare with only 18 tokens, [chc] is common as part of the phrase [chckh] or [chcth]. It’s not insurmountable, but it lessens the explanatory power of the hypothesis if bench gallows have to be excluded.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s