What is [m]?

The character [m] is a key part of the LAAFU puzzle. It consistently appears at the end of a word (95% of all occurrences) and regularly as the last character of a line (67% of all occurrences). Because it has a restricted distribution conditioned by the line of text, it is a LAAFU feature.

Though we do not know the ultimate cause of [m] distribution (and therefore of LAAFU) we can still investigate the character as a problem.What could make a character appear so often at the end of a line and less often away from that position?

Our starting point should be, as always, that any process which can generate a text the length of the Voynich manuscript, and with the apparent structure of words and lines, must work by a set of regular rules. The appearance of [m] at the end of lines is not simply random but for a reason inherent in the creation of the text. The best way to discover the reason (or at least how the creation process worked) is to make the text more homogenous. That is, how do we get the end of the lines to look like the rest of the lines?

Some time ago I discussed Grove words, which present a similar problem. It was clear that the semantic content of the text could not or would not cause words to take specific characters. There’s also no reason here why words with a particular meaning would sit at the end of a line.

Likewise, it is hard to believe that the word order would be so free as to let such words be moved to the end of a line. Indeed, lines ending [m] are most common in Quire 20 where the lines are longest and would demand the greatest freedom of word order.

As with Grove words—as with other LAAFU effects—it is easiest to imagine that transformations are made to words already in a given position. So, for example, Grove words and linefirst words have characters added to the beginning of an existing word. The same could be adding [m] to the end of words at the end of lines which are valid without that character.

Yet in some words [m] is preceded by [i], an unlikely word ending which would be the outcome were the [m] removed. It is thus more reasonable to propose that the final character of a word is transformed into [m] when that word occurs in certain environments, one of which is the end of a line.

So we are left with the question: if another character is transformed into [m], which character is that? Even if we are only accepting this transformation as a working hypothesis, we should seek to identify the best fit character.

Below are some considerations.

What Comes Last

Because [m] occurs mostly at the end of words, it can only be replacing a character which also appears at the end of words. The character doesn’t have to only appear in that position, however, as the word–final occurrence could be a condition of what causes [m].

The most common word–final characters are, in descending order (with percentages): [y] 40%, [n] 16%, [l] 16%, [r] 14%, [s] and [o] with 3% each, and [d] with 2%. Given that [m] itself makes up 3% of all word–final characters, and that it is unlikely to effect a majority of all instances of a character word–finally as it occurs mostly at the end of a line, we can guess that [y, n, l, r] are the most likely candidates.

What Comes Next Last

We mentioned above that sometimes the character [i] comes before [m], which is does in about 70 tokens. It is one of just a few characters which come before [m], the percentages of which are: [a] 71%, [o] 18%, and [i] 6%. All other characters total less than 5%.

Given that [y] doesn’t occur much after any of these three characters we can rule it out as a candidate. The percentages for [n, l, r] are as follows:

Before [n]: [a] 2%, [o] <1%, [i] 97%

Before [l]: [a] 31%, [0] 56%, [i] <1%

Before [r]: [a] 45%, [o] 39%, [i] 10%

We can see two things instantly: [n] is a bad candidate because it occurs almost exclusively after [i], and that neither [l] nor [r] are perfect fits. From these statistics it would seem [r] is the best fit, but not convincingly near.

Where Does It Come?

Earlier we said that two thirds of all [m] occur at the end of lines. The total number of tokens is about 780. Any character which is transformed into [m] word and line finally should show a distinct drop in occurrences in that context.

The curious truth is that all three of [n, l, r] show lower occurrences word–finally at the end of a line: [n] is 2.5% points lower, [l] is 4.5% lower, and [r] is 6.5% lower. The character [r], with the greatest drop, would seem once again to be the best candidate.

However, [m] is more than 13% points higher at the end of the line, twice the amount by which [r] drops. Indeed, it is about equal to the total drop of all three characters.

The character [y], which we have already dismissed as a potential candidate, does not change in frequency at the end of the line.

What Does It Look Like?

Although we cannot be sure that the strokes from which Voynich characters are composed are meaningful, we have seen some indication of that. The conditioning environment for [y] becoming [a] is the following character containing a short stroke like [i]. Likewise, we have seen patterns in the occurrences of gallows and their composition. It is reasonable to consider whether [m] bears any similarity to  our three candidates.

All three characters, [n, l, r] contain the same short [i] stroke which [m] contains. Also, all three have an additional stroke emerging rightward from the character, much as [m] does. However, [n] has the rightward stroke emerging from the bottom of the [i], whereas [l, r, m] have it emerging from the top.

The rightward stroke of [m] follows a more similar course to [r] than [l]. Whereas the rightward stroke of [l] quickly turns down and leftward, crossing the [i], in [r] and [m] the rightward stroke continues right before turning up and leftward. For [r] the stroke continues leftward, while in [m] it dives down and rightward through its earlier path.

Once again, the best match for [m] in graphical terms is [r].

What Words Do We End Up With?

Our goal, mentioned at the start of this post, is to attempt to ‘restore’ the text to its state before [m] was present. We meet success if the text becomes more regular and thus one step nearer to its creation process.

If we take [r] as the best fit for [m], and change all instances of [m] to [r], what words do we end up with? And do they resemble existing words for [r]?

The short answer is: somewhat.

It is certainly true that a common word ending with [r] tends to be paralleled by a common word ending with [m]. So, for example, [dar] accounts for nearly 6% of all words ending with [r], and [dam] for about 9% of all words ending with [m].

Not perfect, but for most words where [r/m] is preceded by [a] the percentage for [m] is higher, while for those preceded by [o/i] it is lower. Also, there are a number of instances of words ending [ram] which are barely paralleled by words ending [rar].


The character [m] may well be a word and line final variant of another character. If this is so then the best character fit is [r]. However, the fit is not perfect.

The underlying reason for the occurrence for [m] is unknown, but the hypothesis that [m] is a variant of [r] gives us a point from which we can explore further. One specific question is why [r] still occurs word and line finally, and why only a portion of that character is transformed into [m]. Research into the specific environments of these two characters at the end of lines may reveal differences which take us further toward the ultimate cause.


10 thoughts on “What is [m]?

  1. I fully agree with the reasoning in this post, but I would also like to point out an additional complication.
    There is evidence that suggests to me that Eva-m is most probably quite distinct from Eva-r.

    This is found (among others) in the labels in the page showing the zodiac sign Pisces.
    In the innermost ring, the figure near 12 o’clock has “otalar” and the next one (near 1 o’clock) has “otalam”, strongly suggesting that these are distinct. Almost the equivalence of a “minimal pair”, which, if I understand correctly, is only used for spoken language, not written language.

    I have not yet been able to think of a good explanation for this.


  2. That’s a great bit of evidence, Rene. It certainly needs accounting for, even if it destroys the hypothesis.

    I will freely admit that I can’t easily account for it. The best explanation I can think of is that the labels are names of items, some or all of which are borrowed from another language which has such minimal pairs. So that the body text mostly uses [m] as a variant of [r], whereas borrowed words may use [m] as a unique sound.

    This, of course, is special pleading.

    (My belief, as I’m sure you know, if that the text of the manuscript much more closely resembles spoken language rather than an established orthography.)


    • Your description reminds me the letter Pe in Hebrew.
      In Hebrew, letter Pe has two forms: non-final פ and final ף. The letter also has two sounds: p and f.
      However, native Hebrew words never end in -p, so final ף always represents f.
      Nowadays, when people write loanwords that end in -p in Hebrew, they use only non-final פ, but never final ף. Loanwords that end in -f are expectedly written with ף. For example: כיף kef, but פיליפ filip.
      I do not understand Hebrew. I saw this on wikipedia: https://en.wikipedia.org/wiki/Pe_(letter)
      Hope this helps.


  3. The best I can think of is that the labels are “words picked up from elsewhere in the MS”.
    Basically a reference to another section. Label words ending in Eva-m then would have been picked up from a line end.
    While that would explain what we see, it also (unfortunately) requires a large part of the MS to be lost, i.e. a much larger part than what is known to be lost.


  4. Here are two other points that I think should be taken into consideration:

    1) It seems that [m] is to [r] as [g] is to [s] (since [g] is also found primarily at the ends of lines).

    2) The letter combinations [ry] and [sy] are also much more common at the ends of lines than elsewhere.

    I think it’s quite possible that [m] and [g] are ligatures, i.e. that they represent combinations of characters. It’s possible that [m] = [ry] and [g] = [sy]. The shapes of the letters [m] and [g] could also be interpreted as combining the features of [y] with those of [r] and [s] respectively.

    Another possibility that fits the above considerations equally well would be [m] = [rl] and [g] = [sl]. Here we would account for the fact that [rl] and [sl] never occur in the manuscript by assuming that the scribe always replaced them with [m] and [g] respectively.

    Either of these possibilities could potentially account for the fact that [m] and [g] occur mainly at the ends of lines but also sometimes in other places, and also account for the point made by Rene above that the difference between [m] and [r] appears to be meaningfully contrastive at least in some cases. The idea would be that the letter combinations represented by [m] and [g] are perfectly valid in their own right, but happen to occur most often when spurious letters are added to the ends of lines (for whatever unknown reason).


    • I think your theory about [m] representing a ligature of [ry] is good. It certainly accounts for the contrast and also somewhat for the shape. But is there any way we can test this?

      Also, I had in mind that [r] became [m] because of some kind of sound change occasioned by the end of a line (or other factors elsewhere). Thus they would be two sounds which were closely linked. This would allow us to explain why [r] still occurred at the end of lines: the conditions for the sound change were not met. But what condition would let [ry] and [m] both occur if one is a simple ligature of the other?


      • I don’t know how it could be tested. I’ve had this idea for years and I’m still not certain about it. Actually I tend to favor the idea that [m] = [rl], since the idea that [m] always replaces that particular combination is appealing (although I just checked, and actually [rl] does occur… but it is rare). One other possible point in favor of [m] being [rl] would be that there are occasional words ending in [mo], and it seems more likely that [r] would be combining with [lo] rather than [yo] since we often see [lo] showing up at ends of lines as well.

        Also, I tend to think that the extra letters attached to the beginnings of lines, and less consistently at the ends of lines, should not be interpreted linguistically. I don’t know why they are there, but whatever purpose they serve, if any, is probably secondary to the meaning of the text. In other words, we should probably just remove these letters when trying to make sense of the words to which they are attached.


        • I do think that the extra characters which seem to be added to the beginning of lines ([s], [y], [d]) can be ignored and simply removed to give the ‘real’ word. However, I think they must tell us something linguistic about the underlying text. The fact that there are line patterns must have some meaning, and by studying them we might learn more.

          For example, we can suppose that [a] and [o] are a similar kind of sound based on how they work within words. That [s] often occurs before both at the beginning of a line supports some link (though it may weaken the link between [a] and [y]).

          Likewise, the similarity between [ch] and [sh] is obvious, so the fact that they both appear after [d] and [y] at the beginning of lines is no surprise. But why sometimes [y] and sometimes [d]? What can we learn about their alteration?


  5. It seems the specific letter added to the beginning of each line was chosen in part based on the first letter of the first word of the line, and in part based on some other consideration.

    I call these letters “annotations” but I’m not sure what they’re annotating. Perhaps it’s a clue that the first letter of each paragraph is usually a gallows. I wonder if these letters tell us something like whether the line begins a new sentence, or whether it’s a continuation of the same sentence from the previous line, or something like this.


    • I agree that these ‘annotations’ must be letter specific, or dependent. And most of them must have other considerations (except [s] before [a], which seems to be near universal).

      I think these extra characters must be telling us something about new or continuing lines, but that could be done unawares through some linguistic effect. I still think they’re ‘linking sounds’ which exist in the spoken language but are only sporadically made explicit in the written language, the line break being one such environment.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s