# Word Position in Quire 20

Marco Ponzi and I have been discussing for the last couple of weeks a curious observation regarding word position in Quire 20. Neither of us can think of a good explanation for what we’re can see, so I think it is worth simply publishing the observation and letting others comment.

It is well known that line patterns exist which cause words with certain characters—beginning with [d, y, s]—to appear at the start of lines. Such patterns are particularly strong in Quire 20. What Marco and I have discovered is that not only does a pattern exist regarding the second position in a line, but that it works in conjunction with the patterns of the first position.

Firstly, let’s go over what we know about first word positions. Using statistics provided by Marco (as all stats in this post are!), below are the words (with over twenty occurrences) most bound to line first position in Quire 20. Given that there are nine or so words in each line, we should expect each word to occur in the first position roughly 11% of the time*, but many have twice or more percent in that position:

[sain] 81%
[saiin] 66%
[sar] 65%
[dair] 41%
[dain] 40%
[daiin] 39%
[y] 29%
[dar] 27%

These are the only words on the list with above average occurrences, the next lowest, [qol], has an occurrence of 11%. The eight words above all fit into the known preference for [d, y, s] at the beginning of lines. Only one other word beginning with one of these three characters and having twenty of more occurrences, [dal], appears on the list.

Now let’s look at the second position in a line. We should expect, given that the line is about nine words long and the first position often taken by an exceptional word, for the other words to appear in each of the other positions about 12.5% of the time—that is, one in eight (again, see note at the bottom).

Below are the words which appear in the second position at more than twice the rate expected, and have over twenty occurrences:

[shey] 37%
[sheol] 32%
[ain] 30%
[shol] 29%
[cheey] 29%
[chey] 27%
[cheo] 27%
[sheedy] 26%
[lkeedy] 26%
[cheol] 25%

The numbers are less drastic than for first position, but still clear. The key point is, of course, the first character. Just as we saw that three characters dominated the most common words in first position, here 8 of the 10 words begin [ch, sh].

I don’t believe that this would happen by chance. The only reason I can think of is that, as discussed in the post about first-last combinations, the first character of second words is influenced by the last character of the foregoing word. Certainly [ch, sh] are ‘weak’ (as is [a]), and all but one of the common first position words end in ‘strong’ characters [r, n].

Yet is this a good explanation? Many combinations of the common first and second words exist in the manuscript, which is reassuring. But I’m doubtful this is the cause of the observation rather than a fact which happens alongside. As mentioned at the beginning, I’m at a loss.

*It’s really a bit higher, as all short lines bring the average line length down while still having identifiable first, second, third positions. That is, there are more first positions than second positions, more second than third, more third than fourth, and so on, regardless of how long the average line is.

## 16 thoughts on “Word Position in Quire 20”

1. This reasoning only works if we are absolutely 100% sure that lines don’t correspond with sentences, phrases or other syntactical units somehow. This needn’t always be the case, but even if some effort has been made to make the line correspond to a syntactical unit.

What I mean is that in English for example words like why, what, who… will find the beginning of sentences or phrases. This meas that there is no equal chance for each word to be at the beginning. Even if there is only some correspondence between line and syntax, you will get effects like these.

So I guess my question would first be: are we absolutely sure that *only* the line itself causes these effects?

Like

• EMSmith says:

I think that the shape of a paragraph leads us to believe that a line is not a sentence. Most lines in a paragraph are roughly the same length on the page, which is normal for writing which wraps over the line end: the writer simply stopped when the gap left was too small for the next word and began a new line.

The exceptions to this are the last lines of paragraphs which are almost always shorter. Unless the shortest phrases were kept til last we can only explain this by saying that the writer simply finished the sentence at this point. The sentence itself could stretch over several lines and only ends on the last line.

We can be sure that the short lines of paragraphs at the last lines because they are often followed by significant vertical gaps.This is often seen in the ‘Large Plant’ Herbal section, though much less obvious in Quire 20.

Like

• Emma May: yes, I see. Lines of the same length is not necessarily problematic – it could be some kind of poetic form (verse) where lines often start the same and syntactic units are made to fit a line on the page. But I guess this is unlikely since the “text wrapping” also appears when not all lines are of the same length.

The only thing I can think of which (maybe?) hasn’t been suggested yet, is that the scribe wanted to make absolutely sure that the reader knew the new line still belonged with the rest. Perhaps these specific line-first words are abbreviated forms of words that imply [continued]?

Like

• EMSmith says:

My belief is that the first words of lines are altered according to some phonological principle. A working hypothesis is that the writer is showing sounds which were inserted when the language was spoken. He’s not doing it on a normal line because the sound change should be obvious to the reader. But maybe there was an element of uncertainty because of the line break and he wanted to make it clear to the reader.

I admit that it is unproven as yet.

Like

• That’s a beautiful idea, and I don’t see why not, given the close link between the written and the spoken, the former being clearly subordinate to the latter in many cultures.

Another idea, which may have been considered already.. Could it be that he often writes his equivalent of capitals at the start of a line independent of sentence structure? Kind of reminds me of when old people type on a computer and press enter whenever they reach the end of a line 🙂

Like

2. Emma: generally speaking, one alternative scenario is that the first character is an artificially inserted construct which doesn’t add to the meaning of the first word (i.e. a vertical Neal Key).

However, the metalinguistic twist with this might be that if the inserted letter is a null, the first (real) word’s initial letter-shape could easily have certain preferred adjacency forms for that inserted null. For example, ‘s’ might be preferentially inserted before a-initial line-initial words, but not before d-initial line-initial words.

Liked by 1 person

• EMSmith says:

Nick, can you explain what a vertical Neal Key is? It seems relevant to my interests, and I’ve certainly heard it mentioned before, but I’m not sure I understand it.

Like

• Hi Emma May Smith,
AFAIK, based on a 2013 post on ciphermysteries:

“(2) Deceptive first letters / vertical Neal keys

At the Voynich pub meet, Philip Neal announced an extremely neat result that I hadn’t previously noticed or heard of: that Voynichese words where the second letter is EVA ‘y’ (i.e. ‘9’) predominantly appear as the first word of a line. EVA ‘y’ occurs very often word-final, reasonably often word-initial (most notably in labels), but only rarely in the middle of a word, which makes this a troublesome result to account for in terms of straightforward ciphers.

And yet it sits extremely comfortably with the idea that the first letter of a line may be serving some other purpose – perhaps a null character, or (as both Philip and I have speculated, though admittedly he remains far less convinced than I am) a ‘vertical key’, i.e. a set of letters transposed from elsewhere in the line, paragraph or page, and moved there to remove “tells” from inside the main flow of the text.”

I’ve searched for more on Philip Neal’s site but can’t find anything.

Like

• Emma, voynichviews: I believe that the different statistical profile presented by line-initial letters was known at least by the 1970s (and probably by Friedman’s various study groups in the 1940s) The wrinkle that Philip Neal added was that many word-initial letter properties reappear as the second-letter properties of line-initial words, which would (when taken together) seem to argue against interpreting the very first letter of lines as literally part of the first word of lines.

A vertical Neal Key, then, is this kind of vertical column of abutted letters added to a series of lines. Perhaps its function is cryptographic, perhaps not: either way, the conclusion is that it would seem to have a non-linguistic or confounding function.

As an aside, I would argue that the (occasionally large number of) line-initial s- words is an indication that something particularly odd is going on there. My current work-in-progress deduction relating to line-initial s- is that ‘s’ functions as a null there, and that each element of such a Neal Key is therefore probably an optional thing (perhaps relating to the contents of that line), as opposed to being part of a message arrayed vertically. Certainly, trying to read off the first letters of each line as if it were a string or message seems to get you precisely nowhere.

Like

• EMSmith says:

Hmm, so the ‘Neal Key’ is basically the same kind of observation about the statistical properties of first letters. Simply he’s taken them as cryptological in origin, and I’ve taken them as linguistic.

I’m now curious if there’s a way of proving this one way or the other.

Like

• Emma: as far as I know, Philip Neal doesn’t take a position either way – it was me who dubbed the phenomenon a “Neal Key”: and that wasn’t because it’s necessarily cryptological, but simply to give it a name so that people could debate it sensibly. But apparently I failed in that regard. 😦

For me, the reason I find Neal Keys (both vertical and horizontal) so interesting is that they’re neither linguistic nor cryptological, but rather things that confound straightforward theories on both sides of that (non-existent) fence (along with all the other “LAAFU” phenomena).

In fact, I’d go so far as to say as these phenomena confound all simple-minded hypotheses about Voynichese, in that they present a kind of internal order or structure to Voynichese that is also incompatible with hoax hypotheses, glossolalia, etc. In many ways, then, there’s a very strong argument that they ought to be the primary places anyone should start attacking Voynichese.

Like

• EMSmith says:

Well, at least I’m doing something right by looking at the phenomenon.

Though I must say I disagree that the second characters of first position words are typical of word–initial characters.

Like

• Emma: as I’m sure you know, that wasn’t what Philip Neal proposed (as quoted in the comment above). Once again (as happens so often), such Voynichese statistical phenomena resist reduction to a simple formula or account. 😮

Like

3. MarcoP says:

Thank you for another interesting post, Emma!
The words that appear frequently in the first position seem to me to belong to two categories:

1) y, which is typically pre-appended to a word in the first positions. There are 7 occurrences of just ‘y’ as the first word, and in 4 cases the sequence y+secondWord also occurs as a single word in the first position. For instance, the single occurrence of:
f112v_par12,,y.cheol. …
is matched by seven occurrences of ycheol in the first postion:
f104v_par04,,ycheol.cheody.qoeechdy.qokeol.qotaiin.chedar.cheo.lkaiin.cheetar.aiin.chataiin-
f106r_par11,,ycheol.chokaiin.sheody.chody.qokaiin.ar.akair.aiir.okaly=
f108r_par10,,ycheol.chckhy.qokedy.okain=
f112v_par06,,ycheol.keeor.olkeeey.chedain.ol.cheedaiin.sheedy.qokeedy.qotain-
f113v_par13,,ycheol.keey.lkeees.or.aiin.otaiin.chkain.olar.olchedy.qok.aiin.os-
f113v_par15,,ycheol.cheey.qol.lsheedy.qokaiin.chedy.kain.qokeeedy.lkaiin.okal.dy-
f114v_par09,,ycheol.oleeey.cheoaiin.chetaiin.sheeodain=

2) all the other words
sain,saiin,sar,dair,dain,daiin,dar
form a family conforming to the pattern (s|d)a(i|ii)(n|r)

I guess that case (1) might be a variant spelling in which the y prefix is separated from the word it applies to.
Case (2) seems to be something different.

Like

• EMSmith says:

Hi Marco,

I think that many first position words which have prefixed characters such as [y, d, s] often seem like two words. Many times the first character—the one I assume has been added—seems oddly unattached. So I’m not surprised that both [y cheol] and [ycheol] appear in the transcription. There’s a [y cheeo] on f111r and a [y sheol] on f113r.

I’m not sure that first position words beginning [y] are that much different from those beginning [s, d]. For example, [ysh*, ych*] and [dsh*, dch*] are both strongly first position. So here [y, d] attach to words beginning [ch, sh] in the same way. There must be some distinction to why some words take [y] and others [d]. I suspect that if we could figure out the reason it would prove insightful.

Lastly, have you noticed that although words such as [sal] and [dal] fail to appear strongly in the first position? I assume that [s] is added to words beginning [a] in the first position, and there are many occurrences of [al], but relatively few of [sal] in the first position? Likewise, [dal] positively stays away from first position. Just another piece in the puzzle of why [l] is a weird character.

Like

4. Thomas Sauvaget says:

Hello,

I’ve just discovered this post now. Your phonological hypothesis (jan 18, 10:29pm) is interesting.

In another direction, I’d simply wish to mention (you probably know it) that something looking like 9 was in latin a very common abbreviation for “cum” or “cun” (a recurring phoneme), have a look on Google Books at Dictionnaire des abréviations: latines et françaises usitées by Alphonse Chassant at page xlvi (and also 117). That type of abbreviation was nearly always at the beginning or end of words, hence a peculiar position statistics. There’s also a larger scholarly website about latin abbreviations which looks gorgeous but I don’t have the money to access it, see http://www.ruhr-uni-bochum.de/philosophy/projects/abbreviationes/contents.html

Like