In May I proposed the Transformation Theory. The theory stated that words are transformed from their original shape through the influence of neighbouring characters and other aspects of the text. I presented a small piece of research regarding words beginning with [o], which provided very basic support for the theory. I now wish to present a more substantial look at the same words.
The statistics used in this post were provided by Marco Ponzi, whom I must thank. He also provided a great deal of discussion, and some suggestions, which find their way into this post. However, I do not wish to suggest he agrees with this post, in whole or in part (unless he otherwise states).
Note: a transcription with a ‘.’ denotes a space between words. So a transcription such a [n.o] means a sequence where a word ends [n] and the following word begins [o].
The following research is built on the idea of last–first combinations. Characters found at the end and beginning of words can be broadly classified into two groups, strong [k, t, r, n, s] and weak [y, o, a, ch, sh], according to the characters they prefer to neighbour. A word ending with a strong character prefers to be followed by a word beginning with a weak character, and vice versa. Strong–strong and weak–weak combinations are typically less common than strong–weak and weak–strong.
When last–first combinations show that certain sequences, such as [n.k] and [y.t], are common or uncommon, this is based on an aggregate of thousands of different words sharing only the last or first letters. In order for last–first combinations to become part of the Transformation Theory we need to show that the combinations are true with regard to individual words. We can do this by altering the shape of a word and measuring the changes in the last–first sequences it is part of.
For the research we took pairs of words which differed only by the presence or absence of an initial [o]. This was chosen because it is a very common variation and many word pairs exist with and without initial [o], and because [o] is a weak character usually followed by typical strong characters such as [k, t, r] (and [l], which acts strong at the beginning of a word). Thus, we should find numerous word pairs which begin [ot, ok, ol, or] in one form and [t, k, l, r] in the other. The theory states that by removing the initial [o] and altering the first character from weak to strong, but keeping the rest of the word the same, we should see a significant difference in the last character of the preceding words.
Twenty–one pairs which met the criteria were chosen on the basis of frequency. All pairs had at least 77 tokens in all, and each word of a pair had at least 31 tokens. The twenty–one pairs were: [or, r], [okaiin, kaiin], [okeedy, keedy], [okar, kar], [otol, tol], [okain, kain], [okedy, kedy], [okeey, keey], [otar, tar], [otedy, tedy], [otaiin, taiin], [olkeedy, lkeedy], [olkeey, lkeey], [oraiin, raiin], [olchedy, lchedy], [okol, kol], [opchedy, pchedy], [olkain, lkain], [otchedy, tchedy], [olor, lor], and [olkaiin, lkaiin]. It should be noted that several features, such as different word endings, the presence or [e] or [i] within the words, and overall length were represented multiple times in different combinations. This allowed us to check if the last character of the preceding word had any influence beyond the variable initial [o].
The most extreme change was observed following words ending with [n]. In all twenty–one word pairs the word beginning [o] was more common after [n] than that without the initial [o]. In many pairs there were no examples of the latter, despite words ending [n] occurring commonly before the [o] version of the pair. For example, [otar] has 41 tokens (30% of all its occurrences) in the sequence [n.otar], yet there are zero for [tar]. Likewise, [okain] has 43 tokens (30%) of [n.okain] and zero for [kain]. A full 13 of the 21 pairs showed this situation.
Here are the percentages after [n] for each word pair:
|Without [o]||%||With [o]||%|
Words following [r] showed the same bias toward the version with initial [o], but without such extreme statistics. The sequence [r.o] was always the more common, sometimes more than twice as common but also sometimes the difference was not significant.
Here are the percentages after [r] for each word pair:
|Without [o]||%||With [o]||%|
Where the word pairs followed words ending [o] with any frequency (which was only in about 8 of 21 pairs) there was a clear preference for the form without initial [o]. Some of these could be misreadings. For example, 28% of [r] occurred after words ending [o], but the sequence [o.r] might simply be [or] with a misplaced space. However, it is telling that only 3% of [or] occurred in the sequence [o.or].
The most difficult to understand result is that after words ending [y], the most common word ending in the text as a whole. With [y] and [o] both being weak characters we would expect a bias against the sequence [y.o]. However, even though many pairs did show a small avoidance of this, the differences were small and some pairs showed the opposite preference.
The only set of words to consistently show a strong preference one way or the other after [y] were those where the characters following [o] were either [lk] or [lch]. Of these five pairs, four had a 2:1 preference for avoiding [y.o].
At the suggestion of Marco we broke the words ending [y] down into two groups: those ending [dy] (a very common ending) and those ending [y] but not [dy]. This breakdown proved to be interesting. Two things were observed, though only one of which directly relates to Transformation Theory.
Firstly, the word pairs ending [dy] themselves were much more likely (across both versions) to come after other words ending [dy]. I won’t comment more on this, as it really deserves its own place.
Secondly, many pairs showed a preference for the form with initial [o] after words ending [dy], even where they did not show such a preference for words ending not [dy]. An example will make this clear. The pair [kaiin] and [okaiin] occur equally after all words ending [y] (32% and 29% respectively). However, nine out of ten times [kaiin] comes after a word ending only [y] not [dy], whereas for [okaiin] the split is 54% for not [dy] and 46% for [dy].
It would seem that last–first combinations have some reality at the level of individual words. The sequences [n.o] and [r.o] are preferred much like the aggregate statistics would suggest, and sequences such as [n.k] and [r.t] are often avoided. So strong–strong combinations are clearly not preferred.
The results for weak–weak combinations are mixed. The phrase [o.o] is not preferred where is exists. The [y.o] sequence has a tendency to be avoided only where the preceding word doesn’t end [dy] or the following word begins [olk, olch]. Where the preceding word does end [dy], then the sequence [dy.o] may be just as much, if not more common. It seems as though [dy] is, for some reason, ignored by the rules of last–first combinations.
Are the word pairs real?
It might be objected that the pairs of words used in this piece of research aren’t real pairs. That’s certainly a reasonable objection, and until we can read the text we cannot, sadly, prove it positively one way or the other. We can only definitely say that they are orthographic pairs, being spelt the same way other than the initial [o] character.
However, we can show that many pairs have similar distributions within the text. This is what we would expect were they true pairs, appearing in the same topics but in varying forms. Using Pearson’s correlation coefficient, where a value of +1 is total positive correlation, -1 total negative correlation, and 0 no correlation, 11 of the 21 pairs score .50 or better. (The worst performing pair is [olkeedy, lkeedy] with a value of .05. However, they are found almost exclusively in quires 13 and 20, just not in the same folios.)
I’m wary of putting too much weight on a single set of figures for proving a theory. But it certainly looks reasonable to suggest that the last–first combinations seen in aggregate do operate at the level of individual words. There’s nothing obviously against the last–first combinations, and even the worst performing, the weak–weak sequence [y.o], seems to be complicated by the treatment of the word ending [dy].
Naturally, further experiments, taking the same question from different angles, would add to the debate. However, the stark preference for the [n.o] sequence shows that, at least in some places, there’s little more than can be said. It would seem that, if a variant with initial [o] exists, then it will occur after words ending [n].
Taking this one phrase, [n.o], which is the strongest individual last–first combination we have seen, what is the consequence for Transformation Theory? The theory would state that one or both of the words in such a phrase have been altered under the influence of their neighbour. The variant beginning [o] occurs after a word ending [n] specifically because the [n] causes the [o] to be added (or the [o] causes the [n], but there are reasons this is unlikely). The context of one word thus transforms the other.
Were we to take this individual transformation as definitely true then we could use the knowledge to ‘undo’ the transformation and restore the original, underlying, text. For example, in the phrase [aiin otedy] the second word might have originally been spelt [tedy], only taking the initial [o] due to the influence of [aiin]. The original text might have been [aiin tedy].
The undoing of words transformations would, if repeated throughout the manuscript, have the likely effect of making it more regular: a smaller vocabulary, with more repeated phrases, and potentially more obvious grammar. The research goal for Transformation Theory is to gather further evidence for transformations and refine our understanding of them. Last–first combinations are one part of this goal.
I realize that this post doesn’t include the full statistics, only my impressionistic assessment of them, so I’m going to ask Marco’s permission to post them as a separate file.