The New Word Structure I published last week relies heavily on older research, specifically the idea that [y] and [o] are equivalent, and that [y] can be deleted in some environments. Indeed, since formulating these ideas most of my text research has relied on them, either openly or implicitly. Much of what I believe about the text would become invalid, or at least seriously undermined, were these ideas to be proven wrong.
The worry I have is that the separate ideas have been pieced together and extended over time (for example, [y] deletion after [ch, sh] as well as [e]) despite being used for a single goal. The goal is to show that [y] and it various forms and expressions are a match for [o], and therefore that [o] and [y] are in the same “class” of glyph.
Although we expect all glyphs to have different frequencies in detail, certain glyphs, such as [ch, sh] or [k,t], share the same distributions. That is, even though they occur more or less in some positions, they appear to be valid in the same positions. My expectation is that [y] is valid in the same positions as [o] once all its forms have been included, but I don’t believe I have ever properly tested this.
Mapping and Matching
To test if the distribution of [y] matches that of [o] I first made table of possible trigrams containing [o] and their token counts. Each trigram consisted of [o] with one of the common 22 glyphs or space before and after it. This gave 529 combinations, ranging from the unlikely [noq] (0 tokens) to the very common [.ok] (2481 tokens). (The full stop/period here denotes a space or word break.)
The reason for the trigram is that various expressions of [y] depend on the glyphs which come before or after. So [a] depends on the following glyph while a null expression/deletion relies on the preceding glyph.
The trigram table shows the actual environments in which [o] occurs and where it does not. The expressions and forms of [y] can then be mapped to the table, slowly building up the coverage and showing us if it doesn’t match any key parts of the distribution of [o].
After [q] ~ 21% of occurrences
We can start with the approximately 21% of [o] occurrences which definitely can’t be matched by [y]. Nothing can replace [o] after [q] and it’s universally acknowledged that this this particular bigram (and all the trigrams which contain it) is its own thing. Some would go as far to say that [qo] is a digraph.
I feel there’s evidence that [q] is specifically added to words starting [o]. This would make this particular distribution more about how [q] works rather than [o]. It is enough to say that nobody expects [o] to be replaced by another glyph in this environment so we don’t have to worry about it.
Before [r, l, i, n, m] ~ 34%
The glyphs [r, l, i, n, m] are important as they have been identified as the main glyphs which [a] occurs before. Thus for all the occurrences [o] in this environment we would expect them to be replaced not by [y] but [a].
If we look at some of the most commons trigrams for [o] before these five glyphs we can see that [a] validly replaces [o] in all cases.
We can clearly see how the differences in frequency changes across the trigrams. The trigrams with [a] and sometimes more, and sometimes less, common than those with [o]. But in all cases they are no less valid, which is an excellent sign.
(I invite the reader to stop and this point and try to replace [o] with any other glyph and get results as half as good.)
At the end of a word ~ 5%
At the end of a word [o] should always be replaced by [y]. The glyph [a] barely occurs in this position due to the lack of a conditioning environment.
The most common trigrams at the end of a word are shown below with their matches containing [y].
The glyph [y] is clearly much more common in the final position than [o]. We might have expected the large token count for [dy.] given its prevalence as a word ending. Yet [ey.] is clearly much more common also. It’s not a problem for our purposes, however, as we’ve shown that [y] can validly replace [o] in all these position. Indeed, the greater issue that some of the token counts for [o] that they barely feel valid.
At the start of words before glyphs which don’t cause [a] ~ 25%
The start of a word is the other place where we would expect to see [y] replace [o]. We’ve already looked at [o] before [r, l, i, n, m] and we take that as including occurrences at the start of words. But [o] is also common before a few others glyphs in this position, as seen in the table below.
We’ve clearly run into a problem here. The trigrams [.ya, .ye, .ys] have far too low token counts to be valid. We might be able to forgive [.ya], as the two glyphs are technically two forms of [y], but the other trigrams are still wrong.
In neither case does there appear to be a solution. The trigrams [.ae, .as] are not only outside the expected rules for [a], but they’re also less common still. The trigrams together account for about 1% of [o]’s distribution: a small amount but still a gap.
In the middle of words after [e] but not before [r, l, i, n, m] ~ 6%
I think that this is the first really difficult part of my ideas for some to understand. We know that [y] is relatively uncommon in the middle of words, and yet [a] only occurs before some glyphs. What is the form of [y] used before other glyphs?
The answer I came up with is that [y] is deleted, or simply not expressed, when it should follow [e]. So if we imagine [okeody] to be made from the words [okeo] and [dy], they [okedy] is made from the words [okey] and [dy]: neither [okeydy] nor [okeady] exist.
The four most common trigrams demonstrate the match well, with “null” simply meaning that [y] is not expressed.
Again, there are a few rarer trigrams with [e] which don’t work so well. The two most common are [eoy, eoa], which I wouldn’t expect to be common anyway due to the double [y].
In the middle of words after [h] but not before [r, l, i, n, m] ~ 6%
This environment is an extension of the one immediately above. There are many places where trigrams such as [chok] and [chod] and no equivalent with [y] or [a]. So I extended the argument used for [e] to cover benches as well.
Interestingly, this extension predated the discovery that benches help determine the length of [e] sequences. The outcome of that discovery is the hypothesis that benches could contain a “captured” [e], meaning that the environment following a bench is the same as following [e]. The use of [h] is to highlight the fact that the same is possibly true of bench gallows.
Below are some common examples.
Despite this environment being an extension of the argument for [o] after [e], it works perfectly well. The same issue arises with trigrams with [y] and [a], which is once again expected.
Everything else ~ 4%
This is the nub of the problem outlined at the start: my attempt to match [y] with [o] has not been systematic. I’ve been unsure of how well [y], [a], and null cover the whole of [o]. This 4% represents everything I’ve overlooked and it needs to be addressed.
To start with, about 0.5% of “everything else” is composed of really rare trigrams. The trigram [doa] has 8 occurrences. It might not be at all normal for [o] and I shan’t try to explain how [y] covers it.
However, some trigrams are common enough that they deserve addressing. Below are the nine most common, all with twenty tokens or more. I’ve added columns for their matches with [y], [a], and null, just so we can see if any of the existing explanations work.
Well, this is a bit of a mixed bag! It’s clear that there is no single pattern. That’s okay, as this is basically a residue category for [o]. There’s nothing particular which brought all these trigrams together other than the fact that they weren’t already mapped to another form of [y].
We’ll have to take these trigrams as groups according to what looks like the best answer.
[tok] and [sos]: There doesn’t seem to be a great answer for these two. Neither are very common themselves, but all the possible matches are poor. While [tok] isn’t a huge problem as it’s quite unusual in itself (two gallows in a single word) the trigram [sos] seems more normal and should have a similar solution to [sod] or [los].
[kod], [tod], and [dod]: These three seem as though two or even three of the possibilities might exist. Why is that? Could it be that the writer didn’t know what to do with [y] in these environments?
[sod], [rod], [lod], and [los]: All these clearly prefer the “null” version. This is a very interesting result as the deletion of [y] was built upon the idea that [e] (or a “captured” [e]) might somehow stand in for the missing [y]).
Yet there is something more worth noting. Three of the trigrams with [o], [sod], [rod], and [lod], are more common than expected in words at the start or end of line. Similarly, [sd], [rd], [ld], and [ls], are also more than than expected in words in these positions.
This could be a sign that, at least for these four trigrams, the match between the [o] and null versions is the right one.
I’m still not convinced that I have my idea about [y] matching [o] completely nailed down. There seems more that needs to be said, or something which I’m missing. Yet the majority of the distribution of [o] does seems to be covered within existing arguments.
Even if in a few places, such as with [.oe] and [.es], it does seems to fail, and the residue is quite confusing, there’s still much more good about the idea than bad. Maybe there is something which will tie all the pieces, including the gaps, into a whole. There needs to be an underlying reason why [y] becomes [a] in some places or is simply not expressed in others.
I definitely haven’t found that reason yet, but I’m willing to stand by my hypothesis while I look for it.