It might appear that we have at least four obvious mappings, and perhaps we do, but

consider that the ciphertext h and

the plaintext e are sitting out there all by themselves.

Those are probably going to pull part one or more of our obvious pairings, but

that's okay.

The idea here is that it gives us the starting point.

We can try these mappings and see if they let us recognize any word fragments.

If we can't, then we can adjust and

perhaps map the plaintext e to the ciphertext h or perhaps i.

And if that doesn't seem to work, look at using the ciphertext o and

pairing the plaintext t to the ciphertext h.

We aren't going to work through a bunch of permutations in this lesson,

we just can't justify that kind if time.

The key observation at this juncture is that particularly with a limited amount of

ciphertext, frequency analysis is not a magic bullet.

While it gives clues and hints and guides our search so

that we can hopefully identify a few character mappings fairly quickly,

it is still going to be a tedious and error prone process taking hours or

perhaps even days and weeks, or perhaps months.

And is not guaranteed to succeed in any acceptable time frame.

Fortunately, most of the time we would expect to have multiple intercepts

using the same cipher alphabet.

Let's say that we intercepted ten enciphered messages.

In this case, they happen to be the one we've already looked at

plus another nine ciphertext that are also excerpts from Leo Tolstoy novels.

Not necessarily the same one as the one we're primarily interested in, though.

The frequency analysis for

this set of 2500 characters shows significantly finer grain in the results.

For instance, we see that now we have at least one occurrence of every letter.

In fact, our least frequently recurring cyphertext letter, m, occurs 23 times.

We also see that there is only two repeated digraphs that don´t appear at

least once.

Here´s what we can probably make use of the tail end of our frequency

distributions.

We know that qq virtually never occurs, and that dd and

jj are also extremely rare.

Therefore it's probably a good guess that ciphertext characters b, m, and

t map to these three at least partially.

Another good guess would be that either b or m represents plaintext q.

And if it does then it would be an easy matter to scan the ciphertext and see

if one of these two letters is virtually always followed by the same letter.

Which would give high confidence the first letter is the letter q and

the letter that follows it represents u.