0:00

In previous lessons we've seen how frequency analysis can be used to break

a monoalphabetic cipher by looking for patterns of non-randomness

in the cypher text that reflect underlying non-randomness in the plaintext.

Not surprisingly, cryptographers are thus motivated to eliminate or

at least dilute these patterns in the ciphertext.

There were several techniques that were developed with monoalphabetic

ciphers to attempt this, such as using ciphertext letter pairs for

each character and using multiple possible pairs chosen randomly for

the most common plaintext letters and fewer possibilities for the rarer letters.

While this makes cryptanalysis more difficult,

the usage patterns are still present and can still be teased out

given a sufficient amount of ciphertext material to work with.

Another approach is to use multiple monoalphabetic ciphers,

switching from one to another as the plaintext is enciphered.

This is known as a polyalphabetic cipher.

In this lesson, we'll look briefly at the history of these ciphers and

learn just what they are.

We'll also develop a powerful tool for defeating them.

In the next lesson we'll walk through an example of applying this tool.

1:07

Early attempts at crafting polyalphabetic ciphers date to the 9th century Arab

cryptographer al-Kindi, and

much later by Leon Batista Alberti in the mid-15th century.

While both had correctly surmised that a polyalphabetic cipher should be much

harder to break, their implementations were so weak that breaking them was quite

easy, and as a result little widespread interest was generated.

The problem that plagued these early attempts was to implement a practical

polyalphabetic cipher algorithm in a simple manner that didn't require

the distribution or memorization of significant amounts of key material.

Namely, several monoalphabetic ciphers and the order in which to use them.

Al-Kindi and Alberti both had no key.

The order in which the different alphabets were used was part of the algorithm.

In other words, security through obscurity.

In the early 16th century, Johans Trithemius published the Tabula Recta,

which we'll see in a moment, and

which is nothing more than a tabulation of all the possible Caesar shift ciphers

one after another, in his case, using the 24 characters of the Latin alphabet.

But we only work with the 26 characters of the English alphabet.

2:17

Since the structure is so simple, there's no need to memorize anything.

It can be created at need by anyone.

While Trimethius' work does not seem to imply mixing different ciphers

within a given ciphertext, a few decades later,

Giovan Battista Bellaso devised an elegant implementation of a polyalphabetic cipher

based on the Tabula Recta that today is known as the Vigenère cipher.

That's right, Blaise de Vigenère did not invent the cipher that bears his name.

This is due to a mis-attribution in the early 19th century that stuck.

However, about the same time as Bellaso, Vigenere developed a similar but

stronger cipher known as the auto key cipher.

Both defied successful cryptanalysis for the better part

of the next three centuries, and both fell to the same basic cryptanalytic techniques

developed independently by Charles Babbage and Frederick Kasiski.

For our purposes,

we'll stick to the classic Vigenere cipher that works as follows.

A previously agreed upon key phrase, such as secret, is written out.

We'll use five letter groups above the plaintext message,

which we'll use retreat to river.

Since this key phrase is typically short compared to the message, it is simply

repeated as many times as needed to match the length of the plaintext.

We then go through the message one character at a time and

use the character in the key to choose one of the Caesar shift ciphers

by simply selecting the row that starts with that character.

Next we select the column that is headed by the plaintext character.

The ciphertext character is the character at the intersection of this

row and column.

3:51

The result is the ciphertext shown here.

When received, this process is simply reversed.

The plaintext character is the character at the top of the column

containing the ciphertext character in the row selected by the key.

Notice that the plaintext contains four Rs, but

two of them are enciphered as I, one as J, and the other as V.

Similarly, there are two Vs in the ciphertext,

but one enciphers the letter T while the other enciphers R.

We also mask double letter occurrences.

Note that that TT enciphers as LX.

It turns out that there is a very simple mathematical relationship between

the corresponding characters of the plaintext, the key, and the ciphertext.

If each of the N characters in the alphabet were assigned a number starting

with zero, then the letters are numbered from zero through n- 1.

For encryption, ciphertext character is the sum of the plaintext and

the key characters reduced modulo N.

While for decryption, the plaintext

character is the difference between the ciphertext character and the key.

Again, reduced modulo N.

In both cases, the modulo operation takes care of any necessary wraparound.

5:01

The practical result of a polyalphabetic cipher is that the ciphertext

quickly dilutes the letter frequencies.

In fact, if the key were to contain all 26 characters exactly once,

the ciphertext letters would appear on average with uniform frequency.

Taking this even further, if a key is chosen that is as long as the message,

and if the letters in the key are chosen randomly, and

if the key is never reused, then the result is a one-time pad.

In fact, this is how most pen and

paper one-time pad ciphers were actually implemented.

But most of the time, the key was a word or

short phrase that was considerably shorter than the message.

But this was sufficient to significantly soften the ciphertext

letter frequencies and render simple frequency analysis ineffective.

So how might we crack this cipher?

5:51

The answer was arrived at independently, and by slightly different methods,

by both Charles Babbage and Friedrich Kasiski in the mid-19th century.

Both men observed that if the length of the key could be determined,

then the ciphertext could be broken up into sets of ciphertext characters

that had been encrypted with the same key.

While successful encryption of each set would obviously not result

in meaningful text, the fact that they all had the plaintext characters in the set

had been encrypted using the same key meant they were encrypted with the same

monoalphabetic cipher.

And in the case of the Vigenere cipher, this was a simple Caesar shift cipher.

This meant that they would exhibit the same frequency distribution as

the underlying plaintext.

This insight by itself would have been sufficient to afford a considerable break

into the cipher, but both men went a major step forward and

found a way to determine the length of the key in a very efficient way.

Both methods look at coincidences, which are letters or

sequences of letters that appear at regular intervals in the ciphertext.

We'll look at one of these methods, Babbage's Index of Coincidence, and

leave the other, the Kasiski Examination, for your own exploration.

7:05

The Index of Coincidence is a measure of how frequently you expect to find

coincidental matches between two unrelated strings of text written

in the same language, using the same alphabet laid one on top of the other.

Here's an example with the beginning of two unrelated stories.

We can see that in these 24 character strings that there are three coincidences.

This is a rate of 12.5%,

which we'll see is actually about twice what we would expect.

But remember that statistical measures tend to vary greatly from the average

in small samples.

7:38

To see how this can help us, let's see what we would expect the rate to be.

For illustration purposes, we'll start with a simple language that has just two

characters, A and B, in which A has a frequency of 75%, and B, 25%.

Here are two 32-character plaintext messages

written in this language with the coincidences indicated by an overbar.

We see that we have 16 coincidences with A, and

three with B, giving a total coincidence rate of 59%.

But is this close to what we expect?

Let's look at what the math tells us.

8:13

Let's say that we have two 2-character alphabets, F and G, where they

have the same frequency distribution, but possibly different character mappings.

Perhaps one is the plaintext alphabet and

the other is one of the possible monoalphabetic cipher alphabets.

How often would we expect to see a coincidence involving the letter A?

8:33

In the first string, there's an f sub A,

the frequency of A in language f, percent chance of seeing an A.

In the second string, there's a g sub A chance of seeing an A.

The probability of both happening at the same time

is simply the product of these two frequencies.

The same is true for the letter B.

We then sum up these probabilities for

each character in the alphabet, and get the overall expected coincidence rate.

For an alphabet with n characters,

this can be written succinctly with this equation.

9:07

Going back to our Toy language, if both alphabets are the same,

we get an expected coincidence rate of 62.5%.

If the alphabets are different, and with just two characters we only have one

possible alternative, we get an expected rate of 37.5%.

9:24

Recall that we are interested in detecting when our two sets

of characters are drawn from the same alphabet, or when f and g are the same.

In this case, our equation reduces to simply the sum of the squares of

the single letter frequencies.

For English, this rate is generally accepted as being 6.65%.

For the frequencies we previously established from that huge set of works,

we got 6.85%, which is in pretty good agreement.

9:50

If our two sets of letters were drawn from different alphabets,

the coincidence rate would be lower.

It would be lowest if the two alphabets used letters randomly, which would

also be the case for a language that has a uniform letter frequency distribution.

This serves as a good reference because it gives an idea of how non-uniform

our frequency distribution is.

For the 26 character English alphabet, the random coincidence rate is 3.85%.

Coincidence rates are very useful, and in practice this is what we generally use.

But in order to compare rates for different languages, having alphabets of

different sizes, it is useful to normalize the expected coincidence rate by dividing

it by the coincidence rate for a uniformly used alphabet of that same size.

This is what the actual index of coincidence refers to.

For English, this works out to about 1.73.

Although not readily apparent,

it is not too difficult to reason out that the expected coincidence rate will be

greatest when the letters in each pair are drawn from the same alphabet.

This is the critical observation that lets us find the key length.

If we count the coincidences between the ciphertext and a copy of the ciphertext

that has been shifted by k places, then when k is a multiple of the key length,

each pair of letters are drawn from the same alphabet.

And we expect to see a spike in the coincidence rate.

Because of statistical variation, it will not be too surprising so

see spikes at other shift amounts, but

we would not expect to see these spikes at every multiple of that shift.

In the next lesson, we'll look at an example of applying coincidence analysis

to a Vigenere enciphered message, and get a feel for how it works and

how the amount of ciphertext material available affects its effectiveness.