[SOUND] In the previous lecture we talked
about defining what we mean by secure encryption.
And we reached an informal description of the security we were after.
Recall that what we said is that we should consider an encryption scheme secure
if regardless of any prior information the attacker has about the plaintext,
observing a ciphertext should reveal no additional information to
the attacker about the plaintext.
Remember also that we say we were going to be considering the threat model
of a ciphertext only attack, meaning that the attacker only observes ciphertext,
passively, but doesn't actively interfere with either the sender or receiver.
And that we would be considering the simplest threat model which is that
the attacker observes only a single ciphertext.
We can adapt our definition to more complicated cases, but
this is the one we're going to be considering for now.
To do that we're going to need to start talking about probability and so
I just want to go over some very basic concepts in probability that I assume you
have encountered before in one form or another before.
Again we are not going to need a very formal view of probability.
That formal view is, there it's present.
We can define everything completely rigorously if we wanted to but
I just want to assume here that you seen these concepts before, that you
are familiar with them, whether or not you have seen them in their complete rigor.
So we're going to use the, terminology of a random variable.
Now again, this has a formal definition in probability theory.
For our purposes, it's enough to just think of a random variable as a variable
that takes on certain values with certain specified probabilities.
For our purposes we're only going to be concerned with discrete random variables.
That means that there's some finite set of values that the variable can
possibly take.
And so we don't have to worry actually about taking integrals or
about considering continuous parameters or
continuous distributions on those random variables.
So again, a random variable is just going to be a variable that takes on
certain values with some probabilities that we're going to specify.
And in particular, a probability distribution for
a random variable is going to be a specification of the probabilities
with which that variable takes on each possible value.
We know that these probabilities must be in the range between 0 and 1 inclusive.
And, furthermore, the sum of the probabilities with
which the variable takes on all these different values should equal 1, right?
The various probabilities should all sum to 1.
again, we'll also use the terminology of an event, which is something that has
a formal definition within probability theory, but for our purposes,
we can just think of, as a particular occurrence, in some randomized experiment.
And we'll write probability bracket E, to denote the probability of
some particular event E, the probability that some occurrence actually happens.
So a typical example of an event that we'll be concerned with
is the probability that some random variable takes on some value.
We then have the notion of conditional probability which is the probability
that one event occurs, assuming that some other event has occurred.
And we'll write this as the probability of A Bar B.
That is the probability of event A, conditioned on the occurrence of event B.
And by definition the probability of A conditioned on
B is equal to the probability that both events A and
B occur, divided by the probability that event B occurs.
We're assuming here that event B occurs with non zero probability so
we don't have any division by zero, okay?
So this is both an informal definition of what we mean by conditional probability as
well as a formal definition of what the notation probability of
A conditioned on B actually means.
With this definition of conditional probability we can now define formally
what it means for two random variables X and Y to be independent.
And actually I'll point out here some notation that I'll generally be
consistent with which is to use capital letters for random variables and
lower case letters for values that those random variables can take.
Okay so here we have random variables capital X and capital Y,
and we'll say that they're independent, if for all possible values X and
Y, the probability that X, takes on the value little x,
conditioned on the event, that random variable Y, takes on value lowercase y.
Is exactly equal to the probability that random variable X takes on the value X in
the first place, i.e.
The fact that the random variable Y takes on any particular value is irrelevant as
far as determining the probability with which X takes on some particular value.
So let's let E1 through EN,
be events, that partition the space of all possibilities.
This means, that the Ei, are all, pairwise impossible.
That means we can't have it be the case that both Ei, and Ej occur for
any i and j.
So that is if E1 occurs, then it's impossible for E2 to occur also, and
if En minus 1 occurs, then it can't be the case that E3 occurs.
Okay, so these events are, are pairwise impossible and
moreover they partition the space of all possibilities in the sense that at least
exactly one of E1 through En have to occur, so some event Ei has to occur.
Then for any other event A we can express the probability of event A as
the sum over all i of the probabilities that A occurs and also Ei occurs.
Right, this completely partitions the possibilities for
when A occurs because some Ei has to occur.
Exactly one of the Ei's has to occur.
And so by summing over the probabilities of A and each of those events Ei,
we recover the total probability that event A occurred.
And then we can break this up further using our definition of
conditional property to be equal to the sum over all i of the probability of
A conditioned on event Ei times the probability of event Ei.
Think, now again, I do assume that you've seen or
encountered all of these concepts before.
This was just meant to be a brief review.
If you haven't, then I would recommend you look at some
introductory probability text.
The first chapter or the first chapter and a half perhaps.
For more, thorough coverage of these different term, terms and definitions.
So let's get back now to cryptography.
So remember the formal definition of a private key encryption scheme, okay?
A private key encryption scheme is defined by a message base, bold M, all right,
that's the set of all possible messages that are allowed to be or that can be
encrypted by the scheme, as well as three algorithms, Gen, Enc and Dec.
Gen was the key generation algorithm.
This is a probabilistic algorithm that generates a key,
that I’ll denote by lower case k.
The encryption algorithm takes as input a key k, and a message M,
in the message base and outputs some ciphertext E.
And just remember that we're allowing the encryption algorithm to
be randomized meaning that if you run it several times, even using the same inputs,
k and m, you can potentially get different ciphertext out.
'Kay, we haven't seen an example of such a scheme yet, but we're,
we're allowing that in our definition.
Finally, the decryption algorithm takes as input a key k and
a ciphertext c and outputs a message m.
And we have the correctness condition that decryption of a ciphertext encrypted,
that the encryption is the message, where you encrypt and
decrypt using the same key, should recover the original message.
Again, I haven't put that on this slide.
Now by way of some more notation, I want to define in parallel to
the notion of using bold face m for the message space.
I want to define the notation of a bold face k to denote the key space.
That’s just the set of all possible keys, that is the set of
all possible things that can be output by the key generation algorithm.
We use also a bold face C, for the ciphertext space, that is the set of
all possible ciphertexts that can be output by the encryption algorithm,
taken over all possible keys that can be output by the key generation algorithm and
all possible messages that are in the message space.
So now what we want to do is look at probability distributions
over the messages that can be sent.
So we'll let capital M without the bold face, okay,
capital M be a random variable denoting the value of
the message that the sender wants to communicate to the receiver.
So this is a random variable that can potentially take on values in
the entire message space.
Right, so a priori, the set of possible values for
what the message can be are exactly those values that are in the message space.
We're not allowed to encrypt anything outside the message space but
we are potentially allowed to encrypt anything we like in the mess,
message space.
Now, the random variable M is meant to reflect the likelihood of different
messages being sent by the parties, and this takes into
account anything that the attacker might know about what those messages might be.
So this distribution, this random variable, M,
over the messages that can be sent is meant to reflect, not only the different
probabilities with which the sender might want to transmit a certain message.
But also something about, perhaps, the attacker's external knowledge that may
have some influence or some bearing over what the sender is sending.
So just as a, as a, as one particular example,
we might have the following distribution over random variable M,
that the probability that the message is equal to attack today Is 0.7 and
the probability that the message is don't attack is 0.3.
And everything else in the message space has a probability of zero.
So the only possibilities are either attack today or
don't attack and those are going to occur with respective probability 70% and 30%.
Okay, now in other situations of course, it might be different.
Maybe again, those are the only two possible messages, but
maybe the attacker has no idea whether the sender wants to attack or not.
And therefore the probabilities would each be 0.5 or maybe there are some other
messages that are possible that can occur with different probability what have you.
You're right, you have different possibilities of what those can be, but
here we're just fixing some particular distribution for
this random variable M as specified, okay?
Now we're going to let capital K be a random variable denoting the key.
The values that this random variable can take are exactly those in
the key space, right?
Only those values in the key space are possibilities for the key.
And actually any prob, any value in that space is a possible value for the key and
so is a possible value that this random variable K can take.
Now one important thing is that if we fix the encryption scheme, or
fix some encryption scheme, Gen, Enc and Dec, then the key generation algorithm
itself defines for you the probability distribution for K, right.
The probability that the random variable K, takes on some particular value,
lowercase k or
the probability that the key is equal to some particular value K is, by definition,
equal to the probability that the key generation algorithm outputs that key.
The probability that the key generation algorithm outputs the particular key,
given by this value lowercase k.
So the encryption scheme defines your distribution over the random variable k.
Now we're going to make the very crucial and
important assumption that the random variables M and K are independent.
Right, this means that the probabilities with which the message takes on some value
are independent of the probabilities that the key takes on some particular value.
Now this is actually a very reasonable assumption,
because the messages that a party wants to send to another party should not
depend on the key that's used to encrypt that message, right?
We imagine that we externally to the encryption scheme,
have some message that they want to communicate.
And then we're sampling a key and
using some encryption scheme to encrypt that message.
But the details of the encryption scheme and the particular key we ended up
choosing shouldn't influence the message that we're sending.
This is a crucial assumption, and one that we're going to make throughout the class.
It is possible to concoct scenarios where this assumption doesn't hold.
But in that case, the definitions we're giving do not apply, and
much more complex definitions need to be considered.
But traditionally we, the assumption is made that M and K are independent, and
this is the assumption we are going to make here.
Moreover, this is the reasonable assumption that should hold except in
some very unusual circumstances.
So, let's fix some encryption scheme, given by Gen, Enc and
Dec, and fix some distribution for the message.
Some distribution on capital M on the message that the sender is going to
send to the receiver.
So we have now fixed by the encryption scheme the distribution over capital K,
and we've fixed by assumption a distribution over
the message space given by capital M.
What we'll do, is first, chose a message little m,
choose a particular message m, according to the given distribution, right?
So, going to back to the example before,
that means we might pick the message attack now with probability 0.7 and
the message don't attack, with probability 0.3.
So we sample it according to those probabilities and
now we've chosen our message and let's call that little m.
We then generate a key, K,
using the key generation algorithm, i.e we sample the key k
according to probability distribution on the random variable capital K.
And then what we're going to do is kind of couple those together
by encrypting the message we've chosen using the key that we've chosen, and
the encryption algorithm that we're given having fixed the encryption scheme.
This gives us a ciphertext that we'll denote by little c.
The key point here is that this defines a distribution on the ciphertext, right?
This process of sampling a random message, sampling a key and
then encrypting the message using that key defines some distribution on
the ciphertext, and we can denote the distribution on the random variable
that denotes the ciphertext by the variable capital C.
'Kay, so we'll let capital C be the random variable denoting the value
that the ciphertext takes on after this experiment.
I think it's helpful here to walk through an example of how this works out and
do an explicit calculation.
So let's consider the shift cipher.
We said that the the fixing an encryption scheme defines the distribution over
capital K.
And indeed in the shift cipher, the key is chosen uniformly from the set of
all characters, or from the set of all numbers between 0 and 25, i.e.,
the distribution on capital K is the one where the probability that the key
takes on, any particular value between 0 and 25, is exactly 1 over 26.
Right, every key in the range of 0 to, 0 to 25 is chosen with equal probability.
Now let's assume we'll take a simple example of
a distribution over the message space.
Assume that the probability that the message is the single character a is 0.7,
and the probability that the message is the single character z is 0.3.
Well, let's calculate now
the probability that the ciphertext is the single character b.
Either the message is equal to a and the key is equal to 1.
Or, in that case we shift our message by one position and indeed get
the ciphertext b, or the message is equal to z and the key is equal to 2.
Those are the only two ways that we can end up with a ciphertext being equal to
the character b.
And now we just have to compute these probabilities.
So the probability that the ciphertext is equal to the character b is equal to
the probability that the message is equal to a and the key is equal to 1,
plus the probability that the message is equal to z and the key is equal to 2.
Because the message and
the key are independent, those become the, just the products of the two terms.
So we have the product sorry the, we have the probability that M equals a times
the probability of K equalling 1, plus the probability that M is equal to z,
times the probability that K is equal to 2.
And now we just plug in the known values for those different probabilities.
Right, the probability that M, is equal to a, we said was 0.7.
The probability that k is equal to 1 is 1 over 26 and
the probability that m is equal to z is 0.3.
The probability that the key is equal to 2 is 1 over 26.
Adding that all together, we see that the total probability with which
the ciphertext is equal to the character b is exactly 1 over 26.
'Kay, nothing magical, but just going through the steps,
I think clarifies a little what we mean, what we meant on the previous slide.
Taking another example, a slightly more complicated one,
we'll again consider the shift cipher, but
now let's look at a different distribution over the message space.
So now we'll assume that we have two possible messages that can be sent,
like before, but they're now longer than one character each.
So we have that with probability one half, the message is equal to the string O-N-E.
And with probability one half, the message is equal to the string T-E-N.
Right? One or ten.
Well, if we break it up as before, what we have is that this is equal to,
using the law of total probability, the probability that C is equal to rqh
conditioned on M being equal to one, times the probability that M is equal to one,
plus the probability that C is equal to rqh conditioned on M being equal to ten,
times the probability that M is equal to ten.
And we can compute that as follows, so conditioned on the message,
being equal to one, the cipher text turns out to be equal to rqh if and
only if we have a shift of three i.e.,.
the key was equal to three.
Right in that case we have o going to r,
we have n going to q and we have e mapping to h.
So now the probability that the ciphertext is equal to
rqh conditioned on the fact that the message is equal to
one is exactly the probability that the key is equal to three, which is 1 over 26.
We multiply that now by the a priori probability with which
the message was equal to one, which we're given as one half, and
now we come to a more interesting calculation, right?
What's the probability that the ciphertext is rqh conditioned on
the message being equal to ten?
Well, if you think about it a little bit, or if you try to exhaust over all
possible keys, you can see actually that there's no possible way
the ciphertext can be equal to rqh, given that the message is equal to ten.
And the reason for that is simply that there is no shift that will
simultaneously shift t to r, e to q, and n to h.
There's just no way for that to happen.
So the probability that the ciphertext is equal to rqh
conditioned on M being equal to ten is 0.
We can write one half for the probability that M equals ten.
It's not going to matter in the end anyway because it's going to go to zero,
but there you have it.
And so by summing these probabilities we get that the probability that C,
the ciphertext is equal to the string rqh, is exactly 1 over 52.
Now, recall the definition of perfect secrecy, from the earlier in the lecture.
All right, we said that regardless of any prior information the attacker has
about the plaintext, the ciphertext that the attacker observes should leak
no additional information about the plaintext.
So, what this means is that perfect secrecy will require that observing
the ciphertext should not change the attacker's knowledge about
the distribution of the plaintext at all.
Right, so the attacker has some knowledge about the distribution of the plaintext
before observing the ciphertext.
But that's just given by the distribution capital M,
the distribution over the possible messages that might be sent.
After observing the ciphertext the attacker can update its knowledge about
the distribution, and what we require is that a scheme is, is defined as,
to be perfectly secret, if, and only if those two distributions, right,
the a priori distribution before observing a ciphertext, and the a posteriori
distribution, after observing a ciphertext, should be exactly equal.
So more formally, what this means, is that an encryption scheme, defined by
the algorithms Gen, Enc, and Dec, and the message space bold face M is perfectly
secret, if for every possible distribution over the message, over the message space,
right, any possible distribution over the possible messages that might be sent.
it holds that the probability that the message was equal to m
conditioned on observing a ciphertext equal to lowercase c is exactly
equal to the a priori probability that the message was equal to m in the first place.
So we have here an expression, on the right hand side we have the a priori
probability with which the message takes on some value lowercase m.
Right, this is the attacker's knowledge about the distribution about the plaintext
before observing any ciphertext.
And we're requiring that to be equal to the left hand side,
which is the probability that the message was equal to some value M, conditioned on
observing that the ciphertext took on some particular value lowercase c.
Okay, so again this is our definition of perfect secrecy.
This is just a formalization, a mathematical description of our informal,
intuitive notion of security that we had on the previous slide.
And let's take as one particular message the string ten and
as one particular ciphertext the string rqh.
Well, we can then ask, what's the probability that the message was equal
to ten conditioned on the fact that we observed a ciphertext equal to rqh?
Well, the reason, we, we can actually say quickly that this is exactly equal to 0.
Sorry, I didn't go through the calculation.
I'll do that in the next example.
Why is that?
Okay, well remember what we said before is that the condition on
the message being equal to ten.
It's simply not possible for the ciphertext to be equal to rqh.
There's no key that will map the message M onto the cipher text rqh.
What that means is that if we've observed the ciphertext rqh,
then it cannot possibly the case, be the case that the message was ten, right?
Because again, there's no possible way for
the message to be equal to ten and the ciphertext to be equal to rqh.
So therefore, the probability that the message is equal to ten,
conditioned on the ciphertext being equal to rqh is precisely 0.
It's simply not possible that the message was ten given that
we've observed the ciphertext rqh going across the channel.
I'm going to turn next to a more complicated example, and
in analyzing that we're going to use Bayes's theorem.
This is something else that I'm assuming that you've seen before but
I just wanted to review it very quickly.
So Bayes' theorem is just a way of switching the order of two events in
a conditional probability statement.
And it simply says that the probability of A conditioned on B
is equal to the probability of B conditioned on A times the probability of
A divided by the probability of B.
It's quite easy to derive this on your own from what we had on the previous slides,
but in any case, this is the statement that we're going to be using.
So, for the next example, we'll make it a little more complicated yet,
we'll take again the shift cipher as our example, but
now we'll consider a distribution over three possible messages.
Right, so there are three different messages that can potentially be sent.
We'll have that the message is equal to hi with probability 0.3,
the message is equal to no with probability 0.2,
and the message is equal to in with probability 0.5.
I just made these numbers up.
Well, we could rewrite this using Bayes' theorem in the following way.
It's just a probability that the ciphertext is equal to xy,
conditioned on the message being equal to hi, times the probability that the message
is equal to hi, divided by the probability that the ciphertext is equal to xy.
And now we just need to calculate each of those probabilities individually.
The prod, get, conditioned on the message being equal to hi,
there's exactly a single key that will result in a ciphertext xy.
I forget exactly what the shift is but
you can see just by exhausting over all 26 possibilities that there's
exactly one value of the key that will map the message M onto the ciphertext.
Sorry, that will map the message hi onto the ciphertext xy.
More interesting or more complicated,
is the computation of the probability that the ciphertext is equal to XY.
Here, we'll just go back to the law of total probability that we used before.
So the probability that the ciphertext is equal to xy,
we can break that up as the sum of three different terms.
The first term is the probability that C equals xy conditioned on the message being
equal to hi times the probability of hi,
which is 0.3, plus the probability that c equals xy conditioned on M equals no,
times the probability that M equals no, which was 0.2.
Plus the probability that C equals xy conditioned on the event that
the message was equal to in times the probability of in which was 0.5.
Now the probability that the cipher text is equal to xy,
conditioned on the message being equal to hi is 1 over 26 as we said earlier.
The probability that the ciphertext is xy conditioned on
the message being equal to no is also 1 over 26.
There's again exactly one key that will map the message M onto
the ciphertext xy and the probability that that key is chosen is exactly 1 over 26.
And so when we add these up we get a total probability of 1 over 52.
Finally, the probability that the message is equal to hi conditioned on C equals xy.
We can go back to our previous expression in terms of Bayes' Law,
plug in the values that we computed, and compute the value of 0.6.
So the probability that the message was equal to hi,
conditioned on the ciphertext being equal to xy, is exactly 0.6.
And the thing to note again, is that this is not equal to the a priori probability,
with which the message was equal to hi, which was 0.3.
So this demonstrates again that the shift cipher is not perfectly secret because we
found a particular distribution over the message space for
which the a priori probability that the message took on some
particular value given that the observed ciphertext took on some other value,
was not equal to the a priori probability with which the message took on that value.
So just to summarize, we see here two examples demonstrating that
the shift cipher is not perfectly secret, and the question that
remains is how can we construct a perfectly secret scheme, right?
We have this nice definition of what perfect secrecy means,
this nice definition that encapsulates our intuitive understanding or
our intuitive desire for what we want to achieve by an encryption scheme.
How can we build an encryption scheme and prove that it satisfies that definition?
And this is the problem we'll turn to in the next lecture.