0:00

In the last segment we talked about the CBC-MAC and the NMAC, but throughout

the segment we always assumed that the message length was a multiple of the block

size. In this segment, we're going to see what to do when the message length is not

a multiple of the block size. So recall that the encrypted CBC mac or ECBC-MAC for

short uses pseudorandom permutation F to actually compute the CBC function as we

discussed in the last segment. But in the last segment, we always assumed that the

message itself could be broken into an integer number of blocks for the block

cipher. And the question is what to do when the message length is not a multiple

of the block size. So here we have a message where the last block actually is

shorter than the full block and the question is how to compute the ECBC-MAC in

that case. So the solution of course is to pad the message and the first pad that

comes to mind is to simply pad the message with all zeros. In other words we take the

last block and just add zeros to it until the last block becomes as long as one full

block size. And so my question to you is whether the resulting MAC is secure. So

the answer is no, the MAC is not secure. And let me explain why, basically the

problem is that it's possible now to come up with messages so that message m and the

message m concatenated zero happen to have exactly the same Pad. And as a result once

we plug in both m and m||0 into ECBC we'll get the same tag out, which means that

both m and m||0 have the same tag and therefore the attacker can

mount an existential forgery. He would ask for the tag on the message m. And then he

would output as its forgery the tag and the message m||0. And it's

easy to say why that's the case. Basically, to be absolutely clear here, we have our

message m. Which after padding becomes m000. So we had to add three

0's to it. And here we have the message m zero, an m that ends with zero. And after

padding, we basically now have to add two 0's to it. And lo and behold, they become

the same pad, so that they're gonna have exactly the same tag which allows the

adversary to mount the existential forgery attack. So this is not a good idea. In

fact appending all 0s is a terrible idea. And if you think about concrete case where

this comes up imagine the automatic clearing house system used for clearing

checks. I might have a check for a $100 and the tag on that check. Well, now, the

attacker basically could append a zero to my check and make it a check for $1000,

and that wouldn't actually change the tag. So this ability to extend the message

without changing the tag actually could have pretty disastrous consequences. So I

hope this example convinces you that the padding function itself must be a one to

one function. In other words, it should be the case that two distinct messages always

map to two distinct padded messages. We shouldn't actually have a collision on the

padding function. Another way of saying it is that the padding function must be

invertible. That guarantees that the padding function is one to one. So a

standard way to do this was proposed by the International Standards Organization

ISO. What they suggested is basically, let's append the string 100000 to the end

of the message to make the message be a multiple of the block length. Now to see

that this padding is invertible, all we do is describe the inversion algorithm

which simply is gonna scan the message from right to left, until it hits the

first one and then it's gonna remove all the bits to the right of this one,

including the one. And you see that once we've removed the pattern this way, we

obtain the original message. So here's an example, so here we have a message where

the last block happens to be shorter than the block length, and then we

append the 1,0,0 string to it. It's very easy to see what the pad is,

simply look for the first one from the right, we can remove this pad and recover

the original message back. Now there's one corner case that's actually quite

important, and that is what do we do if the original message length is already the

multiple of a block size? In that case it's really very, very important that we

add an extra dummy block. That contains the pad 1000. And again, I can't tell you

how many products and standards have actually made this mistake where they

didn't add a dummy block and as a result, the MAC is insecure because there's an

easy existential forgery attack. And let me show you why. So suppose in case the

message is a multiple of a block length, suppose we didn't add a dummy block and we

literally MAC-ed this message here. Well, the result now is that if you look at

the message which is a multiple of the block size and a message which is not a

multiple of the block size but is padded to the block size, and imagine it so

happens that this message m prime one happens to end with 1-0-0. At this point,

you realize that here, the original message. Here, let me draw it this way.

You realize that the original message after padding. Would become identical to

the second message that was not padded at all. And as a result, if I ask for the tag

on this message over here, I would obtain also the tag on the second message that

happened to end in 1-0-0. Okay, so if we didn't add the dummy block, basically,

again, the pad would be not invertible, because two different messages, two

distinct messages, happen to map to the same padded result. Again, as a result,

the MAC becomes insecure. So to summarize, this ISO standard is a perfectly fine way

to pad, except you have to remember also to add a dummy block in case message is a

multiple of the block length to begin with. Now some of you might be wondering

if there is a padding scheme that never needs to add a dummy block, and the answer

is that if you look at a deterministic padding function, then it's pretty easy to

argue that there will always be cases where we need to pad, and the reason is

just literally the number of messages that are multiples of the block length is much

smaller than the total number of messages that need not be a multiple of the block

length. And as a result we can't have a one to one function from this bigger

set of all messages to the smaller set of messages which are a multiple of

the block length. There will always be cases where we have to extend the original

message and in this case that would correspond to adding this dummy padding

block. However, there is a very clever idea called CMAC which shows that using a

randomized padding function we can avoid having to ever add a dummy block. And so

let me explain how CMAC works. So CMAC actually uses three keys. And, in fact,

sometimes this is called a three key construction. So this first key, K, is

used in the CBC, the standard CBC MAC algorithm. And then the keys, K1 and K2,

are used just for the padding scheme at the very, very last block. And in fact in

the CMAC standard, the keys K1, K2 are derived from the key K by some sort of a

pseudo random generator. So the way CMAC works is as follows. Well, if the message

happens to not be a multiple of a block length, then we append the ISO padding to

it. But then we also XOR this last block with a secret key, K1, that the

adversary doesn't know. However, if the message is a multiple of the block length,

then of course, we don't append anything to it. But we XOR it with a different

key, K2, that, again, the adversary doesn't actually know. So it turns out,

just by doing that, it's now impossible to apply the extension attacks that we could

do on the cascade function, and on raw CBC. Because the poor

adversary actually doesn't know what is the last block that went into the

function. He doesn't know K1, and therefore, he doesn't know the value at this

particular point and as a result, he can't do an extension attack. In fact, this is a

provable statement. And so that this construction here is simply by XOR-ing

K1 or XOR-ing K2 is really a PRF. Despite not having to do a final

encryption step after the raw CBC function is computed. So, that's one

benefit, that there's no final encryption step. And the second benefit is that we resolve

this ambiguity between whether padding did happen or padding didn't happen by using

two different keys to distinguish the case that the message is a multiple of the block

length versus the case where it's not but we have a pad appended to the message. So

the two distinct keys resolve this ambiguity between the two cases, and as a

result, this padding actually is sufficiently secure. And as I said,

there's actually a nice security theorem that goes with CMAC that shows that the

CMAC construction really is a pseudo-random function, with the same security

properties as CBC-MAC. So I wanted to mention that CMAC is a federal standard

standardized by NIST and if you now, these days, wanted to use a CBC-MAC for

anything, you would actually be using CMAC as the standard way to do it. But

particularly in CMAC, the underlying block cypher is AES and that gives us a secure

CBC-MAC derived from AES. So that's the end on the segment and in the next segment

we'll talk about a parallel MAC.