In the last segment we talked about the CBC-MAC and the NMAC, but throughout the segment we always assumed that the message length was a multiple of the block size. In this segment, we're going to see what to do when the message length is not a multiple of the block size. So recall that the encrypted CBC mac or ECBC-MAC for short uses pseudorandom permutation F to actually compute the CBC function as we discussed in the last segment. But in the last segment, we always assumed that the message itself could be broken into an integer number of blocks for the block cipher. And the question is what to do when the message length is not a multiple of the block size. So here we have a message where the last block actually is shorter than the full block and the question is how to compute the ECBC-MAC in that case. So the solution of course is to pad the message and the first pad that comes to mind is to simply pad the message with all zeros. In other words we take the last block and just add zeros to it until the last block becomes as long as one full block size. And so my question to you is whether the resulting MAC is secure. So the answer is no, the MAC is not secure. And let me explain why, basically the problem is that it's possible now to come up with messages so that message m and the message m concatenated zero happen to have exactly the same Pad. And as a result once we plug in both m and m||0 into ECBC we'll get the same tag out, which means that both m and m||0 have the same tag and therefore the attacker can mount an existential forgery. He would ask for the tag on the message m. And then he would output as its forgery the tag and the message m||0. And it's easy to say why that's the case. Basically, to be absolutely clear here, we have our message m. Which after padding becomes m000. So we had to add three 0's to it. And here we have the message m zero, an m that ends with zero. And after padding, we basically now have to add two 0's to it. And lo and behold, they become the same pad, so that they're gonna have exactly the same tag which allows the adversary to mount the existential forgery attack. So this is not a good idea. In fact appending all 0s is a terrible idea. And if you think about concrete case where this comes up imagine the automatic clearing house system used for clearing checks. I might have a check for a $100 and the tag on that check. Well, now, the attacker basically could append a zero to my check and make it a check for $1000, and that wouldn't actually change the tag. So this ability to extend the message without changing the tag actually could have pretty disastrous consequences. So I hope this example convinces you that the padding function itself must be a one to one function. In other words, it should be the case that two distinct messages always map to two distinct padded messages. We shouldn't actually have a collision on the padding function. Another way of saying it is that the padding function must be invertible. That guarantees that the padding function is one to one. So a standard way to do this was proposed by the International Standards Organization ISO. What they suggested is basically, let's append the string 100000 to the end of the message to make the message be a multiple of the block length. Now to see that this padding is invertible, all we do is describe the inversion algorithm which simply is gonna scan the message from right to left, until it hits the first one and then it's gonna remove all the bits to the right of this one, including the one. And you see that once we've removed the pattern this way, we obtain the original message. So here's an example, so here we have a message where the last block happens to be shorter than the block length, and then we append the 1,0,0 string to it. It's very easy to see what the pad is, simply look for the first one from the right, we can remove this pad and recover the original message back. Now there's one corner case that's actually quite important, and that is what do we do if the original message length is already the multiple of a block size? In that case it's really very, very important that we add an extra dummy block. That contains the pad 1000. And again, I can't tell you how many products and standards have actually made this mistake where they didn't add a dummy block and as a result, the MAC is insecure because there's an easy existential forgery attack. And let me show you why. So suppose in case the message is a multiple of a block length, suppose we didn't add a dummy block and we literally MAC-ed this message here. Well, the result now is that if you look at the message which is a multiple of the block size and a message which is not a multiple of the block size but is padded to the block size, and imagine it so happens that this message m prime one happens to end with 1-0-0. At this point, you realize that here, the original message. Here, let me draw it this way. You realize that the original message after padding. Would become identical to the second message that was not padded at all. And as a result, if I ask for the tag on this message over here, I would obtain also the tag on the second message that happened to end in 1-0-0. Okay, so if we didn't add the dummy block, basically, again, the pad would be not invertible, because two different messages, two distinct messages, happen to map to the same padded result. Again, as a result, the MAC becomes insecure. So to summarize, this ISO standard is a perfectly fine way to pad, except you have to remember also to add a dummy block in case message is a multiple of the block length to begin with. Now some of you might be wondering if there is a padding scheme that never needs to add a dummy block, and the answer is that if you look at a deterministic padding function, then it's pretty easy to argue that there will always be cases where we need to pad, and the reason is just literally the number of messages that are multiples of the block length is much smaller than the total number of messages that need not be a multiple of the block length. And as a result we can't have a one to one function from this bigger set of all messages to the smaller set of messages which are a multiple of the block length. There will always be cases where we have to extend the original message and in this case that would correspond to adding this dummy padding block. However, there is a very clever idea called CMAC which shows that using a randomized padding function we can avoid having to ever add a dummy block. And so let me explain how CMAC works. So CMAC actually uses three keys. And, in fact, sometimes this is called a three key construction. So this first key, K, is used in the CBC, the standard CBC MAC algorithm. And then the keys, K1 and K2, are used just for the padding scheme at the very, very last block. And in fact in the CMAC standard, the keys K1, K2 are derived from the key K by some sort of a pseudo random generator. So the way CMAC works is as follows. Well, if the message happens to not be a multiple of a block length, then we append the ISO padding to it. But then we also XOR this last block with a secret key, K1, that the adversary doesn't know. However, if the message is a multiple of the block length, then of course, we don't append anything to it. But we XOR it with a different key, K2, that, again, the adversary doesn't actually know. So it turns out, just by doing that, it's now impossible to apply the extension attacks that we could do on the cascade function, and on raw CBC. Because the poor adversary actually doesn't know what is the last block that went into the function. He doesn't know K1, and therefore, he doesn't know the value at this particular point and as a result, he can't do an extension attack. In fact, this is a provable statement. And so that this construction here is simply by XOR-ing K1 or XOR-ing K2 is really a PRF. Despite not having to do a final encryption step after the raw CBC function is computed. So, that's one benefit, that there's no final encryption step. And the second benefit is that we resolve this ambiguity between whether padding did happen or padding didn't happen by using two different keys to distinguish the case that the message is a multiple of the block length versus the case where it's not but we have a pad appended to the message. So the two distinct keys resolve this ambiguity between the two cases, and as a result, this padding actually is sufficiently secure. And as I said, there's actually a nice security theorem that goes with CMAC that shows that the CMAC construction really is a pseudo-random function, with the same security properties as CBC-MAC. So I wanted to mention that CMAC is a federal standard standardized by NIST and if you now, these days, wanted to use a CBC-MAC for anything, you would actually be using CMAC as the standard way to do it. But particularly in CMAC, the underlying block cypher is AES and that gives us a secure CBC-MAC derived from AES. So that's the end on the segment and in the next segment we'll talk about a parallel MAC.