Welcome back. You have previously learned about N-gram language models. And specifically, he used it to compute the probability of a sequence of words. Now in this video, you're going to identify the limitations of that model, and you'll see that it requires a lot of space and a lot of memory. So with that said, let's dive in. Suppose you have to translate the sentence, j'ai vu le match de foot, from French to English. If you had several similar sentence candidates like, I saw the game of soccer. I saw the soccer game. I saw the soccer match, and saw I the game of soccer. For an accurate translation, you could compute the probabilities of each sentence using a language model like the N-gram, and end up selecting the sequence of words with the highest probability. In this case, that would be, I saw the soccer game. You may have encountered the N-gram model before, but let's go ahead and review it quickly. Recall that in order to build an N-gram language model, you have to compute conditional probabilities. For bigrams, you have to compute conditional probabilities using one previous word. For trigrams, you could compute it using two previous words. So for an N-gram model, you use conditional probabilities using n minus one words. At the end, you'll get the probability of a whole sentence by multiplying the probabilities of each word in the sentence using its previous and minus one words. So in the case of a bigram, to get the probability of a three-word sentence, you would multiply the probability of the first word by the probability of the second word. Given the first word, and then by the probability of the third word, given the second one. These models have many limitations. One of them is that in order to capture dependencies between words far away from each other, your model would have to account for conditional probabilities in very long sequences of words. This could be difficult to estimate without correspondingly large corpora. Even in the case of large corpora your model would need a lots of space and RAM to store the probabilities of all the possible combinations. You can see how quickly this approach becomes impractical. Up next, I'll introduce you to recurrent neural networks and gated recurrent units. Two models that are much more efficient than N-grams for NLP tasks like machine translation. You have now seen that traditional N-gram language models require a lot of space and a lot of memory. So if a user is downloading an app, for example, on their phone, then they might not have enough space there. And as a result, the developers might not want to use traditional engram language models here. And they might want to use a different approach, which is known as recurrent neural networks. In the next video, I'll introduce you to RNN or recurrent neural networks.