Welcome to the last unit of the first part of our two part or two module compiler treatment, in which we discussed syntax analysis and parsing. And as usual, we will answer some frequently asked questions about this module. So let's see. Compilers don't only translate programs, they also find and report errors. Are we going to handle errors in our Jack compiler? Well, error diagnostics is something that we decided to completely sidestep in this module. In other words, we assumed, quite naively, I must say, that the source code that we have to compile is error free. So what should we do if we were to remove this simplifying assumption? Well, in some cases, catching an error is a simple matter. For example, suppose that you parse a let statement, L-E-T. According to the Jack grammar, the keyword let must be followed by an identifier, as in, say, let x equals something. So x is an identifier. So when your parser handles a let token, it should expect that the next token should be an identifier. And if the next token is anything but an identifier, you can generate an error message and terminate the compilation right there. However, normally, compilers try to be more friendly and they don't stop dead in their tracks whenever they find an error. Instead, a clever compiler attempts to somehow contain the damage that was caused by the current error and proceed to catch and report additional possible errors in your code. Also, a clever compiler makes a best effort to try to sort of pinpoint the exact location in the source code where the error was conceived, so to speak. And this location is not necessarily the same place where the error raised its head, or was detected by the compiler. So, as you see, error handling is an intricate art that requires sophisticated diagnostics methods and intricate code, none of which were discussed in NAND to Tetris. And as I say this, I feel obliged to also qualify something that I said when we discussed, I think, tokenizers. If you recall, I said that once we construct a tokenizer, an object that implements the tokenizing API, once we do it, we no longer need the input file, which contains the original source code, because the tokenizer is going to sort of dispense for us the next token whenever we need it. So why do we need the source code? However, if we wish to handle errors, then we must preserve the original source code in order to annotate it with all sorts of error messages, warning messages and so on. All these details are very important when you set out to develop a full service compiler. And yet in NAND to Tetris, we decided to build a minimal compiler. And we decided to treat error diagnostics as an optional feature that can be certainly added to the basic compiler whenever you want to do it. All right, so with that, let us move to the next question, which is. Can we use the techniques that we learned in this module to develop parsers for other programming languages? Let me start by saying that parsers are quite useful, not only for parsing programs, but also for parsing any syntax-based text, which happens a lot in the careers of many application programmers. There are numerous applications out there, ranging from bioinformatics to email to financial services in which you have to handle, analyze, and render structured text. And the techniques that we learned in this module are directly applicable to any of these applications. Specifically, throughout the module, we covered and developed the two most important elements of syntax analysis. And they are tokenizing and parsing. So in that respect, you now have a solid hands-on understanding of the essence of what it takes to do syntax analysis. However, compared to other programming languages that you are familiar with, the language that we handled, Jack, is deliberately simple. It has neither operator priority nor inheritance nor many other features that can be found in industrial strength programming languages. Now, this limited design was obviously done on purpose, because our goal was to be able to develop a Jack compiler using simple and elegant parsing algorithms. In particular, the parsing algorithm that we discussed in this module builds the parse tree from the top down. It reads as few tokens as possible, and it tries to determine which language construct we are parsing as soon as it can make this determination. Parsers that employ this top-down greedy parsing strategy are quite suitable for languages that have a simple syntax. And yet, when compiler developers have to parse languages that have a more elaborate syntax, they typically resort to using something known as bottom-up parsing, which is quite different. This parsing strategy starts bottom-up by first building the terminal leaves of the parse tree, and they defer the decision of which language construct we are dealing with to a later stage in the parsing process. This way, they can entertain simultaneously several parsing possibilities. But in order to do this, these algorithms have to use backtracking. And the result is parsing logic which is harder to develop than the simple top-down parsing that we used in this module. Which, once again, is applicable to numerous applications and to simple languages, but not necessarily for languages like Java and C++. As usual, there are numerous topics, in this case in compilation and formal languages, which are simply outside the highly focused scope of NAND to Tetris. Now, if you find compilation and computational linguistics interesting, then by now, I think that we gave you a solid foundation to build on. And you're obviously welcome to consider taking either a compilation course or buying some decent book about it. So with that, let's move to the next question. Why didn't we use lex and yacc? So, for those of you who are not familiar with these buzzwords, lex and yacc are two software tools that come from the world of Unix. And by now, there are numerous versions of these two tools that have different names and flavors, and they are implemented in many different programming languages. Lex stands for Lexical Analyzer, which actually refers to a tool which is capable of generating tokenizing code automatically. And yacc stands for the whimsical, Yet Another Compiler Compiler, which actually refers to a tool which is capable of generating parsing code automatically. Typically, lex and yacc go together and are quite inseparable. So where do these tools come from? Well, many good programmers are lazy, they're lazy in the positive sense of the word, and they seek to automate whatever can be automated. Now, as you may have noticed, and I'm sure you did, the parsing logic that we outlined throughout this module, which by the way, was given as a recipe for developing your parser, well, this parsing logic was highly structured. In particular, we advised that for each production rule in the Jack grammar, you guys should develop a corresponding parsing method In your syntax analyzer. And taken, together all these parsing methods comprise this analyzer. Now, it turns out that this software development strategy that we exposed in this module is so structured that indeed it can be generalized to other grammars and automated. In other words, if you have a well-defined grammar of some language, or some file format or whatever, and you can specify this grammar using some agreed upon machine readable format, then indeed, you can write a program that takes this grammar specification as input and produces from it automatically a syntax analyzer which is written in some language like C or Java or whatnot. Now, of course, you will have to get into the generated code and make all sorts of changes here and there. But most of the logic can be generated automatically by lex and yacc kind of tools. And so that's what these tools are all about. They generate syntax analysis code that can be customized and developed into a full scale syntax analysis tool. So let's go back to the question. Why did we insist on developing the tokenizer and the parser from scratch, where in fact, we could have used these automatic tools and make our life much simpler? Well, I suspect that you know the answer. NAND to Tetris is about doing everything from scratch from the ground up in your bare hands in order to explore and understand. Using black box tools like lex and yacc go against the NAND to Tetris spirit. And that's why we decided not to use them. So that's the end of our perspective unit on parsing. And we now turn to the next module in the course, module five. In which we will continue to develop our compiler, and in particular, we're going to deal with code generation.