Welcome to Unit 4.9 in which we will finally get our hands dirty and set out to develop the syntax analyzer that we described all along. Now, let me remind you the overall picture of what we're doing. We are developing a Jack compiler. And in this module, we are focusing on the syntax analyzer part of the compiler. And we are faced with a challenge of unit testing this module separately from the rest of the system. We decided that in order to do so, in project ten, we will have the syntax analyzer generate XML code. And by looking at this XML code, either by eyeball analysis, or using some other techniques. We'll be able to tell that the syntax analyzer actually delivers, that it understands the source code so to speak. And so, that's the program that we have to develop, this syntax analyzer. All right, so we are doing it obviously in the context of the Jack language. And the conflict is such that we have to develop a syntax analyzer that is capable of translating all of the supplied Jack programs that you will find in the project 10 directory. And for each of such Jack file your analyzer should produce separate XML output file that reflects the grammatical context of the Jack file. And this XML should be identical to the compare files that are also provided by us. Now how do you compare them? Well, first of all the test programs for the compare files are available in the project directory on your computer. And you compare your outputs with our compare files using a utility that we also supplied which we call text.comparer. TextComparer is a tool that compares two text files up to white space differences. So, it is quite possible that your XML analyzer will use some spaces or some empty lines and so on that will not get into the comparison with the supplied compare files. So, the TextComparer is quite a handy tool. Now in other way to look at your XML output and figure out if it makes sense is to load it into either a browser like Chrome or Firefox or into a text editor that knows how to handle programming languages. Anyone of these tools which are freely available on the web will enable you to once again, look at the generated XML and see how it's structured. And once again, the side if it's well structured. And it's a little bit difficult to explain in words, you can just try it and see for yourself. You also need a programming language to create your syntax analyzer. And in this course, so far we accept programs in Java and Python. And finally, it's recommended to also take a look at the book and we have a whole chapter dedicated through syntax analysis. And you can find some useful information there, even though most of what I have to say we said already in these units. All right. So, here's the implementation plan. First of all, we're going to develop a tokenizer which is a very well defined and relatively simple undertaking. Then, once we developed the tokenizer, we're going to move on and develop a compilation engine. And we use the term compilation engine to describe a syntax analyzer that uses the services of the tokenizer that we developed before. And we are going to develop this compilation engine in two stages. First, we'll build a basic program, which is relatively simple and then, we are going to extend it into a full blown syntax analyzer as we now go on to explain. All right, let's begin with a Jack tokenizer. To remind you, we start with some source code and we want to end up with something that looks like this. Now, what we see here in the right hand side is slightly different than the XML example that we saw in the previous units. And here are the differences. First of all we decided that we are going to wrap up the whole XML file with a begin and end tag, which we call tokens. Now why do we do this? Because it turns out that that's what the browsers expect. If you're going to load this file into a browser, in order to see it nicely structured. The browser expects to see a well defined XML that has a beginning and an end. That's why we added this little detail. What else? You will also notice that the string constance, like the word negative, in the source code, appears in the XML without double quotes. That's just fine. That's how we want you to handle it. So, your syntax analyzer should not output double quotes, it should output the string constant as is, without double quotes. For technical reasons only. And finally, you will note that we have certain special characters which are handled in a special way. In particular, less than, greater than, double quote and ampersand have special meaning in HTML and therefore when you load this file into a browser in order to inspect it. The browser will kind of get lost and it will not know how to handle these special characters, or it will handle them in a way which is not good for us. And therefore, instead of using these special characters as is, we use an alternative set of characters as you see here, which kind of bypass this special meaning in HTML. And once you use this convention here, you will actually see less than, greater than, equal, and so on, as they are when you load this file into your favorite browser. Once again, I said browser several times and you have to understand we're not talking about the Internet at all. We use the browser as a viewer program. And browsers have this ability to look at the XML file and lay it out nicely on the screen in kind of a telescopic way. So, that you can tell that everything was done well by your syntax analyzer. So, how should you build this tokenizer? Well, we propose that we actually request that you follow the API describe in unit 4.8. And you have to remember that everything in this module both the tokenizer and the passer is essentially a text processing challenge. And therefore you are welcome to use everything your language facilitates in the way of string processing and handling regular expressions and what not. Its up to you to look at the libraries and decide what to use. You don't have to use these things, but you can if you wish. All right, moving along. Once we are done with the Tokenizer, we have to develop the parser, or the overall Jack analyzer. So, here's an example of some test program, some Jack code, and we have to analyze it, or translate it, and into the XML that you see on the right hand side. How should we do it? Well, we talked about it all along in this module. But basically, we recommend that you begin by writing a basic compilation engine that handles everything except for expressions. Why? Because expressions are a little bit of a headache, so we prefer to handle them separately. And only then we're going to add the handling of expressions and in order to support this staged development strategy, we're going to provide you with test files that will enable you to unit test each one of these steps separately. So, here's how we do it. Here's an example of one of the test files that your syntax analyzer should handle in project ten. And I'm going to highlight the expressions in this Jack file. And as you see, every program has lot's of expressions in it and so we're going to supply you with two versions of this file. First of all we have this version which is sort of the complete test. But we also provide another version of this file which is called expressionless version. And in this version we have substituted every expression with a variable name which is in the scope of this method. Okay, so for example instead of saying y plus size less than 254, we simply say x. Now x makes sense because x happens to be a field in this class and therefore, it is globally recognizable by all the methods. It's in the scope of all the methods. Likewise, instead of saying size = size + 2, which is a complex expression, so to speak, we replace it with size = size. So, this is our job to create these files, it shouldn't interest you how we did it. But, the result is the file which is simpler than the original one because it has no complex expressions. So, it suites the purpose of testing a VM I'm sorry it's syntax analyzer that handles everything except for expressions. So, that's the purpose of these files. So, basically when you looked at the expressionless version of these programs, you will notice that they make no sense. They are kind of weird semantics, and yet syntactically speaking, they are perfectly valid. And therefore, they serve the purpose and we use them happily in order to test the basic version of our syntax analyzer. Once your syntax analyzer can handle these expressionless files you can move on and complete the overall version that handles expressions as well. So, that's the grand plan of building the syntax analyzer. Now I'm going to end this unit with a few words about handling expressions. And going back to the jet grammar. The problematic clause of the grammar in terms of passing is what is highlighted here. It's the handling of the term rule. And the problem is, that when the current token is a variable name, as I explained in one of the previous units, or some identifier. It can be either a variable name like x, or it can be an array entry, like x 18 for example or you can do a 17 call, x dot do something, right? So, the only way to resolve which possibility we are in is to look ahead into the next token. We have to save the current token because we haven't yet used it for code. We have to look ahead for the next token and then once we have these two tokens in hand, we have all the information that we need in order to resolve how to handle it and which XML to generate from it, okay? So, let's detail what you have to worry about when you develop the full scale version of your syntax analyzer. All right, now what about subroutine call? It looks like subroutine call is a separate tool in the grammar and indeed, it is. But for various reasons we decided that the subroutine call will not be handled by a separate compilation method like other rules in the grammar. But rather we're going to handle the subroutine call logic, the right hand side of the rule, as part of handling the term. When you develop the syntax analyzer, you will realize that this little advice here will result with a code which is easier to write. So once again, there will be no compile subroutine call. The compile subroutine call logic will be handled as part of handling the term. So, to recap here is our roadmap. We're developing a Jack compiler. And in this module, we developed the syntax analyzer of the compiler. And in the next module, we are going to develop a code generator that, taken together will deliver the full functionality of a compiler for the Jack language. So, this has been the unit in which we discussed how to handle Project 10. And what comes up is the concluding perspective unit of Module 4.