Now, let me talk about the code base that I will use for this class. I named it yTextMiner, the current version is 1.0, and it is written in Java. To use yTextMiner, you're required to install Java A since the dependent libraries such as Stanford CoreNLP will compile with Java A. So when you use the code you will need to have Java 8 installed in your computer. YTextMiner uses the same architecture as the Simple API provided by Stanford or its CoreNLP. So as you see, on the slide there is a architecture which is called pipeline-based architecture, and I'm going to explain this more in depth in the next slides. First, the original architecture as I said, is pipeline-based architecture, which is relatively easy to use for beginner. yTextMiner extends the notion of Stanford simple Core NLP in order to include the ability to train models such as topic models and classification models. Here topic models is unsupervised learning based and classification model here meant supervised learning base. So yTextMiner 1.0, contains a various method from Stanford CoreNLP besides, I synthesize libraries from different sources to include several other techniques that can be used for text mining. Out of several features the two outstanding features Included in yTextMiner 1.0 is topic modeling tool kit. For that I use Mallet which developed by U Mass Amherst and it's mainly for topic modeling and document classification. But as I said, for this class, I use Mallet as top modeling exercise. In addition to that, I included Alias-i's Lingpipe software, which is Java-based text mining software, in order to train. Logistic regression for document classification and sentence analysis. I also included NTU LibLinear to make use of linear SVM for document classification. Lastly I also included ISTI's SentiWordNet which will be used for sentiment analysis. To make the long story short, you are going to have opportunity to learn three different approaches to sentiment analysis. One is Stanford CoreNLP based. Second is Alias-i Lingpipe based. And third is SentiWordNet based. Let me explain a little bit more about the internal architecture of yTextMiner. There are four main data structure used in yTextMiner. The smallest unit of data in yTextMiner is called token, which includes token itself and lemmatized and stemmed version of tokens. And it's part of speech and also it has named entity. It may or may not have named entitiy depends on the token. The last one is the Boolean variable which determines whether it is a stop word or not. A group of tokens makes out a sentence. Sentence includes the list of tokens and the sentence is sentiment scores using suite techniques that I just mentioned before. The first one is CoreNLPs recurrent neural network based sentiment analysis technique. Second is Lingpipes, logistic regression technique and the last one is SentiWordNet dictionary based technique. A group of sentences make up of document, inside the document data structure contains document itself, its classification label, and its topic label. Its classification label is used for training a classification model or for simply storing its classification label. Its topic can be extracted by using Mallet's LDA or DMR. I'm going to talk more about LDA and DMR later on. The less data structure is collection which consist of a list of documents. This part is one of the major extensions that I changed from the CoreNLP framework. This includes the collection of documents and the models created using the collection of document. The models that can be created are Mallet's LDA or DMR, Lingpipe's logistic regression and LibLinears Linear SVM. In terms of features or algorithms, there are four major features that yTextMiner supports. The first feature is the preprocessing stage which makes use of Standford CoreNLP's preprocessing pipe. The second feature is the sentiment analysis in which there are three techniques used as I mentioned couple of times in previous slides. The first two techniques are supervised learning technique. The Stanford CoreNLP sentiment analysis tool makes use of neural network technique while Lingpipe's sentiment analysis tool makes use of logistic regression model. The last technique is a dictionary based technique which uses SentiWordNet as the dictionary. SentiWordNet consist about 10,000 senti words either negative, positive, neutral word in terms of the token sense. The last feature of yTextMiner is topic modeling. yTextMiner makes uses of two of Mallet's topic modeling technique. The first one is LDA, which stands for Latent Dirichlet Allocation. Second one is DMR, stands for Dirichlet-multinomial Regression. The last feature is the document classification in which we can use three kinds of classification techniques. The first one is LibLinear's Linear SVM for document classification. The second one is Mallet's Naive Bayes classifier. That the last one is link piped logistic regression or document classification.