So far in this unit, we work with largest samples, where the success/failure conditions was met. But what if it's not met? Then what comes to our rescue is inference via simulation. We did a little bit of this earlier in the class when we worked over the gender discrimination case example. So in this video, were going to review how we set up a simulation assuming that the null hypothesis is true. Because, remember that if we're doing any sort of hypothesis test where the ultimate goal is to get a p-value, the definition of the p-value stays regardless of what type of method you're using. It's always an observed or more extreme outcome given that the null hypothesis is true. So we want to make sure that throughout our hypothesis test, we act as if that null hypothesis is true, and what that means is that we set up a simulation scheme which assumes that that null hypothesis is true. Let's give a quick example. Remember this guy, Paul the Octopus, who became famous for predicting correctly the outcome of soccer games during the 2010 World Cup. So the setup that he had was, he was given these two boxes, with a little bit of food in each of them. And these boxes had flags of the countries who were playing against each other in the World Cup that day. And basically the, his predictions were assumed to be whichever box he chose to get the food out of. So, he was, he became famous because he was actually able to predict all eight World Cup games and predicted them all correctly. We want to see, does this provide convincing evidence that Paul actually has psychic powers? In other words, that he does better than just randomly guessing. Because in his setup he had only two countries to choose from, if he is randomly guessing, he would be expected to get right 50% of the time. So the null hypothesis, which claims that no he does not have psychic powers, he's simply randomly guessing, would set the, the true proportion of success to 0.5. If he's doing better than randomly guessing, then the alternative hypothesis should say that p is greater than 0.5. Let's check to see if our conditions for inference are met here. We also know that the sample size is eight, or our number of trials is eight, and Paul the Octopus guessed all of them correctly so p hat is one or 100%. Let's check to see if the conditions for inference are met here. In terms of independence, it seems like we can assume that his guesses are independent of each other from one time to the other. In terms of sample size and skew, we need to check our success/failure condition. We have eight trials times 0.5, our null value, gives us four. So that it appears that our success/failure is not met. Meaning that the distribution of sample proportions cannot be assumed to be nearly normal. Which means that we cannot use methods that rely on the central limit theorem and the normality of the sampling distribution to find our p-value. And this is one, one when once again the simulation based inference comes to our rescue. So how do we do simulation based inference? Let try, let's try to remind ourselves. Remember the ultimate goal of the hypothesis test is a p-value. And the p-value is the probability of observed or more extreme outcome given that the null hypothesis is true. So what we want to do is to devise a simulation scheme that assumes that the null hypothesis is true. And we want to repeat the simulation many times and record the relevant sample statistic. Finally we calculate the p-value as the proportion of simulations that yield a result favorable to the alternative hypothesis. For those of you that remember the examples we did earlier in terms of inference via simulation, these steps should make sense. For those of you who do not remember them, then please revisit these steps once again after we go through the calculations for this particular example. So given that our null value is 0.5, how do we set up a simulation scheme? We can use a fair coin and label heads as successes. This is our correct guesses. We could also use tail, but in this case we're choosing to use heads. And one simulation can be comprised of a flip of the coin eight times, and recording the proportion of the heads, the correct guesses. Remember we're trying to simulate whatever Paul did as many times as possible, and we need to think of his eight trials as one batch. And we want at each simulation to recreate that batch of eight trials, and calculate his rate of success. Remember his rate of success was one. And we were going to try to see if we leave things up to chance, what does the rate of success come out to be. We repeat the simulation many times then, recording the proportion of heads at each iteration. Finally, we calculate the percentage of simulations where the simulated proportion of heads is at least as extreme as the usual observation. So let's take a look to see how we can actually do this. In our first simulation, we said that we're going to flip a coin eight times, so lets flip the coin once, and it seems like we get a head first. So that would be a success. We can flip the coin one more time, another head. Another flip of the coin, another head. Another flip of the coin, a tail. Another flip of the coin, another head. We have three more to go in order to get to our eight tosses. One more, another head. Another head, and lastly yet another head. So in this case, our sample proportion, or the proportion of success, is seven over eight or 0.875. We record this number, and we're going to collect these on the dot plot, dot plot at the bottom of the screen. For the second simulation, we once again have eight slots, and we toss the coin eight times and we record the outcomes. And in this case we have three out of five heads. So our proportion of success here would be 0.375, and we record that number on our dot plot as well. Another simulation, yet another set of eight flips, and then we will want to count how many of those were heads. So that seems like five out of eight, and we want to record that number as well. We can keep doing this forever, and we would want to do it as many times as possible, but for illustrative purposes, we're only going to really get to ten simulations. So let's say that at each iteration we're collecting these data the simulated p hat, and finally when we get to the last simulation again we roll the coin eight more times. Seems like we have six out of eight heads, for 0.75. So this is what our simulated distribution looks like for p hat. Obviously, if in fact we actually had done a lot of simulations, as we should, the shape of the simulation would look maybe only slightly different. We would definitely have more observations, and the shape should probably follow something similar to this, but ten simulations is definitely not sufficient to make a call. However, based on this and based on the definition of the p-value, as the probably of observed or more extreme outcome. So in this case, our observed outcome was 100% success. So the p-value can be defined as what is the probability of 100% or more success, which doesn't even exactly make sense, given that the true rate of success was only 50%. We don't have any data, any simulated sample proportions that actually fit the bill, so based on this simulation, our p-value is zero. It's probably usually a good idea to say that it's almost zero. And chances are if we had actually done this properly with about 10,000 or so simulations we would get a number that's small, which would probably also yield a rejection of the null hypothesis, but it may not be exactly zero. Of course, when we're thinking 10,000 simulations, we would never think of doing that by hand and we would use R for it. So, the first thing we want to do is we want to load our inference function that you should have been using in the labs and we're going to use in order to do the simulation test. We can then define what data from Paul the Octopus looks like, that's eight yesses and zero noes. And finally, we can write our inference function as where we're estimating a proportion, we're doing a hypothesis test using a simulation method, and we’re calling the outcome yes, a success. Our null value is 0.5 and our alternative hypothesis is looking for parameter greater than 0.5. We’re looking for evidence of a parameter greater than 0.5. In this case, the p-value with 10,000 simulations, which is the default for this function, comes out to be 0.0037, meaning that again we would reject the null hypothesis. Let's think here though, what does rejection of the null hypothesis here mean? Does it mean that we found evidence that Paul is psychic? Probably not, and chances are we've made some sort of an error where the null hypothesis should not have been rejected. We had a pretty small sample size. It appeared to show a trend in a certain way, and what that, those particular data yielded is a small p-value based on which, yes, we would definitely reject the null hypothesis. But we might be making a type one error rejecting a null hypothesis that says this octopus simply randomly picks when we shouldn't have. The possibility would be to try to collect a little more data from Paul the Octopus, but unfortunately he passed away shortly after he became a sensation.