In this segment we will review some basic probability distributions, both discrete and continuous. The first one is the Bernoulli distribution. It's used when we have two possible outcomes, such as flipping a coin, where it could be heads and tails, or the cases where we have a success or a failure. Well, to denote this, let's say a random variable x follows a Bernoulli distribution with probability p, where p is probability of success, or probability of heads. The little squiggle refers to, is distributed as, saying it follows this distribution. In this case, the probability of a success or heads, we'll say is p. And we'll denote that sometimes as x = 1. The failure or tails, x = 0 has probability 1- p. We can write this as a function for all the different possible outcomes. And say what's the probability that the random variable x takes a value of little x given a specific value of p? So the notation here is that the capital letters refer to the random variable. The lower case letters refer to a possible value it might take. And then over here we have a probability specified for it. Later when these properties are unknown, we'll represent that with a Greek letter rather than a Roman letter. We may write this in short hand as just probability f(x/p),, and dropping the big x from the notation. In the case of a Bernoulli, mathematically this works out to be p to the x (1-=p) to the 1- x, for the case that x's are either 0 or 1. One way to write this is with an indicator function. The x is either 0 or 1. This indicator function we could write as a function of x and its a step function, sometimes referred to as a heavy side function. It takes value 1 when its argument is true, it takes value 0 when its argument is false. This will be really useful notation for us in the rest of this course. The indicator function takes precedence in the order of operations so we always evaluate it first, this is a way we can avoid doing things such as taking the log or the square root of a negative number. In basic textbooks, this is referred to as the probability mass function. It gives the probability of different outcomes of the random variable. In some textbooks, they make a strong distinction between discrete variables where these were probably mass functions, and continues variables where these are probability density functions. Turns out if you get far enough along in math and you get up to the measure theory level, you can view everything as a density. So I'm going to refer to this as a density function in a measured theoretic sense. One more concept is that of an expected value. This is the theoretical average or the theoretical mean. We write it with a capital E, and the expected value of x as we sum over all possible outcomes. Little x, we sum up x times the probability random variable takes up variable x. In this case it's really simple. One possible outcome is one, it takes that with the probability of p. Another possible outcome is 0, it takes that with the probability 1- p. So the expected value for Bernoulli is just the probability p. Similarly we can talk about the variance which is the square of the standard deviation. For Bernolli, the variance works out to be p x 1- p. The generalization of the Bernoulli when we have N repeated trials is a binomial. Binomial is just the sum of the N independent Bernoullis. We can say X follows a binomial distribution with parameters n and p. In this case, the probability function, probability that X takes some value little x is given p. Is (n choose x) p to the x (1- p) to the n -x. N choose x is the common term, n factorial over x factorial (n-x) factorial. And this is all for X taking values 0, 1 up to N. The expected value for binomial is np and the variance for binomial is np(1- p). Other distributions that we may encounter include the multinomial, further generalization and the Poisson distribution.