Hello. This lesson will introduce you to the calculation of probabilities,
and the application of Bayes Theorem by using Python.
These are very important concepts and there's
a very long notebook that I'll introduce you to in just a second,
but I've also provided links to two web pages that provide
visual introduction to both basic probability concepts
as well as conditional probability concepts.
So first, let's take a look at these websites.
First, the basic probability.
This has three different parts.
The first talks about likelihood,
the second about expectation and the third about estimation.
And the easiest thing to do is to just show you.
Likelihood talks about how to measure probability of an event to occur,
and this does it by doing simulations.
So you can see I can flip coins,
I could also flip them a bunch of times,
and you notice how it randomly generated data.
Here's our theoretical expectation, a fair coin,
we should have half heads half tails,
but the more we do it,
we get random results.
This is a key concept that,
when you're sampling from nature,
you're often getting random results.
You're not getting a uniform prediction.
And that's one of the fundamental concepts in probability and
something that you really need to work with to make sure you understand properly.
Next is the idea of expectation, where,
we may have a specific idea of what the result should be.
So, for instance, if we roll the dice,
the average should be the average of one, two,
three, four, five, six,
or three point five.
So if I roll the dice once,
we get a single value,
if I roll it again, we get another value and you
can see how this line is appearing on the left.
This is giving the average of all of our rolls.
If I roll it a hundred times,
you'll see that we get this long term frequency.
And as we increase we're going to get
hopefully closer to our expected theoretical average.
But there will be deviations,
because again, this is a random variable.
And so the process itself is random.
Next is the idea of estimation.
Here we can actually sample from our data set,
and get different accuracy or values such as a measurement of our Bias,
our Variance, or a Mean Squared Error.
You should definitely play with this site and get a feel for what it's actually showing.
The next site is talking about
compound probability and I won't step through this like I did the last one.
I'll just show you the basic ideas but you can see that we have different sets where
we may have an idea that this represents one particular event occurring,
such as, it was sunny today.
This might be a different event,
such as, it rained today.
And this might be a third event that it was cloudy,
and sometimes events may be overlapping.
Sometimes they may be independent or even mutually exclusive.
That they can't occur together.
And sometimes we'll be able to see the intersections of these,
or the unions of these,
and make interpretations based on that.
Next step was understanding combinatorics,
which is how we create permutations and combinations.
You will be able to see these and test these two ideas
out and you can change the number of marbles.
In this case that you're going to get,
and basically see how running this will change the results.
Notice, as I do this you'll see this forming,
so the first option is we have three different ways of placing the marbles.
When we add our second marble in,
we have these different combinations etc. etc.
The last idea is conditional probability.
This is a really neat idea because what we do is we drop balls uniformly over and
we see that as we move A around and we move B around and we move C around,
we can get different results.
So the idea here is that we have data that has
a specific probability of A occurring and
then a specific probability of B occurring and a specific probability of C occurring.
And what we can do is calculate the conditional probability that we had,
given B occurred, what's the probability that C occurred?
And this is going to be by the colors of the balls down here,
if they're blue, this light blue,
then they went through both green and blue.
So in other words they went through B and C,
and we can see that probability.
And you could change the perspective of things to see what happens.
And this then represents the conditional probability here.
So, play with these websites and get a better feel for these fundamental concepts.
The last thing I want to show you is the introduction to probability notebook,
which walks you through many of these ideas.
We look at Combinatorial Analytics,
where we do permutation and we actually can simulate things,
such as permutations of our different data sets.
You'll be able to see how these work.
We actually have tools built into Python that makes some of these things easier,
such as the calculation of permutations,
we can do permutations without replacement,
which is slightly different.
That's demonstrated here.
We then can also do combinations,
and combinations without replacement.
I also want to show a little bit more about probability, depending on what we are doing,
so for flipping a coin,
the probability of flipping a heads may be 0.5.
You can change this value and get
a different probability and that will change the result.
What we do then is we basically randomly choose either a heads or a tails
based on the probability we put in and we generate N of those.
So this effectively simulates flipping a coin N times,
where the probability is what we think it's going to be for a fair coin.
So here you go, we had heads,
heads, tails, tails et cetera,
when we could accumulate the number of heads in this particular random sample it was 11.
Which means the probability of getting a heads was only 0.44,
close but not exactly the same as what we expect theoretically.
Now, if we simulated more,
that number would likely approach 0.5, so what we expect.
We also do the same example with rolling a dice,
and we get a similar result.
Next we talk a little bit about Bernoulli trials and
the Binomial distribution and then we see
how this actually plays out when we flip a coin,
five coins, and how many heads we can get.
We also can look at the long term frequency,
and this is similar to what we saw before,
here is a bunch of heads,
what's the probability of getting a heads?
It was 0.4 In this particular example,
even though the probability was set to be 0.5.
If I go down farther we can make a plot of this and see
the long term frequency just like you saw in that previous website.
And you see that over time,
as we get increasingly large numbers of samples in this case 50,000 flips,
we started getting very close.
Notice that this is 0.5025.
So we're very close to the theoretical expectation, but not exact.
The rest of this notebook talks about other concepts that are
important in probability theory,
how we can take data and we can normalize it and get out different sets.
Again, here we're seeing a hundred rolls,
we are very close to that theoretical expectation
but as we increase the number of rolls, we get closer.
We can estimate probability from density by using histograms,
we just normalize the histogram,
we can create a cumulative distribution or a cumulative mass function.
This is nice, because if we can read off here,
we say what's the total bill that we expect 50% of the time?
And you could just read right off, and say,
well that's around $18.
That's what the CDF does.
So, with that I'm going to go ahead and stop.
There's a few other things in here,
but be sure to play with this notebook,
test these different concepts out yourself and get a very good feeling for probability.
Since we will be using it repeatedly throughout this course.
And in your career as a data analyst.