0:00

Hi, my name is Brian Caffo and this is a lecture on Machine Learning.

So I'm gonna define machine learning as a set of algorithms

that take a set of inputs and return a prediction.

And I would classify the way in which it returns a prediction at least in the two

ways that are most useful for Data Science, as two broad categories.

0:34

In an Unsupervised case you're trying to build a prediction where you don't

actually have the outcome to train the algorithm.

So I would define unsupervised learning as trying to uncover unobserved factors.

And some examples of this would be clustering, mixture models and

principal components.

To give you an example I will go back to one of the first examples of clustering.

And that is, for the famous g factor is psychometrics.

So people like Spearmen, a famous psychometrician and statistician,

used factor analysis to combine collection of questionnaire data to find

that people who took these tests, these psychometric tests, tended to cluster.

They hypothesized that these clusters represented some outcome, some unmeasured

outcome that represents some kind of intrinsic intellectual ability.

So this was one of the first examples of unsupervised clustering

done well before the advent of computers I might add.

1:41

If that's unsupervised clustering, let's talk about what supervised learning is.

So supervised learning is using a collection of predictors and some observed

outcomes to build an algorithm to predict this outcome when it's not observed.

So some examples of supervised learning algorithms include random forests,

boosting, support vector machines.

I'll try and give you a similarly old,

maybe in fact, older version of supervised learning.

Regression.

And I give a picture of my regression book

which is free on Lean Pub which you're more than welcome to download.

But the reason I actually put the book up here is because of the picture

on the cover.

And I like this picture quite a bit because it was taken from Francis Dalton's

original paper where he developed regression.

In this paper he was trying to predict the height

of sons from the height of the parents.

In some cases, some midpoint between the father and the mother's height.

In other cases just from the father's height.

But this is an example where we have an observed outcome, the son's height, and

then we have the predictor,

the father's height and Francis Dalton wanted to build up and algorithm.

So that, when you just do the father's height and

say the mother was still pregnant.

Then you could try to predict what the son's height was.

So, this led to the development of what we think of low as linear regression.

3:08

However, modern prediction algorithms can take tens of thousands

are potential predictors to predict outcomes.

Now you need a lot of data to train up your algorithm, but

that's been some of the real advances in this area.

So in these cases, you would use a collection of outcomes,

and lots of collection, a large collection of predictors.

You would build up this algorithm.

3:35

And then you would then be able to predict the outcome

in instances where you didn't have it.

So you might wanna predict stock prices in the future.

But doing that, you're gonna use historic stock pricing data

with a lot of predictors to try and build up your algorithm.

Okay? So that's machine learning in a nutshell.

I'd like to contrast it because it seems very different.

Many people are familiar with traditional statistics, but

they're maybe a little less familiar with machine learning.

So I'd like to contrast traditional statistics with machine learning.

So in my mind, traditional statistics or machine learning, the main emphasis,

at least let's focus on supervised learning.

It emphasizes predictions and

then it tries to evaluate performance via the prediction performance.

So, unlike and we'll talk a little bit about statistics,

traditional statistics, how it evaluates performance.

4:33

So there's a lot of concern for overfitting in machine learning, but

there's not as much concern for model complexity.

So if you have a highly complex model that's not overfitting, and

yielding good predictions, then there's more of a tolerance for that in the field

of machine learning than there is in the field of traditional statistics.

And so, there's an emphasis in machine learning on performance.

And in less of an emphasis on super population models and

generalizability that occurs a lot in statistics.

So generalizability in machine learning tends to be obtained

by applying the algorithm on novel data sets where you know the outcome and

checking to see how good your predictions are.

Rather than on a modeling and

sampling assumptions that often occur in traditional statistics.

And there's of course in machine learning a concern over performance and robustness.

So in traditional statistical analysis, le's contrast that now, this tends

to emphasize not so much predictions, even if it's doing predictions, but

emphasizes predictions or models as they relate to some superpopulation.

You have a sample and you wanna generalize it to some superpopulation

5:44

that the sample was drawn from.

So, there's less of an emphasis on sampling from assumptions in

machine learning.

Traditional statistics tends to focus on a-priori hypotheses,

where things like unsupervised learning tend to try to generate the hypotheses.

Right?

The G factor generated this idea that there was intrinsic

variability in intelligence.

6:07

It tends, Traditional Statistics tends to focus on simpler models over complex ones,

and tends to put a higher penalty on complexity than a machine learning

algorithm does.

In fact, the idea of a model seems already simpler than the idea of an algorithm.

Right?

Just the words themselves they seem like when I give you the word algorithm it

conjures up an image of that's far more complex than the idea of a model,

the idea of a model is a simplified version of something that's complicated.

So there's a lot of emphasis on traditional statistics

on parameter interpretability.

And then an emphasis on the modeling and assumptions that go in to connect your

data to the population you're trying to draw inferences on.

And just like machine learning, there's concern over assumptions and robustness.

So those are some broad distinctions between machine learning and statistics,

though of course there's a lot of overlap.

7:02

Let's just give you some examples of problems that occur

where you could both approach them from a statistics perspective and

machine learning perspective and talk about them.

One of the most famous, recent machine learning exercises was the Netflix prize.

And here the goal was to predict movie choices from a large

collection of instances where users rated movies.

So you had the outcome data and you had a lot of data on their viewing history and

other things that might help you perform that prediction.

So machine learning would build an automated movie recommender system.

And success would be defined as anything that produces reliable predictions.

7:46

Statistical analysis on the other hand, would try to build a parsimonious and

interpretable model to better understand why people choose the movies that they do,

so you'd want something that was interpretable.

You'd want to understand uh-huh, this is the reason why this prediction works,

it's because of the psychology.

People have a tendency to like this kind of movie if they like this kind of movie,

whereas an algorithm can tend to have a lot more complexity built in and

may sacrifice some interpretability.

8:16

Another example that I was engaged in was the Heritage Health Prize.

In the Heritage Health Prize, we wanted to identify

the number of days that patients would spend in the hospital in subsequent years.

Given their prior year's hospitalization rates and a large collection

of their insurance claims data that led to their hospitalization and

whatever other insurance claims they have.

And in this case if you are doing a machine learning exercise,

which is how we approach the problem, we wanted to build an automated system for

predicting hospital stays from previous claims.

And all we want,

success is anything that yields reliable predictions for the next year.

So when we predict for

a person in the next year, how long that we think they're gonna be in the hospital.

If that's a number greater than zero,

we might want to do some sort of intervention.

Statistical analysis, the goal would be to build a parsimonious and

interpretal model to better understand why people stay in the hospital longer.

So success would be anything true that's learned about hospital stays,

whether or not it gives good predictions.

Okay, statistical analysis,you can have, for example, a great example of

9:29

statistical effect that would yield no significant prediction is, take for

example if a drug is shown to have a very small but positive effect for

reducing the symptoms of Alzheimer's disease.

That would be actually a huge success for the medical field.

But knowledge of whether, but the effect is very minor, but

statistically significant.

That would be a huge effect,

that would be a landmark study in the field of Alzheimer's disease.

But if it was a really minor effect knowledge of whether or

not someone was taking that drug wouldn't lead to a good prediction of

they're Alzheimer's disease symptoms.

Okay?

That maybe something like their age and other factors, their age and

their family history of Alzheimer's disease and other things maybe a better

predictor of the severity, of the likely severity of their disease than whether or

not they're taking this drug.

So that is an instance where statistical significance in a statistical

model that's important may not lead to an important, that important predictor

being something that would be important in machine learning algorithms.

So I just wanna emphasize that there's a big difference between these two

approaches even though there's a lot of overlap.

And I think the biggest difference is just in how you're thinking about the problem

and what you're concerned with.

10:52

The last example I'd like to give is kind of a relatively

famous one which is Google Flu Trends.

In this, the very clever people at Google tried to come up with a way to predict

flu cases based on people's search history.

And try to predict outbreaks.

So in a particular area where a lot of ISPs traffic is

relating to searches on Tamiflu that might suggest an outbreak in that area.

So success for an algorithm in this case would be anything that produces reliable

predictions.

And they had, for example the CDC data, the historical CDC data to

build up the predictions to try and predict flu outbreaks in the future.

I'm not so sure how this is held up but

nonetheless that's how you would approach this as a machine learning algorithm.

This is a very clever idea I think.

Statistical analysis on the other hand, would instead try to approach the problem

of trying to learn what predicts flu outbreaks,

and anything true that was learned about that would count.

Regardless of whether or

not it dramatically improved our ability to predict.

So, the goal would be to build a parsimonious and interpretal model to

better understand the outbreaks rather than to just get prediction performance.

So if you build a model, if you built a model that was simpler and led to better

understanding of what was going on but didn't leave any good predictions,

that would be a beneficial outcome in statistical analysis.

12:20

So some lessons learned are that both approaches are extremely valuable and

they have their place.

And the amount of tolerable model and

algorithm complexity changes dramatically between the approaches.

And their goals are often very different.

However, I would say this caveat that there's a fair amount of work

in making machine learning more interpretable.

And a fair amount of work can make things traditional statistical approaches

have better prediction.

So it does seem like both fields are working towards some common areas

in the middle.

In the next lecture,

I'm just gonna give you some examples of further reading that you can go into for

contrasting traditional statistics versus machine learning.

So, thank you for listening, and I'll see you in the next lecture.