0:00

One of the most common applications of Bayesian networks or rather one of the

earliest ones that are still very much in use today, is for the purpose of

diagnosis. And by diagnosis I mean both medical as

well as fault diagnosis. Now this dates back into the early 90s in

the Phd thesis of. Heckerman et al won the ACM dissertation

award in a system called Path Finder. Which looked at a range of different

piece of evidence in order to help a doctor diagnose a set of diseases.

And specifically it was focused initially at least on lymph node pathology.

So 60 different diseases, all sorts of different symptoms.

And they tried out a bunch of different rules, a bunch of different methods for

solving this problem. So the first one they actually tried,

this was way back in the early days of artificial intelligence before Bayesian

networks were in common use. Then they tried a rule-based system and

it didn't work very well. The second version of pathfinder used the

naive base model which assumes that all of the symptoms are independent given the

disease and even that really simple model got superior performance to the rule

based system that they initially tried. Pathfinder three still use naive base but

if you naive bayes with better knowledge engineering that is they actually they

actually understood some of the issues behind what makes a system like this work

well and one. And they fixed it.

So specifically one of the things that turns out to be really fundamental for

the performance of any probabilistic modeling system is not to put in zero

probabilities ever, except for things that are definitions because once you put

in a zero, no matter how much evidence to the contrary you have, you will never

ever be able to get rid of it. Because anything, I'm zero is still zero,

And so, here in the initial pathfinder tool they put in some incorrect zero

probabilities for things that were very unlikely, but not impossible.

And it turns out that, that gave rise to about 10% of incorrect diagnosis of the

system. They also did better calibration of

conditional probabilities which turns out to be important for knowledge engineering

of a Bayesian network. So, for example, it turns out that it's a

lot easier to compare the. For a physician to compare the

probability of a finding. A piece of evidence between two diseases

as opposed to the probability of two different findings within a single

disease. It's much, it's much easier to say oh,

this is much more likely in this context than in that context.

And it turns out that when they asked the physician to calibrate this way, they got

much better estimates of the probabilities.

Mind you this was way before they had learning, so it was all hand constructed.

2:46

and then finally Pathfinder four was the full bayesian network in all of its col

full glory it no longer made incorrect assumptions about independencies between

different say symptoms given the disease and that gave us and that both allowed

them to. Make the model more correct, and also it

turns out it has an unexpected side effect by allowing say, a symptom

variable to have more parents than just a single disease variable.

It actually gave rise to considerably more accurate estimation of the

probabilities because the doctor could kind of think about different cases and

didn't have to average them all out in his heads.

3:28

And this is one of the, I think, really compelling aspects of of daisy in network

models. Which is that the daisy in network model

actually turned out to agree with the experts.

In an expert panel of physicians in 50 out of the 53 cases.

And these were hard cases. These were ones that you really needed

the expert's opinions on. It wasn't one that that just an average

doctor could necessarily diagnose correctly.

And this is as compared to 47 out of 53 for the naive Bayes model and

significantly left enough for the role based system.

Mind you, and this is an interesting and important.

Aspect is that the Bayesian network actually outperformed the physician who

designed the model. And, I mean, it didn't outperform the

expert time a little bit. But it performed the physician who

designed it. Because it was, better at putting

together all these different numbers in a way that a doctor just can't fit all of

these different findings into his or her brain at the same time.

4:34

so we talked about the, CPS network, it's one of my, favorite networks because it's

kind of big and hairy and sort of kind of scary to look at, but anyway, the actual

number of variable in this network is about 500.

And each of them has, on average, about four values.

So the total number of parameters, if you were to specify a full-joint

distribution, is four to 500. So that's about, it's about four to the

500, or two to the power of 1000, which is more than the number of [INAUDIBLE] in

the universe. So obviously one couldn't specify this as

a complete joint distribution. Not to mention that the probability of

each and every one of these is about, is as close to zero as makes no difference,

because it's the probability of, you know, 500 different, it means an event

involving 500 variables. if, you were to, if you were to actually

construct a CPD for each one of these for each of these variables, the [INAUDIBLE]

parameters would be about 133 million. Which is considerably better that two to

the 1,000, but still much too large. And so it turns out that they made

additional simplifying assumptions, that we'll talk about later on, that allowed

them to avoid a complete table representation of the CPD's and rather do

a more compact one, and that gave rise to about a 1000 parameters.

6:15

So we already talked about the fact that these medical diagnosis systems have

emerged from research. And Microsoft built a medical diagnosis

system. Various other people have built one as

well. This has been a little bit slow on the

uptake in the medical field. Because it doesn't fit naturally into a

physician's, pipeline. maybe now, with the advent of medical, of

electronic health records. There will be more data entered into the

computers. So, the systems will be in more common

use. But, until very recently, most doctors

just wrote stuff down on paper. And so there, it was very difficult to

put this into the standard, production pipeline for diagnosis.

And then, finally, full diagnosis has been a much.

More direct application of the systems because here we do not have an issue you

know how doctors statistically do their diagnostic pipeline, so within the

windows operating system there is thousands of these little troubleshooters

that help you diagnose problems with your printer, with excel, with your email and

each of these has a little Bayesian network inside that answers probability

questions given observations about what What the system the, about, about the

model involving in this for example the printer.

And, there's also a big website out there that does car repair.

And you put in the make and model of, and year of the car, and what are the main

problems with it in it. figures it out and tells you what to look

at, and what the most likely complaint is.

And the reason behind, the benefits of this.

People don't use Bayesian networks for this, just because Bayesian networks are

cool, even though they are. they use this because it provides a very

flexible user interface for this, for the user,

You instantiate the evidence in the Bayesian network.

Out comes a probability. You don't want to answer the question

right now. That's okay.

You can answer it later. It's just means that it's an observation

that you didn't get the condition on. And.

and then for the designer, this type of system is really easy to design and

maintain. Because if, for example, something

changes a little bit in your printer structure.

If you were to design a standard menu-based system.

You would have to go and rebuild the entire tree that asks, you know?

What, what is the, what, that decides what is the first question to ask?

And what is the second question to ask? And what is the most likely diagnosis?

Here in the Bayesian network. You change one probability, maybe add an

edge, and everything just emerges from that in a very straight forward way.

So it's much more modular and more maintainable than, than a hard-wired

menu-based system. And that's what the people who use these

systems will tell you, that's why that's why they chose this path as opposed to

as opposed to the hard-wired methodology.