对如何在科学的

Loading...

来自 University of California San Diego 的课程

当生物学遇上编程：生物信息学入门

483 个评分

对如何在科学的

从本节课中

Week 4

Which DNA Patterns Play the Role of Molecular Clocks? (Part 2)

- Pavel PevznerProfessor

Department of Computer Science and Engineering - Phillip CompeauVisiting Researcher

Department of Computer Science & Engineering

I now introduce Gibbs sampling, another randomized algorithm for motif finding.

Gibbs sampling works somewhat similarly to the randomize motif search.

The biggest difference is that randomized

motif search is a rather reckless algorithm.

It can actually change all found motifs that

at every iteration depending on what are the profile most probable k-mers.

It allows to move

very quickly in the space of the solutions

but there is a danger that you already came close to the correct motif,

and then jump away from this motif, because randomized motif search

moves so rapidly through the space of all possible motif.

Gibbs Sampler is more cautious.

It actually changes only one of the chosen motifs at every step.

This slide shows the key difference between

Randomized Motif Search and Gibbs Sampling.

Instead of showing you the pseudocode for Gibbs Sampler,

let me show you first how it works.

Let's first randomly select motifs shown

in bold in this slide. In the set of Dna strings, as before,

blue elements shows implanted motifs, and bold

elements showed motifs that we randomly selected.

After we've done it, let's do something strange.

Let's actually remove one of the found motifs

along with the sequence it belongs to.

In our example, we remove the third motif,

and remove the third Dna sequence from consideration.

We'll consider

the resulting set of just four motifs, and build the Count matrix based on this set.

And then afterwards, construct Profile matrix based on this Count matrix.

And after we constructed the Profile matrix, we will calculate the probability

of all k-mers in the deleted string. Afterwards, we will do the following,

again relying heavily on rolling the die. There are seven probabilites we computed

and we will roll a seven-sided die according to the probabilities we computed.

It will be heavily loaded die

with probabilities of sides proportional to the probabilities that we computed.

After we toss this die, we choose a new starting

position of the Motif and proceed in this way.

This is new a substituted position in the deleted sequence.

So that's how it works and then of course, it iterates until motifs keep improving,

or until a certain number of iterations (let's say 100,000 iterations).

So, here is the outline of Gibbs Sampler. We randomly select a

k-mer in each sequence, as before.

We randomly choose one of the selected k-mers from sequence

called Sequence* and remove it from the set of motifs.

We create profile from the remaining motifs.

For each k-mer at the position i in Sequence, we compute its

probability of being generated by profile,

resulting in n-k+1 probabilities.

Afterwards. we generate a die with n-k+1 sides

whose probabilities of sides are proportional to probabilities we computed.

We roll this die and, depending on the outcome

of the die, we turn back this motif

number i to the sequence that we removed. And we reiterate.

So let's once again look at how Gibbs Sampler work.

Let's assume that we've done the first stages and we're

about to generate the, probabilities of every k-mer in the sequence.

I am still considering the same example I considered before.

We generated

our seven probabilities, but now let's pay attention of what these

probabilities are. And it turned out that

six out of seven probabilities. I actually zeroes.

Should we roll a one-sided die?

Four centuries ago, Oliver Cromwell warned us against rolling

one-sided dice.

He told in his famous appeal to the court of Scotland.

trying to convince them that their alliance with the king was an error.

He said: "I beseech you, in the bowels of

Christ, think it possible that you may be mistaken."

Cromwell Rule in statistics implies that the use of probabilities

0 and 1 should be avoided,

unless we talk about purely logical statements.

Thus, we should actually give a

little probability even to extremely unlikely event.

For example, we should give small probability to

the fact that the sun won't rise, tomorrow.