我们将学会对 DNA 测序数据进行分析的计算方法——算法和数据结构。我们将学习一点DNA、基因组、以及 DNA 测序的应用的相关知识。我们将使用 Python 实现关键算法和数据结构，并分析实际的基因组和 DNA 测序数据集。

Loading...

来自 Johns Hopkins University 的课程

DNA 测序算法

391 个评分

我们将学会对 DNA 测序数据进行分析的计算方法——算法和数据结构。我们将学习一点DNA、基因组、以及 DNA 测序的应用的相关知识。我们将使用 Python 实现关键算法和数据结构，并分析实际的基因组和 DNA 测序数据集。

从本节课中

Algorithms for assembly

In the last module we began our discussion of the assembly problem and we saw a couple basic principles behind it. In this module, we'll learn a few ways to solve the alignment problem.

- Ben Langmead, PhDAssistant Professor

Computer Science - Jacob Pritt

Department of Computer Science

In this practical, we're going to construct a de Bruijn graph from a string.

So, I'm going to create a function here, de_bruijn_ize,

which will take a string from which I construct a graph and

a k-mer K that we want to use.

And so, if you recall from lecture what we want to do is look at each k-mer in

this string, take the two k minus 1-mers in that k-mer, and create an edge

between those two k minus 1-mers, and create nodes for them if they don't exist.

So I'm going to create my list of edges and

a set of nodes that will represent this graph, and

I'm going to loop through every k-mer in the stream.

So i is going to range from zero up to the last position where a k-mer could start

and so for this k-mer now, I want to add to edges.

I'm going to represent this edge as a tuple,

so it will be the k minus 1-mer starting at i and

going up to i + k -1.

And then the other k-mer will be starting at i + 1 and going up to i + k.

>> So if the left and the right k- 1 works.

>> Mm-hm.

And now we just have to add both of these to our set of nodes.

So I add the first k minus 1-mer.

And the second k minus 1-mer.

>> Because you're using a Python set.

If you call add more than one time with the same string, it'll only add it once.

>> Right.

And then we can just return our set of nodes and edges.

So this is our de_bruijn_ize function.

So let's run this on a simple DNA sequence.

I'll use a k-mer length of 3.

There you go.

Now let's print out our set of nodes.

>> Okay, we've got AC.

We've got CG.

We've got GC.

You got GT.

Looks like we got all of them.

>> Yeah, and so since our kmer length is 3, every k minus 1-mer in this string

should appear in our set of nodes, and it looks like they're all there.

And now we can print out our set of edges, and

so we have an edge from AC to CG, that's from the first kmer.

CG to GC is the second k-mer, and so on for each k-mer in that.

>> Yep. In fact those edges in that

order are the Eulerian walk for this de Bruijn graph.

>> Right.

So, I'm going to paste in here a function that we'll use to visualize this graph.

All this function does is you can see the first line, it calls our function

de_bruijn_ize, and then it prints out the nodes and edges in a special string

that we can pass the dot, so that we can visualize this with the dot.

>> So now I'm going to load a gvmagic.

>> So, this is a special IPython plugin that lets us draw graphs basically.

>> And then I'll call dot with our function,

visualize_de_bruijn, and

I'll just paste in the stringing k-mer from above.

And here's our graph.

So you can see, we start with AC, and this goes to CG for the first k-mer.

Second k-mer, we start with CG and then go to GC, and back to CG,

and then to GT and then TC, and then CG and then we're done.

>> Yep. So you can easily see

the Eulerian walk when you print out this nice small graph.

>> Yep