0:00

Hi, in this set of lectures we are talking about networks. In particular, we are

focusing on three things, we're talking about the logic of networks, which is how

they form, going to talk about the structure of networks, which is how

connected are nodes from one another, how far is it from one another to another node

and finally we're going to talk about the functions of network. What functionality

does a network have? Maybe one built-in, like sort of emerged from the structure.

Now in this lecture, we're gonna talk about the structure of the network. So

we're gonna define a bunch of terms, find where the network is in terms of nodes and

edges and we'll just define and describe a bunch of measures in network that help us.

For a better understanding of how networks differ from one another. So when you think

of a network, you can think of a bunch of things that we're gonna call nodes, which

are points, and then lines connecting those, and those are gonna be edges. So a

network is really just a set of nodes and a set of edges. Now those edges can be

either directed, and that means there's like a little arrow, or they can be

undirected where it's just a straight line. So an undirected node. The network

would be something like. [sound]. So an undirected network would be something like

the 50 United States. So each node would be a state, and then an edge would just be

whether those two states share a boundary with a common border. So there's no

direction there. If Ohio's connected to Michigan, then Michigan is connected to

Ohio. In a directed network, then that little arrow that you see, that means

that. One node points to another node, but it might not necessarily go in the other

direction. So if you take a college campus, and you look, you ask whether one

student looks to another student for fashion advice. It may very well be that

person A looks to person B, but person B doesn't look to A. So in a directed

network, you could have an arrow going in one direction, or you could have arrows

going in both directions. So a lot of social networks are directed. But many

networks, the networks were gonna focus on are gonna be undirected networks, just

because they're easier to work with. So then you've got structure we're gonna

focus on four measures: degree, which is how many edges each node has on edges,

path length, which is how far it is from each node to another node. Connectedness,

which is whether the entire graph is connected to itself. And then, finally,

clustering coefficient, which will give us some understanding of, like, how tightly

clustered are the, are the edges? So if A is connected to B, and B is connected to

C, is A connected to C? Okay, so let's get started. Degree of the easiest one. If you

take a node, the degree is just the number of edges connected to that node. So if a

node looks like this, its degree is one. If a node looks like this, it's degree is

four. So it's just how many, if you're a social network, it's how many friends do

you have? If you're a business, and you're looking at, and each edge represents

working [inaudible]. Another firm and it's a case sort of how many other firms do you

work with, so degree is just really a sense of how connected you are to other

parts of the node. The degree of a network is the average degree across all nodes. So

if I take the entire network and I add, count up the degree of every single node

and than average by the number of nodes, that's what we'll think of as the degree

of the network. So a high degree network will be one where lots of people are

connected to lots of other people. A low degree network will be one where there's

very few connections between people. So here's a network, right here. And if I

look at this green node, I can ask what is it's degree. And you see that it's

connected to one, two, three people. So the degree of the green node. Is equal to

three. Now, if I wanna figure out the degree of the entire graph you can

[inaudible] I'm gonna go through and count up each person. So, this person has a

degree of one. This person has a degree of one. This person has a degree of three.

And I can go through and count all the way up. Now, that seems like that takes a lot

of time. There should be a better way to. Figure out the average degree for a

network, and in fact there is. Cause all I have to do is count the nodes. So, in this

case, there's one, two, three, four, five, six, seven, eight, nine, ten nodes, and

then I can just count the number of edges. There's one, two, three, four, five, six,

seven, eight, nine edges. So, ten nodes, nine edges. Now each edge connects two

nodes. So, what I've gotta do is just multiply those edges by two. So if I have

two, two, two times the number of edges divided by the number of nodes, that'll

give me my average degree. So, in this case two times nine is eighteen, divided

by ten gives me an average degree of. 1.8, now when you think of moving along those

edges, you can call the people you're connected to, your neighbors. So if this

is a node here, and he's connected to these three people, then these three

people here are the neighbors of the node. And what we can think about then is.

People who are connected have more neighbors, and later on, we're gonna

extend this concept to neighbors from sort of your immediate neighbors, to neighbors

who are further away on the graph. Well, here's an interesting result. So we only

introduced one measure, which is degree. And now we're gonna get a really, sort of

fascinating result. And that is, the average degree of neighbors of nodes will

be at least as large as the average degree of the network. So, typically, it'll be

larger, but it has to be at least as large. What does this mean in less

technical terms? It mean, most people's friends are more popular than they are. So

literally, it means that. But if you look at your friends, they, on average, have

more friends than you have. But what does C have [inaudible]. So let's look at this

network. A. Has a degree of two. B has a degree of three. C has a degree of two.

And D has a degree of one. So if we add all that up, we get eight, divided by four

gives us two. Now we also could have done this by just counting up the edges which

are four, multiplying that by two, and dividing by the number of nodes, and that

would also give us two. So what we get is we get on average, people have two

friends. Well now we can ask what about friends of friends? Well, let's start with

D. D has one friend. And that's B, and B has three friends. Well let's, let's walk

through the whole thing and see how this works. Let's start, again we start with D,

they have one friend, which is B, who has three friends. C has two friends, A and B.

A has two friends, B has three friends, that's an average of 2.5. If we look at B,

B has three friends. A, which has two, C, which has two, D has one, that's 1.66. And

if we look at A, A has two friends, B and C and they also have an average of 2.5.

So, if I add all this up, what I end up getting is that this is five, eight, 9.66.

Divided by four, which is, you know, approximately 2.4. So, people's friends of

friends have 2.4 friends, whereas people on them, by, on themselves only have two

friends. What you see is your friends have more friends than you do. That's just a

fact about networks. Next measurement we want to get to. Path length. The path

lengths from A to B is just the minimal number of edges you have to traverse to

get from A, person A to person B, from node A to node B. So, I've gotta node here

and a node here, but there's no direct path. Get to those two, you have to take

one, two edges, and their path length is two. So, in this particular graph, here's

this green node. To get to this green node, I've gotta take one, two paths, two,

two edges. So the path length is two. We can also then compute the average path

length for an entire graph. So now you take all pairs of nodes, and figure out

the path length. So let's take this graph. Let's think of, there's, what are there?

There's A and B, there's A to C. A to D, B to C. B to D, and C to D. Those are the

different we could get. So, A and B have a distance of one, A and C have a distance

of one, A and D have a distance of one too. B and C have a distance of one. B and

D have a distance of one. And C and D have a distance of two. So when I add all this

up, I get one, two, four, five, six, eight, and there's six total. So that

means the average path length is one and one-third. That's a very short average

path length. In that particular graph, it was connected, which is our next measure.

A connected is just a graph where you can get from any node to any other by walking

along edges. So this would be, again, a picture of a connected graph, 'cause you

can get from any node to any other node just by walking across the edges. There's

also disconnected graphs. So here's a, a graph of, where there's six nodes and you

can't get from this set of nodes to this set of nodes. So it's disconnected. So in,

in a graph like this, it's interesting because if it's not connected, any

information that's over here with A, D, B, and E, might not get over to C and F. So

disconnected nodes, information's not gonna flow between them, which could be a

bad thing. Now if you think about disease, this could be a good thing because if

there's a new disease in this population, it's not gonna cross this barrier and get

over to this population. So depending on what you're concerned with, connectedness

could be good or bad. Here's an interesting assignment. We've talked about

mark-off processes a couple times. We can think of a mark-off process like a

network. In fact our pictures sort of look like networks. So you can think of there

being here's four states of a mark-off process and then I can just draw little

arrows showing the probability of movement from one state to another. Now this is a

directed graph, and you could have, you could be that it's close as you can get

from B to A with property 0.3 and we can just throw that in. So, one way to

represent a mark-off process is with a network because you can just write down

the states and then draw little directed arrows showing the probability from moving

from one. One state to the next. So again, you start seeing all of our models get

related to one another in interesting ways. Last measure, clustering

coefficient. This is gonna be the percentage of triples of nodes that have

edges between all three nodes. So if I think of three nodes A, B, and C, it could

be that they're connected like this, and then we don't have the full triple. Or it

could be, that A is connected to B. B's connected to C and C's connected to A. You

get the full triple so we can fill in the triangle. You can ask, what percentage in

the possible triangles can you fill in? Let's go back and think of a simple graph.

So here's a graph, and you can ask, as I start drawing lines, what happens to the

clustering coefficient? So if I draw these three lines, I can ask, how many triangles

could there possibly be? And there could be four. There could be this triangle one,

this triangle two, this triangle three, and this triangle four. But what we see

when we look at this graph is that none of those triangles are filled in. So this is

gonna have a clustering coefficient of zero. What about this graph? Same number

of edges but now we've got one triangle to fill in so this is going to have a

clustering coefficient of one over four. What about this graph? I've got one

triangle here, and one triangle here, so this is going to have a clustering

coefficient of two over four. And then finally, if I have a completely connected

graph, I've got, this triangle's filled in, this triangle's filled in, this

triangle's filled in, this triangle's filled in, so I have four over four, which

is a clustering coefficient of one. So, if you. Think about that, alright? You can

ask all sorts of really interesting questions. So here's a simple quiz

question you might ask. Draw two Connected graphs each that has six nodes and six

edges, but have this different cluster coefficients. Now the reason we ask an

exercise like this is to get across the idea that, number of edges sort of tell

you how connected the graph is but it doesn't tell you how clustered it is. So

it's possible to have, two different graphs with the same numbers of nodes same

number of edges but different clustering coefficients. So here is graph one, and if

we look at it there's only one triangle filled in. So this clustering coefficient

is gonna be one. Over the number of triangles. Well how many triangles are

that we get. One, two, three, four, six nodes and we ask a triangle consists of

three. That means there's six nodes that we can pick first, five we can pick

second, four we can pick third, but the order doesn't matter. Three times two

times one doesn't matter. So that means we get twenty triangles. So this [inaudible]

one over twenty. Well here's another graph with six nodes and six edges but none of

the triangles [inaudible]. So this is going to have a coefficient of zero over

twenty. So two graphs. Same number of nodes, same number of edges, different

clustering coefficient. So what have we learned? We've learned that there's

structure to graphs. We could measure degree which is on average how many nodes

is another node connected to. We could talk about path length, which is how far

is it from one node to another node. We could talk about connectedness, is the

whole graph connected. And we can talk about clustering coefficient, which is how

many triangles, of the possible triangles, how many of those are filled in. Now these

are statistical measures. So let's talk about, why would we care about these? Why

might these be important? Well, think about degree, what is it telling you? It's

telling you the density of connections, in a way. It's telling you just how connected

is this graph? So you can think of this as some sort of proxy, possibly, for social

capital. So if you looked at a community, and there were very few connections

between people, then not many people know other people. So remember we talked about

Bob Solo saying social capital can't be just this abstract thing. We need some

measures. One measure could be, how connected are people to one another?

What's the average degree of the social network? You can also think about speed of

diffusion. If there's a piece of information up there, how fast is it going

to spread in this network? Well, one thing that'll determine that is how connected

people are. How many nodes people are connected to. So, somebody's connected to

100 people and you tell them something and they tell all 100, it's gonna spread

really fast. If I'm only connected to two people and you tell me something, then

that information is gonna spread fairly slowly. What about path length? Why do we

care about path length graph? Well think of an airline graph. It's gonna tell you

the number of flights your gonna need on average. So if you wanted to fly from

Miami to L.A. And the average path length was 1.1 for the airline network graph,

you'd know that you?re probably gonna get a direct flight. If the average path

length was seven then you'd know oh my goodness this is gonna take me a lot of

flights to get from point a to point b. It also tells you social distance. So suppose

you run an organization, and you've got a network. And you know that the average

path link between employees is six. That means randomly picking two employees, it's

gonna take six other employees, or five other employees to get from employee A to

employee B. That means that information knowledge, ideas, are not gonna spread

very fast within your organization. And you may wanna try and figure out ways to

reduce that path link. And last, the likelihood of information spreading. If

the path length is really long, it's just not that likely that information is gonna

make it nine steps, ten steps, eleven steps, if people are that disconnected.

What about connectedness, our next thing? Well, connectedness for Markov processes,

it matters. [inaudible] Markov process, we had to be able to get from any state to

any other state. If the nodes are the states, and it's not connected, then we

can't apply the Markov convergence theorem, because that was one of our

assumptions. You can get from any state to any other. If you think about, the

abilities of a terrorist group. You might still think all these people are really

upset, with, the world in some way. And, and that sort of a terrorist organization

is people who wanna do, curiosum tasks that [inaudible] we don't want them to

carry out. If their network is not connected. Then it's going to be difficult

for them to carry out these activities. If it is connected, then they can pass

information and can possibly carry things out. So things like path-link degree in

connectedness are going to determine the functionalities of organizations that do

good as well as organizations that do bad. Now, also connecting [inaudible] measures

if you think about things like the internet or power failures, the electric

grid. If the electric [inaudible] becomes disconnected, you disconnect from the grid

and you don't have any power. And finally, living in terms of information, if people

are disconnected from other people there gonna be isolated in terms of information.

So they may not learn things, they may not figure things out. And then finally the

clustering coefficient. Class stream coefficient. Can be used to figure out how

redundant or how robust a network is. So if you pulled one person out. So think

about it, if you've got a little triangle here between A, B, and C. If A gets wiped

out for some reason, B and C are still connected. So, it gives you a sense of how

robust or how redundant the network is. This is also sometimes used as a measure

of social capital, cause it means there's these, these tight links between people.

And one last really interesting thing about The custom co-efficient can be used

to capture how likely an innovation is to be adopted, in particular possibly a new

word. So if I've got three people that are all connected, A, B and C. And A uses some

new word. And B hears it and then tells it to C. And then C uses the same new word,

and A hears it. You get this sort of autocatalytic set where the information

get passed through the loop, and it now seems part of the new normal. A says,

well, I said this, now I heard C say it, so this must be something that people say.

Whereas, if A wasn't connected to C, when A said something new, it could just sort

of fan out, and it may never come back to A. And so A may be less likely to maintain

the innovation. So these clusters [inaudible] creating feedbacks can allow

things to be maintained within a network. One last thing. [laugh] And I've used all

these measures there's still a sense that networks that a picture is worth a

thousand words. And one reason I think for the rise of networks so much is because of

the fact that we now can graph these networks in really interesting ways. So

let me show you the power of pictures. Here are three graphs. The first one is

the 50 US states, and there's an edge, those are the nodes. And there's an edge

if they're adjacent. The second one is a group of 48 universities. And then it

shows, there's a, an edge between them if they played football against each other,

[inaudible] American universities. And then the last one is an airline, so each

node is an airport. And [inaudible], a network edge is just gonna be whether the

airline flies from that one network to another one. Here are the statistics on

those three graphs. So graph one has a degree of 10.8, a path length of four.

Graph two has a degree of nine, a path length of six. Graph three has a degree of

ten and a pass length of four. So they're all about the same. So look at these

statistics. There's not a huge difference between the three networks. Let me show

you the pictures. Here's graph one. What do you think this is? Well, this is

clearly an airline network. There's a couple hubs here, and there's all these

cities that the airline flies into. Here's graph two, what is this? Well, this is the

50 United States. Up here are Alaska and Hawaii, who are disconnected. And it

looks, in fact, a lot like the United States. And then here is graph three.

These are the football conferences, one, two, and three. And what you can see is

the teams all playing each other within conference, with an occasional game

outside the conference. So what we see is we see very different structures between

the airline network, the states. And the University's playing football, even though

the statistics, the first two statistics, degree and path link were about the same.

Now, where we would see a difference statistically is in clustering

coefficient. So, if we look at this graph, the clustering coefficient is going to be

very low, in fact it's going to be zero almost. There's very few triangles in this

graph. In graph two, there's a little bit of a clustering coefficient, but in graph

three the clustering coefficient is really high. So the sense in which, maybe instead

of saying a picture's worth a thousand words, we might want to say a picture's

worth at least three statistics, or maybe four statistics. Cuz you need quite a few

statistics. Statistics before you can really start distinguishing between

graphs. So degree won't be enough, path link won't be enough, but if you throw in

clustering coefficient, then maybe you start making more sense of the graph.

Okay, so that's structure. We now understand, we can think of networks as

having nodes, edges, degrees, path length, and connected in a [inaudible] cluster

coefficient. So this is a language for describing different network structures.

What we're gonna do next, is start with the logic which these networks form.

Alright? Thanks.