0:00

Welcome to Lecture eight of Networks, Friends, Money, and Bytes.

Today's lecture we'll try to formulate and answer the question of, how do I influence

people on Facebook and Twitter? Now, network consists of both the topology

represented by the graph and functionalities which are the tasks

carried out on top of the graph. In this chapter, we talk about topology

dependent influence model in contrast to the population dependent and topology

independent influence models in the last lecture.

As most of us know very well that Facebook and Twitter have already become among the

dominate communication modes, especially among younger people.

Facebook started in late 2003, and by the time of spring 2012 of its IPO, it's got

over 900 million users worldwide. And Twitter started much later and by

2012, got over 500 million users worldwide.

And one estimate says that within a day, it processes over 250 million tweets.

1:22

Now, Twitter, in particular, provides what we call a direction node link. Because you

may subscribe to my tweets but I don't subscribe to yours.

And indeed, Twitter's success builds on top of micro-blogging and group texting

and one-sided or one direction social networking.

1:52

Now, given the popularity of these services, there is no shortage of

interesting questions people have raised. One question is, how do I quantify the

statistical properties of opinions on these platforms?

For example, a recent study were carried about Tweets of Oscar nominated movies

2:50

And, statistical properties of the tweets alone did not provide insightful

prediction of either the box office success, or the critical claim However, if

it is combined with some of these other online opinion venues,

Then it does enhance the predicted the prediction accuracy.

3:49

And other kind of blocks into a numerical score.

Now, how do they measure that? There are variety of possibilities. For example, you

can measure on Twitter the number of re-tweets of your tweets, okay? As we

identify through a party of via. We can also look at the number of

re-posting of URL's that you posted. If you have access to the topology of who

follows whom, For example, I follow your tweets, and you

follow Anna's tweets, and so on. Then, this typology can also give us some

information about influential power of different nodes which are different

Twitter users. Now, what kind of numerical scores is

meaningful enough to be used? And that author remains a question mark.

And that relates to the third question, How do I leverage the above knowledge of

the either aggregate statistical properties of Facebook and Twitter users

properties, or individual influential power source and turn these into a viral

marketing campaigns. For example, maybe I'll be able to seed

some users and ask them to post or tweet certain information about my product or

service. And the question is, which nodes to seed?

5:38

If I know the topology, would it help me to answer this question more precisely?

So, these three questions are indeed important.

And, as we will see, there is a significant gap between the theory and

practice. Just like last lecture, this lecture

offers the biggest gap between theory and practice in networking.

6:01

And these questions are not entirely new either, the question of influential power

of individual users that many interesting historical anecdotes.

For example, the famous Revere-Dawes Ride that happened over the night of April

eighteenth to nineteenth in 1775. This was the ride that helped alert the

American forces to fight against the British military forces.

Two individuals, Paul Revere and William Dawes, they both liked to alert the local

American forces about the imminent British action.

7:13

And it turns out that the path took by Revere went through a lot more influential

people in the neighborhood and was much more effective in alerting the local

militia leaders to prepare for the British military action which led to the success

of the first battle of the American Revolutionary War.

So, how can we quantify that a certain path pass through more influential nodes

than other paths? Another very famous anecdote is this

graph. This graph is so called the Padgett's

Florentine family graph which shows fifteen prominent families in Renaissance

period of Florence, Italy. A few of these fifteen families aren't

shown here with their names. The Medici family was widely viewed as the

most influential among the fifteen families.

If you look at this graph where each node is a family, and a link is a bidirectional

link, representing some kind of kinship or marriage relationship.

Then, how can we make intuitive understanding of this belief that the

Medici family was by far the most influential?

Well, if you look at a naive view, for example, how many links does each family

have? Then, the Medici family had six links.

8:56

That's indeed the largest number. It's got the largest degree in this graph

of family relations. But there are also other families such as

the Strozzi family and the Guadagni families.

They also got each four neighbors, Okay? So, the degree count is not that

much of a difference. And this is 50% higher than four, but was

widely believed that the Medici family was a much more influential than a mere 50%

differential. So, how can we quantify this intuitive

understanding of Medici's family, being more important than just 50%?.

There's many other stories, For example, Granovetter famous social

network theorist and empiricist. He did a 1973 survey in Amherst,

Massachusetts. And he discovered what he called the

strength of weak ties. Turns out that, if you look at the graph

of people where each node is a person and a link is some kind of family or

friendship, relationship. Then, he showed that certain links which

might be viewed as weak, For example, it goes all the way to

another node. But there are very few of the past to

connect this node to that other node as opposed to these nodes which are very

closely knit together. Later, we will quantify the notion of a

group in the advanced material part of the lecture,

How do we quantify closely knit together idea.

But, intuitively and visually, you can see that these nodes are very close together,

okay? And these links reach out to some far out

nodes that are not tightly knit. And these are sometimes called the weak

links. And yet, they present a high strength in

disseminating information such as, employment opportunities.

11:27

In order to answer all these questions, historical ones, the current ones, we have

to, represent our important quantities with some symbols.

So, we start with, of course, graphs. This is already Lecture eight.

So, in the past seven lectures, we have seen quite a bit of graphs.

You may still remember the graph of interference channel.

You may remember the graph that represents the auction, the bipartite graph of

matching. And now, we're going to talk about graphs

representing the topology of a network in general terms.

And then, we will use matrices to algebraically represent these graphs, so

that we can start running some mathematical operations.

Well, in general, we say a graph, G, is a data structure that consists of two

topples. First, is a set of nodes or vertices.

It's the set V. And then, it's the set of links or edges,

set E. Each element in this set say, indexed by

I, is simply one node. And then, each element in this set is

basically a two toe, i, j, represents a link from node i to node j.

13:01

If we say that, i, j is not equal to j, I, then we call this graph a directed

graph, okay? Because each link has a direction.

Node i pointing to j doesn't mean that j points to i, okay?

Whereas, if every link is actually both i to j and j to i, then we just skip writing

these arrows on the line. And with implicit understanding that this

is a by directional link and therefore is a graph without directional links.

13:46

In all the graphs that we'll be looking at, we look at the so called simple and

connected graphs. Simple, meaning that each link just

connects to two nodes. And connected means that every node is

connected to every other node somehow, someway.

Maybe through a very long path, But nonetheless, they're all connected.

We don't have a disconnected node. Like this.

Although, in homework problem, we'll be, be looking at generalization into what's

called the hyper graphs. Now, we're going to use some matrices to

describe these graph's connectivity patterns.

We'll first look at what's called the adjacency matrix represented as a bold

face A. This is a matrix of dimension N by N,

where N is the number of nodes. And this matrix is very easily defined.

We say the ij-th entry of this matrix is one if there is a link from node i to node

j, and zero otherwise. For example, if I have the following

15:17

graph, okay? So, very simple graph, Nodes one, two, three, four.

Then, the four by four matrix, A, is the following entries,

Okay? One is connected to three. So the first row, we'll say zero, cuz

there's no self loop. And then one, one, and then zero cuz one

is not connected to the four. And then, the second row, we'll say, is

connected to node one and three, not four. Three is connect to one, two and four,

And four is only connected to three, Okay? So, this is the matrix A for this

graph. As you can see, the diagonal is all zero

because there's no self loops, A node does not point to itself.

And it's a symmetric matrix meaning, Aij = Aji which is really an algebraic way to

say that this is a graph with directional, bidirectional links.

16:27

Now, if it is a direct to graph, the links have directions, then how do we define Aij

then? It's defined the same way, however, the

ij's may be one but ji's might be zero. For example, if we say one goes to two,

two goes to three, three goes to one and four goes to three,

Okay? Then, the new matrix would be the following,

Okay? One goes to two, Okay? And two goes to three,

17:05

Three goes to one, And four goes to three.

Now, as you can see that Aij is not equal to Aji, for example,

This is one but this entry is zero. A21 is not equal to A12.

17:28

The second matrix that we actually use much less in this course is called

incidence matrix. This is a matrix we represent by A hat of

dimension N by L. N, as before, is the number of nodes

whereas L is now the number of links here. So, how do we define the entries of this

matrix We say A hat ij-th entry equals one if node i is on link j, and zero

otherwise. So, it's a binary matrix where it's one if

there's an incidence of node i and if there's incidence of link j on node i,

Okay? For example, in the last slide we look at this example,

One, two, three four. Now the matrix A hat will look like the

following. We first have to write down also the

labels for these links, okay? Let's write down these links as A, B, C

and D, Okay? So, these are the one, two, three,

four, these are A, B, C, D. So happens that still four by four biggest

number of links equals number of nodes in this particular example.

So, link A is incident on both nodes one and two.

Write down one, one here and zero, zero. And B's incident on both two and three.

C is incident on one and three, And D is incident on three and four.

19:24

Now, what if it is a directed graph, okay? Saying the links are like this. Then, we

say, the entry is minus one, if nodes i, node i stops, terminates j link.

And it's one if node i starts link j, and zero if node i is not related, is not on

link j, Okay? In this particular example, we see

that node on link A goes from one to two. So, start at one ends at node two.

And link B starts from two ends at three. C from three to one it's actually from

20:15

here, from three to one should be positive.

And, D is from four to three, okay? The starting point is positive one, the ending

point is negative one, and zero everywhere else.

As you can see, every column of incidence matrix of the rectagraph must have exactly

one, one, one minus one, and the rest are zero,

Precisely because all the graphs we're looking at are simple graphs.

But, for each row, we could have multiple plus ones and multiple minus ones because

each node might start or terminate multiple links.