0:00

[MUSIC]

So what we're going to do now is looking an example project that hopefully will

inspire you and get you really excited about this capstone.

You worked really hard in the courses and specialization up until now and

you're ready really to put all of the skills that you've acquired to use in

answering a question that care about.

So by the end of this video, you'll have seen one example of a project that

could be done by someone in this capstone.

But hopefully, you'll also get your creative juices flowing

about other projects that you might want to think about and

implement a design as your own capstone project.

So as we mentioned in the introduction,

this capstone really is thinking about social networks and

the connections that we have with other people in our life and in the world.

At this day and age, we have the amazing phenomenon of social networks online and

there are all of these different platforms out there like Twitter and

Facebook, and Google+ that facilitate this interactions that we in

other people through our computers and online.

And what this also gives us is a lot of data

that exist about how people connected to one another.

So in this capstone project what we are going to do is dive in to

some of the data.

And we think about how we can use graph algorithms and

data structures to answer questions about what that data might look like on a local,

but also on a global scale.

Now these networks are huge many, many millions and sometimes even billions of

nodes or people represented in these networks and relationships.

And so its really hard to grasp what the structure and

the global behavior of this network might look like.

But that global behavior is really important.

Think about how information spreads in this community.

And the worldwide community thinking about videos that become viral online or

thinking even about how news of an epidemic might spread.

Or even just how the next hot tech item gets buzzed about from one place

to another, and people hear about it in various places around the world.

And so we'd like to think about how the personal connections that we might

have on a pair by pair basis translate to some global trends.

And we might think about how to visualize those trends, and see some patterns.

So for example, if we're looking at a really big visualization of a network,

we might notice that some of the nodes are somehow they seem closer to one another,

they seem like they might be clustered more than nodes that are not part of that

community.

And so we might ask is there a way do detect communities

inside a big social network and maybe if we have a lot of informations.

So maybe if this network represented people on the world then we knew something

about they're geography where they were located.

Maybe we'd be able to deduce communities based on people nearby one another or

more likely to be friends.

And we can try to color their graph based on their locations.

But maybe we don't have that data people

sometimes can be a little private about sharing their data.

And so maybe all we know is whether two people in our network

are connected or not.

Can we still deduce information about these clusters, these communities?

And so let's ask specifically about UC San Diego here on campus

thinking about can we find the communities amongst our students?

Now what's nice about restricting our attention to just UC San Diego

students is that there is a really great data set from 2005.

Which is a little bit old but it still has a lot of very interesting data in it that

really took a snapshot of the Facebook friend network on campus at that day.

Now friendships of course evolve over time and so all of these social networks that

we're thinking about will change from one time to another.

But with this data we just have a snapshot of

how the relationship look like in September 2005.

Now Christine and the next video, will tell you a little bit more about Facebook

and the details of what those edges look like and what the graph looks like for

Facebook.

But for now let's think about this data is a proxy for

relationships between our students on campus here at some moment in time.

And there's a lot of students on campus here, so

this particular data set that we have.

We've got quite a few vertices and quite a few edges and all we know about

these people represented by vertices is whether they're friends with one another.

We don't know what high school they went to, we don't know what clubs they belong

to, we don't know which residence they live in.

But all of these things might contribute something to whether they're in one of

these clusters, these communities or not.

So now we have to think, now is there some way to try to discover community.

So one way to answer this question would be to look for connected components.

So if we have a graph structure, then we know that we can partition that graph into

components where a component will include a whole bunch of

notes that are reachable from one another through puzzles in the graph.

And it might makes sense to say,

well maybe a community is all one that are friends of friends of friends of friends,

and so they are reachable from one another through these puzzles of friendship.

5:12

The problem with social network data or actually pretty amazing, it's not really

a problem at all, is that when you think about the world that we live in especially

if we restrict to people who are active in these online social networks.

We're all not so far apart from one another in terms of this friends of

friends of friends of friends behavior.

Turns out that for most of these social networks you'll have one giant component

that most of the vertices in the graph belongs to.

And so most of the vertices in the graph will be

just some path away from another vertex in the graph.

And so if we just look at the connected components of the graph,

we won't really get a very fine description of that inner structure.

We'll just have a whole bunch of advertise's grouped in one of these giant

components, it's actually a technical term which is kind of funny.

And then a few outliers who aren't really connected to the other people.

Which is interesting and if you want to think more about drawing components it

might be the basis of an interesting project.

But for our example right now, we'd like to find those inner communities and for

the drawing components not really going to help us out.

But maybe if we think about the meaning of community as not just I'm a friend of

a friend of a friend of a friend.

But somehow capturing this fact that within a community we should have more

links to one another than to people who are outside the community.

Maybe if we can formalize that notion of being closely connected to other people in

my community, then we can write that down as a graph property that

then we could algorithmically look for in our social network.

And this was done and in fact, there's different ways of describing what it means

to have more links within a particular community than outside to it.

And so different definitions of this connectedness and

clustering behavior might lead to slightly different algorithms.

So we have a reference for you that might be really useful

as you're exploring these questions that you want to ask about social networks and

look for algorithms that might give you some of these answers.

And this book by Easley and Kleinberg has a very nice overview of

the research that's really been done in social networks and

algorithmic questions about social networks up until this point.

And one example that they present in the context of community finding

is this study that was done quite a ways back by Wayne Zachary.

And in this study, he was analyzing the relationships amongst a karate club

on a campus not UC San Diego, but on a different campus.

And noticing that for this particular karate club,

you could find a difference between the people who are represented by vertices in

white from those who are represented by vertices shaded as red.

And it turned out that there were more edges interconnecting the ones that were

shaded in white from those that were shaded in red.

And it turned out that this club really ended up having a schism.

It separated into two groups, two separate karate clubs some time after

this network was analyzed and it turned out that at the heart of the white

shaded vertices was one person who was the original founder of the club.

And at the heart of the red shaded vertices was actually the instructor in

the club, the karate instructor.

And it turned out that these two people had somewhat different visions for how

the club would carry out its activities and what the mission of the club was.

And so they each ended up splitting up into two separate clubs.

And when the researchers followed up with this community of people,

they noticed that all of the white shaded vertices actually ended up going with

the founder of the club.

And all of the red shaded vertices indeed went with the instructor with

the exception of one person.

And there was a psychological reason for that person as well.

So it's really amazing how just the graph structure of the relationships between

individual and people modeled in this

abstract way can give us insight into how the network will evolve over time.

And so what we like to do is have a description of what does it mean for

those white vertices to be grouped more and the red vertices to be grouped more.

And that definition will lead to algorithms that can figure

out those clustering.

We'll talk more about those next week.

But for now, let's think about applying that algorithm

to that UC San Diego data in 2005.

And what we see is that using that algorithm,

we can find sub communities within that big giant component

of people who seemed to have more of a clustering behavior to one another.

And then we can ask ourselves a question, what brought these people together?

And we can go back to the people on campus and ask them.

So did you go to the same high school?

So are you both engineering students?

And from the graph structure give us insight about how these people interact?

So how do we do this?

How do we answer these questions?

And at the heart of these interesting questions,

I really foundational algorithmic computer science mathematical notions.

So we're modeling the social network piece of graph.

We, as part of the algorithm, we'll do a breadth first search

which should hopefully be familiar to you from our earlier courses.

We'll be using really interesting data structures like priority queues and

queues to implement the algorithms that are necessary for

separating out a big graph into its subcommunities.

And so hopefully, you're as excited as I am about the kind of questions you can

answer about social networks using the tools that we've developed in this course.

Sorry, developed in this specialization for

courses that you can work through up until now to get to this point.

And in the next video, what I'm going to do is outline the steps that you'll take

in developing your own project throughout this capstone course.