0:22

And talking about sampling networks from a couple of perspectives.

We're going to do two illustrations here.

We're going to do two different kinds of networks that arise

in the survey literature.

These are not the only kinds and

it doesn't mean that this is the only way to think about doing network sampling.

Certainly, the kinds of things that people think about for networks, social networks,

and other kinds of things they have similar

kinds of approaches to what we're going to be talking about here.

But, I'm going to talk about two simpler examples,

just the basics of what happens when you deal with networks.

0:57

Now, these networks arise in all sorts of circumstances.

You may have something that is a network system that is a social network.

There are groups of people who are socially connected to one another.

Many times those social connections are quiet complex.

Their connected not only to one or

two other people but lots of other people, sometimes only a couple of other people.

And we can get very dense kinds of systems, but

then having different groupings of these networks, different combinations.

Now the basic diagram has those dots there, nodes.

We'll think about those as persons.

Now the lines are edges.

They are the connectivity among the units and we're going to think about,

as I say, people and the edge's relationships.

But it could be computers and connectivity among them.

It could be any number of networking kinds of circumstances.

This network diagram represents one that has cases that are isolated,

standalone in these kinds of networks.

The networks can be quiet distributed.

They're may be centralized around a single individual or even something

in which there is a central location that is a node that is unlike the other nodes.

Maybe they're connected through a common organization.

But not everybody's connected to that organization, but

they're connected to it indirectly.

And so, the networks can be generalized to also include not only people,

but people and collections of people, organizational units as well.

We're going to look at two examples, as they say, of this kind of thing.

But before we do that,

we should recognize that there are different patterns to these networks.

And what we're going to do is look at

something that might be labelled a star pattern.

Something where there's a central collection,

where that unit may be a person or it could be an organization.

And then we are also going to look at things that

more fully connected networks that one that’s shown in the upper right.

Even though there are other kinds of networks here.

As you pose the problem of drawing samples of the nodes

in these kinds of things think about how individual elements could be selected.

The shape of that network, the pattern of that network

will dictate certain kinds of things that influence our estimation.

3:34

And they happened to be connected to one another.

Well, not in all cases.

Sometimes, there's a single circle there.

It's a person that is unconnected to anyone else in our network framing.

In other cases, they're connected to one other person or two other persons.

But these are those kinds of mesh networks.

They're connected to everybody in their local network, but

they're disconnected from the others.

Now how could something like this arise?

Well, one of the ways that this arises in the survey literature is something that

was referred to as multiplicity sampling and multiplicity waiting.

So the idea was that you have a collection of those people available.

And you sample those people, and as you do it,

our red dots are the sampled people, you identify their network.

What is their network that they're connected to?

So now you see what we've done.

We've actually taken a sample of people, and

we've asked them what their network is in a particular way.

Now, this particular network has to do with siblings.

What is your network with respect to siblings?

These could be siblings that are related to you by blood, or

by adoption, or by marriage.

We can define it in a variety of ways involving half siblings and

full siblings and a variety of others.

But let's just deal with the siblings now that we think about ordinarily in

terms of brothers and sisters, whether it's by birth or by adoption.

And so now what we've done is draw a sample of people, and

define their network for them.

The networks are unconnected.

In this particular case we didn't draw anybody who was from the same network,

we just happen to get individual networks.

And so we've drawn a sample of nodes and

now we're looking at the networks that they comprise.

5:38

our network, as defined by our networks.

That would allow us to take the sample that we have, and

expand its reach to the sample plus all of those who are in the existing networks.

So we can now talk about the individuals and their siblings.

And that would allow us to possibly collect data about the individuals and

the siblings through the single interview, and

thereby have a larger base sample and potentially a more precise estimate,

especially when we're dealing with rare characteristics.

So suppose that in this sibling network we're what we're interested in

is a disease condition, diabetes.

And we know that the frequency of diabetes is low enough that if we just

stuck with the red individuals, the sampled individuals, we're going to have

a fairly small sample and a fairly small number of cases of diabetes in our sample.

But if we expand to include the entire network, now we're going to have not

only those that we selected, but also their siblings involved in the sample.

And we will try a data collection in which we attempt to collect data

about not only the sample person, but ask them about their siblings.

6:47

Now there's a measurement problem that arises here.

How do we measure that characteristic for those that we haven't selected?

But there's also a sampling problem here and

that is that if you happen to be part of a two sibling arrangement

you were selected and you have one sibling who was not selected.

And we collect data about you and ask you about information about your sibling.

That sibling has two ways of coming in the sample.

So do you.

Because you came in because you were selected but

you could also come in because your sibling was selected, and vice versa.

Your sibling could come in because they were selected, or

because you were selected and provided information about them.

We have then multiple chances of selection, a duplication problem or

triplication problem, or so on.

We've got more than one opportunity for people to come into the sample.

And that means that we're going to overrepresent people in this arrangement,

who tend to be from smaller networks,

relative to all of the persons across the networks.

We've got a potential for bias.

And over representation and

over sampling a person's coming from certain kinds of smaller networks.

In circumstances like that we would use a way to compensate for it.

But we are going to have to calculate, figure out how what the sibling network is

so that we know how many different ways that could come in at the sample, and

then we factor that into a waiting factor for our particular case.

Well here's a tabular display of what happened for those 10 cases.

There were actually 10 sample persons there.

Sample person number one, I didn't number them, but

sample person number one happened to be, has no living siblings.

8:32

So they were all by themselves, they're one of those singleton dots.

And we asked them, have you ever been told by a doctor that you have diabetes,

by a medical doctor?

And they say, no.

Okay. Now there are, in the network now,

there's no cases of diabetes.

There's only one represented there.

That person, let's suppose that each person was selected with a probability

that we can identify and

that person's weight corresponding to that probability was 100.

Their network size was one so their chance of being in the sample is just

based on that 1 in 100 probability that's expressed in terms of the weight in 100.

They have a network adjusted person weight of 100.

9:16

But now, let's look at the second person going across the second line.

They have two siblings, that's one of those triangles.

They have two living siblings.

Now, none of them, not the person who was selected nor

any of the siblings have diabetes.

Now we're going to collect the amount of siblings from the informant,

the sample person.

But again, as we look across there, the base probability for

that person is 1 in 100, a weight of 100.

There are three people in the network but their weight now should be smaller.

They have three times the chance of being selected because they could

have been selected alone or through either one of their other siblings.

So we're going to give them a weight that is one-third as large

since their probability is three times larger.

10:04

Let's try one with person number three who comes from a quad, comes from that quad.

There are four siblings all together.

We selected one and there are three others we haven't selected.

The person we've interviewed does not have diabetes,

or has never been told that they have diabetes.

But one of their siblings does.

Now in our network we've got a contribution of one case of

diabetes to what we're doing among the total number of cases.

The probability of selection in that particular case, 1 in 200.

And the network size is 4, so the weight has to be adjusted.

That weight of 200 has to be adjusted by a factor of 1 in 4.

Now we're assuming that all the elements in the network

have the same personal weight.

There are some assumptions going on here but

you can see how what we're doing is thinking through how does that network

which we're now accessing as a way to increase our sample size,

how does that influence the chance of our individual being selected and the members

of the network being selected and how do we compensate for that in our weighting?

And so we have now network adjusted person weights that accounts for that.

So the network's sampling here is one in which we use the network to expand through

the network connections the number of persons who are in our sample.

11:22

Let's look at how this adjustment works.

So we have, really, two cases of diabetes among the sample persons.

Among the living siblings there are four more.

So we've tripled the number of cases of diabetes.

There are 10 sample persons but

27 persons in the networks that I've identified including the sample persons.

If we just took the unweighted prevalence among the sample persons, it would be 20%.

And that's perfectly valid, but it's only based on the sample of size 10.

If we take the end weighted prevalence among all the networks,

the 6 divided by the 27 we see that we've got a much lower rate,

we're going to need to do a weighted prevalence in that particular case.

But our network adjusted weighted prevalence comes out to be 0.256.

And it's based now on 27 cases, properly weighted to account for

the multiple chances of selection that are occurring, because of the networks and

people being in those networks and

reporting about other members through those relationships in the network system.

12:24

Okay, multiplicity sampling.

This is not intended to give you full tools on how to do this but

to give you the idea about what happens when we sample persons in networks.

There are other kinds of network sampling that occur in the survey realm.

For example, this is one in which we have two central nodes, star patterns, two

star patterns, which are also connected through one person in each of them.

The outer circles around this pattern are clients to insurance companies.

They're people who receive health insurance through a company.

And the central circle in each is the insurance company.

Now our goal here was to draw a sample of persons and

identify characteristics about them.

So we've drawn a sample of six persons and we're going to collect data about them.

But we also want to know about their health insurance companies.

And so what we're going to do is identify the insurance carrier for

each sample person.

In this case there are two insurance carriers across the six people.

Five of the people are uniquely associated with only one carrier.

One person's associated with both.

Now, we're going to have to figure out,

what is the chance of the insurance carrier coming into the sample?

Well, we have to count all of the people who are in the network.

We're going to need a count on the size of the insurance carrier,

as well as any interconnectivity that goes on.

And we would go through and systematically calculate for

each of the insurance carriers, what their chance of selection was

based on the sampling rates we have for the individual persons.

Now, I won't go through that kind of calculation.

But you can see now how the network is factored into a further expansion of our

data collection to include, not just more people, as in the previous example, but

here, people as well as an organizational unit, that we're very interested in.

And we're going to combine both in our single data collection through one sample

of people, and then extract from that information about insurance carriers and

their probabilities of selection so

that we can properly account for their size contributions to the industry.

All right, multiplicity network sampling.

The principles that we've been dealing with for

probabilities, the randomized selection, all of that kind of thing can be

factored in through a waiting system to allow us to utilize these networks

as ways to increase our sample sizes, or increase the diversity of our units.

Persons as well as insurance companies.

We have one last topic to talk about,

our final topic, lecture six on non-probability sampling.

Just some thoughts about non-probability sampling

to wrap up what we've been doing as we look at the last lecture in unit six next.

Thank you.