In this module, what we're interested in discussing is,

how do we encode large amounts of data?

We've talked about now parallel

coordinate plots and scatter plots,

but we still want to think about,

what if we have a really large set of multivariate data?

What way can we show this on a graphical image?

So first, let's just start

thinking about some problems with the scatter plot.

So I've got my dataset.

Let's think about country,

GDP, and life expectancy. All right.

Instead of country, maybe I do this just

by state or county level in the US,

or the equivalent county level across the world.

So I might have a million entries here.

So my scatter plot is going to have a million dots.

As I keep adding these dots,

I get this overplot and I can't see thing,

I can't determine patterns anymore.

The same if I do my parallel coordinate plots,

the more lines I add,

the less patterns I can see.

To some extent, we're completely

limited by our screen real state.

I can only fit a scatter plot as big as my screen.

So if my screen is 1024 by 768,

I can only get a million points on each spot,

each pixel can represent an element of my data.

That's the concept behind the pixel based display,

is that the most data we can represent

is based on our screen real state.

So if every pixel on the display

represents its own data element,

we can start having these pixel based images.

So pixel based display takes

a pixel as a little box on the screen,

and tries to do things like give it a color to

represent typically a univariate data measure.

But again, we could do data reduction,

we could do some different elements of

a bivariate color map to

show multiple variables for pixels.

We can enhance these pixel-based displays to

incorporate components that will

draw attention to the data.

So we can do halos

around data elements that are important,

we can do color, distortion,

hatching, and so we can

augment these pixel-based displays.

This is very popular method from

Daniel Kimes group where you can look at how

he does all these different pixel based displays,

and we could even think about

representing multiple variables.

So each row of

pixels might be all of

our elements representing different things,

each little chunk could represent different items.

So we can start thinking about,

if I have tons and tons of data,

how can I break this down into

what's the most data I can show on

the screen without any overlap,

without any issues there?

How can I show all of these

for at least one single data element?

So if I have a really large database,

and I want to show all of the values

for a particular variable,

each little pixel here could be

a row in my database

and the color can be a particular value.

What happens is, I may find that there's chunks in

my data that don't have any values.

So I might need to look into those.

I may find that there's chunks in my data

that don't seem to match

things that are surrounding that.

I can reorder the rows and

columns in my pixel-based data,

to try to find patterns within these as well.

So I can start manipulating this.

I can start analyzing my data to

detect what's important and discover what's unimportant.

Pixel-based displays are again

relatively easy to start with because basically

for each element in your dataset.

So if we've got country and we have GDP,

we've got our screen,

so maybe this is my 1024 by 768 and

each pixel just represents a row in my database.

So the color of this pixel can be colored by GDP,

this one is by row 2,

by row 3, and so forth.

So all I'm doing is coloring pixels on

the screen to represent some elements in the data.

Then there's a ton of different ways we can

think about ordering these,

there's tons of ways we can think about showing

information or relationships here.

People have used pixel based displays even

for things like network graphs.

So for example, we can think about story like Lame is,

and we have character 1, character 2,

character 3, and so forth,

and we might know which chapter in

the book character 1 interacts with character 2.

So this is chapter 1,

that this is chapter 2, and so forth.

I want to know when character

1 interacts with character 2.

We can create some dataset like this,

we get a network relationship,

and we can draw those network relationships as

adjacency matrices essentially like friendship matrices.

So if I'm character 1, character 2,

character 3; character 1,

character 2, character 3.

I know if character 1 is friends with character 2,

but not friends with character 3,

but character 2 might be friends

with character 3 and character 1,

and character 3 is friends with character 2.

So we have the connectivity matrices and we

can draw these as pixel based displays.

So there's lots of different ways we can

think about how to display data.

So this is why I want to show this.

We're not going to go too much more

into depth on pixel based displays.

But again, think about

the relationship between different designs,

different ways we can display the data,

different things to try out to see what might be

most effective for your dataset.

It all goes back to, who your audience is?

What message you're trying to display?

What data do you have?

What questions are people asking

about the data? Thank you.