This course answers the questions, What is data visualization and What is the power of visualization? It also introduces core concepts such as dataset elements, data warehouses and exploratory querying, and combinations of visual variables for graphic usefulness, as well as the types of statistical graphs, tools that are essential to exploratory data analysis.

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

When we have an application,

we need to not only choose a vector space,

but we also have to choose a distance function in the vector space.

So the question essentially becomes,

what does a good distance measure?

How do I select this distance measure.

As I said earlier the distance measure is application dependent.

Depending on the application L1 distance might be appropriate,

L2 distance might be appropriate or L infinity distance might be appropriate.

It's hard to tell it ahead of the time

without actually looking at the properties of the application.

However, there are certain properties of distance functions that if they are satisfied,

they make our job easier are life easier.

Some of these properties are called metric distance properties.

And they are essentially four properties together make this metric distance properties.

self-minimality, minimality, symmetry, and triangular inequality.

Let's go down one by one.

Let's start from self-minimality.

Self-minimality means, give me a distance function,

and give me a vector in the vector space.

If I measure the distance in the on

the vector space from the object to itself from one vector to itself,

the distance should be zero.

Now, this makes sense.

Right? Essentially this means defining and comparing an object to itself,

there shouldn't be any difference.

The system should tell me this object is equal to itself.

Now it turns out that,

there are some situations in which this is how to achieve.

Again we are not going to get into that right now.

We might talk about that later.

But essentially, self-minimality is

a desired situation which should ensure that given object matches itself perfectly.

But again, there might be situations in which that it is violated.

The second requirement from metric distances is minimality.

Minimality means, give me two vectors that are not the same,

vector S1 and vector S2.

If I measure the distance between them,

that distance should be greater than or equal to the distance of the S1 vector to itself.

If self-minimality holds, we know that this is equal to zero.

This means essentially is that if I'm comparing two distinct objects,

they will have some zero or larger value.

Essentially what this means is that,

if the minimality condition holds,

an object matches itself much better than it matches in a draw object in the database,

a second property that we like to ensure if possible.

The third property that we often seek from distance functions is called symmetry.

Symmetry means the following.

If I measure the distance from vector S1 to S2,

or if I measure the distance from vector S2 to S1,

I should get the same value.

Now again this sounds intuitive.

If I compare one object to the others,

versus this object to the to the others,

I should get the same value.

I have two objects, I measure the distances,

It shouldn't matter which one I'm comparing to what.

If this can be ensured,

then I get some kind of consistency from the system.

If I am basically using object S1 is my query,

object S2 is returned,

and if I use object S2 as the query,

I will get object S1 from the system as well.

I get consistency. Now again,

it turns out that the symmetry is often violated.

So there are many cases in which

the distance functions that basically we use do not satisfy the symmetry properties.

So we basically lose consistency.

But we gain some other things.

Again we will discuss them when the time comes.

But keep it in mind symmetry is a strong desired property of distance functions.

The fourth requirement from distance functions is called triangular inequality.

So, as we discussed,

symmetry gives us consistency,

triangular inequality gives us efficiency.

Let's see why that is the case.

First of all, what does triangular inequality say?

The triangular inequality says the following.

Let's assume that we are given three vectors in the space.

In this case we are given S1, S2 and S3.

Three vectors in the space.

Let us basically measure the distance between S1 and S2.

Let us then measure the distance between S2 and S3.

At this point, if the triangular inequality is satisfied,

the distance between S1 and S3 should be less than or

equal to the summation of distance S1 S2 plus distance S2 S3.

In this example, we have distance between S1 and S2 is equal to 10,

distance between S2 and S3 is equal to 15.

I already measured them. So, if the Triangular inequality conditional is satisfied,

what I can tell you without actually having to measure distance between S1 and S2

directly is that this should be less than or equal to 25.

10 plus 15 gives me 25,

so distance between S1 and S3 should be less than or equal to 25.

This is the Triangular inequality,

and its advantage as I mentioned is efficiency.

So note what happened. I compared the object S1 and S2.

I find that there are 10 apart from each other.

I compared objects S2 and S3,

I measured their distance.

They are 15 units apart from each other.

At that point without having to compare object S1 and object S3,

I know that they are less than 25 units apart from each others.

Say I have a search condition and I'm trying to find all objects that are near S1,

and all objects say less than 30 units away from object S1.

At that point without having to compare object S1 to object S3,

I can tell without having to compare them that object S3 should be in the result.

Because I know at that point that the distance between

object S1 and object S3 is less than 25 units.

Triangular Inequality saves me time.

Triangular Inequality eliminates in certain conditions

the need to compute distances explicitly.

So, once again, four conditions.

Self-minimality, minimality, symmetry and triangular inequality.

If all of these conditions are satisfied,

we call them metric distances.

And these metric distances gives us several advantages such as consistency in retrieval,

such as efficiency in retrieval.

And it turns out that,

the P-norms functions that we discussed we covered in the earlier slides.

1-norm, 2-norm, 3-norm or infinite norm,

they are all metric.

So that's one reason why we tend to use this 1-norm, 2-norm,

or infinite norm measures often because they satisfy

these conditions that ensure consistency in

exploration and that ensure efficiency in exploration.

So the next questions, are they the only distance functions we have?

Are there any other distance functions or smarter functions that we like to use?

It turns out that there are many.

Again, it kind of depends on the application. There are many.

In the next few slides,

I'm going to introduce briefly some of these that we will be using

throughout the course in the latest units of the course.

One commonly used function to measure

similarity in vector spaces is called Intersection Similarity.

Note that unlike the P-norm or Metrics,

which were distance functions they were measuring distances,

Intersection Similarity measures similarity,

So, it's a Similarity function.

We'll see what that means.

Now lets consider two vectors.

In this example, I have vector A and vector B.

And vector A has eight components or eight dimensions or eight bases vectors.

Vector B is in the same space.

So, it also has eight components,

eight dimensions or eight bases vectors.

And for a second let's basically visualize,

let's assume that all of these are positive.

And let's basically visualize them in the form of a bar chart.

So, in this example,

this is the component one,

component two, component three,

component four, component five,

component six, component seven,

component eight, each one corresponds to a different basis vectors.

So, given this, how do we define Intersection Similarity?

Well, Intersection Similarity is defined,

given this bar chart representation in terms of

minimum values of the two vectors for each component,

and maximum values for each vector, for each component.

Let's see an example. So, what do we do?

We start from component one.

We find the minimum of both of them.

We then go to vector two,

we find the minimum of both of them.

We repeat this for each component of the vectors.

So, in this example the blue bars

essentially correspond to minimum value for each component separately.

Okay. What about Max?

Well, we can do the same thing and compare

the two bars for each component of the vectors.

We can start from basis one or component one.

We find the Maximum of both.

We can then go to a basis vector two and we can find the maxim of both.

Then we can go to component three or

basis vector three and we can find the maximum of both,

and then we can go over each of the eight bases vectors like this.

In that case we

get the purple bars or the orange bars in this case,

which gives us the maximum values.

What Intersection Similarity does, is the following.

It states that if I add

these minimum bars to each other and if I add the maximum bars to each other,

so I sum up the minimum values,I sum up the maximum values for each of the components.

If these vectors are similar,

then the minimum values and the maximum values should be similar to each other.

Which means that the summation of these should also be similar to each other.

So, the result should

be similar to one.

That's what would happen if basically I have A1 and B1 close to each other.

If A2, B2 close to each other if B3,

A3 close to each other,

then the blue area and the orange area will be close to each other.

So if I sum up the blue area,

and sum up the orange area,

and if I take the ratio,

I will get something close to one.

If however, the minimum values are very small close to zero.

Maximum values are very large.

If that's the case then,

this division would be close to zero.

Why? Well because, what I'm summing up,

the minimum values are all small values,

the maximum value on the other hand are all large values and I summed them up.

The result would be a value close to zero.

So, essentially, what the Intersection Similarity measures,

is how large the maximum contributions

are relative to the minimal contributions of these components.

If the two vectors are similar they have

similar components then I'm likely to get a value close to one,

so, if I get the value closer one,

I will call the vectors to be similar to each other.

If however, these vectors are very different from each other,

that I'm likely to get a value close to zero.

And if that is the case,

I'm going to refer to these vectors as being dissimilar.

So essentially, in this case not that I'm not measuring how much I

displace in the vector space to go from one Vector point to the other.

Instead, I am looking at

the individual contributions of the different compounds of the vectors,

I sum them up, and then I look at

the ratio of the minimum components to the maximum components,

and that gives me its net value.

So this is called Intersection Similarity,

and this is used very often in cases where the vectors that we are dealing with

have positive values and they can be sort of considered as accounts of things,

because essentially, if you take a look at that,

what I have here is a histogram for an object A,

and another histogram for an Object B.

And we will basically discuss what histogram means,

and we will basically see that we are using

vector spaces to explore data that represented in the form of histograms,

Intersection Similarity is often very effective.

Now, is this the only two measures?

Because so far, we learned about the one norm, two norm,

infinity norm as distance functions and Intersection Similarity is similar to function.

Are there others? Well, there is another measure that is very often

used and it is known as angle-based similarity measures.

So, let's see what this means,

what an angle-based similarity measure means.

In this example, we have a vector space consisting of two basis vectors,

let's call them X, Y and Z.

And in this vector space,

we are given three vectors,

one of these vectors is 5555,

all three components have value five,

the second vector is 2222,

all of the components have values two,

and the third vector in the space has three for the first component,

has two for the second component and has five for the third component.

Now the question is, if I am given A as my query,

A is my starting point,

which of the other objects,

B or C is more similar to my query object A?

So, that's the question. Now, if we go with the distance function measure,

you can kind of see that C looks closer to A because in the vector space,

C is nearer to A.

On the other hand, if you look at the example closely,

you will see that the composition of A is very similar to the composition of B,

A has 555 for all of the dimensions and B has 222 for all three dimensions.

They had very different values,

one of has five, the other one has two,

but if I look at the relative contribution of each of the components,

I can see that A and B are actually quite similar to each other.

So essentially, what angle-based similarity measures does,

is instead of evaluating the distance in the vector space between the two vectors,

they try to measure the composition,

they try to compare the composition of the input vectors.

So in this example,

if I was using an angle-based similarity measure,

I would say that, well, you know what?

A and B are actually very similar to each other because

the composition of A and B are actually very similar to each other,

C on the other hand has a different composition,

so C is not similar to A and not similar to B.

So, how do I measure them?

So, one example is to measure the angle between the two vectors.

In this case, if I basically look at the angle between A and B and if I take the cosine,

I will see that A and B are on the same direction,

so the angle between A and B is zero,

so cosine A and B is cosine zero which is equal to one,

which is the largest cosine value that I can get.

So, essentially, one common used angle-based measure

is the cosine of the angle between the two vectors.

In fact, there are two angle-based measures that are often used.

One of them is, I just mentioned,

it's called cosine similarity and the other one is the dot product similarity.

Mathematically, they are very similar to each other.

In the dot product similarity,

what you do is you take the vector and then you take the product

of each of the components and you add them up,

and that essentially gives you the dot products in that.

If that value is high,

you say that the two vectors are similar to each other and there is a reason for that,

that's essentially the projection of vector A on vector B but we

will not discuss that for now,

but in the later units,

this is going to be an important concept.

And cosine similarity is once again similar

to the dot product similarity because you compute the product between vectors A and B,

you do that as the first step,

but then you also

scale the dot product with the length of vectors A and B.

Why do you scale the lengths between them?

Essentially, what you are doing, if you look at here,

I have vector A,

I have vector B, they of different length but the length doesn't matter.

They have the same composition.

If they have the same composition,

the angle between them is zero,

irrespective of what is the length of A, what the length of B.

So, to eliminate the effect of the length,

we basically divide the dot product

with the length of the vector A and the length of the victor B.

Note that if the vectors are of unit length,

that is vectors have the same length,

the cosine similarity and dot products are the same, because,

remember the cosine similarity is nothing

but dot product scaled with the vector's length.

If the length of vector A is equal to one,

if the length of vector B is equal to one,

it is easy to see that the cosine and dot product are the same.

Okay. So far we have learned three classes of distance and similarity functions.

The first one were the P-norms,

norm one, norm two, norm infinity.

The second one was intersection similarity,

the third one is the angle-based measures

which is cosine and dot product. Are there others?

Well there are others,

and again I'm not going to go over each one of them in this lecture.

However, I would like to highlight three of them

that are used quite often in different applications,

one of them is a similarity measure called Pearson's correlation.

Essentially, Pearson's correlation measures

the strength of the linear association among the corresponding components of two vectors.

So, when you are trying to measure your application,

required to measure the strength of the linear association between

the components of the vectors views what's known as Pearson's correlation.

KL-Divergence is a distance measure and it is

used often if they provided input vectors are probability distributions.

And we will see cases in which the vectors are represented as probability distributions,

but KL-Divergence essentially is

used to measure the divergence between two probability distributions.

A third measure that is used fairly

frequently especially in multimedia applications is called Earth-movers distance,

and once again, Earth-movers distance is used when the vectors that we

are exporting are interpreted again as probability distributions.

Once again, I'm not going to give

the definition of these similarity and distance functions,

but it's important to note that they exist and they

are used in applications where either we are looking for

strength of linear associations or we are

measuring the degree of divergence among probability distributions.

So, to recap, to be able to use vector space for data exploration,

we first need to define a vector space,

that is need to select a set of basis vectors that we will use to represent our data,

and those basis vectors need to be non-redundant and they need to be complete.

Secondly, we need to select a distance or similarity function,

the distance or similarity function that we will use depends on the application,

we can use P-norms,

one norm, two norm,

infinity norm, we can use intersection-based similarity measure,

we can use an angle-based similarity measure or we can use

correlation KL-Divergence or Earth-mover distance

if what we are dealing with are probability distributions.

However, in any case,

what we need to do is we need to understand our application,

we need to understand what our vectors represent,

we need to understand what our vector space represent and we need to

select the appropriate basis vectors and appropriate distance or similar functions.

And I will stop here, thank you for your attention.