This course answers the questions, What is data visualization and What is the power of visualization? It also introduces core concepts such as dataset elements, data warehouses and exploratory querying, and combinations of visual variables for graphic usefulness, as well as the types of statistical graphs, tools that are essential to exploratory data analysis.

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

Welcome to the next video in the Introduction to Data Exploration unit.

In this video, I will focus on Vector Spaces.

We will learn what are vector spaces,

we will learn what's the distance measured in a vector spaces.

And we'll also discuss what we call similarity measures,

If you remember from the last videos,

we have seen that one way to represent data,

one way to represent multidimensional data,

is to map it into a vector space.

For example, in this slide we see that we have a set of images and what we do is

we take all of the images and now we somehow map them into what we call a vector space,

and within this vector space we can then go and

measure similarities or distances between these images.

And this can provide the way to

design data retrieval systems as well as data exploration systems.

So, this vector space becomes a representation

in which we can actually explore data, explore complex data.

An important questions, of course when we are

designing a vector space for exploding data,

is to define the vector space itself.

And as we see this slide,

to define a vector space,

we need to identify what we call basis vectors.

So, these are the vectors in the vector space that you use to represent

other objects and you also defined in

the same space that we call distance and similarity functions.

So, in this lecture,

we will first focus on the definition of basis vectors,

we will define what a basis vector is.

We will also discuss some of the good features of basis vectors.

So, how do we select these basis vectors?

How do we decide the vectors that we use to represent our complex data?

And in the upcoming units of the data exploration course,

will also focus on the second question,

how many features we need to represent our vector space?

How many basis vectors we need?

How do we select them?

And so on.

This lecture, we are primarily focusing on the definition of

the basis vectors and also some of the critical properties.

So, what's a vector space?

A vector space is actually nothing but a set of objects,

a set of objects.

A set of objects make up a vector space.

So, this set of objects could be anything,

could be images, could be audio,

could be video, could be social media data,

could be records in a database, could be anything.

However, for us to call this set of objects a vector space,

they need to satisfy certain conditions.

And these conditions, the vector space essentially needs to have

a set of properties that enables us to operate on it.

One of these is what we call addition operation.

That is if I have two objects that I represent as vectors,

I should be able to add them and the result should also be an object in the same set.

So, if for example,

if I have two objects,

and if two images then I combine them,

the result should be another image object.

A second thing that is critical to define

a vector space is what we call scaling operation.

That is, I should be able to take a vector,

multiply it with a real number,

positive or negative and the results should also be a vector in the same space.

That is, I can't take any match and I can scale the image.

For example, I can scale the number of

pixels in the image and the result is still that image.

So, scaling is another required operation in the vector space.

And next requirement is that you should have a specific vector,

specific object in the vector space that we call zero object.

This object essentially is

a special object where if we add the zero object to another object,

we get the same object itself.

So, the zero object essentially is,

doesn't have an effect for the addition operation.

The same way, if we scale an object with number one,

real value one, we get the same object itself.

So, those that are essentially primarily it.

So, if we define any set of objects and if you can define addition

and scaling on those objects and we can identify at least one of the objects,

actually exactly one of the object is the zero object,

then we have a vector space, that's it.

That this is essentially the definition.

So, the question is how do we use these definitions,

addition, scalars, multiplication and zero objects to define a vector space?

Because this still doesn't say,

how do we use this vector space for representing

our vectors or to define distance or similar to measure things in vector space.

So, we'll discuss that next.

One of the key concepts in the vector space is linear independence.

We call a set of vectors linearly independent

if the only way to obtain the zero object by.

properly scaled versions of the vectors,

the only way to combine them to obtain the zero object

is if the scaling factor that we use is zero.

Now, why is this important?

So, it looks like a requirement of what,

how was this useful?

So, this is useful because it makes us,

it gives us a way to define non-redundant vector sets.

Let me give an example. Let's assume that we have

three vectors such that

when I properly scale them and I add them together,

I can obtain the zero vector.

Let's assume that I can't do that.

Not that, if I can do that,

one thing that I can do also is to move

this term on the right on side and to rewrite the entire equation S,

C1V1 plus C2V2 minus,

divided by C3 equal to V3.

What does this mean? This means that the vector

three essentially is a redundant vectors relative to

vector one and vector two because I can somehow

combined vector one and vector two and obtain vector three.

So, what this constraint essentially tells us is that if I

have a set of vectors which satisfy this condition then, they're non-redundant.

So, this is going to be an important requirement for us.

When we select the basis vectors,

we want them to be non-redundant and we will discuss why that is the case.

But please do remember that for basis vectors,

it is important that they are non-redundant.

Second requirement from basis vectors is that

in addition to being linearly independent we also want that

any vectors in our vector space should be able to

describe it in terms of sum combination of our vectors in the basis vector set.

Now, this is again very important because if you

remember this representation, any vector,

any object that we have in the vector space,

we should be able to describe it in terms of a combination of the basis vectors.

So, what basically this requirement says is that if

we are selecting a set of vectors to be our basis vectors,

we should be able to represent any vectors,

any object in that vector space in terms of a combination of the basis vectors.

So, what does this mean? Let's assume that we have

a vector space and let's assume that our vector space,

in the vector space we have identified V1,

V2 and V3 as linearly independent vectors.

What we also know is that if you give us

any other vectors in the vector space, let's say vector A,

we can represent it by a proper,

scaling values of the vectors in the basis vector set.

This means that given any vectors A I can represent

it as a sequence of scaling values or scaling factors of the basis vectors.

So, this is essentially it.

Now for this is basically what's the definition of a vector space.

And this is essentially how we use the vector how we use the basis set of

a vector space to represent the other objects in the vector space.

So, two key requirements from the basis vector set,

one of them it must be complete,

that is any object that I want,

I should be able to represent as a scaling factors,

as a set of scaling factors.

And also our basis vectors should be non-redundant, two key requirements.

Now, an important question of course arise,

is given a data object that I want to explore,

how many features that I extract from this or how many basis vectors do I need?

And obviously we said that basically these basis vectors must be non-redundant,

but there might be small number of non redundant vectors or.