So given those distances, dij, between data points i and j,
between city i and city j, can we recover the positions of those cities?
So I'm not taking the coordinates of those cities,
I'm just taking the distances between those cities.
Basically taking a graph where I'm specifying the length of the edges between
the nodes.
And then I want to find the positions of the nodes that satisfies
those edge lengths.
So we can do that by minimizing this function.
We basically take the distance between the two data points,
and then we want to subtract their actual distance, and we want to minimize that.
So if the distance between two data points equals the desired distance
between those two data points, then this should be close to zero.
And we square things here so that we keep everything positive so
that negative values aren't messing up our minimization.
So we can solve this with any number of non-linear optimization methods.
In the example I'm going to describe, I solved it just using
the Excel spreadsheet program with an optimizer plug-in.
So here's the results that I got,
I minimized that distance relationship between the actual
distance from my optimized variables and the desired distance.
And as you can see, things are obviously in the wrong place.
I've got the East Coast on the left and West Coast on the east, but
that should still preserve distance.
And there's a few other differences,
but basically I've got this general east to west trend.
Seattle is above SF and LA, Denver is here, Chicago is where it's at.
So I've preserved the distance of my
points even though I haven't preserved the exact geometry of the points.
And so this can be a useful layout method, especially for high dimensional data when
you're trying to preserve the distances between the points without necessarily
representing accurately their spatial configuration, their original coordinates.
So we've thrown away the original coordinates of these points.
And just using their distance relationship,
tried to preserve that in a reprojection, in this case in two dimensions.
So there's all sorts of ways you can use MDS, multidimensional scaling.
You can visualize affinities between data points,
areas of collaboration based on co-authorship of papers.
If you're visualizing papers or the number of any other attributes two data
points might have in common, anything you can describe as a distance.
It doesn't have to be coordinate distance, it can be other similarities.
It's used in human-computer interaction for
user interface design to basically lay out buttons.
If you have a dashboard with a lot of buttons, you can lay out the buttons by
organizing them based on how similar the buttons are to a given task.
And then,
let multidimensional scaling compute the coordinates of where the buttons should be
located so that buttons that are used together are close to each other.
And it's also used in marketing to create what are called perceptual maps
that are based on survey data of what people think of different products,
what attributes people assign to different products.
And then you can figure out what products are similar to other products and
create a map of products that way.
So, multidimensional scaling is enabled by optimization.
You don't have to write your own optimization program,
you can use pre-existing optimization packages.
You just need something that's going to minimize a function, and so
you need some form of non-linear minimization in an optimization package.
I found one that was enabled in Excel in order to do
the example I described in the slides.
[MUSIC]