1:04
So a raw of the data matrix now includes all the statistics of interest to us and
the dimensionality here is simply defined by R and
each roll then cut us once to defend microstructure.
So here, in this notation, it is assume that there are G microstructures.
So based on what we learned in the previous class,
the simple application of PCA will this data matrix.
Tells us that we can expect A transformation show on us here and
essentially the alpha j are now going to be the PC scores.
These are going to describe or quantify the microstructure
then you started representation as we learned in the previous lesson.
Simply comes by truncating the principle components, PC scores or PC weights.
And that decision is based on the eigenvalues you find in the PCA.
In general, we note that the dimensional reduction
is such that the new dimensionality of the microstructure is
significantly smaller than either J or R, either ways much lower.
Let's look at 1 simple example first.
In this example,
we have five different heat treatments in an alpha beta titanium alloy.
These are simply labeled as HT1, HT2, HT3, HT4, and HT5.
Moreover, we have 20 images in HT 1 for example.
These protein images are obtained from 20 different samples
subjected to this heat treatment one.
Likewise in HT2 we happen to have 28 images,
32 images in HT3 again 32 in HT4 and in HT5.
So the total number of images is about 150 images.
Now what you are seeing on the right side here Is a visualization
of all 150 images, where each point here is 1 micrograph.
3:17
But we have reduced the dimensionality of each micrograph.
So in this micrograph, there are typically about
2,000 pixels times 1,500 pixels.
So, in other words, each micrograph has three million variables.
So, in the representation on the right, we have only 3 variables.
So, we went all the way from 3 million to 3 and
we represented the microstructure with three important measures for
each guess, and there are 150 data points on the plot on the right.
What you see right away is that there is natural clustering
because the measures that came out of one heat treatment tend to
be closer to each other than the measures from different heat treatments.
And this has happened very naturally.
In other words, while doing PCA, we did not tag each image and
saying that this image belongs to HT1 and this image belongs to HT5.
That information was removed from the data matrix, so
the data matrix didn't have that information.
Although the data matrix did not have that information PCA,
based on the patterns, the local patterns found on the microstructure
automatically classified this data set into five groups.
And the 5 groups are shown colored, for convenience of visualization, so
that comes out naturally from PCA.
Another feature that comes out quite naturally from PCA, it says that this
class of microstructures belonging to one of the heat treatments on the left.
Shows much less variance compared to, say, this class of microstructures.
So what it is telling us is that that process responsible for
this set of microstructures actually produces a lot more variance.
Then the process that produces this set of microstructure.
And that is one of the important low hanging fruit in terms of benefits
that we get from doing a PCA on the special correlations.
In the previous slide, we only showed three principle compliments.
Of course when you do the analysis you get a lot more than three principle
compliments.
5:37
The reason we cut if off at three principle components is, again,
remember that the eigenvalues tell us the variance in the dataset.
At the end of keeping three principle components,
we have already captured over 97% of the variance in the dataset.
So we decided that that was good enough.
Of course, if that's not good enough,
you can include higher order principal components core.
So if you go to four principal components instead of three you would
keep about 97.8% of the variance and so on and so forth.
So one gets this additional information from PCA which is usually
called a screen plot, and this plot tells us,
gives us objective guidance on way to truncate the principal components.
Now let's look at another set of examples.
In the previous set of examples,
the microstructures were obtained from actual experiments.
But one can also think of generating a very large set of synthetic
microstructures where you are just digitally making them up.
So for example, if one were to think of a matrix-precipitates system, two phases, so
in this case, the matrix is shown in black and the precipitates is shown in white.
One can think of making many, many classes of distributions.
In this particular one, we're only focused on four classes.
And one can also think of many shapes of conclusions and
one can think of many volume fractions of interest.
In this particular case study, we generated about 900 structures.
Of course you can generate a lot more but this was a particular example, a case
study, and that case study is described in this paper for further information.
So, nevertheless, in these 900 microstructures there's already
a rich distribution of Inclusion shapes,
placement of inclusions as well as volume fractions.
If you take all these 900 microstructures and
throw them into this protocol that we have been learning in this class.
That is first compute the two point set of sticks and
then do the principle component analysis.
So the principle component analysis we discussed in the previous lessons
applied to this 900 microstructures uses these plots.
The plots in the top row are projections so
the principal component analysis and two sets of axis at a time.
So the first plot is showing you PC1 PC2.
The second plot is showing you PC1 PC3.
And the third plot is showing you PC2 PC3.
So, it is actually the same plot.
The original plot is a 3D plot that contains all 3 scores PC1, PC2, PC3, but
what you're seeing projections, selected projections of this 3 dimensional plot.
And right away, you can see that the five pluses of placement of
the precipitate naturally lead to clustering.
Five different clusters in the principle receipt blocks.
Again, this comes out naturally.
To the PC analysis, we did not tag the microstructures, indicating that
some of these were random, horizontal, vertical, or clustered, or whatever it is.
This information was not provided to the principle component analysis.
In spite of not having that information, the data gets automatically clustered.
One of the benefits of doing the principal components is simply that we get three
principal components.
Whereas the original dataset, any one of this dataset, has 80X80 pixels that
means the original dimensionality of the microstructure is 6400.
So from 6400 you went on 3 principal components and yet the microstructures
have clustered as expected even though we did not provide that information.
Now what are we actually capturing in the principle compliment?
Here are some of the plots of what the average looks like.
This is average for all the 900 microstructures for
all the statistics of all the 900 microstructures.
This is a map of the first principle compliment, second principle compliment,
and third principle compliment.
So each one of these blocks is capturing a particular special pattern.
And this is some sort of a signature pattern.
And the PC score associated with each compliment,
then tells you how strong is this feature in the given microstructure.
As an example, if one looks at the version to Autocorrelation of
one of the 900 microstructures this is what you get.
This is the version Autocorrelation, and the symbol would be this one.
In the truncated principle component representation we are approximating this
using these stuffs.
Of course, there are other terms in the principal component analysis,
but we're ignoring that.
What this is, the pattern represented
by 5,1r, has this much trend, In this particular micrograph.
And likewise, the pattern represented by P2r has
this much strength in this micrograph, so on, so forth.
So the advantage of this visual content analysis representation of presented
presentation, is that the microstructure is represented by these three numbers.
These three numbers are the, weights of the different principle components.
A different microstructure in the same ensemble would have three other numbers
but every microstructure in that ensemble of 900 microstructures.
Now has three distinct, three numbers, a set of three numbers that points to it.
So thereon the hypothesis and our hope is that, this
representation is what we need to make connections to the properties and process.
In summary, we have learned in this lesson that application of PCA on spatial
statistics offers unsupervised classification of material structure.
Although we didn't explicitly state it all the algorithms that are used in
the analysis of the example from in this lesson are very broadly available.
And as a specific example they're easily accessible through
the by pymks.org code repository.
There also other open access,
open source repositories provide similar functionality, of course.
The PCA analysis also allows objective quantification of variance within
the microstructure ensembles.
Because the calculations are very cheap and computationally very efficient.
They can be attach to almost any in line analytics in term of especially
useful in wood expensive experiments as well as in expensive simulations.
Thank you.
[MUSIC]