1:04

So a raw of the data matrix now includes all the statistics of interest to us and

the dimensionality here is simply defined by R and

each roll then cut us once to defend microstructure.

So here, in this notation, it is assume that there are G microstructures.

So based on what we learned in the previous class,

the simple application of PCA will this data matrix.

Tells us that we can expect A transformation show on us here and

essentially the alpha j are now going to be the PC scores.

These are going to describe or quantify the microstructure

then you started representation as we learned in the previous lesson.

Simply comes by truncating the principle components, PC scores or PC weights.

And that decision is based on the eigenvalues you find in the PCA.

In general, we note that the dimensional reduction

is such that the new dimensionality of the microstructure is

significantly smaller than either J or R, either ways much lower.

Let's look at 1 simple example first.

In this example,

we have five different heat treatments in an alpha beta titanium alloy.

These are simply labeled as HT1, HT2, HT3, HT4, and HT5.

Moreover, we have 20 images in HT 1 for example.

These protein images are obtained from 20 different samples

subjected to this heat treatment one.

Likewise in HT2 we happen to have 28 images,

32 images in HT3 again 32 in HT4 and in HT5.

So the total number of images is about 150 images.

Now what you are seeing on the right side here Is a visualization

of all 150 images, where each point here is 1 micrograph.

3:17

But we have reduced the dimensionality of each micrograph.

So in this micrograph, there are typically about

2,000 pixels times 1,500 pixels.

So, in other words, each micrograph has three million variables.

So, in the representation on the right, we have only 3 variables.

So, we went all the way from 3 million to 3 and

we represented the microstructure with three important measures for

each guess, and there are 150 data points on the plot on the right.

What you see right away is that there is natural clustering

because the measures that came out of one heat treatment tend to

be closer to each other than the measures from different heat treatments.

And this has happened very naturally.

In other words, while doing PCA, we did not tag each image and

saying that this image belongs to HT1 and this image belongs to HT5.

That information was removed from the data matrix, so

the data matrix didn't have that information.

Although the data matrix did not have that information PCA,

based on the patterns, the local patterns found on the microstructure

automatically classified this data set into five groups.

And the 5 groups are shown colored, for convenience of visualization, so

that comes out naturally from PCA.

Another feature that comes out quite naturally from PCA, it says that this

class of microstructures belonging to one of the heat treatments on the left.

Shows much less variance compared to, say, this class of microstructures.

So what it is telling us is that that process responsible for

this set of microstructures actually produces a lot more variance.

Then the process that produces this set of microstructure.

And that is one of the important low hanging fruit in terms of benefits

that we get from doing a PCA on the special correlations.

In the previous slide, we only showed three principle compliments.

Of course when you do the analysis you get a lot more than three principle

compliments.

5:37

The reason we cut if off at three principle components is, again,

remember that the eigenvalues tell us the variance in the dataset.

At the end of keeping three principle components,

we have already captured over 97% of the variance in the dataset.

So we decided that that was good enough.

Of course, if that's not good enough,

you can include higher order principal components core.

So if you go to four principal components instead of three you would

keep about 97.8% of the variance and so on and so forth.

So one gets this additional information from PCA which is usually

called a screen plot, and this plot tells us,

gives us objective guidance on way to truncate the principal components.

Now let's look at another set of examples.

In the previous set of examples,

the microstructures were obtained from actual experiments.

But one can also think of generating a very large set of synthetic

microstructures where you are just digitally making them up.

So for example, if one were to think of a matrix-precipitates system, two phases, so

in this case, the matrix is shown in black and the precipitates is shown in white.

One can think of making many, many classes of distributions.

In this particular one, we're only focused on four classes.

And one can also think of many shapes of conclusions and

one can think of many volume fractions of interest.

In this particular case study, we generated about 900 structures.

Of course you can generate a lot more but this was a particular example, a case

study, and that case study is described in this paper for further information.

So, nevertheless, in these 900 microstructures there's already

a rich distribution of Inclusion shapes,

placement of inclusions as well as volume fractions.

If you take all these 900 microstructures and

throw them into this protocol that we have been learning in this class.

That is first compute the two point set of sticks and

then do the principle component analysis.

So the principle component analysis we discussed in the previous lessons

applied to this 900 microstructures uses these plots.

The plots in the top row are projections so

the principal component analysis and two sets of axis at a time.

So the first plot is showing you PC1 PC2.

The second plot is showing you PC1 PC3.

And the third plot is showing you PC2 PC3.

So, it is actually the same plot.

The original plot is a 3D plot that contains all 3 scores PC1, PC2, PC3, but

what you're seeing projections, selected projections of this 3 dimensional plot.

And right away, you can see that the five pluses of placement of

the precipitate naturally lead to clustering.

Five different clusters in the principle receipt blocks.

Again, this comes out naturally.

To the PC analysis, we did not tag the microstructures, indicating that

some of these were random, horizontal, vertical, or clustered, or whatever it is.

This information was not provided to the principle component analysis.

In spite of not having that information, the data gets automatically clustered.

One of the benefits of doing the principal components is simply that we get three

principal components.

Whereas the original dataset, any one of this dataset, has 80X80 pixels that

means the original dimensionality of the microstructure is 6400.

So from 6400 you went on 3 principal components and yet the microstructures

have clustered as expected even though we did not provide that information.

Now what are we actually capturing in the principle compliment?

Here are some of the plots of what the average looks like.

This is average for all the 900 microstructures for

all the statistics of all the 900 microstructures.

This is a map of the first principle compliment, second principle compliment,

and third principle compliment.

So each one of these blocks is capturing a particular special pattern.

And this is some sort of a signature pattern.

And the PC score associated with each compliment,

then tells you how strong is this feature in the given microstructure.

As an example, if one looks at the version to Autocorrelation of

one of the 900 microstructures this is what you get.

This is the version Autocorrelation, and the symbol would be this one.

In the truncated principle component representation we are approximating this

using these stuffs.

Of course, there are other terms in the principal component analysis,

but we're ignoring that.

What this is, the pattern represented

by 5,1r, has this much trend, In this particular micrograph.

And likewise, the pattern represented by P2r has

this much strength in this micrograph, so on, so forth.

So the advantage of this visual content analysis representation of presented

presentation, is that the microstructure is represented by these three numbers.

These three numbers are the, weights of the different principle components.

A different microstructure in the same ensemble would have three other numbers

but every microstructure in that ensemble of 900 microstructures.

Now has three distinct, three numbers, a set of three numbers that points to it.

So thereon the hypothesis and our hope is that, this

representation is what we need to make connections to the properties and process.

In summary, we have learned in this lesson that application of PCA on spatial

statistics offers unsupervised classification of material structure.

Although we didn't explicitly state it all the algorithms that are used in

the analysis of the example from in this lesson are very broadly available.

And as a specific example they're easily accessible through

the by pymks.org code repository.

There also other open access,

open source repositories provide similar functionality, of course.

The PCA analysis also allows objective quantification of variance within

the microstructure ensembles.

Because the calculations are very cheap and computationally very efficient.

They can be attach to almost any in line analytics in term of especially

useful in wood expensive experiments as well as in expensive simulations.

Thank you.

[MUSIC]