0:07

This lecture is about using a time series

as context to potentially discover causal topics in text.

In this lecture, we're going to continue discussing Contextual Text Mining.

In particular, we're going to look at the time series as a context for

analyzing text, to potentially discover causal topics.

As usual, it started with the motivation.

In this case, we hope to use text mining to understand a time series.

Here, what you are seeing is Dow Jones Industrial Average stock price curves.

And you'll see a sudden drop here.

Right.

So one would be interested knowing what might have caused the stock

market to crash.

0:48

Well, if you know the background, and you might be able to figure it out if you

look at the time stamp, or there are other data that can help us think about.

But the question here is can we get some clues about this

from the companion news stream?

And we have a lot of news data that generated during that period.

1:08

So if you do that we might actually discover the crash.

After it happened, at the time of the September 11 attack.

And that's the time when there is a sudden rise of the topic

about September 11 happened in news articles.

1:26

Here's another scenario where we want to analyze the Presidential Election.

And this is the time series that are from the Presidential Prediction Market.

For example, I write a trunk of market would have stocks for each candidate.

And if you believe one candidate that will win then you tend to buy the stock for

that candidate, causing the price of that candidate to increase.

So, that's a nice way to actual do survey of people's opinions about

these candidates.

2:10

Or in a social science study, you might be interested in knowing what method

in this election, what issues really matter to people.

Now again in this case,

we can look at the companion news stream and ask for the question.

Are there any clues in the news stream that might provide insight about this?

So for example, we might discover the mention of tax cut

2:35

has been increasing since that point.

So maybe, that's related to the drop of the price.

So all these cases are special cases of a general problem of joint

analysis of text and a time series data to discover causal topics.

The input in this case is time series plus

text data that are produced in the same time period, the companion text stream.

3:28

Now we call these topics Causal Topics.

Of course, they're not, strictly speaking, causal topics.

We are never going to be able to verify whether they are causal, or

there's a true causal relationship here.

That's why we put causal in quotation marks.

But at least they are correlating topics that might potentially

explain the cause and

humans can certainly further analyze such topics to understand the issue better.

3:59

And the output would contain topics just like in topic modeling.

But we hope that these topics are not just the regular topics with.

These topics certainly don't have to explain the data of the best in text, but

rather they have to explain the data in the text.

Meaning that they have to reprehend the meaningful topics in text.

Cement but also more importantly,

they should be correlated with external hand series that's given as a context.

So to understand how we solve this problem, let's first adjust to

solve the problem with reactive topic model, for example PRSA.

And we can apply this to text stream and

with some extension like a CPRSA or Contextual PRSA.

Then we can discover these topics in the correlation and

also discover their coverage over time.

5:05

But this approach is not going to be very good.

Why? Because

awareness pictured to the topics is that they will discover by PRSA or LDA.

And that means the choice of topics will be very limited.

And we know these models try to maximize the likelihood of the text data.

So those topics tend to be the major topics that explain the text data well.

aAnd they are not necessarily correlated with time series.

Even if we get the best one, the most correlated topics might still not be so

5:37

So here in this work site here, a better approach was proposed.

And this approach is called Iterative Causal Topic Modeling.

The idea is to do an iterative adjustment of topic,

discovered by topic models using time series to induce a product.

6:09

And then we're going to use external time series to assess

which topic is more causally related or correlated with the external time series.

So we have something that rank them.

And we might think that topic one and topic four are more correlated and

topic two and topic three are not.

Now we could have stopped here and

that would be just like what the simple approached that I talked about earlier

then we can get to these topics and call them causal topics.

But as I also explained that these topics are unlikely very good

because they are general topics that explain the whole text connection.

They are not necessary.

The best topics are correlated with our time series.

6:51

So what we can do in this approach is to first zoom into word level and

we can look into each word and the top ranked word listed for each topic.

Let's say we take Topic 1 as the target examined.

We know Topic 1 is correlated with the time series.

Or is at least the best that we could get from this set of topics so far.

7:23

And if the topic is correlated with the Time Series,

there must be some words that are highly correlated with the Time Series.

So here, for example, we might discover W1 and W3 are positively

correlated with Time Series, but W2 and W4 are negatively correlated.

7:41

So, as a topic, and it's not good to mix these words with different correlations.

So we can then for the separate of these words.

We are going to get all the red words that indicate positive correlations.

W1 and W3. And

we're going to also get another sub topic.

8:07

Now, these subtopics, or these variations of topics, based on the correlation

analysis, are topics that are still quite related to the original topic, Topic 1.

But they are already deviating,

because of the use of time series information for bias selection of words.

So then in some sense, well we should expect so, some sense

more correlated with the time series than the original Topic 1.

Because the Topic 1 has mixed words, here we separate them.

8:46

can be expected to be better coherent in this time series.

However, they may not be so coherent as it mention.

So the idea here is to go back to topic model by using these

each as a prior to further guide the topic modeling.

And that's to say we ask our topic models now discover topics that

are very similar to each of these two subtopics.

And this will cause a bias toward more correlate to the topics was a time series.

Of course then we can apply topic models to get another generation of topics.

And that can be further ran to the base of the time series to set after the highly

correlated topics.

And then we can further analyze the components at work in the topic and

then try to analyze.word level correlation.

And then get the even more correlated subtopics that can be

further fed into the process as prior to drive the topic of model discovery.

9:46

So this whole process is just a heuristic way of optimizing causality and

coherence, and that's our ultimate goal.

Right?

So here you see the pure topic models will be very good at

maximizing topic coherence, the topics will be all meaningful.

10:02

If we only use causality test, or correlation measure,

then we might get a set words that are strongly correlate with time series,

but they may not necessarily mean anything.

It might not be cementric connected.

So, that would be at the other extreme, on the top.

10:21

Now, the ideal is to get the causal topic that's scored high,

both in topic coherence and also causal relation.

In this approach,

it can be regarded as an alternate way to maximize both sine engines.

So when we apply the topic models we're maximizing the coherence.

But when we decompose the topic model words into sets

of words that are very strong correlated with the time series.

We select the most strongly correlated words with the time series.

We are pushing the model back to the causal

dimension to make it better in causal scoring.

And then, when we apply the selected words as a prior

to guide a topic modeling, we again go back to optimize the coherence.

Because topic models, we ensure the next generation of topics to be coherent and

we can iterate when they're optimized in this way as shown on this picture.

11:20

So the only I think a component that you haven't seen such a framework is how

to measure the causality.

Because the rest is just talking more on.

So let's have a little bit of discussion of that.

So here we show that.

And let's say we have a topic about government response here.

And then we just talking more of we can get coverage of the topic over time.

So, we have a time series, X sub t.

11:43

Now, we also have, are give a time series that represents external information.

It's a non text time series, Y sub t.

It's the stock prices.

Now the the question here is does Xt cause Yt?

12:08

There are many measures that we can use in this framework.

For example, pairs in correlation is a common use measure.

And we got to consider time lag here so

that we can try to capture causal relation.

Using somewhat past data and using the data in the past

12:26

to try to correlate with the data on

points of y that represents the future, for example.

And by introducing such lag, we can hopefully capture some causal relation by

even using correlation measures like person correlation.

12:52

And the idea of this test is actually quite simple.

Basically you're going to have all the regressive model to

use the history information of Y to predict itself.

And this is the best we could without any other information.

So we're going to build such a model.

And then we're going to add some history information of X into such model.

To see if we can improve the prediction of Y.

If we can do that with a statistically significant difference.

Then we just say X has some causal inference on Y,

or otherwise it wouldn't have causal improvement of prediction of Y.

13:32

If, on the other hand, the difference is insignificant and

that would mean X does not really have a cause or relation why.

So that's the basic idea.

Now, we don't have time to explain this in detail so you could read, but

you would read at this cited reference here to know more about this measure.

It's a very convenient used measure.

Has many applications.

13:55

So next, let's look at some simple results generated by this approach.

And here the data is the New York Times and

in the time period of June 2000 through December of 2011.

And here the time series we used is stock prices of two companies.

American Airlines and Apple and

the goal is to see if we inject the sum time series contest,

whether we can actually get topics that are wise for the time series.

Imagine if we don't use any input, we don't use any context.

Then the topics from New York times discovered by PRSA would be

just general topics that people talk about in news.

All right. Those major topics in the news event.

14:41

But here you see these topics are indeed biased toward each time series.

And particularly if you look at the underlined words here

in the American Airlines result, and you see airlines,

airport, air, united trade, or terrorism, etc.

So it clearly has topics that are more correlated with the external time series.

On the right side,

you see that some of the topics are clearly related to Apple, right.

So you can see computer, technology, software, internet, com, web, etc.

So that just means the time series

has effectively served as a context to bias the discovery of topics.

From another perspective,

these results help us on what people have talked about in each case.

So not just the people, what people have talked about,

but what are some topics that might be correlated with their stock prices.

And so these topics can serve as a starting point for

people to further look into issues and you'll find the true causal relations.

Here are some other results from analyzing

Presidential Election time series.

The time series data here is from Iowa Electronic market.

And that's a prediction market.

And the data is the same.

New York Times from May 2000 to October 2000.

That's for 2000 presidential campaign election.

Now, what you see here

are the top three words in significant topics from New York Times.

16:21

And if you look at these topics, and they are indeed quite related to the campaign.

Actually the issues are very much related to

the important issues of this presidential election.

Now here I should mention that the text data has been filtered by using

only the articles that mention these candidate names.

16:53

But the results here clearly show that the approach can uncover some

important issues in that presidential election.

So tax cut, oil energy, abortion and gun control are all known

to be important issues in that presidential election.

And that was supported by some literature in political science.

17:35

So there are two suggested readings here.

One is the paper about this iterative topic modeling with time series feedback.

Where you can find more details about how this approach works.

And the second one is reading about Granger Casuality text.

17:55

So in the end, let's summarize the discussion of Text-based Prediction.

Now, Text-based prediction is generally very useful for

big data applications that involve text.

Because they can help us inform new knowledge about the world.

And the knowledge can go beyond what's discussed in the text.

18:28

Text data is often combined with non-text data for prediction.

because, for this purpose, the prediction purpose,

we generally would like to combine non-text data and text data together,

as much cruel as possible for prediction.

And so as a result during the analysis of text and

non-text is very necessary and it's also very useful.

Now when we analyze text data together with non-text data,

we can see they can help each other.

So non-text data, provide a context for mining text data, and

we discussed a number of techniques for contextual text mining.

And on the other hand, a text data can also help interpret

patterns discovered from non-text data, and this is called a pattern annotation.