此课程是为影响转变数据成为更好的决定的想法而设计。最近在数据采集技术上的显著提升改变了公司进行有效决定的方式。

Loading...

来自 University of Pennsylvania 的课程

运营分析

个评分

此课程是为影响转变数据成为更好的决定的想法而设计。最近在数据采集技术上的显著提升改变了公司进行有效决定的方式。

从本节课中

Predictive Analytics, Risk

How can you evaluate and compare decisions when their impact is uncertain? In this module you will learn how to build and interpret simulation models that can help you to evaluate complex business decisions in uncertain settings. During the week, you will be introduced to some common measures of risk and reward, you’ll use simulation to estimate these quantities, and you’ll learn how to interpret and visualize your simulation results.

- Senthil VeeraraghavanAssociate Professor of Operations, Information and Decisions

The Wharton School - Sergei SavinAssociate Professor of Operations, Information and Decisions

The Wharton School - Noah GansAnheuser-Busch Professor of Management Science, Professor of Operations, Information and Decisions

The Wharton School

In session one, we compared decision making processes in business settings with

and without significant uncertainty.

We have used the wireless data plan problem as an example

illustrating a high uncertainty business setting.

We have learned about reward and risk measures, and in this session,

we will look at how we can estimate the values of reward and

risk for any course of action we choose using a simulation toolkit.

Let's begin.

Okay, in section two, we're going to build an algebraic model for

evaluating a wireless data plan, and then use the simulation toolkit

to obtain estimates of the reward and the risk measures associated with this plan.

Here's a reminder of the business context we're looking at.

The current wireless data plan that our consultant has charges $10 per

gigabyte of data.

The new data plan charges a fixed fee for all data usage up to 20 gigabytes,

and then it also charges $15 per gigabyte of data above 20.

For example, if the data usage in a particular month is 22 gigabytes,

she'll have to pay $160 for the first 20 gigabytes and $30 for the next two.

So her total payment in such month will be $190.

If, on the other hand, her data usage in a particular month

does not exceed 20 gigabytes, her monthly payment will be just $160.

Based on historical data usage values, our consultant estimates

that her family's monthly data usage is normally distributed with mean 23 and

a standard deviation of 5 gigabytes.

So she also knows the distribution of her monthly payments under the current plan,

normal with mean $230 and a standard deviation of $50.

What about the distribution of monthly payments under the new plan?

If we want to estimate the reward and the risk measures for

this new plan, in other words, the expected monthly payment and

the standard deviation of monthly payments, what are they?

Let us use the verbal description of the new plan to connect the random input,

data usage U to the random output,

whose probability distribution we're interested in, monthly payment P.

If U is below or at 20, then the monthly payment P is 160.

If U is above 20, then the monthly

payment is 160 + 15 x (U- 20).

In other words, extra payment of $15 for every gigabyte above 20.

Now we can combine both cases using one EXCEL formula, IF.

The IF formula has the following form,

IF(Condition, Choice1, Choice2).

This function evaluates condition and IF happens to be true,

then U becomes equal to Choice1.

And if the condition happens to be false, then IF becomes equal to Choice2.

So in this case, if U is greater than 20,

then IF selects Choice1, which is 15 x (U- 20).

Otherwise, it selects Choice2, no extra payment.

We have a formula that expresses the monthly payment P as a function of

monthly data usage U.

So let's return to our main question.

If U is distributed as a normal random variable with mean 23 gigabytes and

standard deviation of 5 gigabytes, what is the probability distribution of P?

What reward value does our consultant get if she subscribes to the new plan?

What risk will she be facing?

As far as the reward is concerned, the expected payment value, can we just plug

in the expected value of U 23 into our formula, and get the expected value of P?

It's very tempting to try and

get the estimate of the expected value of a random key performance indicator

by replacing all the random factors it depends on by the expected values.

For example, if we plug in 23 gigabytes into that formula, we get a value of $205.

Is this the expected value of our monthly payment P?

Unfortunately, while there are cases when you can get the correct estimate for

the expected value of a key performance indicator in this way,

in many other cases, this is the wrong way to go.

If you have a contract where payment, P, is a linear function of U,

in other words, P is equal to U times constant plus or

minus another constant, then this approach will work.

For example, in the old data plan, our consultant pays $10 for each gigabyte.

So under that plan, the monthly payment P is equal to 10 times the data usage U.

10 times U is a linear function of U, and the expected value of P in this case is

just 10 times the expected value of U, or 10 times 23, which is 230.

But here's a simple example showing us how it all can go wrong.

Let's say our data usage U can only take 2 values,

each with 50% probability, 18 gigabytes and 28 gigabytes.

The expected value of U under this probability distribution is still 23

gigabytes, and the standard deviation is still 5 gigabytes.

Let's see what kind of distribution we will get for

the monthly payment amount, P.

If U is 18, then P is equal to 160.

If U is 28, then P is 280.

The expected value of P, is 0.5

x 160 + 0.5 x 280, $220.

Now this is very different from $205 we obtained earlier by

replacing the random variable, U, by its expectation.

Please keep this observation in mind.

In general, we cannot simply replace random variables in our formulas and

hope to get the right estimates of the key performance measures.

So we're back to asking the same question.

What are the reward and risk measures that describe our new plan?

How do we estimate them?

The answer is use simulation.

Simulation is a tool for converting probability distributions of random

factors we cannot perfectly control, like data usage, into probability

distributions for outcomes we're interested in, like monthly payment.

In simulation, we'll call random factors like data usage random inputs, and

outcomes with distribution would like to obtain random outputs.

Here's how simulation works.

In each simulation step,

we generate a random instance of the input quantity like data usage U.

In other words, we know the distribution of this random input, in our case,

it is normal with mean 23 and standard deviation of 5 gigabytes.

And we will instruct Excel or any other simulation tool we use

to pull one value from that distribution.

Then, we use the formula that connects our random input U, in our case, and

the random output P, in our case,

to calculate the value of the output random variable corresponding to that

instance of the random input variable we just generated.

In other words, we ask a simulation tool to generate an instance of a random data

usage value U, and then calculate the corresponding monthly payment value, P.

We can repeat the simulation step,

which we will call a simulation run As many times as we like.

Since at every simulation run, we convert a random input value, such as data usage U

into the corresponding output value, such as payment P, the simulation basically

converts the set of random input numbers into a set of random output numbers.

We will use the term input sample and

output sample to describe the sets of numbers generated during the simulation.

Once we generate a sample of output values, for

example, is sample of payment values.

Or can use the sample to estimate the expected value of the output,

standard deviation etc.

In other words, we can estimate reward and

risk measures that we will use later to choose the best course of action.

We will use Excel for running simulations and for analysing their results.

Specifically, we will use an Excel add-in called Analysis Tool Pak to run

simulations.

This add-in is a part of standard Excel installation on Windows.

If you're using Mac,

here's a link to a free software that has similar capabilities as Analysis Tool Pak.

If you're using Google Sheets,

an add-on called XLMiner Will have similar functionality.

Okay.

We're ready to use Excel to set up and

run a simulation for the monthly payment values under the new wireless data plan.

We have created an Excel template, dataplan_0,

you can use to follow our setup.

In this first example, we'll set up and run a simulation with just

ten simulation runs to help us understand how simulation works.

Okay let's go to dataplan_0.

We start with a template dataplan_0

that contains all the data we need to set up a simulation.

In our analysis, we'll use Excel 2013 on Windows.

We'll be setting up our simulation using Analysis Tool Pack,

which is a standard add-in in Excel.

It is usually located under data tab in the portion called analysis,

right next to the solver button.

If you do not see data analysis there, you should go to file, options,

add-ins.

And here what says, manage Excel add-ins, you click go, and

you wanna make sure that the Analysis Tool Pak is checked.

While setting up our simulation, keep in mind that the ultimate purpose of running

a simulation is to obtain estimates for

the measures of reward and risk associated with a new data plan.

As we discussed earlier,

we will use the expected monthly payment under the new plan as a measure of reward,

and the standard deviation of monthly payment as a measure of risk.

Okay, in order to prepare our simulation let's first add headers to cells C1,

D1, and E1.

In C1, let's enter simulation run.

This header will indicate that in column C,

we will count the instances of our random variables.

In D1, let's put data usage,

U in gigabytes.

In column D, we will be generating random instances of the monthly data usage.

Finally, in E1,

let's put payment, P in dollars.

Column E will contain the monthly payment amounts calculated

using the random instances of the monthly data usage from column D.

In this example, we will generate ten random instances of monthly data

usages U and calculate ten corresponding values of monthly payment P.

Every time we generate a random instance of data usage,

we will count it as a simulation run.

So let's number the simulation runs will conduct 1 through 10,

and place the simulation run identifiers in column C, in cells C2 through C11.

Okay, the actual random instances of monthly data usage for

each simulation run will be store in the cells D2 through D11.

In particular, the cell D2 will contain the first random instance of monthly data

usage, the cell D3 the second random instance of monthly data usage, and so on.

But how do we generate those random instances?

Well, Analysis Tool Pad provides us with the tool to generate random numbers.

And that's the tool we're going to use.

Let's go to data, click on data analysis,

select random number generation, and click okay.

In the random number generation dialogue,

let's put in one, into the number variables box.

This tells Excel that we're going to generate the instances of a single,

random variable monthly data usage in our case.

Next, we'll put ten.

Into number of random numbers box.

This will instruct Excel to generate ten random instances of monthly data usage.

Next, in the distribution pull down menu, we choose

normal and specify 23 and five

as mean and standard deviation of the normal random variable.

Now, we have instructed Excel to generate ten random instances of monthly

data usage using a normal distribution with the parameters we specified.

As you can see from this drop down menu, Excel can generate random variables

using a number of different probability distributions, in addition to the normal.

Now, let's put some combination of numbers say, one,

two, three into the random seed box.

Random seed instructs itself to generate random numbers from the distribution with

specified in a particular way.

We do not need to worry to much about the seed value in order to run and

interpret the simulation.

Just keep in mind that if you are running Excel 2013 and Windows, and

you set up your model in the same way as I do, and

use the same number of simulation runs and the same seed, you will generate exactly

the same sequence of monthly usage values that I do.

So, if you want to compare your simulation results to mine,

you should be setting up your simulation model in the same way as I do.

Use the same number of simulation runs, like ten that I use in this case, and

the same seed, like 1, 2, 3 that I use in this case.

In practice, it does not matter much what seed you select as long as you run

a fairly long simulation.

In otherwords, as long as the value in the number of random numbers box is high.

We'll talk more about short versus long simulations and

about different seed values later this week.

Okay.

One last thing, in the output options,

let's select output range starting in the cell D2.

This way Excel will put the first random monthly usage number it generates into

the cell D2.

The second random monthly usage number into the cell D3 and so on.

Now, when you click OK, Excel will generate 10 random

monthly data usage numbers in cells D2 from D11,

using the distribution and the parameters we specified.

The numbers in the cells D2 through D11 are all instances of

a normal random variable with mean 23 and standard deviation 5.

In other words Excel generated, or using another word, simulated for us,

ten instances of what the future might hold in terms of monthly data usage.

Let's make sure that the values in the cells D2 through

D11 are visually distinguishable from the other values.

Let's change the font in the cells into green and bold.

We'll use this same visual designation for

the random inputs into our simulation models.

Now, what about the simulation outputs?

The quantity we're interested in is the monthly payment amount, P.

Let's go into the cells E2 through E11 and put in formulas that will

calculate the monthly payment value P, for any monthly Data Usage, U.

Let's start with the cell E2.

The way the new data plan works is by

charging $160 up front, and then by adding $15 for each gigabyte above 20.

Let's put this formula into the cell E2 using the value in

the cell D2 as the first possible instance of the monthly data usage.

We have already discussed an algebraic formula that calculates

the monthly payment for any value of monthly data usage.

The algebraic formula we put in the cell E2 is $160,

that's B5, plus if the data usage, D2,

exceeds 20, B4, then we're charged extra $15 for

each gigabyte of usage above 20.

Otherwise, there are no extra charges.

The result in this instance is $160

since the monthly usage in this instance fell below 20.

If we want to calculate the values of monthly payment corresponding

to the remaining nine random instances of data usage We need to copy and

paste the formula in E2 into the cells E3 through E11.

But before we do this we must use absolute cell references for

the cells B4, B5, and B6 for the formula in the cell E2.

Those are the parameters of our data plan and

they do not change when we copy and paste the payment formula.

So we go to the cell E2, And

use the shortcut F4 to put the dollar signs around B4, B5, and B6.

Let's start with B5, And

then move to B4, and

then B6 and then B4 again.

Okay.

After that, we can copy and

paste the payment formula into the cells E3 through E11.

So we now have 10 random monthly payment values that

correspond to 10 random monthly data usage that Excel generated.

In other words,

Excel generated a random sample of the future data usage values and we have used

the payment formula to convert the sample into a sample of future payment values.

This random sample of payment values is the output of our simulation.

We will use blue color and bold font to make sure that those values look different

from the random input values, and from other values on the spreadsheet.

To facilitate the future analysis of the results of our simulation,

let us calculate the average and the standard deviation of the simulated

samples of the monthly data usage, and the corresponding monthly payment values.

The average of a sample of random numbers is also called sample mean.

And the standard deviation of a sample of random numbers

is also called a sample standard deviation.

Let's put out these headers into the cells C13 and C14.

Sample mean, and

sample standard deviation.

Let's look first at the sample of monthly data usage,

where we'll calculate the average of these ten numbers in the cell D13.

So we put in the formula,

average of D2 through D11.

As we can see, the sample mean in this case is about 25.

25 gigabytes.

And in the cell D14,

we will calculate the value of the standard deviation of the same sample.

The formula we put in the cell D14 is,

STDEV of the same cells, (D2:D11).

There's several formulas for calculating the standard deviation in Excel, and

those formulas are applicable in different settings.

The STDEV formula is used for a sample of random numbers.

The sample standard deviation for this particular sample is around 7.8 gigabytes.

Let's use the font for the cells D13 and

D14, as we did for the cells D2 and D11.

So it's a green font and bold.

Now we're ready to compute the estimates of the reward and

risk measures associated with the new data plan.

As we discussed earlier, we will use the estimate of the expected value of

the monthly payments as a reward measure, and the estimate of the standard deviation

of the monthly payments as the measure of risk.

In order to calculate the sample mean and sample standard deviation for

our sample of payment values, all we have to do now is to just copy and

paste the formulas from the cells D13 and D14 into the cells E13 and E14.

We'll use the same font scheme for E13 and E14 as we did for

E2 through E11, so it's blue and bold.

So, The estimate for

the measure of reward associated with the new data plan is about $253 and

the estimate for the measure of risk is about $92.

For completeness, here's the picture of the Excel file data plan 10 we created

with all the formulas that helped us to set up and run the simulation.

In the next session, we'll have a more detail look at the simulation results.

In this session, we have learned to use simulation to estimate the reward and

risk measure associated with any potential decision we can make.

Next time, we'll look at the interpretation of the simulation results.

In particular, we will compare the results of short and long simulations,

to see how precise our simulation estimates are.

We will also look at histograms as a convenient way of presenting the results

of the simulation.

See you next time.