So in this video we are going to create

our actual simulated data,

and I'm going to show you quite a

few ways to go about this.

You can see there we're going to create

five variables, computer variables.

They're going to be called age, wcc.

In my world, that would stand for white cell count,

the type of cells in your blood that fight infection.

Crp, in my world, once again,

that would mean C-reactive protein,

a type of protein that

increases in your bloodstream if you have an infection.

Then we're going to have two,

and I'm going to do two for specific reason.

Two categorical variables, treatment,

and result, and we'll go through those.

So first let's start with age.

What we're going to do is just use the RAND function.

R-A-N-D, short for random,

and that is an in-built base function in Julia.

We can use it just to generate a random number.

In this instance, I'm using two arguments.

My first argument is a range and you see the range there,

I'm not using a step size,

in other words, the default step size

is going to be used which is one,

and it just says 18 to 80 and that is inclusive.

So 18 is included and 80 is included as well.

So it's going to be 18, 19, 20, 21, etc.

The second argument that I'm using is

just the number of values that I want.

In this instance, I want a 100 values.

So just using RAND in this form is

going to give me random values,

random data point values from a uniform distribution.

So all the values from 18 to 80 inclusive have

an equal likelihood of being chosen at random,

and that is what the RAND function's going to do for us.

So we can have these 100 random values between 18 and 80.

Now, if you run this code,

you're going to see different values because we are not

setting the pseudorandom number generator

in this instance.

So next, I'm going to actually take some random value.

So you can see I'm using RAND there again,

there we go, highlighted for you.

But this time it's going to be from a distribution.

Just to make things very clear,

I'm using the full package name

dot it's inbuilt function.

So Distributions.Normal, but because

we said using distributions,

remember, you don't have to do this.

I just want to make it clear where

this normal function comes from.

This normal function is not part of just the base Julia,

it comes from the distributions package.

It is saying, well,

take from a normal distribution

and it takes two arguments.

The first argument is the mean,

and the second argument is the standard deviation.

So it is creating a distribution,

a normal distribution with a mean

of 12 and a standard deviation of two,

and that is my first argument.

So that takes the place of

this range object that I made here 18 to 80,

that takes the place of that.

So I'm saying give me a 100 values from

this very specific normal distribution at random,

and I want a 100 of those as well.

But you can see that all of these,

this RAND function with its two arguments.

Again, this selection from

which it is selecting and

the number of values that I want,

a 100 data point value as

the first argument in the round function.

Round because I only want

a single decimal point for my white cell count.

I don't want five, six,

seven digits as far as

the decimal places are concerned which the

normal RAND function's going to return for me.

I wouldn't just want one.

So the round takes the two arguments,

this list of values that we're creating and then digits,

a keyword argument equals one.

So I just want one decimal point.

And here in Julia 1.0,

very important that you put the dot there,

referring to the fact that you want

this rounding function to apply

to each element in

this list of 100 values that we are creating.

So it's round. and then

we're going to get just that one little value.

Crp, this is another way

that you can go about this and look very carefully.

So it's the RAND function again,

but I'm not using digits equal

zero as a keyword argument.

I'm actually using an int as my first argument,

which says whatever these values

are that we're going to get back,

each of these elements, I want them to be

rounded to an integer, the nearest integer.

This time around, just to show you,

we're using another distribution.

This is the chi-squared distribution.

It takes a single argument and

that argument is the degrees of freedom.

So masking for four degrees of freedom.

So a chi-squared distribution

with four degrees of freedom.

That's the three to right to tail distribution.

We are going to get a 100 values from that,

but the round function I'm doing in a different way.

Remember still the dot because I want us to apply

to each and every element,

and instead of this using this

and then comma digits equals zero,

I'm using int at

the beginning and then it's just going to

return the same thing for me is going

to return integer values.

Then I'm using a bit of broadcasting as well

because once I have these 100 values,

I want each of them to be multiplied by 10.

I'm just scaling each one of them,

so it's dot multiply 10.

So that will just take care of each and every element.

Now, I'm going to use the RAND function

again to create this computer variable called treatment,

which is going to hold the list object

for me and this list objects tend to be

a random values again and instead of

range like 18 to 80 or distribution and passing a list.

This list has two characters in it. Two strings.

I should say, because we're using

double quotes "A" and "B",

and I want 100 of those.

So again, I'm going to get A, A,

B, A, B, B, A, etc.

Again, it's a uniform distribution,

A and B at every turn,

every one of these 100

turns has an equal likelihood of being chosen.

Then, I'm going to create the third one,

and this time I'm going to have three

strings to choose from;

improved, static, and worse.

I want 100 of them as well.

So that's a very nice way to generate

your own data and we see here we have numerical data.

We have categorical data

and some are from uniform distribution.

The numerical is at least and some are from

a normal distribution and one

from a chi-squared distribution.

So very nice range of

different data types that we are creating

here and we're going to use them

in the next section where

we just trying to describe this data.

So I'm going to hit Shift and Enter and that

is going to then create these values for me,

but I'll see you in the next video.