0:00
Okay, yeah, [LAUGH] if you looked at that and said, hey, I was following along
perfectly fine until we got to that slide, [LAUGH] that's completely fine.
Parsing and converting dates is just nuts.
So you'll refer to that one-page guide, as I mentioned.
0:14
And there's a ton of different options for
converting between data types and parsing those dates.
There's a link down below to a lot of those functions as well, but
let's take a look at a few.
Before we do that, just understand the fact that there are different types of
data, data types, that are stored within your tables.
You could have numeric data, like your integers, so
somebody's salary could be stored as numeric.
Or you just have strings, like their first and last name is string data.
As you saw, the nuances there, you need to have those stored.
And you need to operate, if you're trying to do equivalency operators
like name equals something, they need to be in single quotes.
And then dates, again, are in that year, year,
year, month, month, day day format for BigQuery.
Other values, of course, you can have your Booleans, your yes/no fields.
And we'll introduce these last two in great detail in our architecture
lecture as part of course three, which is your array, which is kind of
a series of values, and your structs, which are flexible containers.
Don't worry about those two yet.
1:22
Okay, so say you were given a data set that wasn't particularly clean.
And you had to convert or
treat data types as another before storing it into an alternate reporting table.
This is super useful for kind of transforming your data.
How would you use that?
So if you're familiar with a little bit of SQL,
you also probably have ran into the CAST or convert function.
So in BigQuery and standard SQL, it's the CAST function.
So you have "12345", which is a string of numbers.
You want that to be treated as an integer, boom,
the result is then an integer, 12345.
Or if you have something that's stored as 2017-08-01, treat that as an actual date.
And allow me to use these special functions that
expect data in a certain type.
Or if you want to convert it to string.
Or the bottom example is an interesting one.
If you're unsure of the data that's coming in, so say you don't know that it's apple,
you're assuming that it's going to be a.
You ask somebody to put in their, I don't know, their age or something like that.
You're expecting it as an integer, right?
And you're not doing any kind of prefiltering or validation.
You can use something like a SAFE_CAST, so instead of erroring out,
you can basically say, well I'm going to convert it.
And if it doesn't match what I think it is,
I'm going to actually have it be an empty value, or a NULL.
Now, NULLs are super interesting, right?
So they are valid values, but a NULL is the absence of a value.
It's not "", it's not an empty string.
And it's not, like if you hit the space bar once, it's not just a single space.
It is the absence of data.
And then you'll see a lot more of that when we get into a lecture on joins where
you're actually merging two different data sets together.
And when they don't have matches, then you actually have valid missing values or
NULL values.
So if you work a lot with data, you'll become very,
very familiar with NULL values.
And actually how to deal with them when you're trying to match and
justifying those NULLs.
3:30
So let's get into a little bit of that now.
If you want to match on those NULLs, for example,
you want to find all of the values where the state is missing for these charities.
because you're saying hey, well if it's a US charity, it should have a state value.
Now, you would normally think that if you're going to use an equivalency
operator, like state equals California or state equals New York,
that you would use an equal sign.
So NULLs, again, super tricky beasts,
they cannot be equivalent to anything, not even themselves.
So you actually have a separate set of operators for NULLs and that's the IS, so
state IS NULL or state IS NOT NULL.
So just be familiar with when you're using NULLs, you'll be using the IS.
And there's special functions that handle NULLs as well, so if you wanted
to pass in data and if it was NULL, then set it to zero or something like that.
There's those NULL functions as well that you can wrap around your fields.
So for the state example, take a look at these.
These are US charities, or at least the ones that are filing with the US.
And they're missing the state value here.
So we have some cities, Canada for some of these cities as well.
But they're missing those state values, which is really interesting.
5:02
So not all countries could necessarily be using a state value.
And so for example, maybe Canada doesn't use a state value, or
Italy doesn't use a state value, much like the US would.
But they're operating, or at least doing business, in the US.
5:18
As we mentioned briefly before, year, year, year,
year or YYYY-MM-DD is the expected format for dates.
And that unlocks a ton of different date functions that you can
then use on that date itself.
So extracting the year, extracting the week, extracting the day,
that's the second row there.
The extract function's very powerful if you wanted to add or subtract or
take the difference between two intervals of time.
Truncate, so you only want to look at the month or
you only want to look at the year.
Or format into a different format, you can do that as well.
So once you get it into that proper DATE or DATETIME data type,
you unlock a lot of really, really, really powerful things that you can do.
So dates, it's a necessary evil [LAUGH] inside of SQL,
but you'll get familiar with them.
6:10
You can parse these string values.
So for example, before we were just casting between values for data types.
Now we can actually do some fun things like string manipulation functions.
So you're probably already familiar with concatenation.
So if you had two string values, much like "12345 "and then "678",
you could merge those two together.
Or say you have two fields, one's a first name, another person's a last name.
You could string together their first name and last name to get first and last name.
Or if it's just, a lot of these are kind of self explanatory.
If it ends with the letter e, then return true or false.
A very popular one is convert everything to lower case or
convert everything to upper case.
So in the example we had way earlier on whether or not it IS school or
IS NOT school for our charities.
You can think of well, I don't know whether or
not in my data set it's going to capital y or lower case y.
I'm just going to put everything to lower case and then match on a lower case y.
So you would do LOWER( whatever the field name was for
that school, I think it was operates as school.
And then you can set that equal to lowercase y.
And that will allow you to not worry about guessing.
And then, of course, you could use regular expressions in here as well.