Welcome, we're going to shift gears a little bit in this video to talk about perhaps a less exciting but vitally important topic with you, data governance and profiling. Now, data governance and profiling is not something that we're actually using SQL for in the sense that I'm going to teach you a new function or anything. But it's so important when it comes to writing really clean and quick queries where you're getting this data and how you actually want to get it back. So I want to touch on this a little bit because although this class is about SQL for data science, it's really about the application of using SQL for data science. An important part of that is understanding data governance and being able to profile your data. So after this lesson, you should be able to define data governance and profiling, explain the importance of data governance and profiling in your data appropriately. And discuss some methods to profile your data. Okay, let's define data profiling first. Profiling your data is where you're looking at either descriptive statistics or different information on the data. And the reason why this is important is because we've talked about how you need to understand your data first, before you start querying it. So profiling is a great step to begin to understand your data. It's really simple things that you can do to start to profile your data. Some of the things you can do is start with just understanding how many rows are in the table. You can also look at when was the object last updated? Meaning when did the data get refreshed and reloaded? Because this may change some of your results. This may change the data that you need to limit it by. If it's getting refreshed nightly, or in real time, do you need to set a certain date parameter or are you okay with a consistent flow of data? So understanding these things is essential for understanding how you might want to pare down your data to get what you're looking for. You can also do some column data profiling. So for this example, just start off with looking at what is the actual column data type? Is it a date? Is it a timestamp? Is it a date stamp? Is it a string or an integer? This is going to change how you write your queries, for what functions you're going to be able to use as well. Although another thing to look at is how many distinct values are in this column? How many rows have no values in them? How many NULLs are there? Is this something you need to be concerned about and deal with? Or is there some other issue, and some of the functions you're working with will take care of that? And then you have your simple descriptive statistics, minimum, maximum, average, standard deviations. Things like that are going to be really helpful in getting you to know your data and profiling it and understanding it. And the reason that this is important is because you need to be able to test your data along the way. So as you start to write your queries, test to make sure you're getting the results back that you expect. If you're doing a left join from a table that has 100 rows on the left side, and then 50 on the right side, and you only get 50 rows back. Well, you know you did something wrong, because it should bring back everything from the left side. It should at least have the hundred rows. Understanding your data can help you in testing. And that's why using some of these simple things to look at, such us number of rows, distinct values, minimum and maximum, can really help you get around this space. And get you familiar with writing the correct queries in the long run. In terms of governance, this is really dependent upon what the data strategy is at the company you're working for. Some have really strict governance policies, and some are more open and free. In terms of a data site using SQL, it's important to understand what your read and write capabilities are in the different environments. Is there a sandbox you can play in and work in to do some of the transformations and things like that to your data? These are good questions to ask. This is really more so getting in contact with whoever manages this at your organization. But understanding the governance around the data and what you're actually able to do is important to know to keep your environments clean. This should go without saying if you've done any programming, but clean up after yourself as you start to write. As you start to really explore your data, you quickly type something and then clean something up and then try something else. Just really keep what you're doing and keep your work really clean. And then also understand what the promotional process is through your environments. If you created a model and you want to give those predictive scores and write them back to the database, what environments do you have set up? Is this something that you're gonnabe able to go through development, acceptance, and production? What is the process for that and what are you writing it to? These are just a few little tips on how profiling your data and understanding governance regulations around them. But it's definitely important to look at this at your organization and find out what these governance policies are. So that you know why something may or may not be working. Then you'll have the foresight to be able to write your queries and extract and insert your data into a table in the way your organization views as best practice.