Imagine you're trying to organize a game of football in an international summer camp. You've gathered some interested people, you've placed them at different positions on the field, and you've introduced equipment to use during the game. You blow the whistle to start, and all of a sudden, you realized the game isn't working quite as well as you planned. First of all, some folks think you're playing American football. Others are thinking soccer and a few came ready for rugby. Also, it turns out that some of your players have never actually played the game before, or have only practiced on their own and don't know how to play as a team. There's general confusion as to what the rules are and what the overall game plan should be. Without some guidance, our game has quickly deteriorated into chaos. Managing how data is used across an organization can be a lot like this. There are typically many people involved, and those people have different points of view and expectations on how things should work. The idea of data governance is intended to put some structure around how data is managed and used in an organization. By establishing rules and processes around a variety of data related operations and decisions. In this video, we'll cover some of the most common areas addressed by data governance, and how data governance might be set up in an organization. Let's start by discussing four major functions of data governance. Establishing & Maintaining Standards. Establishing Accountability for Data. Managing & Communicating Data Development. And Providing Information about the Data Environment. A primary role of data governance is to establish and maintain standards around data. This can take a few different forms. The first, is identifying what sources are preferred for each type of data or metric used in an organization. There's an idea called Master Data Management, or MDM, which identifies the most critical data within an organization and ensures there is a clear understanding of where that data should come from and where it should be stored. A related idea is that of common reference data. Generally speaking, reference data provides sets of allowable values for certain data attributes, or provides additional descriptive information about key ideas in the company's data environment. Sometimes this data is loosely referred to as look up data, or dimensional data. Data governance helps to ensure that reference data is complete and accurate. Data governance also helps to establish common definitions and calculations. The same term might have different meanings across the organization. And different teams might use slightly different calculations to arrive at the same metric. Governance helps to ensure that everyone is on the same page and does things the same way. The last set of controls are around data access and compliance. A governance process can help to find who should have access to data under what circumstances, and is often applied in support of more general sarbanes oxilly, or sox controls and data privacy concerns. We'll talk about data privacy more in a separate video. The second major role of data governance is to establish and maintain accountability for data. We'll talk a bit more about how data governance programs are structured in a few minutes. But usually organizations assign responsibility for specific data domains to individuals called data stewards. Data stewards are generally accountable for ensuring that their area has the correct definitions and are responsible for the overall state of their data domain. Governance can also help identify who is responsible for addressing various types of data quality issues. Like data privacy, we'll cover data quality in more detail in a separate video. The third role of data governance, is to help manage the overall process of data development and to communicate changes to the data environment. Lots of teams use data and everyone of them probably has a laundry list of additions or modifications they'd like to see implemented. However, there's usually not enough capacity to accomplish them all and there needs to be some way of prioritizing the work that needs to get done. Governance can help by providing a process for vetting, assessing, and prioritizing which data projects are undertaken, usually by rationalizing those projects against the overall business priorities of the enterprise. Because data environments are constantly evolving, there also needs to be some mechanism for letting the users of the data know when new data is added. Or some change or improvement is made. Having a well structured data governance approach can facilitate communication about data and make sure everyone is informed and aware of the changes. The last role that data governance plays, is in providing information about the data environment itself. There's a broad class of activities called metadata management, which helps to keep track of metadata, or data about data. Given that we've gone through all the trouble of creating standard definitions and calculations, it's generally useful to formally document them and provide that documentation to the enterprise. We also might want to provide information about the lineage of data and metrics, which traces where data elements come from. Or keep a history of changes that have been made to a data environment. All of these would fall under metadata management. We also might want to provide information about the quality of certain data domains or metrics. Governance mechanisms can help serve as a clearing house for this type of information. Metadata can speak to the what and where of the data environment, but it can also indicate how good the information is. Likewise, governance can help keep track of the who, including tracking who data stewards are and who may be involved in other data governance functions. Users can consult this information to determine who to contact with questions or concerns about the data. So that's what data governance does, let's switch gears a bit and talk about how it works. There can be a lot of variance in how data governance is implemented within an organization. However, there are a few characteristics that are almost always present in a successful program. The first is cross-functional representation. The whole point of data governance is to get everyone on the same page. To do that, everyone needs to be involved. The best governance structures have broad participation across technical and nontechnical teams, usually via something like a data governance council that brings those groups together and addresses governance issues. The second, is an ongoing process and schedule. A data governance council doesn't do much good if it never convenes, or doesn't convene often enough. Or if it doesn't make any decisions, or if it has no mechanism to execute on decisions. A sound data governance program provides the structure. The third common element, is a set of defined roles. Someone needs to act as the defacto leader of the program. This may be a Chair of the Governance Council or other leader. Earlier, we discussed to role of data stewards. Some form of data stewardship or ownership is critical to a successful governance program. Beyond these ideas, data governance structures can take many forms and you may see some functions implemented in different ways. For example, sometimes an organization will formally staff a data governance to team that develops and coordinates processes and handles things like meta data management or data quality. However, more often than not, data governance is executed virtually, with responsibilities rolled into the normal job functions on those on the cross functional team. Likewise, the drive for data governance can come from different parts of the organization. In some cases, the function is executed out of IT. Other times is driven by an analytics team. It's also quite common to see data governance driven by a functional group, like finance or operations. This doesn't change the cross functional nature the activity, but who takes the lead can say a lot about how the organization thinks about data. Finally, some organizations may adopt more formal tools to assist in their data governance efforts. While others take a less formal or manual approach. There are robust software tools that can help with master data management, metadata management, or data quality. But not all organizations find them necessary. It's not uncommon to see organizations build their own tools or use informal documentation methods, like Wikis or even standalone documents, to manage data governance activities. At this point, we've covered the major roles the data governance programs play in organizations. As well as how those functions are executed. With some detail around what good programs have in common and how they tend to differ. Why is this important to you as the data analyst? It's really all about knowledge, context, and the ability for you to have confidence in the data you're using for analysis. Being tied into your company's data governance program, or helping to create one if it doesn't exist, can help you rapidly learn about what's available, understand what's good and what's not. And keep abreast of new additions or changes to the data environment. This in turn will assure that you're always using the best data available for the job. And will help you produce the best insights you can and have confidence in your results.