If you deal with large amounts of data, you have probably heard the term data governance and are either wondering what it is, whether it applies to you, or how to implement it. Simply put, data governance is all about the policies you develop to take care of your data – how you store it, make it available, validate it, protect it, and ultimately, how you use it. Data governance includes defining access plans: who can view, use, and share your data.
These questions are of growing importance, as businesses rely on collecting and storing large amounts of data – and analyzing that data – to fulfil their business goals. Data becomes an organization’s stock in trade, its business medium, and its trade secrets. Data breaches can cause legal entanglements, as well as a loss of confidence in the core business.
Left to chance and the workings of the various departments that must deal with your data store, you will end up lacking a policy for managing your data, and perhaps letting each department set its own policies. This is as unthinkable as lacking any policies to manage physical stock, and permitting each brick-and-mortar department leeway to create, store, and distribute merchandise. Misuse of stock or data can cost an organization millions of dollars, and thus policies are developed so that usage is consistent, secure, and available when needed. These policies, when applied to data, are what constitute the field of data governance.
Attributes of data governance
Data governance policies must cover the entire data lifecycle. Policies must cover everything from data collection to curation. Within that lifecycle, data governance must address the following:
- Where you obtain your data, and how, is the beginning of the data lifecycle. What your sources are, determines the foundation of your data governance policy. One important factor determined by your sourcing strategy, for instance, is the size of your data set. Are you collecting from targeted market sourcing, existing customers, social media? Are you using an outside vendor to collect and perhaps analyze the data you collect? What is the incoming data stream? Data governance must look at these questions and establish policies that control the collection of data, how your outside vendors can interact with the data they collect or analyze for you, and what the path and the lifecycle of the data will look like.
- Validating data, particularly data collected from a wide variety of sources, is the kind of problem that keeps data managers up at night. How to distinguish significant input from noise is just the beginning of the problem. If you are collecting data from affiliates, you must be sure the data is sound. If you are collecting from social media sites, you need to have in your policy a way of validating significant data. In all cases, you must be sure that your incoming data is legitimate and has not been tampered with – an issue that is particularly of concern in parallel computing environments that are frequently employed to collect large amounts of data, and which frequently make use of the Cloud, opening an additional vulnerability.
- Data governance policies must address storage, and strategies for storage will very much depend on the size of your data set. Big data, where the numbers might be in the petabytes, must be stored in secure, redundant systems, frequently using a hierarchical system to make data available based on frequency of use so that expensive, online systems supply frequently-requested data, while less-requested data is stored on less expensive and less available systems. Unfortunately, these lower-priority systems may have a lower security as well, allowing access to sensitive but infrequently-requested data, so a good data governance policy must look at a variety of factors when establishing a data storage plan.
- Data governance must establish access policies, again balancing needs and security issues. Those who require the data for their work must be able to access it as needed without roadblocks; for security reasons, they should not have access to more than they need. Data should be available when legitimately requested, but for security reasons sensitive data should be less available, and only to users of a certain security level. Levels of access should be assigned to people as well as to the data itself, and account management that interacts closely with HR and sourcing departments is a critical piece of the overall picture, so that terminated employees and vendors cease to have access in a timely fashion. Determining these details and ensuring ownership and responsibility is part of a complete data governance plan.
- Usage/Sharing/Analysis. How your data may be used is an extremely important part of a data governance policy. Likely use cases are requiring data to manage accounts, improve customer experience, create targeted advertising, feed market analysis, and share with affiliates. What data may be shared or used for marketing must be carefully defined and protected against attacks and breaches – as, indeed, should data used for purely internal purposes. Ensuring customers know an organization’s data use and sharing policy is required of all organizations that collect data. Ensuring compliance with regulations regarding the use of data is another valuable contribution of having formal data governance policies.
- Collection, validation, storage, access, and usage are all necessarily part of your security plan, and there must be an overarching policy that addresses these and other security issues. Your security plan must be effective without being prohibitive to users, but all parts of the data lifecycle are vulnerable to attack and breaches caused by carelessness. Security must support, not impede, necessary usage. The security strategies applied to data are defined in your data governance policy, including access protocols, and the encryption used for your data at rest as well as in transit.
- Curation / Metadata. The data lifecycle is not complete without curation. An example of curation is application of metadata to a piece of data to identify it for retrieval. Metadata can include such things as the origin of the data, date of creation and/or collection, access level information, semantic classification, and other attributes as required by your business needs. Data governance can establish a metadata vocabulary and define parameters for the shelf-life of data – it’s important to remember that data can expire, and at some point may only be of use for historical data analysis.
Organizational issues in data governance
Data governance often has to be established over the resistance of groups that fear they will lose needed access to data, and sometimes the resistance of groups that historically do not share their data for competitive reasons. Data governance policies need to address these concerns in a way that builds acceptance among the varying groups. Organizations that are used to working in silos may have a difficult time with new data governance policies, but today’s reliance on large data sets and an accompanying proliferation of security issues make it necessary to establish and enforce organization-wide data policies.
Data has incrementally become a part of the organizational infrastructure, and decisions are made along the way, at each small step, of how to handle this given situation, or that other given situation, with decisions being made on a one-off basis, often as a reaction to a specific problem. As a result, an organization’s approach to data can vary with the department and even with the circumstance within the department. Even if every department develops a reasonable plan for taking care of its data, at the very least the organization will find itself trying to reconcile disparate plans which may be in conflict with each other. Making sense of the requirements and demands on the data store can be an intimidating job – and if it isn’t done right, you can lose the potential power of your data to work for your marketing and customer-retention teams, as well as leave you legally liable if your policies lead to a breach.
Adding to the problem is that in a large organization, departments tend to compete for resources, and frequently also must compete for their needs to be heard among the cacophony of equally important needs of other departments. Departments, asked only to ensure their own viability as a profit or support center, develop tunnel vision about their own requirements and find it difficult to reach a compromise without mediation.
A data governance board looks at existing data policies, unmet needs, potential security issues and much more, and creates governance policies to normalize the collection, curation, storage, access, and usage policies of your organization, while considering the needs and requirements of each department and function. The board acts as a mediator to balance these competing needs and reconcile security concerns with access demands to ensure the most efficient and secure data management policies.
Steps for successful data governance
- Create a Data Governance Organization. The Data Governance Institute recommends a data governance board be established, which can assess input from various users of data and build an organization-wide data management policy designed to satisfy multiple needs and demands from inside users, external users, and even legal demands. The board should include business stakeholders from a broad sampling of your business areas to ensure needs are being met, and it is important that all types of data ownership are represented. Security experts should also be part of the team. It is critical to know what the goals for your data governance board are, so spend some time considering the reasons the organization has a need for a formal data governance policy, and articulating them clearly.
- Develop a framework within which the many data requirements can be accommodated. The framework will ensure that all the pieces fit together into a unified whole that satisfies collection, storage, retrieval, and security requirements. In order to do this, your organization will need to articulate its end-to-end data strategy in order to design a framework that will address all the requirements and necessary operations. Components must be planned to fit together and to support each other, so that your retrieval requirements can be executed in a high-security environment, as an example. Compliance with regulatory mandates might also require a particular design as part of the framework so you can track and report on regulatory issues. The framework will include logging and other security measures that can give early warning of an attack. Validating the data before putting it to use is part of the framework as well. The data governance board should understand each piece of the framework, and be able to define its purpose and how it fits into the overall data lifecycle.
- Pilot Data Strategy. Typically, a strategy should be rolled out to a small section of your business at first, so that flaws in the plan, the framework, and the infrastructure are identified before requiring compliance across your organization.
- Have an Ongoing Data Governance Organization. The board is an ongoing entity, as expansion of data governance policies to new business areas will certainly require some adjustment to policies. Moreover, as technology evolves, your data policy should evolve, to keep pace with security developments, data analysis methods, data management tools, and so on.
- Understand what defines a successful data strategy. Create criteria for success, and benchmarks so you can measure your progress along the way. Defining your data management goals helps you define key success indicators so you can be sure your data governance strategy is moving in the direction you want and need.
Big and small organizations face similar data challenges. The larger the organization, the more data, and the more data, the more necessary it is to develop an effective, formal data governance strategy. Smaller organizations may do well enough with an informal data management policy, but the organization size must be small, and their dependence on data for their operations must be minimal. Even an informal plan must consider, at the very least, collection, validation, access, and storage of customer and employee data.
A more formal plan becomes appropriate when an organization becomes large and data needs become multi-departmental, when the data systems and data set are too large to be easily navigated, when business needs demand an enterprise-level strategy, or when legal or regulatory needs demand it. It is time if you find that departments are developing their own strategies to manage their data. It is time as soon as you have enough data to be a hacker’s target. In short, if you have to ask the question “Is it time?” then you can be pretty sure it’s time to put in place a formal data governance policy.