Governance: The Who, What, When, Why, Where and How of Access Data

Data can be one of the most powerful tools to improve customer experiences and increase customer acquisition and retention. Nevertheless many businesses are feeling the pressures of data overload. Documenting business objectives helps determine what data should be captured, how the data is related, and how it should be structured to transform your data into useful information.

Master Data Management is a comprehensive platform that delivers consolidated, consistent and authoritative master data across the enterprise and distributes this master information to all operational and analytical applications. Its capabilities are designed for mastering data across multiple domains ranging from: Customer, Supplier, Site, Account, Asset and Product including many others.

Data Management Association (DAMA) has identified 10 major functions of Data Management in the DAMA-DMBOK (Data Management Body of Knowledge).

Data Governance is identified as the core component of Data Management, tying together the other 9 disciplines, such as Data Architecture, Data Quality, Reference & Master Data, Data Security, Database operation, Data development, Meta-data, Document & Content, data warehousing & BI.

DMBOK Wheel

Data Management Association (DAMA) Data Governance Framework

Effective data governance serves an important function within the enterprise, setting the parameters for data management and usage, creating processes for resolving data issues and enabling business users to make decisions based on high-quality data and well-managed information assets. But implementing a data governance framework isn’t easy. Complicating factors often come into play, such as data ownership questions, data inconsistencies across different departments and the expanding collection and use of big data in companies.

The essential WHO-WHAT-WHEN-WHERE-WHY-HOW information about Data Governance

Data-Governance-Methodology-1024x742

WHAT does Data Governance mean, and what does it do?

Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise.

DAMA defines Data Governance as: “The exercise of authority, control and shared decision-making (planning, monitoring and enforcement) over the management of data assets. Data Governance is high-level planning and control over data management.”

According to the Data Governance Institute, “Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”

WHO is involved with Data Governance?

Data Governance is of concern to any individual or group who has an interest in how data is created, collected, processed and manipulated, stored, made available for use, or retired. We call such people Data Stakeholders.

WHEN do organizations need formal Data Governance?

Organizations need to move from informal governance to formal Data Governance when one of four situations occur:

  • The organization gets so large that traditional management isn’t able to address data-related cross-functional activities.
  • The organization’s data systems get so complicated that traditional management isn’t able to address data-related cross-functional activities.
  • The organization’s Data Architects, SOA teams, or other horizontally-focused groups need the support of a cross-functional program that takes an enterprise (rather than siloed) view of data concerns and choices.
  • Regulation, compliance, or contractual requirements call for formal Data Governance.

WHERE in an organization are Data Governance Programs located?

This varies. They can be placed within Business Operations, IT, Compliance/Privacy, or Data Management organizational structures. What’s important is that they received appropriate levels of leadership support and appropriate levels of involvement from Data Stakeholder groups.

WHY use a formal Data Governance Framework?

Frameworks help us organize how we think and communicate about complicated or ambiguous concepts. The use of a formal framework can help Data Stakeholders from Business, IT, Data Management, Compliance, and other disciplines come together to achieve clarity of thought and purpose.

HOW do we assess whether we are ready for Data Governance?

Data Governance Maturity Assessment allow to measure the current state, determine both interim and long-term goals for improvement, provide the best practices that will move them to the next stage and assess their progress at any point in the process. As every organisation differs in their business, their systems, management style and so forth, performing the Data Governance Maturity Assessment will help in designing both the short and long-term goals for implementing a Data Governance program that is tailored for the organisation. To get the most out of your data warehouse and business intelligence implementation, a Data Governance Maturity Assessment should be performed.

HOW does an organization “do” Data Governance?

Data Governance programs tend to start by focusing their attention on finite issues, then expanding their scope to address additional concerns or additional sets of information. And so, the establishing of Data Governance tends to be an iterative process; a new area of focus may go through all of the steps described above, at the same time that other governance-led efforts are well-established in the “govern the data” phase.

In other words, a data governance framework assigns ownership and responsibility for data, defines the processes for managing data, and leverages technologies that will help enable the aforementioned people and processes.

The objectives of data governance are to:

  1. Enable better decision-making
  2. Reduce operational friction
  3. Protect the needs of data stakeholders
  4. Train management and staff to adopt common approaches to data issues
  5. Build standard, repeatable processes
  6. Reduce costs and increase effectiveness through coordination of efforts
  7. Ensure transparency of processes
  8. Ensure a single version of the truth for your organization

 

Data Governance Focus Areas

Data governance touches various components of enterprise information management and will have a different set of objectives and implementation approach while taking on a focus in one these specific areas.

Data Governance programs with different focus areas will, however, differ in the type of rules and issues they’ll address. They’ll differ in the emphasis they give to certain data-related decisions and actions. And, they’ll differ in the level of involvement required of types of data stakeholders.

Data Governance with Focus on Data Quality

The most common objective of Data Governance programs is to standardize data definitions across an enterprise. Quality needs to be a mandatory piece of a larger governance strategy. Without it, your organization is not going to successfully manage and govern its most strategic asset: its data. Any good active data governance methodology should let you measure your data quality. This is important because data quality actually has multiple dimensions which need to be managed.

Data governance initiatives improve data quality by assigning a team responsible for data’s accuracy, accessibility, consistency, and completeness, among other metrics. This team usually consists of executive leadership, project management, line-of-business managers, and data stewards. The team usually employs some form of methodology for tracking and improving enterprise data, such as Six Sigma, and tools for data mapping, profiling, cleansing, and monitoring data. At the end of the day, data quality and data governance are not synonymous, but they are closely related.

Data Governance with Focus on Privacy / Compliance / Security

The digital era has created unprecedented opportunities to conduct business and deliver services over the Internet. Nevertheless, as organizations collect, store, process and exchange large volumes of information in the course of addressing these opportunities, they face increasing challenges in the areas of data security, maintaining data privacy and meeting related compliance obligations.

Big data privacy falls under the broad spectrum of IT governance and is a critical component of your IT strategy. You need a level of confidence in how any data is handled to make sure your organization isn’t at risk of a nasty, often public data exposure. That extends to privacy for all your data, including big data sets that are increasingly becoming part of the mainstream IT environment.

Cyber Security – Companies of all sizes need to:

  • Understand who can access which types of data, via what means, and within what parameters (time of day, department, location, and many more)
  • Determine what data is sensitive
  • Review and authorize access
  • Monitor who is actually accessing the data
  • Detect unauthorized access in real-time
  • Track data access patterns
  • Be able to perform forensics after the fact

Data Governance with a Focus on Data policy, standards, strategies

The focus on data policies, data standards, and overall data strategies are usually the first step when an organization initiates a data governance function.

Data Governance with a Focus on Data Warehouses and Business Intelligence (BI)

This type of program typically comes into existence in conjunction with a specific data warehouse, data mart, or BI tool. These types of efforts require tough data-related decisions, and organizations often implement governance to help make initial decisions, to support follow-on decisions, and to enforce standards and rules after the new system becomes operational.

Data Governance with a Focus on Architecture / Integration

This type of program typically comes into existence in conjunction with a major system acquisition, development effort, or update that requires new levels of cross-functional decision-making and accountabilities.

Data Governance with a Focus on Management Support

Data Governance programs with a focus on Management Support typically come into existence when managers find it difficult to make “routine” data-related management decisions because of their potential effect on operations or compliance efforts.

It is important to recognise that data governance is not an IT function. Accountants can play a key role in enabling Data Governance, and ensuring that it is aligned with an organization’s overall corporate governance processes. Accountants already are familiar with applying many of the principles above to the financial data that they work with in a regular basis.

Becoming involved in a data management or data governance initiatives provides the opportunity to apply these principles into other parts of the organization. Developing a successful data governance strategy requires careful planning, the right people and appropriate tools and technologies. IT is a member of the data governance board but any effective data governance program requires executive sponsorship and business involvement.

 

 

http://datagovernance.comdg_data

http://www.aicpa.org/InterestAreas/InformationTechnology/Resources/DATAANALYTICS/DownloadableDocuments/Overview_Data_Mgmt.pdf

http://searchdatamanagement.techtarget.com/essentialguide/Building-an-effective-data-governance-framework

http://aisel.aisnet.org/cgi/viewcontent.cgi?article=3324&context=cais

http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/supporting-your-data-mgmt-strategy-phased-approach-mdm-106228.pdf

https://www.informatica.com/content/dam/informatica-com/global/amer/us/collateral/white-paper/metadata-management-data-governance_white-paper_2163.pdf

http://www.oracle.com/technetwork/articles/entarch/oea-best-practices-data-gov-400760.pdf

http://www.intergen.co.nz/blog/Mark-Worthen/dates/2012/12/data-governance-an-overview-/

http://www.datasciencecentral.com/profiles/blogs/the-data-supply-chain-and-master-data-management

https://www.edq.com/blog/data-quality-vs-data-governance/

http://www.datagovernance.com/wp-content/uploads/2014/11/dgi_framework.pdf

http://www.datagovernance.com/adg_data_governance_basics/

The Data Scientist: Finding the right tools for the job

tools

 

Which technologies and tools look best on a resume? Do you need to know HBase, Cassandra, MySQL, Excel, SPSS, R or SAS? Professionals advise: all of the above.

Data analysts have many tools at their disposal, from linear regression to classification trees to random forests, and these tools have all been carefully implemented on computers. But ultimately, it takes a data analyst—a person—to find a way to assemble all of the tools and apply them to data to answer a question of interest to people.

While working as the chief data scientist at Facebook, Jeff Hammerbacher described how, on any given day, a scientist team would utilize Python, R and Hadoop, and then have to relay the analyses to colleagues. Additionally, a recent SiSense data professionals study found that 60 percent of respondents use three or more data warehouse and business intelligence interfaces.

The size of the data is growing rapidly at the same time we have lots of tools to deal with that data. We can categorize the software/tools based on the tasks and data that they can deal with. We can classify the available tools in the market based one request type as Reports and Dashboard development Tools, Statistical Packages and BI Tools.

Tools for Data Analysis – Reports and Dashboard development Tools

Report generation and Dashboard development is a daily task for any organization. They want to understand the data on timely basis. They will generate the reports daily, weekly, monthly, yearly and ad-hoc reports and dashboards. There are many tools available in the market like MS Excel, Tableau, QlikView, Spotfire. These tools are only for develop the reports and dashboards, they are also adding the capabilities or functionality to perform on statistical analysis. Excel is spreadsheet application. And we can say Tableau, QlikView, Spotfire are BI Tools. They have powerful engines to perform ETL.

Tools for Data Analysis – Statistical Packages

SAS and SPSS are the leaders to perform the advanced statistical modelling. We also have many other open source tools like R to perform statistical analysis. You have many bulit-in procedures and tools to deal with your data. You can do extracting the data, data cleaning, formatting the data, Tabulation and statistical analysis. And you can SAS and SPSS are BI Tools as they have very powerful modules to perform ETL. (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse.

Tools for Data Analysis – BI Tools

MS SQL Server, Oracle, SAP, Microstrategy, Spotfire, QlikView, SAS, SPSS Modeler are the leaders in this market. They can deal with large amount of data and perform the ETL and drill-down Analysis. We will see more about BI concepts in a separate topic.

Thankfully, there are ample resources on the Web to develop and hone your skills. Big Data University, for example, offers free resources to help data professionals gain proficiency in JAQL, MapReduce, Hive, Pig and others.

It’s also important to gain experience using these skills in the “real world.” Gopinathan advises aspiring data scientists to participate heavily in open-source projects and data contests, such as Kaggle, to practice utilizing technical, scientific and visual skills in real business scenarios.

In summary, to make the transition from BI specialist to data scientist is going to require the following new skills and capabilities:

  • Deep dive into the multitude of statistical and predictive analytics models. Without a doubt, you’re going to have to get out your college statistics and advanced statistics books and spend time learning how and when to apply the right analytic models given the business situation.
  • Learning new analytic tools like R, SAS and MADlib. R, for example, is an open source product for which lots of tools (like RStudio) and much training is available free and on-line.
  • Learning more about Hadoop and related Hadoop products like HBase, Hive and Pig. There is no doubt that Hadoop is here to stay, and there will be a multitude of opportunities to use Hadoop in the data preparation stage.  It’s the perfect environment for adding structure to unstructured data, performing advanced data transformations and enrichments, and profiling and cleansing your data coming from a multitude of data sources.

As you make your way through the world of data science, learning R programming and other important skills, it’s important to remember that data science isn’t just a collection of tools.

It requires a person to apply those tools in a smart way to produce results that are useful to people. Choosing the right modelling approach is often a creative exercise that demands expert human judgment.

No matter what type of company you’re interviewing for, you’re likely going to be expected to know how to use the tools of the trade. This means a statistical programming language, like R or Python, and a database querying language like SQL.

 

http://www.ibmbigdatahub.com/blog/data-scientist-mastering-methodology-learning-lingo

http://www.dataiku.com/blog/2013/11/10/The-Six-Core-Skills-of-a-data-scientist.html

http://www.oracle.com/us/corporate/profit/big-ideas/052313-gshapira-1951392.html

http://www.ibmbigdatahub.com/blog/data-scientist-mastering-methodology-learning-lingo

http://www.datasciencecentral.com/profiles/blog/list?user=02bj314n4m3i2&page=3

http://www.exploringdatascience.com/the-data-science-clock/

http://blog.udacity.com/2014/12/data-analyst-vs-data-scientist-vs-data-engineer.html

http://www.datasciencecentral.com/profiles/blogs/the-data-supply-chain-and-master-data-management