“IN-Fusion TableT” – THE GOOGLE WAY

7065d0de97305399fe5dae6487b48aee

Google Fusion Tables (or simply Fusion Tables) is a web service provided by Google for data management. Fusion tables can be used for gathering, visualising and sharing data tables. Data are stored in multiple tables that Internet users can view and download. In the 2011 upgrade of Google Docs, Fusion Tables became a default feature, under the title “Tables (beta)”.

Google Fusion Tables is a cloud-based service for data management and integration. Fusion Tables enables users to upload tabular data files (spreadsheets, CSV, KML), currently of up to 100 MB per data set, 250 MB of data per user. The system provides several ways of visualizing the data (e.g., charts, maps, and timelines) and the ability to filter and aggregate the data. It supports the integration of data from multiple sources by performing joins across tables that may belong to different users. Users can keep the data private, share it with a select set of collaborators, or make it public and thus crawlable by search engines.

The discussion feature of Fusion Tables allows collaborators to conduct detailed discussions of the data at the level of tables and individual rows, columns, and cells. HTML is useful for styling info boxes and adding more complex features. Fusion Tables maps have limited options and functionality compared to custom mapping applications, but they are far easier to build. Fusion Tables does not require knowledge of JavaScript or CSS to make online maps.

Fusion Power of Visualisation

  • Upload and manage map data
  • Map points, lines or areas
  • Create pushpin, intensity, and other types of maps
  • Create other types of visualizations (charts)
  • Embed your visualizations in a Web site
  • Share and collaborate with others

fusiion

A Science of Data-Visualization Storytelling

Data visualization is viewed by many disciplines as a modern equivalent of visual communication. A primary goal of data visualization is to communicate information clearly and efficiently to users via the statistical graphics, plots, information graphics, tables, and charts selected. Effective visualization helps users in analysing and reasoning about data and evidence. It makes complex data more accessible, understandable and usable.

Data visualization is both an art and a science. The rate at which data is generated has increased, driven by an increasingly information-based economy. Data created by internet activity and an expanding number of sensors in the environment, such as satellites and traffic cameras, are referred to as “Big Data”. Processing, analysing and communicating this data present a variety of ethical and analytical challenges for data visualization. The field of data science and practitioners called data scientists have emerged to help address this challenge. Well-crafted data visualization helps uncover trends, realize insights, explore sources, and tell stories.

120917visualisationmain_original

However, sometimes visualization tools may require technical knowledge or are just too expensive. That’s why I thought about using Google Fusion Tables to provide a few complementary visualizations to Google Analytics – it is a great tool, very user friendly, and free. Google Fusion Tables provides means for visualizing data with pie charts, bar charts, lineplots, scatterplots, timelines, and geographical maps. Google provide a quick step-by-step guide to use Fusion Tables to visualize Google Analytics data: how to bring the data, prepare it, and visualize it using great charts.

THEmatic WEB MAPping

One of the coolest features of Fusion tables is their ability to interface with Google Maps. If a table contains geographical location data, it can be made into a layer for the Google Maps API, allowing you to visualize your data geographically. The display of information can be customized to make sure you’re getting the best visualization of your data.

6323180339_095a18e2a1_b

The quickest and easiest ways to produce simple maps for your Web site is to use Google’s Fusion Tables. Fusion Tables is an online data management application designed for collaboration, visualization and publication of data. Journalists often want to create thematic web maps, in which geographic areas are filled in with colour/shade according to data values. Thanks to Google Fusion Tables, creating basic thematic maps and embedding them on a web page is now easy.

Web mapping is widely used by government statistical agencies. The Irish CSO has an option of web mapping from statistical data they collect and publish.

Resources:

http://homes.cs.washington.edu/~alon/files/socc10.pdf

http://www.sco.wisc.edu/images/stories/publications/SCO_quick_and_easy_web_maps_v1.2.pdf

Accessed 02 Aug 2015

 

Ireland Mapping – HIStory or Visual Reality

 

As a project assignment “Google Fusion tables” we created map of Ireland with counties boundaries and population density accordingly. The first step was to get data in a Fusion Table-friendly format. Statistical data of Census population 2011 was used. Some data cleaning was made to match with data of Ireland county boundaries that was downloaded in KML format.

Both tables with geographic boundary information and Census population data was uploaded to Google drive account. Fusion Tables within a Google Docs account were created by clicking Create –> More –> Fusion Table. Afterwards two tables was merged and new map of Ireland ready to see.

To customise the map, few more steps were made. Be clicking “Configure styles” and then choosing “Fill colour” under “Polygons” we were able to regulate visible density of population and actual colour of the map. The final map with legend shows an interesting story…

map link

Storytelling with Data

The larger towns Dublin, Cork, Galway are service centres but, in addition, usually have industrial, administrative and commercial functions. The main concentration of towns is in the east and south of the country and all of the larger centres grew up as ports. Dublin, the focus of the roads and railways, is situated where the central lowland reaches eastwards to the Irish Sea. It is the chief commercial, industrial, administrative, educational and cultural centre.

Cork city has traditionally been associated with the processing and marketing of agricultural products but it benefits also from the presence of large-scale industrial development around its outer harbour and the use of natural gas from the offshore Kinsale field.

On the west coast, the main city is Limerick, which is located at the lowest crossing place on the river Shannon. It shares in the prosperity of the Shannon Industrial Estate but its harbour facilities are now little used, though significant port and industrial activities are developing westwards along the Shannon estuary. Other significant western urban centre is Galway.

Contrasts and Consequences

Regional imbalances in population trends, employment, income and related social conditions have for long been a feature of Ireland. The most striking traditional contrast is between the more prosperous east and the less developed west, though this twofold distinction is a simplification of a more complex regional pattern. T

The less developed character of the west can be explained mainly in terms of its more difficult physical environment, its remoteness from external influences, markets and financial sources, its heavy dependence on small-farm agriculture and its lower levels of urbanisation and infrastructural provision. The result has been low incomes, high unemployment and underemployment and heavy migration from the area with its social consequences. In recent times inner Dublin and the central districts of other cities have been recognised as problem areas also.

Progress Facilitator: Incentives & Policies

Attempts have been made to counteract regional imbalance since the 1950s, at first focusing exclusively on the west but later promoting western development within a broader regional planning framework. The Irish-speaking Gaeltacht areas have been particularly favoured in welfare promotion. The major initial incentive was the allocation of direct state grants to manufacturing firms locating in the west, and although grant provision was later extended to all parts, a differential was maintained in favour of western areas.

The largest manufacturing concentration of this type is at Shannon, where an industrial estate was developed as part of a plan to promote traffic through the airport. While manufacturing remained the spearhead of regional policy, development efforts in other sectors assumed an increasing regional dimension, as in agriculture, forestry, fishing and tourism. Some decentralisation of government administration has been introduced. In recent years there has been a growing realisation of the role which service industries could play in regional development.

Transition Initiatives

Smart Infrastructure and Smart Cities are key elements of both the Digital Agenda for Europe and the Irish Government’s plan for economic recovery. In addition to the opportunity around job creation and service revenue, there are also wider benefits to the economy.

According to the report “The Global Technology Hub” published by ICT:

Key recommendations for Government:

“Meet the target of doubling the annual output of honours degree ICT undergraduate programmes by 2018”

Key recommendations for industry:

“Support Skillnets programmes and encourage the up-skilling of existing staff”

Key recommendations for academia:

“Increase number of places available for tech-conversion programmes”

 

Prospective PlansDeveloping a Digital Society

The use of technology throughout society can greatly improve a country’s overall economic performance. Work is on-going to increase the level of Government activity using technology as an enabler in a wide range of areas – from our education system to services for citizens. Notably, Government recently published its Digital Strategy to focus on enhancing the digital and online capabilities of the business community and general public.

Ireland have the goal of making Ireland the most attractive location in the world for ICT Skills availability. Department of Jobs, Enterprise and Innovation published a report for 2014 – 2018 years with the action plan to make Ireland a global leader in ICT talent. One of the main objective of this plan is the increase output of high-level graduates, enhance ICT capacity and awareness in the education system.

This Action Plan is a collaborative effort by Government, the education system and industry to meet the goal of making Ireland the most attractive location in the world for ICT Skills availability. There are a number of challenges faced by the technology industry under the umbrella of education and skills. Ireland is addressing each of these challenges and the below examples demonstrate the improvements to date:

  1. Improving the standard of education in Ireland and increasing the uptake of science, technology, engineering and mathematics (STEM) subjects at all levels in the education system.
  2. Increasing the output of honours level graduates from college level ICT courses.
  3. Maintaining the provision of effective technology conversion courses for those from other disciplines and fields.
  4. The up-skilling of current employees in the technology sector through formal continuous professional development.
  5. The availability of language skills and the ability to attract skilled workers from outside Ireland.

Taking all these actions together, and by working in a collaborative way across Government, State agencies, the education sector and industry, they will ensure that the ICT sector in Ireland continues to thrive with benefits for everyone in our society.

MyMining

The most densely populated areas have the largest Irish towns Dublin, Cork, Galway as their centres. They are the main commercial, industrial, administrative, educational and cultural places. From the education map of Ireland we can see that most of higher education system are concentrated in Dublin and other deeply populated places on the map. One of the reason why more people live in these places are Colleges and Universities.

data legend

education map

 

private college data mg college

However, as we can see from the map, Private Higher Education Institutions only situated in Dublin.

Insights4Thougthts

Irish Government continues to develop areas and sectors that is critical to the on-going recovery and growth of the Irish economy. National Digital Strategy and enhancing ICT capacity are key priorities and outlined in the Action Plan.

From this perspective private colleges and DBS in particular have a potential opportunity to grow in other deeply populated counties of Ireland in upcoming years. High Dublin rents is a contributing factor for local education.

Resources:

http://www.educationinireland.com/en/Where-can-I-study-/#private

http://www.ireland-information.com/reference/geog.html

http://www.ictireland.ie/Sectors/ICT/ICT.nsf/vPages/Papers_and_Sector_Data~ict-skills-action-plan-2014-14-03-2014/$file/ICT+Skills+Action+Plan+2014.pdf

http://www.ictireland.ie/Sectors/ICT/ICT.nsf/vPages/Papers_and_Sector_Data~the-global-technology-hub/$file/The+Global+Technology+Hub+ICT+Ireland+ISA.pdf

 

3 V’s and Beyond – The Missing V’s in Big Data?

Big data represents the newest and most comprehensive version of organizations’ long-term aspiration to establish and improve their data-driven decision-making. Data in itself is not valuable at all. The value is in how organisations will use that data and turn their organisation into an information-centric company that relies on insights derived from data analyses for their decision-making.

The early detection of the Big Data characteristics can provide a cost effective strategy to many organizations to avoid unnecessary deployment of Big Data technologies. The data analytics on some data may not require Big Data techniques and technologies; the current and well established techniques and technologies maybe sufficient to handle the data storage and data processing. This brings us to the purpose of the characteristics of Big Data to help with identifying if a problem requires a Big Data solution.

Definition

According to Gartner big data definition is:

“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

There are differing opinions with the number of characteristics – “V dimensions” are needed to identify a project as ‘Big Data’. The original three V’s – Volume, Velocity, and Variety – appeared in 2001 when Gartner analyst Doug Laney used it to help identify key dimensions of big data.

3-D Data Management

  1. Volume – The sheer volume of the data is enormous and a very large contributor to the ever expanding digital universe is the Internet of Things with sensors all over the world in all devices creating data every second. All the emails, twitter messages, photos, video clips, sensor data etc. we produce and share every second. Currently, the data is generated by employees, partners, machines and customers. For example, hundreds of millions of smart phones send a variety of information to the network infrastructure. This data did not exist five years ago. More sources of data with a larger size of data combine to increase the volume of data that has to be analysed. This is a major issue for those looking to put that data to use instead of letting it just disappear.
  2. Velocity – is the speed at which the data is created, stored, analysed and visualized. Big data technology allows us now to analyse the data while it is being generated, without ever putting it into databases. Initially, companies analysed data using a batch process. One takes a chunk of data, submits a job to the server and waits for delivery of the result. That scheme works when the incoming data rate is slower than the batch processing rate and when the result is useful despite the delay. With the new sources of data such as social and mobile applications, the batch process breaks down. The data is now streaming into the server in real time, in a continuous fashion and the result is only useful if the delay is very short.
  3. Variety – Nowadays, 90% of the data that is generated by organisation is unstructured data. From excel tables and databases, data structure has changed to lose its structure and to add hundreds of formats. Pure text, photo, audio, video, web, GPS data, sensor data, relational databases, documents, SMS, pdf, flash, etc etc etc. One no longer has control over the input data format. As new applications are introduced new data formats come to life.

The three V’s are the driving dimensions of Big Data, but they are open-ended. There is no specific volume, velocity, or variety of data that constitutes big. These may be the most common but by no means the only descriptors that have been used.

3bd3b98

 

Quantifying ‘Big’ – How Many “V’s” in Big Data?

There are many different characteristics of Big Data on which data scientists agree, but none which by themselves can be used to say that this example is Big Data and that one is not. In fact I was able to find another eleven different characteristics claimed for Big Data. These characteristics were compiled from several sources including IBM, Paxata, Datafloq, SAS, Data Science Central and the National Institute of Standards and Technology (NIST) etc.

 4.Value the all-important V, characterizing the business value, ROI, and potential of big data to transform your organization from top to bottom. It is all well and good having access to big data but unless we can turn it into value it is useless. It is so easy to fall into the buzz trap and embark on big data initiatives without a clear understanding of costs and benefits.

5. Viability Neil Biehn, writing in Wired, sees Viability and Value as distinct missing Vs numbers 4 and 5. According to Biehn, “we want to carefully select the attributes and factors that are most likely to predict outcomes that matter most to businesses; the secret is uncovering the latent, hidden relationships among these variables.

6. Veracity: This refers to the accuracy, reliability. Veracity has an impact on the confidence data.

7. Variability – means that the meaning is changing (rapidly) dynamic, evolving, spatiotemporal data, time series, seasonal, and any other type of non-static behaviour in your data sources, customers, objects of study, etc.

8. Visualization Making all that vast amount of data comprehensible in a manner that is easy to understand and read.

9. Validity: data quality, governance, master data management (MDM) on massive, diverse, distributed, heterogeneous, “unclean” data collections.

10. Venue: distributed, heterogeneous data from multiple platforms, from different owners’ systems, with different access and formatting requirements, private vs. public cloud.

11. Vocabulary: schema, data models, semantics, ontologies, taxonomies, and other content- and context-based metadata that describe the data’s structure, syntax, content, and provenance.

12. Vagueness: confusion over the meaning of big data (Is it Hadoop? Is it something that we’ve always had? What’s new about it? What are the tools? Which tools should I use? etc.) (Note: Venkat Krishnamurthy Director of Product Management at YarcData introduced this new “V” at the Big Data Innovation Summit in Santa Clara on June 9, 2014.)

13. Virality: Defined by some users as the rate at which the data spreads; how often it is picked up and repeated by other users or events.

14. Volatility Big data volatility refers to how long is data valid and how long should it be stored. In this world of real time data you need to determine at what point is data no longer relevant to the current analysis.

How many V’s are enough?

In recent years, revisionists have blown out the count to a too-many, expanding the market space but also creating confusion. They definitely all matter, particularly as we consider designing and implementing processes to prepare raw data into “ready to use” information streams. Reaching a common definition of Big Data is one of the first tasks to tackle.

Bill Vorhies, President & Chief Data Scientist – Data-Magnum, has been working with the US Department of Commerce National Institute for Standards and Technology (NIST) working group developing a standardized “Big Data Roadmap” since the summer of 2013. They elected to stick with Volume, Variety, and Velocity and kicked other dimensions out of the Big Data definition as broadly applicable to all types of data.

As author and analytics strategy consultant Seth Grimes observes in his InformationWeek piece “Big Data: Avoid ‘Wanna V’ Confusion”. In his article he wants to differentiate the essence of Big Data, as defined by Doug Laney’s original-and-still-valid 3 Vs, from derived qualities, proposed by various vendors. In his opinion, the wanna-V backers and the contrarians mistake interpretive, derived qualities for essential attributes. Conflating inherent aspects with important objectives leads to poor prioritization and planning.

So, the above mentioned consultants believe that Variability, Veracity, Validity, Value etc. aren’t intrinsic, definitional Big Data properties. They are not absolutes. By contrast, they reflect the uses you intend for your data. They relate to your particular business needs. You discover context-dependent Variability, Veracity, Validity, and Value in your data via analyses that assess and reduce data and present insights in forms that facilitate business decision-making. This function, Big Data Analytics, is the key to understanding Big Data.

Summary:

I’ve explored many sources to bring you a complete listing of possible definitions of Big Data with the goal of being able to determine what a Big Data opportunity is and what’s not. Once you have a single view of your data, you can start to make intelligent decisions about the business, its performance and the future plans.

In conclusion, Volume, Variety, and Velocity still make the best definitions but none of these stand on their own in identifying Big Data from not-so-big-data.  Understanding these characteristics will help you analyse whether an opportunity calls for a Big Data solution but the key is to understand that this is really about breakthrough changes in the technology of storing, retrieving, and analysing data and then finding the opportunities that can best take advantage.

 

References:

Bernard Marr “Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance” http://media.wiley.com/product_data/excerpt/33/11189658/1118965833-18.pdf

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Harvard Business Review October 2012 Big Data: The Management Revolution by Andrew McAfee and Erik Brynjolfsson http://ai.arizona.edu/mis510/other/Big%20Data%20-%20The%20Management%20Revolution.pdf

http://hmchen.shidler.hawaii.edu/Chen_big_data_MISQ_2012.pdf

http://www3.weforum.org/docs/WEF_GlobalInformationTechnology_Report_2014.pdf

https://hbr.org/2012/10/making-advanced-analytics-work-for-you/ar/1

http://www.mckinsey.com/insights/business_technology/making_data_analytics_work

http://www.infoivy.com/2013/12/5-steps-for-big-data-application.html

http://www.ndm.net/siem/pdf/ebook-tibco-loglogic_tcm8-17804.pdf

http://www.wired.com/2013/05/the-missing-vs-in-big-data-viability-and-value/

http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/

http://www.forbes.com/sites/gartnergroup/2013/03/27/gartners-big-data-definition-consists-of-three-parts-not-to-be-confused-with-three-vs/

http://www.sas.com/offices/NA/canada/lp/Big-Data/Extreme-Information-Management.pdf

http://www.sigmetrics.org/sigmetrics2013/bigdataanalytics/abstracts2013/bdaw2013_submission_4.pdf

QUALITY DIMENSIONS: DATA & INFORMATION

Decisions are only as good as the information on which they are based. The potential damage to service users arising from poor data quality as well as the legal, financial and reputational costs to the organisation are of such magnitude that organisations must be willing to take the time and give the necessary commitment to improve data quality. Every organization today depends on data to understand its customers and employees, design new products, reach target markets, and plan for the future. Accurate, complete, and up-to-date information is essential if you want to optimize your decision making, avoid constantly playing catch-up and maintain your competitive advantage.

Business leaders recognize the value of big data and are eager to analyse it to obtain actionable insights and improve the business outcomes. Unfortunately, the proliferation of data sources and exponential growth in data volumes can make it difficult to maintain high-quality data. To fully realize the benefits of big data, organizations need to lay a strong foundation for managing data quality with best-of-breed data quality tools and practices that can scale and be leveraged across the enterprise.

What can your organisation do to make data quality a success?

Within an organization, acceptable data quality is crucial to operational and transactional processes and to the reliability of business analytics (BA) / business intelligence (BI) reporting.

Confidence in the quality of the information it produces is a survival issue for government agencies around the world. Health information and Quality Authority of Ireland has adopted a business-driven approach to standards for data and information and endorse “Seven essentials for improving data quality” guide:

data quality

 

 

Data Quality is central to an effective performance management system throughout the organization. Data quality is a complex measure of data properties from various dimensions and determined by whether or not the data is suitable for its intended use. This is generally referred to as being “fit-for-purpose”. Data is of sufficient quality if it fulfils its intended use (or re-use) in operations, decision making or planning. Maintaining data quality requires going through the data periodically and scrubbing it. Typically this involves updating it, standardizing it, and de-duplicating records to create a single view of the data, even if it is stored in multiple disparate systems.

Data Quality Management entails the establishment and deployment of roles, responsibilities, policies, and procedures concerning the acquisition, maintenance, dissemination, and disposition of data. A partnership between the business and technology groups is essential for any data quality management effort to succeed. The business areas are responsible for establishing the business rules that govern the data and are ultimately responsible for verifying the data quality. The Information Technology (IT) group is responsible for establishing and managing the overall environment – architecture, technical facilities, systems, and databases – that acquire, maintain, disseminate, and dispose of the electronic data assets of the organization.

A data quality assurance program.

Data quality assurance (DQA) is the process of verifying the reliability and effectiveness of data: an explicit combination of organization, methodologies, and activities that exist for the purpose of reaching and maintaining high levels of data quality. To make the most of open and shared data, public and government users need to define what data quality means with reference to their specific aim or objectives. They must understand the characteristics of the data and consider how well it meets their own needs or expectations. For each dimension of quality, consider what processes must be in place to manage it and how performance can be assessed.

Data quality control

Data quality control is the process of controlling the usage of data with known quality measurements for an application or a process. This process is usually done after a Data Quality Assurance (QA) process, which consists of discovery of data inconsistency and correction. Data quality is affected by the way data is entered, stored and managed. Analytics can be worthless, counterproductive and even harmful when based on data that isn’t high quality. Without high-quality data, it doesn’t matter how fast or sophisticated the analytics capability is. You simply won’t be able to turn all that data managed by IT into effective business execution.

Difference between Data and Information

Data and information are interrelated. In fact, they are often mistakenly used interchangeably.

Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.

When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information.

wkid-pyramid1-300x224

If the information we derive from the data is not accurate, we cannot make reliable judgments or develop reliable knowledge from the information. And that knowledge simply cannot become wisdom, since cracks will appear as soon as it is tested.

Bad data costs time and effort, gives false impressions, results in poor forecasts and devalues everything else in the continuum.

What are the factors determining data quality?

Understanding the difference between data and information is the key to solving data quality. To be most effective, the right data needs to be available to decision makers in an accessible format at the point of decision making. The quality of data can be determined through assessment against the following internationally accepted dimensions.

In 1987 David Garvin of the Harvard Business School developed a system of thinking about quality of products. He proposes eight critical dimensions or categories of quality that can serve as a framework for strategic analysis: performance, features, reliability, conformance, durability, serviceability, aesthetics, and perceived quality.

Agencies create or collect data and information to meet their operational and regulatory requirements. They will define their own acceptable levels of data quality according to these primary purposes. It is often a mistake to stick with old quality measures when the external environment has changed.

Thus dimensions of quality also differ from user to user: completeness, legibility, relevance, reliability, accuracy, timeliness, accessibility, interpretability, coherence, accessibility, Interpretability and validity. Data also has to be volume manageable, cost effective and coherent. Clearly they are not independent of each other. This will help ensure that an organisation has a good level of data quality supporting the information it produces.

The dimensions contributing to data quality

data q

Master Data Management

A lot of business problems traces back to lack of data governance and poor quality data in the end. Master data management technology can address a lot of these issues, but only when driven by an MDM strategy that includes a vision that supports the overall business and incorporates a metrics-based business case. Data governance and organizational issues must be put front and centre, and new processes designed to manage data through the entire information management life cycle. Only then can you successfully implement the new technology you’ll introduce in a data quality or master data management initiative.

At its recent Master Data Management Summit in Europe, Gartner recommended a structural approach to implementing master data management, beginning with strategy for development and planning, then setting up a process to govern data. Subsequently, this will aid change management of all types and smartly utilize data targeted at strategic business goals. Once set, data management can be measured, monitored and altered to stay on course.

MDM software includes process, governance, policy, standards and tools to manage an organization’s critical data. MDM applications manage customer, supplier, product, and financial data with data governance services and supporting world-class integration and BI components. Data quality is a first step towards MDM, which allows you to start with one application knowing that MDM will be introduced as more applications get into the act.

 

Data Governance

Effective data governance serves an important function within the enterprise, setting the parameters for data management and usage, creating processes for resolving data issues and enabling business users to make decisions based on high-quality data and well-managed information assets. But implementing a data governance framework isn’t easy. Complicating factors often come into play, such as data ownership questions, data inconsistencies across different departments and the expanding collection and use of big data in companies.

At its core, data governance incorporates three key areas: people, process and technology. In other words, a data governance framework assigns ownership and responsibility for data, defines the processes for managing data, and leverages technologies that will help enable the aforementioned people and processes. At the end of the day, data quality and data governance are not synonymous, but they are closely related. Quality needs to be a mandatory piece of a larger governance strategy. Without it, your organization is not going to successfully manage and govern its most strategic asset: its data.

Any good active data governance methodology should let you measure your data quality. This is important because data quality actually has multiple dimensions which need to be managed. Data governance initiatives improve data quality by assigning a team responsible for data’s accuracy, accessibility, consistency, and completeness, among other metrics. This team usually consists of executive leadership, project management, line-of-business managers, and data stewards. The team usually employs some form of methodology for tracking and improving enterprise data, such as Six Sigma, and tools for data mapping, profiling, cleansing, and monitoring data.

International standard

ISO 8000 is the international standard that defines the requirements for quality data, understanding this important standard and how it can be used to measure data quality is an important first step in developing any information quality strategy.

  

Resources:

  1. NSW Government Standard for Data Quality Reporting March 2015
  2. Health information and Quality Authority of Ireland 2012 “What you should know about data quality”
  3. http://www.finance.nsw.gov.au/ict/sites/default/files/NSW%20Standard%20for%20Data%20Quality%20Reporting%20FINAL%20v1.1_0.pdf
  4. http://www-01.ibm.com/software/data/quality/
  5. https://www.sas.com/content/dam/SAS/en_us/doc/factsheet/sas-data-quality-101422.pdf
  6. http://www.oracle.com/us/products/middleware/data-integration/odi-ingredients-for-success-1380930.pdf
  7. http://www.iso.org/iso/catalogue_detail.htm?csnumber=52129
  8. https://hbr.org/1987/11/competing-on-the-eight-dimensions-of-quality