Data Management is a comprehensive collection of practices, concepts, procedures, processes, and a wide range of accompanying systems that allow for an organization to gain control of its data resources. Data Management as an overall practice is involved with the entire lifecycle of a given data asset from its original creation point to its final retirement, how it progresses and changes throughout its lifetime through the internal (and external) data streams of an enterprise. The Data Management Book of Knowledge (DMBOK) refers to Data Management (or Data Resource Management) as:
“The development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.” As well as, “The planning, execution, and oversight of policies, practices and projects that acquire, control, protect, deliver, and enhance the value of data and information assets.”1
Data management tasks include the creation of data governance policies, analysis and architecture; database management system (DMS) integration; data security and data source identification, segregation and storage. Data management encompasses a variety of different techniques that facilitate and ensure data control and flow from creation to processing, utilization and deletion. Data management is implemented through a cohesive infrastructure of technological resources and a governing framework that define the administrative processes used throughout the life cycle of data. It is a huge area, and this really is just an over-arching term for an entire segment of IT.2
Data management is concerned with the end-to-end lifecycle of data, from creation to retirement, and the controlled progression of data to and from each stage within its lifecycle. Data management minimizes the risks and costs of regulatory non-compliance, legal complications, and security breaches. It also provides access to accurate data when and where it is needed, without ambiguity or conflict, thereby avoiding miscommunication. Any kind of business data is subject to data management principles and procedures, but it is particularly useful in rectifying conflict among data from duplicative sources. Organizations that use cloud-based applications in particular find it hard to keep data orchestrated across systems. Data management practices can prevent ambiguity and make sure that data conforms to organizational best practices for access, storage, back up, and retirement, among other things. A common approach to data management is to utilize a master data file, called Master Data Management (MDM)
. This file provides a common definition of an asset and all its data properties in an effort to eliminate ambiguous or competing data policies and give the organization comprehensive stewardship over its data.3Evolution of Data Mangement4
Beginning in the 1960s, the Association of Data Processing Service Organizations (ADAPSO) became one of a handful of groups that forwarded best practices for data management, especially in terms of professional training and data quality assurance metrics. Over time, information became more popular than data as a term to describe the objectives of corporate computing - as seen, for example, in the renaming of ADAPSO as the Information Technology Association of America (ITAA), or the National Microfilm Association renaming as the Association for Information and Image Management (AIIM) - but the practices of data management continued to evolve. In the 1970s, the relational database management system began to emerge at the center of data management efforts. Based on relational logic, the relational database provided improved means for assuring consistent data processing and for reducing or managing duplicated data. These traits were key for transactional applications. With the rise of the relational database, relational data modeling, schema creation, deduplication and other techniques advanced to become bigger parts of common data management practice. The 1980s saw the creation of the Data Management Association International, or DAMA International, chartered to improve data-related education. Data arose again as a leading descriptive term when IT professionals began to build data warehouses that employed relational techniques for offline data analytics that gave business managers a better view of their organizations' key trends for decision-making. Modeling, schema and change management all called for different treatments with the advent of data warehousing that improved organization's views of operations.Principles of Data Management5
The key principles of Data Management are illustrated in Figure 1 and described below.Key principles of Data ManagementFigure 1.
Data Management Techniques6
- Avoid re-collecting data: The largest potential for waste in Data Management is reacquiring an existing dataset. This has been done frequently by public and private sector organizations and must be avoided.
- Data lifecycle control: Good Data Management requires that the whole life cycle of datasets be managed carefully. This includes:
- Business justification, to ensure that thought has been given to why new data are required rather than existing data amended or used in new ways, how data can be specified for maximum use including the potential to meet other possible requirements, and why the costs of handling, storing and maintaining these data are acceptable and recoverable.
- Data specification and modelling, processing, database maintenance and security, to ensure that data will be fit for purpose and held securely in their own databases.
- Ongoing data audit, to monitor the use and continued effectiveness of the data.
- Archiving and final destruction, to ensure that data are archived and maintained effectively until they are no longer needed or are uneconomical to retain
- Data policy: The fundamental step for any organization wishing to implement good Data Management procedures is to define a Data Policy. The document may have different names in different public bodies but in each it should be a set of broad, high-level principles that form the guiding framework within which Data Management can operate.
- Data ownership: One key aspect of good Data Management is the clear identification of the owner of the data. Normally this is the organization or group of organizations that originally commissioned the data acquisition or compilation and retains managerial and financial control of the data. The Data Owner has legal rights over the dataset, the IPR and the Copyright. Data ownership implies the right to exploit the data, and if continued maintenance becomes unnecessary or uneconomical, the right to destroy them, subject to the provisions of the Public Records and Freedom of Information acts. Ownership can relate to a data item, a dataset or a value-added dataset. IPR can be owned at different levels. For example, a merged or value-added dataset can be owned by one organization, even though other organizations own the constituent data. If the legal ownership is unclear, there are risks that the data can be wrongly exploited, used without payment of royalty to the owner, neglected or lost.
- Metadata: All datasets must have appropriate metadata compiled for them. At the simplest level metadata are “data about data”. Metadata provide a summary of the characteristics of a dataset. A good metadata record enables the user of a dataset or other information resource to understand the content of what they are reviewing, its potential value and its limitations.
- Data quality: Good Data Management also ensures that datasets are capable of meeting current needs successfully and are suitable for further exploitation. The ability to integrate data with other datasets is likely to add value, encourage ongoing use of the data and recover the costs of collecting the data. The creation, maintenance and development of quality data require a clear and well-specified management regime.
- Data Steward: All datasets need to be managed by a named individual referred to here as the Data Steward; also known as dataset manager and data custodian. A Data Steward should be given formal responsibility for the stewardship of each major dataset. They should be accountable for the management and care of the data holdings assigned to them, in line with the defined data policy.
- Data Management Plan: The Data Steward is responsible for the development of a Data Management Plan for each dataset under their responsibility. The objective of the Data Management Plan is to ensure:
- That the dataset is fit for the purpose for which it is required.
- That the long-term management of the dataset is considered for potential re-use.
- Data Management procedures: Individual datasets may require compilation of specific Data Management procedures. These may be needed where specific datasets require detailed operational procedures to ensure their quality; examples of this include scientific and statistical datasets.
- Data access and dissemination: Although this aspect will depend upon the business and the financial policy of the organization, the following guidance should be followed.
- Public access to data should be provided in line with The Freedom of Information Act, The Data Protection Act and The Human Rights Act.
- IPR and Copyright of datasets owned by public bodies must be protected, as data should be regarded as an asset.
- IPR and Copyright of third-party data must be respected.
- The potential for commercial re-use and exploitation of the dataset should be considered.
- The right to use or provide access to data can be passed to a third party, subject to agreed pricing and dissemination policies.
- Data audit: Data Management audits are recommended to ensure that the management environment for given datasets are being maintained. Their purpose is to provide assurance to the Data Management Champion that the resources expended are being used appropriately. Audits of major datasets should be commissioned to ascertain the level of compliance with data policies and the Data Management plans and procedures that have been prepared.
Data must always be easily accessible so that employees can work on data at any time and from anywhere. The Internet has been a useful tool in increasing data accessibility. Having data available online to access - when an employee is traveling, for example - is one technique used for managing large amounts of data.
Data Management Process7
- Data Entry: Part of data management techniques occur before the data even enters into the database. The data must be correct upon first entry, which means that the individual originally correcting the data must record the data correctly. For example, when a customer reports an address change via the phone, the customer service representative must hear the customer correctly so that the address is correctly entered into the database. One technique for avoiding error is to repeat the information back to the customer.
- Data Backup: Data that is lost can have a detrimental effect on the company by costing the company time as it attempts to replace lost data, according to IBM. Companies often rely on backup methods which store crucial information so that this information can be retrieved in the event that this information is lost.
- Data Cleansing: Data cleansing is the act of fixing incorrect data, consolidating data and deleting irrelevant data. Data cleansing uses both automated software programs and manual input from a database administrator. While data cleansing removes errors and can increase company productivity and storage space, the data cleansing process can be expensive and time-consuming.
- Replacement Data: Data is sometimes backed up by software, as mentioned previously. When a piece of data is altered for whatever reason and the data stops functioning properly, the data can be replaced by the backup data that worked in the past. Then, the programmers can determine which changes corrupted the data and determine how this data can be better updated in the future.
- Customer Data Entry: Companies can set up websites that allow customers to enter data directly. This direct data entry saves the company money by not requiring them to staff an employee to fill out the necessary forms. Customers can also correct data themselves if there is a mistake. However, not all customers are tech-savvy enough to know how to access online databases through the Internet.
- Double-Checking: Data that is very crucial should always be checked by two pairs of eyes. When one employee edits data, that data should be edited using a different color to denote that the change was made. Then, a second employee can look at the data edit to ensure that there were no errors made.
- Data Management Consultants: Many consultant companies specialize in data quality management. These companies examine how a business carries out data management processes and provides recommendations on how these processes can be improved.
A comprehensive data management plan should be designed to organize data handling processes in order to assure data integrity and security. The stages, components, and strategies of one such plan are represented in Figure 2. The components outlined in each stage can be used as a step-by-step checklist to insure that any issues are addressed and properly documented. The Data Management ProcessFigure 2.
source: Prairie View A&M University
Integrated Data Management8
- PREPARATORY STAGE: The preparatory stage takes place during the project startup period and includes instrument construction and refinement, data collection procedures, personnel training, instrument coding, software programming for data entry, planning for data set security procedures, and documentation.
- Instrument Construction/Refinement: The structure and format of the research instrument are critical to data accuracy and completeness.
- Data Collection Procedures: Written standardized procedures facilitate consistency in data collection across study participants and data collectors. The adage “garbage in, garbage out” illustrates the issues in management of raw data.
- Personnel/Training: When multiple data collectors are used, the interviewer training needs to be designed to establish consistency in data collection procedures. Written procedures are developed and included in a training manual to ensure consistency in approach.
- Instrument Coding: During the preparatory stage, data collection instruments and individual items are assigned a code name for ease of data entry and management. Code names should be meaningful and easy to remember. Coding and naming conventions should be standardized for files, variables, programs, and other entities in a data management system.
- Software Programming: Software programs allow for the entry, transfer, and analysis of data. Prior to data entry, data fields must be determined to assure accurate data entry. It is important to label all data programming steps in order to create a data set history.
- Security Procedures: One of the most critical components of data management is data security
- Documentation: The heart of effective data management is in the documentation. Documentation includes information about recruitment sites, data collectors, and participant progression through the study. A tracking system is necessary to document each participant’s progression and completion of required data collection and intervention components.
- DATA ORGANIZATION STAGE
- Data Entry, Conversion, Editing, and Cleaning: Accurate data cleansing is critical to the project’s success. If done improperly, the results of the study can be skewed. Data entry needs to be performed by well-trained and responsible individuals. Data must be entered with attention to detail and some individuals are better at this than others. After data entry is accomplished, several activities then need to be performed to check the quality of the data. The data must then examined for accuracy.
- Data Manipulation: Prior to initiating statistical analysis, data manipulation also needs to be completed. Data manipulation includes recoding and creating new variables and creating scales and subscales. New variables are created when it is useful to collapse categories to create fewer response options to achieve more meaningful results.
- Data Merging: With multiple data collection points in time, each time point is entered into separate data files. The files will then need to be merged to allow change in variables across time. Checks are implemented to validate that the files were merged properly. The data sets are linked by participant identification numbers.
- Data Backup: A high priority is the creation of backup electronic copies of all files. Electronic copies of the system codes, data, and other related files were stored in the main server. Additional backup files may be stored on a zip disk. Hard copies of questionnaires may be kept in a locked cabinet in the project office.
- Documentation: During data entry, the data manager documents all of the item codes and recodes, variables names, and the creation of scales and subscales, and any other changes to
the data. Additionally, all steps taken to transform, convert, or manipulate data, as well as file mergers must be documented.
- ANALYSIS AND DISSEMINATION STAGE
- Preliminary Data Analysis: Preliminary analysis of the data is a valuable tool that needs to be included prior to analysis in order to test the research hypotheses. The preliminary analysis can detect various issues that are not specifically related to quality of data, but may be important in making any inferences based on the data. In addition the preliminary analysis allows interim reports for dissemination to project staff.
- Baseline Data Analysis: Baseline data analysis includes both descriptive and inferential statistics. Descriptive statistics were reported for each data collection stage. At this stage an individual with statistical expertise provides consultation and supervises the analysis.
- Linking Longitudinal Data: In longitudinal data with multiple data collection stages, files for each data collection must be merged to enable data analyses of effects and patterns across time.
- Data Access Procedures: Limiting access to the data is a necessary part of the data management process.
- Manuscript Preparation: All research team members need to play a role in planning, developing, and submitting manuscripts. Senior team members can mentor less experienced members in planning analyses and writing reports of findings.
- Documentation. Documentation is also critical during the data analysis and dissemination stage. All data analysis activities must be documented to create an analysis history. A written summary of each analysis is useful for preventing unnecessary duplication of analyses.
- Archiving Data: Archived data includes all raw data, the database stored in datasets, the stored datasets, all analysis programs, all documentation, and all final standard operational procedures. In the archived data the link between individual and data sets remains separate. Hard copies of raw data and zip discs are secured in a locked storage area used solely for storing archival materials
Integrated Data Management (IDM) is a tools approach to facilitate data management and improve performance. IDM consists of an integrated, modular environment to manage enterprise application data, and optimize data-driven applications over its lifetime. IDM's purpose is to:
Data Management Best Practices9
- Produce enterprise-ready applications faster
- Improve data access, speed iterative testing
- Empower collaboration between architects, developers and DBAs
- Consistently achieve service level targets
- Automate and simplify operations
- Provide contextual intelligence across the solution stack
- Support business growth
- Accommodate new initiatives without expanding infrastructure
- Simplify application upgrades, consolidation and retirement
- Facilitate alignment, consistency and governance
- Define business policies and standards up front; share, extend, and apply throughout the lifecycle
The best way to manage data, and eventually get the insights needed to make data-driven decisions, is to begin with a business question and acquire the data that is needed to answer that question. Companies must collect vast amounts of information from various sources and then utilize best practices while going through the process of storing and managing the data, cleaning and mining the data, and then analyzing and visualizing the data in order to inform their business decisions. It’s important to keep in mind that data management best practices result in better analytics. By correctly managing and preparing the data for analytics, companies optimize their Big Data. A few data management best practices organizations and enterprises should strive to achieve include:
Benefits of Data Management10
- Simplify access to traditional and emerging data
- Scrub data to infuse quality into existing business processes
- Shape data using flexible manipulation techniques
Having a strong data management plan is very important for the success of a company. Below are a few other benefits of a strong data management plan.
Challenges to Data Management11
- Productivity: Good data management will make your organization more productive. On the flip side, poor data management will lead to your organization being very inefficient. Good data management makes it easier for employees to find and understand information that they need to do their job. In addition, it allows them to easily validate results or conclusions they may have. It also provides the structure for information to be easily shared with others and to be stored for future reference and easy retrieval.
- Cost Efficiency: Another benefit of proper data management can be that it should allow your organization to avoid unnecessary duplication. Be storing and making all data easily referable it ensures you never have employees conducting the same research, analysis or work that has already been completed by another employee.
- Operational Nimbleness: In business the speed at which a company can make decisions and change direction is a key factor to determining how successful a company can be. If a company takes too long to react to the market or its competitors it can spell disaster for the company. With a good data management system it can allow employees to access information and be notified of market or competitor changes faster. As a result, it allows a company to make decisions and take action significantly faster than companies who have poor data management and data sharing systems.
- Security Risks: In addition there are multiple risks if your data is not managed properly and your information falls into the hands of the wrong people. For example electronics giant Sony was prey to computer attacks which led to the theft of over 77 million PlayStation users’ bank details. A strong data management system will greatly reduce the risk of this ever happening to your organization.
- Reduced Instances Of Data Loss: With a data management system and plan in place that all your employees know and following it can greatly reduce the risk of losing vital information. With a data management plan things will be put in place to ensure that important information is backed up and retrievable from a secondary source if the primary source ever becomes non accessible.
- More Accurate Decisions: Many organizations use different sources of information for planning, trends analysis, and managing performance. Within an organization different employees may even use different sources of information to perform the same task if there is no data management process and they are unaware of the correct information source to use. The value of the information is only as good as the information source. The old idea of garbage in garbage out. This means that decision makers across the organization are often analyzing different numbers in order to make decisions that will affect the company, and result in poor or inaccurate conclusions without a data management system in place. Data entry errors, conclusion errors and processing inefficiencies are all risks for companies that don’t have a strong data management plan and system. For a great article on this topic click here. The corrective costs of inadequate data management can be significant and can run into millions of dollars from a single occurrence. The primary reasons of bad data and data loss is that there is no data management system or plan is place or the plan or system is of poor quality. The unfortunate part is that often organization realizes that they have an issue only after an issue arises. Instead of being proactive most organization are reactive, which in the long run costs them significantly more.
The good news: we’ve got more data than ever before to help us make better business decisions. The bad news: we’re drowning in that data and it’s becoming increasingly difficult to use it effectively. It’s a case of too much data, too little time, and too many lost opportunities. Unfortunately, high volume isn’t the only data challenge. There are numerous obstacles that can impede data analysis. Many organizations have been “limping along” dealing with these challenges manually or using workarounds that simply aren’t scalable. But growing volumes exacerbate the problem and turn these data speed bumps into roadblocks. Here are five common roadblocks to watch out for:
- Static reporting data: Chances are, your organization has millions of documents – from purchase orders to patient files to inventory records – packed with intelligence you can’t use. Why? Because this type of data lacks context and interactivity, which decreases your ability to provide comparisons and drive improvement.
- Data extraction and manipulation: Having to rely solely on your IT staff to access data limits you and puts pressure on them. What’s more, you then need to put your data into a logical order – hopefully without error. Getting access to the underlying data you need from existing reports, web pages and PDF documents in a timely fashion is critical in order to stay competitive.
- Integration of disparate data: Providing a unified view of your data from a range of sources – including PDFs, HTML, text documents and Excel files – is a typically laborious, time consuming exercise. (Just ask any of your IT staff.) Consolidating from so many sources can also cause unintentional errors. You need to be able to generate clean reports and assess their contents in order to make important and impactful decisions.
- Information and distribution challenges: If you can’t share valuable insights from sources such as annual sales report stats or warehouse inventory projections, what good is having the data in the first place? If you have bandwidth or network issues, you’re likely suffering from inefficient data delivery. Users need to be able to quickly access, manipulate and blend any type of data.
- Business intelligence (BI) challenges: From complexity to poor quality data to difficult-to-use data tools, the BI challenges are plentiful. If your everyday business staff don’t know how to use the systems you have in place, they simply won’t use them at all. Did we mention that BI challenges can lead to lost competitive advantage?
See AlsoBig DataMetadataMetadata ManagementMaster DataMaster Data Management (MDM)Customer Data Management (CDM)Data CleansingData MiningBusiness IntelligenceData AnalysisData AnalyticsPredictive AnalyticsData WarehouseEnterprise Data Warehouse (EDW)