# Data Analysis

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## What is Data Analysis?

Business Dictionary defines Data Analysis as "the process of evaluating data using analytical and logical reasoning to examine each component of the data provided". This form of analysis is just one of the many steps that must be completed when conducting a research experiment. Data from various sources is gathered, reviewed, and then analyzed to form some sort of finding or conclusion. There are a variety of specific data analysis methods, some of which include data mining, text analytics, business intelligence, and data visualizations.[1]

## Types of Data Analysis[2]

• Descriptive Analysis: this is the first type of data analysis that is usually conducted. It describes the main aspects of the data being analyzed. For example, it may describe how well a football player is performing by looking at the number of touchdowns. This allows one to make comparisons among different athletes.
• Exploratory Analysis: this is when one is looking for unknown relationships. This type of analysis is a great way to find new connections and to provide future recommendations.
• Inferential Analysis: When a researcher takes a small sample in order to point out something about a larger population, they are using inferential analysis, for instance, looking at the grades of all first graders to explain how well the entire elementary school is doing.
• Predictive Analysis predicts future happenings by looking at current and past facts.
• Causal Analysis is used to find out what happens to one variable when you change some other variable. So, if the police give out tickets for texting, this may cause fewer accidents to occur.

## Categories of Data Analysis

• Quantitative Data Analysis: In quantitative data analysis you are expected to turn raw numbers into meaningful data through the application of rational and critical thinking. The same figure within the data set can be interpreted in many different ways; therefore it is important to apply fair and careful judgment. For example, questionnaire findings of research titled “A study into the impacts of informal management-employee communication on the levels of employee motivation: a case study of Agro Bravo Enterprise” may indicate that the majority 52% of respondents assess communication skills of their immediate supervisors as inadequate. This specific piece of primary data findings needs to be critically analyzed and objectively interpreted by comparing it to other findings within the framework of the same research such as the organizational culture of Agro Bravo Enterprise, leadership styles exercised, the levels of frequency of management-employee communications, etc.[3]
• Qualitative Data Analysis: Qualitative data analysis is the process in which we move from the raw data that have been collected as part of the research study and use it to provide explanations, understanding, and interpretation of the phenomena, people, and situations that we are studying. The aim of analyzing qualitative data is to examine the meaningful and symbolic content of that which is found within. What we are aiming for is to try to identify and understand such concepts, situations, and ideas as:
• A person’s interpretation of the world/situation in which they find themselves at any given moment.
• How they come to have that point of view of their situation or environment in which they find themselves.
• How they relate to others within their world.
• How they cope within their world.
• Their own view of their history and the history of others who share their own experiences and situations.
• How they identify and see themselves and others who share their own experiences and situations.[4]

## Considerations/Issues in Data Analysis[5]

There are a number of issues that researchers should be cognizant of with respect to data analysis. These include:

• Having the necessary skills to analyze
• Concurrently selecting data collection methods and appropriate analysis
• Drawing an unbiased inference
• Inappropriate subgroup analysis
• Following acceptable norms for disciplines
• Determining statistical significance
• Lack of clearly defined and objective outcome measurements
• Providing honest and accurate analysis
• Manner of presenting data
• Environmental/contextual issues
• Data recording method
• Partitioning ‘text’ when analyzing qualitative data
• Training of staff conducting analyses
• Reliability and Validity
• Extent of analysis

## A Diagrammatic Illustration of The Data Analysis Process (Figure 1.)

Figure 1. source: Barbara Fusinska

## Phases in the Data Analysis Process (Figure 2.)[6]

Data Analysis Process consists of the following phases that are iterative in nature −

• Data Requirements Specification: The data required for analysis is based on a question or an experiment. Based on the requirements of those directing the analysis, the data necessary as inputs to the analysis is identified (e.g., Population of people). Specific variables regarding a population (e.g., Age and Income) may be specified and obtained. Data may be numerical or categorical.
• Data Collection: Data Collection is the process of gathering information on targeted variables identified as data requirements. The emphasis is on ensuring the accurate and honest collection of data. Data Collection ensures that the data gathered is accurate such that the related decisions are valid. Data Collection provides both a baseline to measure and a target to improve.
• Data Processing: The data that is collected must be processed or organized for analysis. This includes structuring the data as required for the relevant Analysis Tools. For example, the data might have to be placed into rows and columns in a table within a Spreadsheet or Statistical Application. A Data Model might have to be created.
• Data Cleaning: The processed and organized data may be incomplete, contain duplicates, or contain errors. Data Cleaning is the process of preventing and correcting these errors. There are several types of Data Cleaning that depend on the type of data. For example, while cleaning the financial data, certain totals might be compared against reliable published numbers or defined thresholds. Likewise, quantitative data methods can be used for outlier detection that would be subsequently excluded from the analysis.
• Data Analysis: Data that is processed, organized, and cleaned would be ready for analysis. Various data analysis techniques are available to understand, interpret, and derive conclusions based on the requirements. Data Visualization may also be used to examine the data in a graphical format, to obtain additional insight regarding the messages within the data. Statistical Data Models such as Correlation, Regression Analysis can be used to identify the relations among the data variables. These models that are descriptive of the data are helpful in simplifying analysis and communicating results. The process might require additional Data Cleaning or additional Data Collection, and hence these activities are iterative in nature.
• Communication: The results of the data analysis are to be reported in a format required by the users to support their decisions and further action. The feedback from the users might result in additional analysis. The data analysts can choose data visualization techniques, such as tables and charts, which help in communicating the message clearly and efficiently to the users. The analysis tools provide the facility to highlight the required information with color codes and formatting in tables and charts.

Figure 2. source: Tutorials Point

## Benefits and Challenges of Data Analysis[7]

Data analysis is a proven way for organizations and enterprises to gain the information they need to make better decisions, serve their customers, and increase productivity and revenue. The benefits of data analysis are almost too numerous to count, and some of the most rewarding benefits include getting the right information for your business, getting more value out of IT departments, creating more effective marketing campaigns, gaining a better understanding of customers, and so on. data analysis models. But, there is so much data available today that data analysis is a challenge. Namely, handling and presenting all of the data are two of the most challenging aspects of data analysis. Traditional architectures and infrastructures are not able to handle the sheer amount of data that is being generated today, and decision-makers find it takes longer than anticipated to get actionable insight from the data. Fortunately, data management solutions and customer experience management solutions give enterprises the ability to listen to customer interactions, learn from behavior and contextual information, create more effective actionable insights, and execute more intelligently on insights in order to optimize and engage targets and improve business practices.

## Barriers to Effective Analysis[8]

Barriers to effective analysis may exist among the analysts performing the data analysis or among the audience. Distinguishing fact from opinion, cognitive biases, and innumeracy are all challenges to sound data analysis.

• Confusing Fact and Opinion: Effective analysis requires obtaining relevant facts to answer questions, support a conclusion or formal opinion, or test hypotheses. Facts by definition are irrefutable, meaning that any person involved in the analysis should be able to agree with them. For example, in August 2010, the Congressional Budget Office (CBO) estimated that extending the Bush tax cuts of 2001 and 2003 for the 2011-2020 time period would add approximately \$3.3 trillion to the national debt. Everyone should be able to agree that indeed this is what CBO reported; they can all examine the report. This makes it a fact. Whether persons agree or disagree with the CBO is their own opinion. As another example, the auditor of a public company must arrive at a formal opinion on whether the financial statements of publicly traded corporations are "fairly stated, in all material respects." This requires extensive analysis of factual data and evidence to support their opinion. When making the leap from facts to opinions, there is always the possibility that the opinion is erroneous.
• Cognitive Biases: There are a variety of cognitive biases that can adversely affect the analysis. For example, confirmation bias is the tendency to search for or interpret information in a way that confirms one's preconceptions. In addition, individuals may discredit information that does not support their views. Analysts may be trained specifically to be aware of these biases and how to overcome them. In his book Psychology of Intelligence Analysis, retired CIA analyst Richards Heuer wrote that analysts should clearly delineate their assumptions and chains of inference and specify the degree and source of the uncertainty involved in the conclusions. He emphasized procedures to help surface and debate alternative points of view.
• Innumeracy: Effective analysts are generally adept with a variety of numerical techniques. However, audiences may not have such literacy with numbers or numeracy; they are said to be innumerate. Persons communicating the data may also be attempting to mislead or misinform, deliberately using bad numerical techniques. For example, whether a number is rising or falling may not be the key factor. More important may be the number relative to another number, such as the size of government revenue or spending relative to the size of the economy (GDP) or the amount of cost relative to revenue in corporate financial statements. This numerical technique is referred to as normalization or common-sizing. There are many such techniques employed by analysts, whether adjusting for inflation (i.e., comparing real vs. nominal data) or considering population increases, demographics, etc. Analysts apply a variety of techniques to address the various quantitative message and may also analyze data under different assumptions or scenarios. For example, when analysts perform financial statement analysis, they will often recast the financial statements under different assumptions to help arrive at an estimate of future cash flow, which they then discount to present value based on some interest rate, to determine the valuation of the company or its stock. Similarly, the CBO analyzes the effects of various policy options on the government's revenue, outlays, and deficits, creating alternative future scenarios for key measures.

For Data Analysis and Management, leveraging IT-enabled data services is essential for businesses to manage and analyze vast amounts of data effectively. Information Technology Enabled Services (ITeS) offer sophisticated tools for data processing, analytics, and reporting, driving informed decision-making and strategic insights.

Data analysis involves inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. It plays a crucial role in various contexts, from business intelligence and market research to scientific discoveries and enhancing operational efficiency.

• Statistical Analysis: Discussing the collection, analysis, interpretation, presentation, and organization of data using statistical methods and techniques. This includes descriptive statistics, inferential statistics, and probability theory.
• Data Mining: Covering techniques for exploring and analyzing large data sets to discover patterns and relationships that might not be evident initially. Data mining involves methods at the intersection of machine learning, statistics, and database systems.
• Predictive Analytics: Focusing on the use of data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data. This is key in forecasting and risk assessment.
• Data Visualization: Discussing the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.
• Big Data: Exploring the concepts, tools, and challenges associated with analyzing extremely large data sets that traditional data processing software cannot handle. Big data analytics leverages advanced analytics techniques over petabytes and exabytes of data.
• Business Intelligence (BI): Covering the technologies, applications, strategies, and practices used to collect, integrate, analyze, and present pertinent business information. BI involves data analysis processes to support decision-making within organizations.
• Machine Learning: Discussing algorithms and statistical models that computer systems use to perform tasks without using explicit instructions, relying on patterns and inference instead. Machine learning is a fundamental tool in advanced data analysis.
• Quantitative Research: Focusing on the systematic empirical investigation of observable phenomena via statistical, mathematical, or computational techniques. This topic includes survey research, experimental research, and quantitative model research.
• Qualitative Analysis: Exploring methods of inquiry employed in many different academic disciplines, traditionally in the social sciences, but also in market research and further contexts. Qualitative analysis aims to understand concepts, thoughts, and experiences.
• Data Cleaning and Preprocessing: Covering the techniques used to prepare raw data for analysis, including dealing with missing values, removing outliers, and ensuring data is in the correct format for analysis.
• Time Series Analysis: Discussing methods that analyze time series data in order to extract meaningful statistics and characteristics of the data. Time series forecasting is an important area of machine learning and statistics.
• Ethics in Data Analysis: Highlighting the importance of ethical considerations in data analysis, including privacy concerns, data protection, and the responsible use of data in making decisions that affect individuals and communities.
• Data Analytics