Actions

Difference between revisions of "Data"

Line 1: Line 1:
Data is distinct pieces of information, usually formatted in a special way. Data can exist in a variety of forms — as numbers or text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind. Since the mid-1900s, people have used the word data to mean [[computer]] information that is transmitted or stored. Strictly speaking, data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word, and as a mass noun.<ref>Definition - What is the Meaning of Data? [https://www.webopedia.com/TERM/D/data.html Webopedia]</ref>
+
'''Data''' is distinct pieces of information, usually formatted in a special way. Data can exist in a variety of forms — as numbers or text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind. Since the mid-1900s, people have used the word data to mean [[computer]] information that is transmitted or stored. Strictly speaking, data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word, and as a mass noun.<ref>Definition - What is the Meaning of Data? [https://www.webopedia.com/TERM/D/data.html Webopedia]</ref>
  
  
Line 5: Line 5:
  
  
The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.
+
The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "[[Data Processing|data processing]]" was first used in 1954.
  
The Latin word data is the plural of datum, "(thing) given," neuter past participle of dare "to give". In English the word data may be used as a plural noun in this sense, with some writers—usually those working in natural sciences, life sciences, and social sciences—using datum in the singular and data for plural, especially in the 20th century and in many cases also the 21st (for example, APA style as of the 7th edition still requires "data" to be plural). However, in everyday language and in much of the usage of software development and computer science, "data" is most commonly used in the singular as a mass noun (like "sand" or "rain"). The term big data takes the singular.
+
The Latin word data is the plural of datum, "(thing) given," neuter past participle of dare "to give". In English the word data may be used as a plural noun in this sense, with some writers—usually those working in natural sciences, life sciences, and social sciences—using datum in the singular and data for plural, especially in the 20th century and in many cases also the 21st (for example, APA style as of the 7th edition still requires "data" to be plural). However, in everyday language and in much of the usage of [[Software Development|software development]] and computer science, "data" is most commonly used in the singular as a mass noun (like "sand" or "rain"). The term [[Big Data|big data]] takes the singular.
  
 
Although data are also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented. Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.<ref>Etymology and Terminology of Data [https://en.wikipedia.org/wiki/Data Wikipedia]</ref>
 
Although data are also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented. Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.<ref>Etymology and Terminology of Data [https://en.wikipedia.org/wiki/Data Wikipedia]</ref>
Line 19: Line 19:
 
*textual analysis
 
*textual analysis
 
*physical artifacts or relics
 
*physical artifacts or relics
For social science, data is generally numeric files originating from social research methodologies or administrative records, from which statistics are produced. It also includes, however, more data formats such as audio, video, geospatial and other digital content that are germane to social science research. Digital text is becoming increasingly important in the humanities and arts. Research in these areas may think of data in the form of textual information, semantic elements, and text objects.
+
For social science, data is generally numeric files originating from social research [[Methodology|methodologies ]]or administrative records, from which [[Statistics|statistics]] are produced. It also includes, however, more data formats such as audio, video, geospatial and other digital content that are germane to social science research. Digital text is becoming increasingly important in the humanities and arts. Research in these areas may think of data in the form of textual information, semantic elements, and text objects.
  
Computer data is information processed or stored by a computer. This information may be in the form of text documents, images, audio clips, software programs, or other types of data. Computer data may be processed by the computer's CPU and is stored in files and folders on the computer's hard disk. At its most rudimentary level, computer data is a bunch of ones and zeros, known as binary data. Because all computer data is in binary format, it can be created, processed, saved, and stored digitally. This allows data to be transferred from one computer to another using a network connection or various media devices. It also does not deteriorate over time or lose quality after being used multiple times.<ref>What is Computer Data? [https://techterms.com/definition/data Techterms]</ref>
+
Computer data is information processed or stored by a computer. This information may be in the form of text documents, images, audio clips, software programs, or other types of data. Computer data may be processed by the computer's [[Central Processing Unit (CPU)|CPU and]] is stored in files and folders on the computer's [[Hard Disk Drive (HDD)|hard disk]]. At its most rudimentary level, computer data is a bunch of ones and zeros, known as binary data. Because all computer data is in [[Binary Code|binary format]], it can be created, processed, saved, and stored digitally. This allows data to be transferred from one computer to another using a [[Network|network]] connection or various media devices. It also does not deteriorate over time or lose [[Quality|quality]] after being used multiple times.<ref>What is Computer Data? [https://techterms.com/definition/data Techterms]</ref>
  
  

Revision as of 18:55, 7 April 2021

Data is distinct pieces of information, usually formatted in a special way. Data can exist in a variety of forms — as numbers or text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind. Since the mid-1900s, people have used the word data to mean computer information that is transmitted or stored. Strictly speaking, data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word, and as a mass noun.[1]



The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.

The Latin word data is the plural of datum, "(thing) given," neuter past participle of dare "to give". In English the word data may be used as a plural noun in this sense, with some writers—usually those working in natural sciences, life sciences, and social sciences—using datum in the singular and data for plural, especially in the 20th century and in many cases also the 21st (for example, APA style as of the 7th edition still requires "data" to be plural). However, in everyday language and in much of the usage of software development and computer science, "data" is most commonly used in the singular as a mass noun (like "sand" or "rain"). The term big data takes the singular.

Although data are also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented. Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.[2]


Data is more than just data[3]

In the Reference Model for an Open Archival Information System (OAIS) (Wikipedia), data is defined as "[a] reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen." Types of data include:

  • observational data
  • laboratory experimental data
  • computer simulation
  • textual analysis
  • physical artifacts or relics

For social science, data is generally numeric files originating from social research methodologies or administrative records, from which statistics are produced. It also includes, however, more data formats such as audio, video, geospatial and other digital content that are germane to social science research. Digital text is becoming increasingly important in the humanities and arts. Research in these areas may think of data in the form of textual information, semantic elements, and text objects.

Computer data is information processed or stored by a computer. This information may be in the form of text documents, images, audio clips, software programs, or other types of data. Computer data may be processed by the computer's CPU and is stored in files and folders on the computer's hard disk. At its most rudimentary level, computer data is a bunch of ones and zeros, known as binary data. Because all computer data is in binary format, it can be created, processed, saved, and stored digitally. This allows data to be transferred from one computer to another using a network connection or various media devices. It also does not deteriorate over time or lose quality after being used multiple times.[4]


Data Types[5]

Computer systems work with different types of digital data. In the early days of computing, data consisted primarily of text and numbers, but in modern-day computing, there are lots of different multimedia data types, such as audio, images, graphics and video. Ultimately, however, all data types are stored as binary digits. For each data type, there are very specific techniques to convert between the binary language of computers and how we interpret data using our senses, such as sight and sound.

Analog vs. Digital Data
There are two general ways to represent data: analog and digital. Analog data are continuous. They are 'analogous' to the actual facts they represent. Digital data are discrete, broken up into a limited number of elements. Nature is analog, while computers are digital. Many aspects of our natural world are continuous in nature. For example, think of the spectrum of colors. This is a continuous rainbow of an infinite number of shades.

Computer systems, on the other hand, are not continuous, but finite. All data are stored in binary digits, and there is a limit to how much data we can represent. For example, a color image on a computer has a limited number of colors - the number might be very large, but it is still finite.

Consider the example of color in a bit more detail. The very first monitor displays were essentially text terminals with only a single color. White or light green text appeared on a black background.

Newer monitors used more colors, enough to represent basic images, but were still quite limited. Modern displays have millions of colors and look much more natural. Still, the number of colors is finite. The finite nature of data stored on a computer influences how different types are stored as binary digits. You will see examples of this as the different types are discussed.


The basic types of data found in databases include character strings, integers, decimals, images, audio, video and other multimedia types.

Character Strings
One of the most basic data types is plain text. In database terminology, this is referred to as a character string, or simply a string. A string represents alphanumeric data. This means that a string can contain many different characters, but that they are all considered as if they were text and not put into calculations, even if the characters are numbers.

Consider the following database table:

Database Table.png
source: Study.com


All of these fields are strings. Fields like the first and last name consist only of text characters, so it makes sense they are stored as a string. The field for the street address contains both numbers and characters and is also stored as a string. The student ID looks like a number, but it really represents a code. It is not a number you want to do any calculations with, so it is stored as a string. Similarly, the ZIP code looks like a number, but is also stored as a string.


Numeric Data Types
The second most important data type is numeric data. As a general rule, you store numbers only as a numeric data type if they represent a count or measurement of some kind and if it makes sense to perform calculations with them. A ZIP code is a number assigned to a geographic area by the postal service. It would not make much sense to determine the average value for multiple ZIP codes.

There are several different types of numeric data. An integer is a numeric value without a decimal. Integers are whole numbers and can be positive or negative. In a database, a distinction is made between short and long integers, referring to how much data storage is used for the number. A short integer is typically stored using 16 bits, which means that you can store up to 2^16, or 65,536 unique values. For any number larger than that, you would need to use a long integer, which uses 32 bits or more.

A number with a decimal is referred to as a decimal, a float or a double. The terminology varies somewhat with the software being used. The term float comes from 'floating point,' which means you can control where the decimal point is located. The term double refers to using double the amount of storage relative to a float.

In the example table of students below, the field credits completed is an integer, while GPA is a decimal. In both these examples, it would make sense to do calculations. For example, you could use credits completed to calculate how many more credits a student needs to graduate. Or, you could determine the average GPA for all the students.

Numeric Data Table
source: Study.com


Boolean Data
The Boolean data type can only represent two values: true or false. Typically, a zero is used to represent false and a one is used to represent true. In the example table of students, the field Financial Aid is stored as a Boolean, since a student is classified as having financial aid or not.


Boolean Data Table
source: Study.com


Characteristics Of Data[6]

Not all data can be considered of fine quality hence making them limited in their usefulness. In order to fully realize the benefits of data, it has to be of high quality. This means that one should look out for certain characteristics in the data. These are:

  • Data should be precise which means it should contain accurate information. Precision saves time of the user as well as their money.
  • Data should be relevant and according to the requirements of the user. Hence the legitimacy of the data should be checked before considering it for usage.
  • Data should be consistent and reliable. False data is worse than incomplete data or no data at all.
  • Relevance of data is necessary in order for it to be of good quality and useful. Although in today’s world of dynamic data any relevant information is not complete at all times however at the time of its usage, the data has to be comprehensive and complete in its current form.
  • A high quality data is unique to the requirement of the user. Moreover it is easily accessible and could be processed further with ease.


Quantitative and Qualitative Data[7]

When it all boils down to it, all data that is collected are either measured or are an observed feature of interest, and at the highest level that gives us 2 kinds of data:

  • Quantitative data: Quantitative data is information about quantities of things, things that we measure, and so we describe them in terms of numbers. As such, quantitative data are also called Numerical data. Quantitative data are used when a researcher is trying to quantify a problem, or address the "what" or "how many" aspects of a research question. It is data that can either be counted or compared on a numeric scale. For example, it could be the number of first year students at a University, or the ratings on a scale of 1-4 of the quality of food served at a restaurant. This data are usually gathered using instruments, such as a questionnaire which includes a ratings scale or a thermometer to collect weather data. Statistical analysis software, such as SPSS, is often used to analyze quantitative data.
  • Qualitative data: On the other hand, qualitative data give us information about the qualities of things. They are observed phenomenon, not measured, and so we generally label them with names. Qualitative data are also known as Categorical data. Qualitative data describes qualities or characteristics. It is collected using questionnaires, interviews, or observation, and frequently appears in narrative form. For example, it could be notes taken during a focus group on the quality of the food at a restaurant, or responses from an open-ended questionnaire. Qualitative data may be difficult to precisely measure and analyze. The data may be in the form of descriptive words that can be examined for patterns or meaning, sometimes through the use of coding. Coding allows the researcher to categorize qualitative data to identify themes that correspond with the research questions and to perform quantitative analysis.


See Also

Data Compatibility
Data Access
Data Analysis
Data Analytics
Data Architecture
Data Asset Framework (DAF)
Data Buffer
Data Center
Data Center Infrastructure
Data Center Infrastructure Management (DCIM)
Data Cleansing
Big Data
Big Data Integration
Big Data Maturity Model (BDMM)
Metadata
Data Collection
Data Compatibility
Data Consolidation
Data Deduplication
Data Delivery Platform (DDP)
Data Description (Definition) Language (DDL)
Data Dictionary
Data Discovery
Data Driven Organization
Data Element
Data Enrichment
Data Entry
Data Federation
Data Flow Diagram
Data Governance
Data Health Check
Data Hierarchy
Data Independence
Data Integration
Data Integration Framework (DIF)
Data Integrity
Data Island
Data Item
Data Lake
Data Life Cycle
Data Lineage
Data Loss Prevention (DLP)
Data Management
Data Migration
Data Minimization
Data Mining
Data Model
Data Modeling
Data Monitoring
Data Munging
Data Portability
Data Preparation
Data Presentation Architecture
Data Processing
Data Profiling
Data Proliferation
Data Propagation
Data Protection Act
Data Prototyping
Data Quality
Data Quality Assessment (DQA)
Data Quality Dimension
Data Quality Standard
Data Reconciliation
Data Reference Model (DRM)
Data Science
Data Security
Data Stewardship
Data Structure
Data Structure Diagram
Data Suppression
Data Transformation
Data Validation
Data Value Chain
Data Vault Modeling
Data Virtualization
Data Visualization
Data Warehouse
Data Wrangling
Data and Information Reference Model (DRM)
Data as a Service (DaaS)
Database (DB)
Database Design
Database Design Methodology
Database Management System (DBMS)
Database Marketing
Database Schema
Database System


References

  1. Definition - What is the Meaning of Data? Webopedia
  2. Etymology and Terminology of Data Wikipedia
  3. Data is more than just data University of Minnesota
  4. What is Computer Data? Techterms
  5. What are the Different Types of Data Study.com
  6. Five Characteristics Of Good Quality Data Rebellion Rider
  7. What Are Quantitative and Qualitative Data Types in Statistics? [https://www.chi2innovations.com/blog/discover-data-blog-series/data-types-101/ Chi Squared Innovations