What is Data?
Data refers to any collection of facts, figures, or statistics that can be analyzed to gain insights or knowledge about a particular topic. In the context of computing and technology, data often refers to digital information that is stored and processed by computers and other devices. This information can be structured or unstructured and can come from a wide range of sources, including sensors, databases, social media platforms, and more. The purpose of collecting and analyzing data is to gain a deeper understanding of trends, patterns, and insights that can be used to inform decision-making and improve outcomes in various fields, such as business, science, healthcare, and more.
Data is distinct pieces of information, usually formatted in a special way. Data can exist in a variety of forms — as numbers or text on pieces of paper, as bits and bytes stored in electronic memory, or as facts stored in a person's mind. Since the mid-1900s, people have used the word data to mean computer information that is transmitted or stored. Strictly speaking, data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word, and as a mass noun.
The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.
The Latin word data is the plural of datum, "(thing) given," neuter past participle of dare "to give". In English the word data may be used as a plural noun in this sense, with some writers—usually, those working in natural sciences, life sciences, and social sciences—using datum in the singular and data for plural, especially in the 20th century and in many cases also the 21st (for example, APA style as of the 7th edition still requires "data" to be plural). However, in everyday language and in much of the usage of software development and computer science, "data" is most commonly used in the singular as a mass noun (like "sand" or "rain"). The term big data takes the singular.
Although data are also increasingly used in other fields, it has been suggested that their highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, “to take”) to distinguish between an immense number of possible data and a subset of them, to which attention is oriented. Johanna Drucker has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example, that phenomena are discrete or observer-independent. The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities.
Data is more than just data
In the Reference Model for an Open Archival Information System (OAIS) (Wikipedia), data is defined as "a reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen." Types of data include:
- observational data
- laboratory experimental data
- computer simulation
- textual analysis
- physical artifacts or relics
For social science, data is generally numeric files originating from social research methodologies or administrative records, from which statistics are produced. It also includes, however, more data formats such as audio, video, geospatial, and other digital content that are germane to social science research. Digital text is becoming increasingly important in the humanities and arts. Research in these areas may think of data in the form of textual information, semantic elements, and text objects.
Computer data is information processed or stored by a computer. This information may be in the form of text documents, images, audio clips, software programs, or other types of data. Computer data may be processed by the computer's CPU and is stored in files and folders on the computer's hard disk. At its most rudimentary level, computer data is a bunch of ones and zeros, known as binary data. Because all computer data is in binary format, it can be created, processed, saved, and stored digitally. This allows data to be transferred from one computer to another using a network connection or various media devices. It also does not deteriorate over time or lose quality after being used multiple times. 
Analog vs. Digital Data
There are two general ways to represent data: analog and digital. Analog data are continuous. They are 'analogous' to the actual facts they represent. Digital data are discrete, and broken up into a limited number of elements. Nature is analog, while computers are digital. Many aspects of our natural world are continuous in nature. For example, think of the spectrum of colors. This is a continuous rainbow of an infinite number of shades.
Computer systems, on the other hand, are not continuous, but finite. All data are stored in binary digits, and there is a limit to how much data we can represent. For example, a color image on a computer has a limited number of colors - the number might be very large, but it is still finite.
Computer systems work with different types of digital data. In the early days of computing, data consisted primarily of text and numbers, but in modern-day computing, there are lots of different multimedia data types, such as audio, images, graphics, and video. Ultimately, however, all data types are stored as binary digits. For each data type, there are very specific techniques to convert between the binary language of computers and how we interpret data using our senses, such as sight and sound.
Consider the example of color in a bit more detail. The very first monitor displays were essentially text terminals with only a single color. White or light green text appeared on a black background.
Newer monitors used more colors, enough to represent basic images, but were still quite limited. Modern displays have millions of colors and look much more natural. Still, the number of colors is finite. The finite nature of data stored on a computer influences how different types are stored as binary digits. You will see examples of this as the different types are discussed.
Types of Data
The basic types of data found in databases include character strings, integers, decimals, images, audio, video, and other multimedia types.
One of the most basic data types is plain text. In database terminology, this is referred to as a character string, or simply a string. A string represents alphanumeric data. This means that a string can contain many different characters, but that they are all considered as if they were text and not put into calculations, even if the characters are numbers.
Consider the following database table:
All of these fields are strings. Fields like the first and last name consist only of text characters, so it makes sense they are stored as a string. The field for the street address contains both numbers and characters and is also stored as a string. The student ID looks like a number, but it really represents a code. It is not a number you want to do any calculations with, so it is stored as a string. Similarly, the ZIP code looks like a number but is also stored as a string.
Numeric Data Types
The second most important data type is numeric data. As a general rule, you store numbers only as a numeric data type if they represent a count or measurement of some kind and if it makes sense to perform calculations with them. A ZIP code is a number assigned to a geographic area by the postal service. It would not make much sense to determine the average value for multiple ZIP codes.
There are several different types of numeric data. An integer is a numeric value without a decimal. Integers are whole numbers and can be positive or negative. In a database, a distinction is made between short and long integers, referring to how much data storage is used for the number. A short integer is typically stored using 16 bits, which means that you can store up to 2^16, or 65,536 unique values. For any number larger than that, you would need to use a long integer, which uses 32 bits or more.
A number with a decimal is referred to as a decimal, a float, or a double. The terminology varies somewhat with the software being used. The term float comes from 'floating point,' which means you can control where the decimal point is located. The term double refers to using double the amount of storage relative to a float.
In the example table of students below, the field credits completed are an integer, while GPA is a decimal. In both these examples, it would make sense to do calculations. For example, you could use credits completed to calculate how many more credits a student needs to graduate. Or, you could determine the average GPA for all the students.
The Boolean data type can only represent two values: true or false. Typically, a zero is used to represent false and a one is used to represent true. In the example table of students, the field Financial Aid is stored as a Boolean, since a student is classified as having financial aid or not.
Characteristics Of Data
Not all data can be considered of fine quality hence making them limited in their usefulness. In order to fully realize the benefits of data, it has to be of high quality. This means that one should look out for certain characteristics in the data. These are:
- Data should be precise which means it should contain accurate information. Precision saves time for the user as well as their money.
- Data should be relevant and according to the requirements of the user. Hence the legitimacy of the data should be checked before considering it for usage.
- Data should be consistent and reliable. False data is worse than incomplete data or no data at all.
- Relevance of data is necessary in order for it to be of good quality and useful. Although in today’s world of dynamic data, any relevant information is not complete at all times however at the time of its usage, the data has to be comprehensive and complete in its current form.
- High-quality data is unique to the requirement of the user. Moreover, it is easily accessible and could be processed further with ease.
Quantitative and Qualitative Data
When it all boils down to it, all data that is collected are either measured or are an observed feature of interest, and at the highest level that gives us 2 kinds of data:
- Quantitative data: Quantitative data is information about quantities of things, things that we measure, and so we describe them in terms of numbers. As such, quantitative data are also called Numerical data. Quantitative data are used when a researcher is trying to quantify a problem or address the "what" or "how many" aspects of a research question. It is data that can either be counted or compared on a numeric scale. For example, it could be the number of first-year students at a University or the ratings on a scale of 1-4 of the quality of food served at a restaurant. These data are usually gathered using instruments, such as a questionnaire which includes a rating scale or a thermometer to collect weather data. Statistical analysis software, such as SPSS, is often used to analyze quantitative data.
- Qualitative data: On the other hand, qualitative data give us information about the qualities of things. They are observed phenomena, not measured, and so we generally label them with names. Qualitative data are also known as Categorical data. Qualitative data describes qualities or characteristics. It is collected using questionnaires, interviews, or observation, and frequently appears in narrative form. For example, it could be notes taken during a focus group on the quality of the food at a restaurant, or responses from an open-ended questionnaire. Qualitative data may be difficult to precisely measure and analyze. The data may be in the form of descriptive words that can be examined for patterns or meaning, sometimes through the use of coding. Coding allows the researcher to categorize qualitative data to identify themes that correspond with the research questions and to perform quantitative analysis.