Metadata is information stored within a document that is not evident by just looking at the file. It is an electronic “fingerprint” that automatically adds identifying characteristics, such as the creator or author of the file, the name of individuals who have accessed or edited the file, the location from which the file was accessed, and the amount of time spent editing the file. In addition to data that is automatically added to a document, there is user-introduced metadata, such as tracked changes, versions, hidden text and embedded objects. Every time you create, open or save a Microsoft Word document, hidden information is created and stored within the document that you may not want others outside of your organization to discover. Hidden information can also reside in other Microsoft application files, such as Excel spreadsheets or PowerPoint presentations and includes:
- Your Name and Initials
- Company Name
- Computer Name
- Location of Document on Local or Network Server
- Attached Template
- Hidden Text
- Track Changes
- Non-Visible Portions of OLE Objects
- File Properties and Summary Information
- And more …1
In addition to document files, metadata is used for images, videos, spreadsheets and web pages. The use of metadata on web pages can be very important. Metadata for web pages contain descriptions of the page’s contents, as well as keywords linked to the content. These are usually expressed in the form of metatags. The metadata containing the web page’s description and summary is often displayed in search results by search engines, making its accuracy and details very important since it can determine whether a user decides to visit the site or not. Metatags are often evaluated by search engines to help decide a web page’s relevance, and were used as the key factor in determining position in a search until the late 1990s. The increase in search engine optimization (SEO) towards the end of the 1990s led to many websites “keyword stuffing” their metadata to trick search engines, making their websites seem more relevant than others. Since then search engines have reduced their reliance on metatags, though they are still factored in when indexing pages. Many search engines also try to halt web pages’ ability to thwart their system by regularly changing their criteria for rankings, with Google being notorious for frequently changing their highly-undisclosed ranking algorithms. Metadata can be created manually, or by automated information processing. Manual creation tends to be more accurate, allowing the user to input any information they feel is relevant or needed to help describe the file. Automated metadata creation can be much more elementary, usually only displaying information such as file size, file extension, when the file was created and who created the file.2
Metadata is an important tool for improving a Web page's search engine optimization (SEO). Search engines generally use metadata, along with a combination of other factors, to determine what is on a Web page and how relevant that content is to a given search. This data is included in the the meta tags found in a Web page's HTML or XHTML. Common metadata used by most search engines includes:
- Description: This meta element describes the type of content found on a Web page. For example, the description for this page tells this search engine that the page contains a definition of the term metadata.
- Title: This provides a title for the content on the page, which is shown by search engines in results. For this page, it is: What is Metadata? - Definition from Techopedia.com.
- Keywords: This provides the search engine with additional keywords that are related to the content that's on the page. Whether search engines still use this data is a matter of debate.3
According to the DAMA International Data Management Book of Knowledge (DMBOK), Metadata “includes information about technical and business processes, data rules and constraints, and logical and physical data structures.” Think of it as a wrapper around data that describes it, like how packaging tells what food is in a box or a container. In the past, people interacted with Metadata through cards in a library catalog, with technical advances, humans and computers have access to distinct types of Metadata:
Metadata - Historical Background5
- Business Metadata: Provides the meaning of data, by defining terms in every-day language without regard to technical implementation. It “focuses largely on the content and condition of the data and includes details related to Data Governance.”
- Technical Metadata: Provides information on the format and structure of the data as needed by computer systems. Some examples of Technical Metadata include physical database tables, access permissions, data models, backup rules, mapping documentation, data lineage, and many more.
- Operational Metadata: This type of Metadata “describes details of the processing and accessing of data.” (DMBOK) Various example of Operational Metadata include: job execution logs, data sharing rules, error logs, audit results, various version maintenance plans, archive and retention rules, among many others.4
Metadata was traditionally used in the card catalogs of libraries until the 1980s, when libraries converted their catalog data to digital databases. In the 2000s, as digital formats were becoming the prevalent way of storing data and information, metadata was also used to describe digital data using metadata standards. The first description of "meta data" for computer systems is purportedly noted by MIT's Center for International Studies experts David Griffel and Stuart McIntosh in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data." There are different metadata standards for each different discipline (e.g., museum collections, digital audio files, websites, etc.). Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers and songwriters whose work appears on the disc. A principal purpose of metadata is to help users find relevant information and discover resources. Metadata also helps to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. Metadata assists users in resource discovery by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information." Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance. In many countries, the metadata relating to emails, telephone calls, web pages, video traffic, IP connections and cell phone locations are routinely stored by government organizations.Metadata Standards6
A metadata standard or schema is a set group of elements that have been standardized for a particular field. Some scientific disciplines already have established metadata standards for data sets. Additionally, some data repositories also have their own standards. One of the standards listed below might be exactly what you need to document your data. If there is not a standard already in place for your data, there are several general purpose schemas that you can adapt to your needs. Your subject specialist will be familiar with the metadata standards used in your discipline.
source: Boston CollegeWhy Create Metadata?
[Why Create Metadata? ANDS]
Metadata enables and enhances the discovery and reuse of data.
- Finding Data
Data formats such as text can be indexed and searched themselves (as in a simple Google search). However, the ability to search formats like audio, images and video is limited, and discoverability relies on searching the metadata. Discovery metadata helps researchers find data that, for example:
- relates to a geographical area of interest (via geospatial metadata)
- relates to a research discipline of interest (via field of research, keyword or vocabulary metadata)
- is generated by another researcher whose work is of interest (via lead researcher or contributor metadata).
- Determining the value of data
To assess the usefulness, value and quality of a dataset, researchers need to understand the context around the data. This is given in metadata that:
- Describe why the data were collected, the experimental design and data collection methods, etc.
- Links to the researchers and institution(s) involved
- Identifies the research program or grant
- Points to publications that have flowed from the research data
- Explicitly provides provenance, licensing, rights, and technical information.
- Accessing data
Access to research data requires:
- information that identifies the research data collection - usually through a metadata collection (catalogue) record
- Links to the data or contact information:
- a direct download link to online data for open access, or
- contact information for the data manager for mediated access.
- Using and reusing data
To make use of any dataset, researchers need metadata on:
- how the data is structured
- what it describes
- how to read it (e.g. column headings and units)
- methodological information such as instrument settings and calibrations, reagents used, or survey questions
- exactly what they are allowed to do with the data through rights metadata such as licensing
- how to acknowledge the original creators by citing the data.
Metadata Creation Process7
- Step One
When preparing to describe your resources, there are a number of questions that you will want to consider:
- What you are describing?
- Are you describing a physical object, a digital object, a digital representation of a physical object?
- What kind of information do you want to record?
- Who is my audience? What information is needed to identify the resource? What information is needed to properly contextualize it? How do I want people to find it or interact with it? How do I expect them to search for or discover it? How do I expect to use it? How do I expect others to use it now and in the future? What information is required to communicate who owns it, who can use it and to what extent?
- Step Two
As you begin to answer the questions presented in Step One, list out the information that you would like to include as data points, e.g., title, subject, access rights, etc. For example, if you are wanting to overlay images onto a map, you will want to record coordinate data. This is your metadata wish list.
- Step Three
Consider the descriptive information or metadata that you may already have: Which elements or what kind of information are recorded or represented there? Is information missing about your resources? Is there information that would challenging to find or create?
- Step Four
Find your "golden minimum." Determine what information is essential to facilitate discovery, identification, and to give sufficient context, but no more. What exactly is the golden minimum in the space of your project depends on your project goals and available resources.
- Step Five
Finalize your list of data points. Choose to codify this list as your own metadata schema or map it to an existing schema, such as Dublin Core.
- Step Six
Decide whether you want to make use of data value standards (controlled vocabularies, thesauri, encoding or formatting standards). If so, which standards would apply to which fields? Alternatively, you can create your own data value standards, such as, a subject vocabulary specific to your topic or collection of resources or a controlled list of names. Document your decisions as your best practices.
- Resource discovery
- Allowing resources to be found by relevant criteria;
- Identifying resources;
- Bringing similar resources together;
- Distinguishing dissimilar resources;
- Giving location information.
- Organizing e-resources
- Organizing links to resources based on audience or topic.
- Building these pages dynamically from metadata stored in databases.
- Facilitating interoperability
- Using defined metadata schemes, shared transfer protocols, and crosswalks between schemes, resources across the network can be searched more seamlessly.
- Cross-system search, e.g., using Z39.50 protocol;
- Metadata harvesting, e.g., OAI protocol.
- Digital Identification
- Elements for standard numbers, e.g., ISBN
- The location of a digital object may also be given using:
- a file name
- a URL
- some persistent identifiers, e.g., PURL (Persistent URL); DOI (Digital Object Identifier)
- Combined metadata to act as a set of identifying data, differentiating one object from another for validation purposes.
- Archiving and Preservation
- Digital information is fragile and can be corrupted or altered;
- It may become unusable as storage technologies change.
- Metadata is key to ensuring that resources will survive and continue to be accessible into the future. Archiving and preservation require special elements:
- to track the lineage of a digital object,
- to detail its physical characteristics, and
- to document its behavior in order to emulate it in future technologies.
Types of Metadata
Below is a diagrammatical illustration of the types of Metadata with examples
source: Maureen McClarnon
See AlsoBig Data
Customer Data Management (CDM)
Enterprise Data Warehouse (EDW)
Master Data Management (MDM)