Data Compatibility

What is Data Compatibility?

Data Compatibility is an IT innovation that provides integrated data throughout an organization, among organizations, and across industries. Data compatibility is so superior to traditional data integration methods that data compatibility is the inevitable replacement for data integration methods. Compatible data is integrated data. Data is automatically integrated from the time it’s initially stored and it will remain integrated until it's deleted. Any data from a compatible data system is integrated with data from any other compatible data system. Unlike traditional data integration methods, there is no need to spend exorbitant amounts of money, time, and effort moving and transforming data. Data compatibility creates untold opportunities for collaboration, both within and outside your organization.^[1]

All data arise within a particular context and often as a result of a specific question being asked. That is all well and good until we attempt to use that same data to answer a different question within a different context. When you match an existing dataset with a new question, you have to ask if the original context in which the data were collected is compatible with the new question and the new context. If there is context compatibility, then it is often reasonable to move forward. If not, then you either have to stop or come up with some statistical principle or assumption that makes the two contexts compatible. Good data analysts will often do this in their heads quickly and may not even realize they are doing it. Understanding context compatibility is increasingly important as data science and the analysis of existing datasets continue to take off. Existing datasets all come from somewhere and it’s important to the analyst to know where that is and whether it’s compatible with where they’re going. If there is an incompatibility between the two contexts, which in my experience is almost always the case, then any assumption or statistical principle invoked will likely introduce uncertainty into the final results. That uncertainty should at least be communicated to the audience, if not formally considered in the analysis. In some cases, the original context of the data and the context of the new analysis will be so incompatible that it’s not worth using the data to answer the new question. Explicit recognition of this problem can save a lot of wasted time analyzing a dataset that is ultimately a poor fit for answering a given question.^[2]

Compatible Data Types^[3]

The compatibility of two data types (except reference types) is based on their technical type attributes. It is the basis for type checking in assignments to field symbols or when assigning actual parameters to formal parameters. In the case of value assignments and comparisons between data objects (except reference variables), compatibility also determines whether a conversion has to be performed or not. In the case of reference types with data objects (reference variables) that have a dynamic type as well as the static type, then compatibility, which is based entirely on technical type attributes, is not sufficient.

Non-Generic Data Types (Except Reference Types): Two non-generic data types (not reference types) and data types that contain reference types as components are compatible if all their technical type attributes match.
- In the case of elementary data type all technical type attributes match. The technical type attributes are as follows:
  - The predefined ABAP type
  - The length (in the case of the types c, n, p, and x)
  - The number of decimal places (in the case of the type p)
- In the case of structured types, the technical type attribute is as follows:
- The layout of components: The layout of structured types does not only refer to the sequence of elementary components in memory but also to the combination of components with substructures and whether a substructure is a boxed component. The names of the components and of the semantic attributes defined in ABAP Dictionary, such as conversion routines or documentation, however, are not important. In the case of compatible structures, all components are compatible in pairs. This applies recursively down to the level of elementary data types.

If two structures are both constructed identically but different substructures are declared as boxed components, the structures are not compatible.

- In the case of table types, the technical type attributes are as follows: (In the case of compatible internal tables, the row types are compatible and the table category and table key match. Other attributes, such as the initial memory requirement, are not important.)
  - The row type
  - Table category
  - Table key
- In the case of mesh types, the technical type attributes are as follows:
  - The layout of nodes (including the node names)
  - The associations (defined by ON conditions) of every component, including names and the table key used
- In the case of enumerated types, the technical type attributes are as follows:
  - All properties of the enumerated type: Every enumerated type is unique and only compatible with itself.

Note: The types specified here are not compatible with one another. For example, an elementary data type is never compatible with a structure, even if the structure has only one component. Statement TYPES cannot be used to define different enumerated types with the same technical type properties. Even data types constructed with RTTC methods exactly like an existing enumerated type are not compatible with it. An enumerated type defined by direct or indirect reference (including RTTI) to an existing enumerated type, is compatible with it however.

Generic Data Types: A non-generic data type (not a reference type) is compatible with a generic data type if its technical attributes are covered by the generic data type.
Reference Types: A reference type is the static type of reference variable and determines to which objects they can point. At runtime, reference variables also have a dynamic type determined by the type of object pointed to by a reference variable. The dynamic type may be more specialized than the static type. For this reason, the rules for typing checks, assignments, and comparisons cannot be covered by a compatibility concept based entirely on the technical attributes of the static type. Instead, the following three points show how reference types can be used together:
- When typings are checked, the following is possible:
  - A reference variable can be passed to a formal parameter typed as a reference variable, provided that the type of the formal parameter is more general or equal to the type of the reference variable (known as an upcast) and the formal parameter cannot be changed within the procedure.
  - A reference variable can be assigned to a field symbol typed as a reference variable, provided that the reference types are identical.
- Assignments between reference variables can be carried out by using an upcast or a downcast.
- Data reference variables can be compared with all data reference variables and object reference variables can be compared with all object reference variables.

As a rule, data reference variables can only be used with data reference variables and object reference variables can be used only with object reference variables. No conversions take place between reference variables. They are either passed as unconverted reference variables, assigned to each other, compared with each other, or no action at all takes place.

Backward Compatibility Vs. Forward Compatibility^[4]

It can be challenging to understand which kinds of changes to files and databases are compatible and which ones are incompatible. Further complicating things is that there are two different categories of compatibility: backward compatibility and forward compatibility.

A data format is backward compatible with its predecessor if every valid file under the old structure and format is still accessible and readable without error (therefore valid) when accessed and read with the new format.

There are issues with ensuring backward compatibility.

There may be a cost associated with supporting old software. This is considered one of the major drawbacks to ensuring backward compatibility. The associated costs of backward compatibility are further increased if the hardware is required to support a legacy data storage system or an application. This makes everything associated with being backward compatible more complex.

One also needs to look forward and consider forward compatibility.

Forward compatibility or upward compatibility is a design characteristic that allows a system to accept input intended for a later version of itself. The concept can be applied to entire systems, data communication protocols, file formats, and application programming languages.

Forward compatibility for the older system usually means backward compatibility for the new system. This means that the new system or format can access and process data from the old system. The new system usually has full compatibility with the older one by being able to both process and generate data in the format of the older system.

If a change is both forward and backward-compatible, then it is called fully compatible. This means you can run any combination of readers and writers without breaking anything.

References

[1] Defining Data Compatibility

[2] Compatible Data in Context

[3] Compatible Data Types

[4] Backward Compatibility Vs. Forward Compatibility

[1]

[2]

[3]

[4]

Data Compatibility

What is Data Compatibility?

Compatible Data Types[3]

Backward Compatibility Vs. Forward Compatibility[4]

See Also

References

Compatible Data Types^[3]

Backward Compatibility Vs. Forward Compatibility^[4]