What is External Data Representation and why we need it?
External Data Representation is representing data structures and primitive data types in an agreed standard.
The information in a running program are stored in data structures. But when transferring they are converted to stream of bytes. Though different computer programs use different formats to store information, use of external data representation allows data to be transferred between different kinds of computer systems.
Let’s clarify this further.
Think of two computer systems that we need to communicate occasionally. These computer systems may use different data structures (linked lists, hash maps, binary trees, queue etc.) and primitive data types while running programs. These are different from program to program.
- Integers have two different types — big-endian and little-endian
- Floats — Different representation in different architectures
- Characters — ASCII and Unicode
Further two programs may be written in two different languages.
When these two system exchange information, information are converted to a stream of bytes regardless of the communication form and again converted to relevant data structures on arrival.
But what happens if the receiving computer system understand different set of data structures and uses different set of characters? Simply think of receiving a message in Japanese and your phone only recognize Sinhala. Your phone should be able to understand this message is in Japanese.
To overcome this all agreed on a common standard way to represent data. That is known as External Data Representation, which is an intermediate data type to be used in data and information transmission.
For that one of the following methods is used in communication.
- The values are converted to an agreed external format before transmission and converted to the local form on receipt; if the two computers are known to be the same type, the conversion to external format can be omitted.
- The values are transmitted in the sender’s format, together with an indication of the format used, and the recipient converts the values if necessary.
Simply, External Data Representation was designed to work across different languages, operating systems, and machine architectures during data and information communication.
I believe now you have a clear idea about External Data Representation. So let’s find out Marshalling!
Marshalling and Unmarshalling
Marshalling is the process of gathering data items and transforming them into an external data representation type suitable for transmission or storing.
Unmarshalling is the opposite of marshaling; dissembling the received stream of data to the relevant data structures.
Different External Data Representations.
- CORBA’s Common Data Representation (CDR)
- Java’s Object Serialization
- XML (Extensible Markup Language)
CORBA’s Common Data Representation (CDR)
CORBA CDR is an external data representation introduced by CORBA 2.0. This represent all of the data types that can be used as arguments and return values in remote invocations in CORBA. CORBA can be used with many programming languages. This includes 15 primitive types and composite types.
- Short (16-bit)
- Long (32-bit)
- Unsigned short
- Unsigned long
- Float (32-bit)
- Double (64-bit)
- Boolean (TRUE, FALSE)
- Octet (8-bit)
Marshalling in CORBA
In CORBA marshalling and unmarshalling processes are carried out by a middle-ware layer. Primitive data types marshall in to binary form. When transmitting data do not contain any information about the type of the content. The types of the data structures and the types of the basic data items are described in CORBA IDL which provides a notation for describing the types of the arguments and results of RMI methods.
Java’s Object Serialization