Saturday 23 July 2016

United States of Variables


A thermos is an under-appreciated invention. The entirety of its concept is both simple and beautiful. Its job is to keep things as they are. The biggest challenge a thermos (or anyone for that matter) has to face is not to submit to the test of time. The heat quotient (temperature) of the thing contained by a thermos must remain unchanged (or changed insignificantly) for a given amount of time. What is this given amount of time you ask? Well the time for which we need the thing inside our thermos. If I put boiling water in a thermos which I’ll be needing in an hour, then that water must retain its fire or else what good the thermos does me. The same must be true for this thermos when the occasion is different and the water is freezing.
The point is that the thermos must preserve its contents for a certain amount of time (subject to the thermos’ capability).
Does that remind you of anything related to computers?
Memory. But for the purpose of this post, Random Access Memory. Replace in the above line the word “thermos” with “RAM” and see if the fit isn’t perfect.
Where do we go from here? Well, what about the fundamental limitation of a thermos?
A thermos cannot contain hot and cold things at the same time. The water inside of the thermos can either be cold or hot, never both.
Now can I find a similar behavioral pattern for RAMs? Of course I can. When we declare a variable in a program, a place is reserved in memory and is marked with the name of the variable. This place corresponds to our thermos and the contents of this thermos are the bits representing the value given to the variable by the programmer. This variable (which is nothing but a bunch of bits) can be classified into different types just like water (a bunch of molecules in liquid form) can be classified based on its temperature. Some of these types for a variable are int, char, float, double, etc. With this in mind, let me rewrite a line from before with certain replacements:
A place in memory cannot contain int and char (and float and double..) at the same time. The bits in this memory can either be an int or a char (or a float or a double..), never both (all).
Did that make sense? It did to me when I was new to programming. I knew for a fact that every variable must have a place in memory exclusively for itself. But then I read about unions. And of course I read about them being contingent to C++.
Unions (in C++) arrange a block of memory. This block of memory is used by the members of this union, all at once. Members of a union are your simple variables with normal names and liberal types. As an example, consider the following union:
union U
{
                int i;
                char c;
                float f;
};
U obj; //An object of the union U

First thing union U is going to need is enough place in memory that can contain the largest of its members, in this case that corresponds to float f. So U will reserve 4 bytes (the size of a float). Now assume these 4 bytes are somehow filled with some bits. The question is that what does these bits mean? The answer is that they can mean anything depending on what the programmer want them to mean. If the programmer wishes to use this union as a float then he/she could do that using obj.f and the bits would behave like a float. If the programmer wants this union to behave like an int then using obj.i will force a portion of the whole union (just 2 out of 4 bytes) to behave like an int. And finally, if a character is needed, then obj.c would do that by taking only 1 byte of the union and casting the bits to yield a char.
You see, bits are bits. There is no such thing as a bit being hot or cold. The same bits make an integer and the same bits make a character. It’s simply the choice of the programmer to use these bits a certain way. And that is exactly what type specifications are for, telling the compiler the way in which a group of bits are to be treated. Unions take advantage of this.
Anonymous Unions:
In the above example I used a union to create a class of sorts (U). Using this I was able to create an object of this union. This is useful when the program requires you to use a union multiple times in a program under different cases. But when all you need is a group of variables sharing the same memory location (the possible reasons for which I’ll discuss later), then you can choose not to name your union. This way you will be saved from first having an object of the union and then using the members via the member operator: dot. These are called anonymous unions. Here’s an example to clarify things syntactically:
union
{
                int I;
                char c;
                float f;
};
Now you are free to use the variables i, c and f as you would use any normal variable. But in the background, these three will be sharing the same memory place.
Need:
Well first of all, it saves memory. Nah, that’s not it. Saving a few bytes don’t matter much. The real essence of unions lie in something else, something quite rare.

Humor me for a moment. Imagine your job is picking people at the airport, people you have never met before. My question is what size car you’d take with you to carry them? It is possible that on some days you’ll get a really thin person but on others you can very well end up with a huge one. The answer is a car which is big enough to hold any size of a person imaginable. And that is exactly how a union is to be used in a computer program. When you are not sure what type of data you’re going to get (from a file or some other source of input), simply use a union to enable different treatments of the same memory location. A good example can be found in Flex and Bison, but it’ll be cruel to force that on you just yet. May be later.

No comments:

Post a Comment