Article on multiple data compression and expansion using iterative technique by sumit sharma(E-34)

MULTIPLE DATA COMPRESSION AND EXPANSION USING ITERATIVE TECHNIQUE

Since we know that in almost all embedded devices there is memory constraint. So there should be judicious use of the available memory. So here i am discussing how to compress and expand the compressed data using iterative technique and save memory.

Since the size of char is 1 byte so it could be used to represent 256 different characters. But in how many of our files do we actually have 256 different characters . So taking the advantage over this fact we could actually compress our data.

The first step towards compressing data is to create a master array which contains the different characters that exists in our file. So if for example in our 100 character file there are only 15 different characters then master array would be of size 15. Then taking a step further in order to compress data we need to assign an index to each of the member of master array with minimum number of bits required to represent all of the members of the master array. In our case of 15 characters we could represent each member with only with 4bits rather than 8bits which were used earlier.

So after providing each member of the master array with a unique index of 4 bits each we move on to the next step. We then need to access each element of our text file and map that element with the master array and find out its unique index value. The value which we provided is of 4 bits so we could actually combine the index values of two elements and form an compressed byte which actually contains two of characters from the file in the space of 1 character. Thus effective we are saving 50% of our memory.

For example:

Data to compress: helo to linux.

Master array :helo tinux.

So here we need 4 bits to represent each member of masater array.

Elements : unique index

h : 0000

e : 0001

l : 0010

o : 0011

: 0100

t : 0101

i : 0110

n : 0111

u : 1000

x : 1001

. : 1010

Now while accessing each element of in order to compress it first we access ‘h’ it has index of 0000 but in order to represent it in character form we need 8 bits so we access another character ‘e’ with index 0001 and combine both 0000 and 0001 in order to form 8 bit character.

Actually it will be implemented like this:

we will assign h with 0 and e with 1 in unsigned char form.

h=00000000

e=00000001

Since we know that only lower 4 bits of h are significant and so are of e ,so we left shift the bits of h by 4(though there is no significance of this step but if it would not have been 0 then it is a must step) and OR both h and e.

We get 00000001 where upper 4 bits represent ‘h’ and lower 4 bits represent ‘e’.

For the next two elements :

l : assigned with 3 i.e 00000011

o : assigned with 4 i.e 00000100

On left shifting ‘l’ and Oring both we get 00110100

On the left index value of ‘l’ that we as assigned and on right index value of ‘o’.

So using this technique we could compress all our data. Now after compressing you actually generated a file that occupies 50% less space than your original file without even losing any of its content.

Now in order to revive back the uncompressed data the process is just the vice-versa. You just have to access each byte of the compressed data and separate out the lower 4 and upper 4 nibbles by masking or shifting , and compare with the indexes of the master array to look out for the character what that nibble meant and save that character in another file in order to create an uncompressed file from your compressed file.

Taking the same example:

For our first byte of the compressed i.e. 00000001

first we AND it with 11110000 to get the upper nibble since we know that upper nibble represents the index of first character and then compare that obtained nibble with the index of the master array and the element with the corresponding index is the first character , save it in another file that represents your uncompressed file .Then for the second character AND that byte from compressed file with 00001111

in order to get the lower nibble since it represented the index of the second character . As done for the first character look for the index in the master array and save that character in the same uncompressed file .

First byte of compressed data :00000001

multiply with 1111000 we get 00000000 i.e 0 which is the index of ‘h’ . So we get h as the first character of our expanded file that is same as the original first character.

Similarly

on multiplying with 00001111 we get 00000001 i.e 1 which is the index of ‘e’ . So we get e as the second character of our expanded file that is same as the original second character.

So following this procedure we could revive all our original data from the compressed data using the key i.e. The master array which we have created earlier.

In this way we can use our storage device with small storage capability to store large amount of data.

EmbLogic's Blog

Blog Members Area

Embedded Systems Trainings

Industrial Trainings

Android System Development

C with Linux

C++ with Linux

Linux Internals and System Programming

Linux Device Drivers

Embedded Applications with ARM

Embedded Systems using 8 bit Controllers

Blog Categories

Article on multiple data compression and expansion using iterative technique by sumit sharma(E-34)

Leave a Reply Cancel reply