The perfect compression algorithm

Jan 09 2010

A story from my college days.

I stormed in with full excitement to the room of two of my friends. I had found the perfect compression algorithm.

“I found a compression algorithm that can reduce the size of any file!”

“That is impossible. What is your algorithm?”

“I cannot share the algorithm right now, but I will tell you once I write a program to compress files.”

“Don’t you know that it is impossible to write an algorithm to compress all files?”

“Yes, but… now I have this algorithm!”

So what was my algorithm?

1. Consider the text to compress as a large number.

2. Find the integer square root of the number.

3. The square root and the error form the compressed text.

For example, let us compress the number 98653784578.

Integer square root of  98653784578 = 314092

Error = 114 (i.e. 98653784578 – 3140922)

314092 and 114 form the compressed text. We just saved 2 characters using our algorithm!

If the original number is of length n, then the square root and the error will be of maximum lengths n/2 each. So this will mean that the final compressed text will be of a maximum length n and possibly even smaller.

Can you find the flaw in the logic?

Looks like I was not the only one who invented a perfect compression algorithm. At least I did not spend 12 years to understand that perfect compression is not possible and I did not get publicly humiliated (not until this post, but who is reading this anyway ;) ). There are various proofs for why perfect compression cannot be achieved.

10 responses so far

  • Joyce Babu says:

    I think you will face the following problem with this logic. In order to differentiate between different integral-error pairs, you can follow two approaches.

    1. You can insert a separator between the integral part and the error. Also you need an separator between two different pairs. Now the problem is that the separator has to be unique and should not be confused as part of the integral part or the error. For that the separator should have a minimum width of n/2+1.

    2. You can set a fixed width for both integral part and error. In which case both integral and error requires a minimum length of n/2.

  • Faiz says:

    I used to think the same way. Instead I was thinking about the storing the combination sequence number (which will be larger than the number itself, if number is greater than 21). Finally my brother helped me understand the concept. If you need to store 1 bit of information, you need 1 bit. There is no other way.

  • Hugo says:

    correct me if I’m wrong, but to me the flaw seems to be in this statement:

    “If the original number is of length n, then the square root and the error will be of maximum lengths n/2 each”

    for numbers where n is uneven, the maximum length of square root and error are (n+1)/2. The same algorithm on the number 10,200:

    int_sqrt(10,200) = 100
    error = 10,200 – 100^2 = 200

    our resulting numbers are six digits long together, one longer than the original number.

  • Fazil says:

    1. Sometime length n/2 is neither correct for root nor error. Consider number 7, needs 3 bits to store. Root is 2, need 2 bits to store. Error is 3 need another 2 bits. Total of 4 bits.

    2. Second challenge is where to store length of root and error.

  • Ben says:

    This doesn’t work for all inputs. You can’t compress single bit input.

  • Ali says:

    Well, not in square but I think there is a way to save some bits if we knew some additionaly logical information.
    e.g. in single precision floating point the significand (mantissa) is 23-bits but bit-precision is 24-bits because you know that the significand always begin with 1. Thats the way it should go.

  • Ali says:

    We need logical information that does not have to be written, we should know it in advance. I actualy don’t know how, it has to be invented, but I image it in that way –> Compress a file as good as possible, than invert every second byte and compress it again. So we don’t need to save the information about byte-inversion, we just know it.
    (inverting the bytes is to easy, i wished I knew the right way)

  • hohums says:

    I think what you do is use string theory to compress the data into another dimension. Since the data is uncompressed in another dimension and doesn’t exist in your dimension its zero bytes… but there is a problem… how do you get the data back when you need it? You need to store something about where it was stored and what dimension.

    So what you do is store it one dimension over in the same spot on the same harddrive. Then you can always retrieve it. The problems is that inevitably some other dimension will try to use your HD to store some sort of useless information on it. Therefore you need to make sure that the HD is properly shielded from inter-dimensional-data-hijacking.

    There you go… zero byte compression. So simple.

  • Ali says:

    hohums, I see you watched too long the films like matrix or the-one.
    But I’ve got some news for you, the earth is a globe!

  • Who took 12 years to figure it out? Not sure if that person was working on the same dimension.

    I think like Ali said, you may have watched too much matrix or science fiction stuff. Maybe this is breakthrough but I have yet to confirm that it is. Neat concept though.

Leave a Reply