A quick little puzzle

Jan 27 2010

Here is a puzzle for you:

In a group of people, 40% are men and the rest are women. It is also given that in this group 40% of the men and 10% of the women are obese.

What is the probability that an obese person in the group is a man?

When you find the solution, post it as a comment or send it to me. The solution will be explained in a later post.

15 responses so far

Founder equity

Jan 19 2010

A question about splitting equity between co-founders:

I envision a 51 – 49 split. Do you believe this to be fair?

A 50-50 split will bring more trust and synergy in the startup. In the long run it will far outweigh any (non-existent) advantage you think you will have in keeping the 1% share to yourself.

If you have to use the 1% share to end an argument, you have already lost. In deadlock situations flipping a coin will be better than using the 1% share to force your choices every time on the other founder. Remember the saying: Win an argument and lose a friend.

If you haven’t done any significant work on the startup before the other founder join, give them equal share.

No responses yet

Chess programming and such

Jan 16 2010

Another story from my college days. Be warned, long article.

Engineering colleges in Kerala require students to submit a mini project in the third year of their computer science and engineering courses. This is in addition to the main project which is to be submitted in the last year of the course.

Two of my classmates – Praveen Kumar (who in our circles is regarded equal to Jon Skeet), Philip and I formed our team for this mini project. We were very much excited about this project that we started discussions about it a lot earlier than the official project start time. After a lot of late night debates we decided to develop a chess playing program.

Three of us were sharing a rented house near college and we used to develop software utilities for various purposes#1. The main point to note here is that we developed these programs in – wait for it – Visual Basic 6.

We were very much comfortable in Visual Basic, and since it is very easy to develop a good chess UI in Visual Basic, we started building a prototype of the application in VB6. We thought that once the basic logic is done and working, we would port it to a better and faster language. Unfortunately, we worked on the code for very long and hard that the code base grew larger and longer.

Testing the program was somewhat tricky. The alpha-beta based algorithm was in a working condition and the program was generating and making some basic moves, but we were facing two problems:

Problem 1: How do we know if the computer has played correctly? There is no way to really make sure that the computer has played the correct move (for any given depth) because we ourselves don’t know the correct move! Of course if you are a good chess player you can find out a good move for a particular board position, but that does not solve the problem. First of all you cannot be sure that your move is the best move. What if there were better moves that you just did not see? Secondly, chess players do not think exactly like a computer. They do not calculate the moves strictly using an algorithm. Some of their decisions are based on intuition.

The very best chess players can just look at a chess position and see the 2-3 lines of play that they should analyze instead of analyzing all the 30+ available moves. Computers cannot do that. Computers should look at all the moves to decide whether to analyze that line of play to more depths or not. Humans are good in chess strategy while computers are good in the tactics. This means that a human cannot tell whether a computer is playing the perfect game (for any given depth) or not.

Problem 2: Since our implementation was not complete and no optimizations were made initially, the computer would take very long time to think and make its move. We had to wait something like 4 minutes or so to get each move from the computer and it was taking up much of our development time. Verifying the correctness of the program was going to take a lot of time.

One good way to test a Chess program is to make it play against itself#2. So we would code till late nights and in the mornings when we go to college we would start the program and make it play against itself. When we come back we would look at the logs and see what happened. As much as excited we were, the results were often depressing. Most of the time the game will go into a loop where each player played (back and forth) the same moves. The reason was that once the chess AI#3 found the best move for a particular board position, it just plays the move disregarding any other facts like the history. The best way to fix this is to tell the program to consider the move history too in making its decision about which piece to move.

Another reason for frequent loops was that both the players in the game were of the same strength (they were both thinking x moves ahead). To fix this and to see some real results we made them of different strengths. One of the players will think x moves ahead while the other would think x-2 or something like that. This improvement helped us see some real results. When we came back to check the results at the end of the day, we could see one of the players checkmated!

If you are into computers, chess programming is one of the best ways to have fun. There are a lot of intricacies in the implementation so that you can tweak the program again and again to get performance improvements. The best chess programming tutorial (for beginners) I have come across is the one from GameDev.net.

I remember one other small bug we faced. Here is how a computer chess program finds out the best possible move:

Find out all the valid moves available to the player. Assume that you played the move and find out the new board position. Now look at the new board position from the perspective of the opponent and try to figure out what move he is most likely to make. If he makes that move, what will the new board position look like? What move will you make to counter his move?

You can go on and on iteratively deeper and deeper into this search tree#4 . The deeper you search, the better your decision (move) will be. For each board position the algorithm will assign a score. The score determines how good the board position is. If one of your pieces is captured, you lose that much score. For example if you lose a pawn, you lose 1 point.

Now look at the following game tree. Blue represents a move by the computer and red, a move by the opponent. The numbers indicate the points gained by the computer in each step.

For example in the first line of play, the computer gains 5 points in the first move (probably by capturing a rook) and in the counter-move the opponent fights back by gaining 3 points (probably by capturing a bishop/knight). Now as you may remember, the algorithms just looks at the total points of the player at the end of each line of play. In both of the above cases, the total gain by the computer is 2 points.

If the computer chooses between these two lines of play randomly, you are in for trouble. The problem is that the computer does not know what happens after the last move in the line of play. This means that if you follow the first line of play you are almost certain to have gained 2 points after the next two moves while in the second line of play you will lose 3 points after the next two moves. May be you can gain 5 points later in the game, but what if after two more moves when you can see further in the tree you realize that the move you saw earlier would be disastrous? Then you will have to change your game and this means that you cannot get those points.

There are two ways to solve this issue. The first is to prefer early points to later ones. In our example, it is better to choose the first line of play over the second one.

The second way is to use quiescence search:

The horizon effect can be mitigated by extending the search algorithm with a quiescence search. This gives the search algorithm ability to look beyond its horizon for a certain class of moves of major importance to the game state, such as captures.

We did a lot of modifications to the code. We used to take the printed versions of the source code (about a 100 pages) to our college and tried to find optimizations that can be applied to the code. Chess programming is one of the areas where over-optimization is not frowned upon.

So after all the hard work, the program was working fine and it could play a decent game of chess (given enough time of course).

Translating the source code

We were just about being happy at how the project was going on when it hit us – the college put a restriction that all projects must be done using Java. We had to port the application to Java soon.

Rewriting the whole application from scratch seemed time consuming and demotivating. We needed a faster way. What about converting the source VB6 source code to Java code automatically? Of course for this to work we would have to write a VB to Java language translator. It seemed too difficult considering the fast approaching deadline for project submission. We downloaded some code translation software from the internet and tried them out, but none of them seemed to work perfectly. Converting a complex VB6 UI having a fair amount of custom animations to Java is almost impossible even for an advanced translator.

Then we had an idea. Regex!

Of course Regular Expressions are not ideal for parsing source code of any kind. We had no time, so we decided to try anyway.

In the case of any decent chess playing program, there are two basic parts: a chess engine and the UI module. The chess engine does all the complex calculations – recording the user moves, finding out whether a move is valid or not, thinking about the best move for the computer etc. The UI module displays the chess board to the user and allows the user to make a move using the mouse. Now as you may have already inferred, the UI part is not very translatable between VB6 and Java. We decided to develop the UI from scratch in Java.

The interesting thing about the chess engine is that it contains lots and lots of calculations, conditions and loops which help it to make an intelligent move. I should probably note that a chess engine cannot be considered intelligent as such. A chess engine crunches millions of number very fast and finds out the best move the computer can make against a human opponent. The moves may look intelligent to a human player, but a chess engine cannot be regarded as an example of an AI engine. It is just a number cruncher beneath its layers. Anyway the point is that we had hundreds of pages of code that consisted purely of algorithms which aided the computer in playing a better game of chess.

As we found out, converting these algorithms from VB6 code to Java was not that hard as it seemed. We employed the powers of Regular Expressions to do that. Here is a glimpse of what we did.

1. Replaced:

If with if(
For with for(
Then with {
Else with } else {
End if, Next, End Function etc with }
True with true
False with false
And with &&
Or with ||
Mod with %
Exit For with break;
= in conditional statements with ==

etc.

2. Added ; (semicolon) at the end of lines which did not start with any of the above keywords.

3. Changed the array referencing code. Something like Board(x,y) in vb6 code became board[x][y] in Java code.

4. There were some complex conditional statements in the VB code that we thought would be impossible to convert to Java using Regex. We had to convert these manually.

5. Converted looping statements of the form

For i = 0 To NumMoves - 1 to

for(int i = 0; i < numMoves-1; i++){

6. Variable declarations of the form

Dim index As Integer to

int index = 0;

There was more of this kind.

Of course it was not all Regex stuff. It is impossible to get it right with Regex alone. The small program we wrote to translate the code read each line of VB code separately and converted it to Java, mostly using Regex. The whole translation (Regex replacement) was done in several passes. We never implemented a full-blown translator or language parser.

Even after this entire Regex circus the translation was not complete. We had to read through most of the code and change a lot of things to get it working correctly. There are many differences between the languages (like array indexes started 0 in Java and 1 in VB6) and those needed to be taken care of. It did take us a bit of effort to get the details correct, but finally we completed converting the code to Java with much less effort than completely rewriting it.

Please don’t let this article make you think that I advocate using the Regex for this type of work. I don’t.

#1More on that later. Remind me if I forget. I promise that this will change the life of many undergraduates. ;)

#2Another good way is to pit it against better chess engines.

#3I included the word “AI” there just so that I could write this note. Chess playing algorithm cannot be considered Artificial Intelligence. Computer chess is a number crunching problem. Implementing the algorithms in the most efficient manner is the best way to get good results. There is no intelligence involved (compared to real AI techniques such as Neural Networks).

#4It is limited by the computing power you have. On an average the number of valid moves you have for each chess move is about 30. If you are going to search 8 moves ahead, you will have to search through 30^8 nodes, which is a very large number.

10 responses so far

The perfect compression algorithm

Jan 09 2010

A story from my college days.

I stormed in with full excitement to the room of two of my friends. I had found the perfect compression algorithm.

“I found a compression algorithm that can reduce the size of any file!”

“That is impossible. What is your algorithm?”

“I cannot share the algorithm right now, but I will tell you once I write a program to compress files.”

“Don’t you know that it is impossible to write an algorithm to compress all files?”

“Yes, but… now I have this algorithm!”

So what was my algorithm?

1. Consider the text to compress as a large number.

2. Find the integer square root of the number.

3. The square root and the error form the compressed text.

For example, let us compress the number 98653784578.

Integer square root of  98653784578 = 314092

Error = 114 (i.e. 98653784578 – 3140922)

314092 and 114 form the compressed text. We just saved 2 characters using our algorithm!

If the original number is of length n, then the square root and the error will be of maximum lengths n/2 each. So this will mean that the final compressed text will be of a maximum length n and possibly even smaller.

Can you find the flaw in the logic?

Looks like I was not the only one who invented a perfect compression algorithm. At least I did not spend 12 years to understand that perfect compression is not possible and I did not get publicly humiliated (not until this post, but who is reading this anyway ;) ). There are various proofs for why perfect compression cannot be achieved.

10 responses so far

Finding the number of votes

Jan 09 2010

A little bit of math fun.

You visit this cool website and see a poll there. You want to know how many people voted in the poll. The number of people voted is important if you want to validate the results. For example, look at the following result:

If you don’t know the number of people voted in this poll, you will be left wondering what this result means. Does it mean that most people think IE6 is the best web browser? Or is it just that a single person voted in favor of IE6? You can calculate the number of people voted in the poll:

1. Note down the current status of the poll:

2. Cast your vote. Note down the new results:

In the above case I voted Yes. Here is how we calculate the number of people voted:

Let x = % of Yes votes before I voted

Let y = % of Yes votes after I voted

Total number of people voted = (100 – x) / (y-x)

In our example, number of people voted = (100-67)/(75-67) = 4

Remember that the result counts your vote too.

Many websites do not want you to know the exact number of people who voted in the poll (Mostly because there are not many votes). This is the reason why they allow you to see the results only after you have voted (Another reason for this is of course to make you vote).

Usually you can get around this by opening a new session to the server (by clearing your cookies etc) and voting again. Note that if there is no change in the percentages after you vote, it just means that a lot of people participated in the poll  and it is difficult (not impossible) to get the number of votes.

No responses yet

Elegant logic puzzles

Dec 25 2009

Nick Yee on elegant logic puzzles:

…an elegant logic puzzle is one that can be told to anyone age 10 and up and doesn’t rely on gimmicks, but always seems impossible to anyone when first told. In other words, the problem is tough but the solution is satisfactorily simple once explained. The solution doesn’t involve a person or tool that wasn’t explicitly stated in the problem itself…

Then he goes on and asks you two elegant logic puzzles. For a curious mind, both of them are extremely rewarding to muse over and the solutions are clean and elegant too. Go solve them if you haven’t already.

No responses yet

Get cached images from your visitors

Dec 12 2009

Jeff Atwood (Coding Horror fame) was in for a horror when he realized that his server crashed and his data was gone and due to some reason, the backup mechanism was not working. The complete data in Coding Horror and the StackOverflow blog disappeared.

Since his blog is very popular, many archiving systems including the Google cache have copies of the pages and I hope that they have by now recovered the complete textual data. The biggest problem in this case is getting back the images. There are not many archiving services that may have the complete backup of the images in the website.

So what should Jeff do now?

Since Coding horror is a high traffic blog, I think there is a way to get back at least some of the images. (The probability of this working depends a lot on the traffic to the website and a bit of luck)

Here are the steps:

  1. Configure the web server to return 304 for every image request. The HTTP status code 304 means that the file is not modified and this means that the browser will fetch the file from its cache if it is present there. (credit: this SuperUser answer)
  2. In every page in the website, add a small script to capture the image data and send it to the server.
  3. Save the image data in the server.
  4. Convert the pixel data to get the original images.Voila!

Capturing the image data

We are going to use the Canvas functionality in HTML 5 to get back the image data.

Here is the code you should insert into the pages of the website. It gets all the images in the current page, loads it to the HTML Canvas, gets the pixel data for the image and sends it to the server through an Ajax post.

This PHP script (Can PHP rescue Jeff? ;) To be fair, the server side code is trivial) saves the data to files in the server. Note that the files themselves will not be images, they will just contain the pixel data of the images. In addition to this, we are also saving the original file name and the image dimensions. This means that we can easily reconstruct the original images from this data. Data from every visitor is saved in a different file to just to make sure that you have enough redundancy (Watch out for his redundancy filling up your server disks)

Remember that this is a proof of concept code. You will have to modify it to use it in regular production environments and to get some real use from it. There are many limitations to this code. It goes without saying that you will get the image data back from the users only if they have the images cached in their browsers. This script will work only in the latest versions of Chrome, Firefox, Safari, Opera etc. (Don’t ever expect it to work in IE for the next decade). In addition to this, remember that the pixel data will be many times bigger than the original file size and you may have to carefully analyze the disk space usage of this script. (I guess in an emergency, none of these really matters).

You should edit the post URL in the script to match your domain name.

Finally, I have tested the code and it seems to be working (for me, at least). You need to include JQuery in the pages using this script and remember that due to security restrictions in the browsers, you will have to place all these files under the same domain name. Please tell me if there are any other flaws in the code.

[Updated: code changes to reduce the file size by 50%. The decimal numbers were converted to hex and the spaces in between the numbers removed. The file sizes can be further reduced by using the full character set.]

22 responses so far

« Newer posts Older posts »