Archive for: February, 2010

Wanted: Superman

Feb 08 2010 Published by Niyaz PK under Programming

Some time back, I could not resist sending the following reply to this person at the recruitment agency:

Dear Sandhya,

Is this a joke? Did anyone really go through the so called assignments before forwarding them to me? Are they asking me to build two full-fledged websites in 12 hours? Best of luck finding candidates who can do that.

BTW, if you can get candidates to build these sites for you, why do you have to recruit them? I have to admit, this is an excellent way to get your works done for free.

I am sorry. I genuinely not interested in working for such a stupid company.

(Thanks for contacting me with the offer though. Really appreciate that. Let me know if there are openings in companies which are not looking for superman as their programmer.)

- Niyaz

I am refraining from attaching the two assignments they asked me to complete before the interview. Not because I don’t want to expose the actual company that tried to trick me into building the websites for free, but because when you see the requirements which will make Facebook ashamed of themselves, it will make you sick for the rest of the day.

4 responses so far

The flow of PageRank

Feb 02 2010 Published by Niyaz PK under Internet, Math

You may be familiar with Google’s PageRank technology. Google considers lots of variables to calculate the PageRank of your website. This is a discussion on an extremely simplified version of PageRank.

Assume that we rank the websites by the number and quality of incoming links. The quality of an incoming link is defined as a function of the PageRank of the site which link to the other.

Let us take an example. The following figure shows how a small group of websites link to each other.

Note that website F does not have any incoming link while website G does not have any outgoing link.

Now from the given graph of links we have to find out the (relative) PageRank of each of the websites. Initially we will assume that all the pages have the same PageRank. Now we count the number of incoming links to each site and change the PageRank according to the number of incoming links.

We define PageRank of site A as:

PR(A) = ? PR(x)/L(x)

where L(x) = number of outgoing links in site x

and x denotes the sites linking to A.

When you run this algorithm for the first time, the PageRank of all the pages get updated. Now the problem is that since the PageRank of all the incoming pages have been updated, we have to re-calculate the PageRank of the pages again to take the new PageRank values into consideration. (You can predict this problem just by noticing that the function is a recursive one.) The same problem surfaces in every iteration of the algorithm.

The question is that if the PageRanks change in every iteration, how do we know when to stop the iteration? Do the PageRanks ever stabilize? (The proper term is convergence).

Here is a python script to simulate the PageRank calculation many times over and over to find out whether the values converge or not. The output values are represented as percentages. (Google considers this value as the probability of a person visiting any particular website).

The chart below shows how the PageRank changes after each iteration:

As you can see the PageRanks fluctuate highly in the initial iterations and then they stabilize. This means that the PageRank function converges.

Another think to note is that adding more nodes to the graph did not seem to affect the convergence. Even if you double the number of sites in the collection, the number of iterations taken to converge stays almost the same. Others have also reached the same results (ppt). The PageRank function is analogous to electric current flowing through a mesh. Even if there are a lot of nodes and sources, the current flow stabilizes (and stabilizes really fast).

Also note that site D has the highest PageRank, which is to be expected as it has the most incoming links. Site F has the lowest PageRank because it does not have incoming links.

According to this algorithm, linking out to other sites do not reduce the PageRank of your website. There is a problem though. Take the case of site G. It does not link to any other site. This means that the PageRank is not flowing out of site G to any other site. If site G linked to other sites, it would have increased the PageRank of the other sites by a tiny bit. (This case affects only the first link out of any site). To solve this problem, Google divides the PageRank of sites like these (called sinks) to all other sites. You may also want to read about Damping factor.

Before leaving can you explain why the PageRank of site A is greater than that of site B?

5 responses so far