2012-05-30

2012 Google Code Jam, Round 1B

In the second part of Round 1, with 1000 of the qualifiers ineligible due to already advancing, 3273 people scored points. Only the top 1000 would advance. This put the cutoff at 27 points in 1 hour 19 minutes, or 28 points or more.

I first attempted Problem A: Safety in Numbers.This problem is framed as a reality show that is apparently similar to Dancing with the Stars. Here are the bare facts, with a longer description available at the link.

  • There are N contestants
  • Each contestant gets a score from the judges from 0 to 100
  • Based on audience voting, the second part of the score is determined by multiplying the percentage of audience votes received by the total points awarded by the judges.
    • For example, if there are 2 contestants, scored 20 and 10 by the judges, they will split an additional 30 points based on the share of audience votes, so the final score may fall anywhere from 50-10 to 20-40.
  • In case of a tie for the lowest score, no one is eliminated that week, otherwise the lowest is eliminated.
The judges' scores are given, the problem is to find the minimum percentage of audience votes required to guarantee the contestant is saved no matter how the rest of the audience votes are distributed. In the 20-10 case above, the answer is
33.33 66.67
because if the first contestant gets 1/3 of the vote, they'll get 10 more points for a total of 30, and the second contestant can't beat him, but can only tie him. The same logic applies for contestant 2. a 2/3 vote would give him 30 and make him unbeatable (but tie-able, this is important to remember but I'll stop saying it now).


I struggled for awhile with this problem, first thinking I just needed to split the audience vote between the each contestant and the worst judged contestant, which works in the 2 contestant case.  Another sample set was:
24 30 21
with a stated answer of
34.666667 26.666667 38.666667
This one flustered me because my algorithm said that giving 34.6% of the 75 audience points to #1 yielded a total of 50 points, and giving the other 65.3% to #3 yielded a total of 70, meaning #1 would lose. After a few minutes, I smacked myself in the forehead and realized #1 would be safe, because #2 would be stuck at 30 in that case. This led me to a closer solution, that each contestant needed enough audience votes to get to 1/N of the total points. So for this case with 75 judge points (and 75 audience points), that meant percentages that led to 50 total points would be the correct answer.

There was one more subtlety, what if a contestant already had more than their 1/N share without any audience votes at all? For example:
45 1 1 1
In this case, there are 96 total points, so each one would need 24 based on the previous assumption. But giving 25% to #2 would yield for him 13 total points, over the required 1/N, but #3 and #4 could each get 37.5% and beat him. I modified my algorithm again to ignore any contestants, and their points, if they were already over the 1/N threshold. The rest of the contestants would need to reach, in this case
1/(N-1) * (96 - 45) = 17 points
This yielded an answer of
0 33.3 33.3 33.3
which makes sense. Contestant #1 is already safe, and the others are on even footing, so splitting the points 3 ways keeps them all safe.

I used this algorithm to solve the easy problem, but as it turns out, it wasn't exactly correct. There were more subtleties around having lots of point values near the 1/N cutoff, and still more that were higher than the new cutoff of 1/(N-1) of a lesser value. As a result, my large solution failed. The problem author suggested using a binary search in the post-contest analysis, which was way off from what I did.

I spent lots of time on Problem A, so then decided to go straight for Problem C: Equal Sums, as it was worth a few more points than problem B if solved in full, and I thought I might need them.

This problem has a very simple setup, and a very complex answer. Given a set of integers, find two subsets with the same sum. The small input would be a set of 20 positive integers under 10,000. I actually iterated over all subsets, at least until I found an answer. This meant for each set, I could loop over up to 2^20, or 1,048,576 subsets if none had a matching sum. I wasn't sure that was feasible in the 4 minutes allowed, but it wound up working.

For the large input, the set had 500 integers under 1,000,000,000,000. This meant using the same algorithm was likely not an option, as 2^500 is a very large number of subsets to loop over, and I certainly wouldn't be able to store all of the sums in the case that I had to get all the way to the end and find there were no matches. I did some searching online and found that this is roughly equivalent to an NP-complete problem, the Subset Sum problem. I was very low on time by this point so I just modified my original code to loop over all 2^500 solutions, after all, I had 8 minutes. I ran it, and it finished very quickly, causing me to question if it really did what it was supposed to. I had no time to check as the clock was winding down, so I submitted the large input without looking at it.

Just a minute or so later I found out it was wrong. There was another bug in my code that reads the input, as it can't handle numbers greater than 32 bits. Once again, at least I've found something to fix for next year. If you want an in-depth explanation of the solution to this problem, I highly suggest reading the contest analysis for it, but in a nutshell, because of the size of the numbers and the sheer numbers of available subsets, there must be matching sum subsets, and lots of them. It is actually sufficient to choose random 6-number subsets until two with matching sums are found. It seems crazy, but that's math for you. I've actually gone back to read the "Rigorous proof" section of the analysis several times, and each time it makes a little more sense to me.

In the end, I finished with a paltry 16 points, putting me in 2262nd place for this round. Round 1C was the next day at 4am local time, since the Code Jam tries to scatter the start times to be fair to the whole world. I opted not to wake up for that one, so that ends my Code Jam journey this year. 

1 comment:

  1. Good effort, though. Problems like these are hard enough just to wrap your head around, not to mention coding them.

    ReplyDelete