October 05, 2005

Medal Predictions Part II — The Per-Capita Olympics

Hmmm, well, somebody's asking for the story about medals-per-capita. Here it is, but remember, you asked for it!

In early August, 1991, I was in Havana — I don't recommend that, by the way, Havana in August — for the Pan American Games, which had really just begun but were already over for me (aside from a few consumable souvenirs and an impending stomach ailment), and I was watching television in the common room of our cinder block dormitory in the sweltering and barely completed athletes' village — while keeping one eye on the stairway in case Annie Pelletier walked past — watching television in Spanish, and muddling through because I had taken two semesters at SFU, and also because I had spent the previous four days attempting to hold a conversation with our interpreter, Rosa, and on the news I saw that the Pan American Games were magnificent, the Cuban team was unstoppable, and the Great Leader (who I had shaken hands with, incidentally) was omnipresent, and when the anchor discussed the early medal standings he had a bar chart in the graphic over his left shoulder, and he said excitedly that "Cuba holds a slight lead over los Estados Unidos" — as I said, it was early days, and Cuba had cleaned up in canoe/kayak — "with Canada a distant third," and I thought that it was nice that the home team got to be in the lead for a couple of days, but then he raised his voice almost to a shout, "pero if we consider the number of medals per million inhabitants," and the bar graph changed to show Cuba like an enormous phallus towering over second-place Canada and third-place USA, cringing in the shadow of the superpotencia cubano, and I had to admire that, making up their own scoring system for the glory of Cuba and (more importantly) making the United States of America look like a whipped dog.

The medals-per-capita argument was not invented by the Cubans, of course, and it is actually fairly popular; I have seen Canadians use it during the Olympics — here is a great example, with a table, that shows that Tonga won the 1996 per-capita Olympics, and here's another and another that popped up in 2004 — and recently my local paper carried an article purporting to show that Nova Scotia really "won" the Canada Games based on a per-capita medal table.

Any time that you reach the conclusion that Tonga is the world's leading Olympic performer, followed by The Bahamas, and that the US is only forty-first, you might want to sit back and think a little bit about your methodology. In this case, you're assuming that the expected number of medals won should (in some ideal world) scale linearly with population. (It's a bit worse than that, even; you are assuming that the expected number of medals is directly proportional to the population, but never mind.)

So why shouldn't the medal total scale linearly with population? Here are three reasons that I can think of.

  • Olympic representation isn't proportional to population. If it was, then one Tongan athlete (pop. ~100,000) would be matched by 3,000 US athletes (pop. ~300,000,000).
  • The supply of medals is finite. For the US to match Tonga on a per-capita basis (10 medals per million inhabitants in 1996) they would have to bring home 3,000 medals, which is more than the total number of medals awarded.
  • Medals are extremely scarce. Medals are awarded only for the top three finishers. Even if your country boasts the top 50 performers in the world in any event, you can still only get three medals at most. In some events you can only get one, no matter how deep your talent pool may be.

All of this suggests that there should be a law of diminishing returns; that is, that doubling your population should not lead to a doubling in Olympic medals, and in fact the larger your population is, the more you have to increase it for the same marginal benefit.

Figure 1

Summer Olympic medals (1996-2004) versus population

Figure 1 — Summer Olympic medals (1996-2004) versus population, in three different presentations (click to enlarge).

If you really must model Olympic performance with a single-parameter model, and if population has to be the single parameter, then you probably can't do much better than to assume a power law dependence. It still doesn't predict medal performance very well, mind you, but it's better than all of the alternatives. You can see what I mean in Figure 1 (inset). That plot shows the average number of summer Olympic medals won from 1996 through 2004 versus population. The details of the calculation are in the full caption for the enlarged figure, but here's the important point: a linear dependence is a terrible fit to the data, a logarithmic dependence is somewhat better, and a log-log (power law) dependence looks like something vaguely approximating a straight line. The "power" in the best power law fit turns out to be almost exactly 1/3, which is to say that the number of medals won is roughly proportional to the cube root of the population. If you want to double the number of medals you win at the summer Olympics, you need to increase your population by a factor of 8 (presumably without diminishing your per-capita resources, too). If you want to increase your medal total by 10%, then your population has to go up by a third.

I'm sure that my anonymous commenter is, by this point, regretting that s/he brought this up, but I might as well finish my thought here. Let's use this model to normalize the number of medals by population. We can look at recent history and see if my approach gives us anything more sensible than the per-capita analysis. We can compare nations on a population-corrected basis by dividing the number of medals won by the cube root of the population. Table 1 shows the resulting top 25, again using the average medal totals in 1996, 2000, and 2004 to construct the score. The scores are scaled by 100 just so that they look a bit nicer.

Table 1 — Top 25 Summer Olympic nations, 1996-2004, by population-corrected medal score.
CountryTotal MedalsPopulation (millions)(Medals/3)/Pop.1/3 ×100
Australia 148 19.9 18.3
Russia 243 144.0 15.6
United States 301 293.0 15.2
Germany 169 82.4 13.0
Cuba 81 11.3 12.1
France 108 60.4 9.2
Italy 101 58.1 8.7
Netherlands 66 16.3 8.7
Hungary 55 10.0 8.5
South Korea 85 48.2 7.8
Romania 64 22.4 7.6
Belarus 47 10.3 7.2
Bulgaria 40 7.5 6.8
Ukraine 69 47.7 6.4
United Kingdom 73 60.3 6.2
Greece 37 10.6 5.6
China 172 1298.8 5.3
Canada 48 32.5 5.0
Norway 23 4.6 4.6
Japan 69 127.3 4.6
Spain 47 40.3 4.6
Sweden 27 9.0 4.4
Jamaica 18 2.7 4.3
Czech Republic 27 10.2 4.2
Poland 41 38.6 4.1

That's a fairly sensible list, it seems to me; there are no outrageous surprises here. The large-population Olympic powers — USA, Russia, China and Japan — all remain in the top 25. The small-population overachievers — Cuba, Hungary and others — are in the top 25, too. Two countries that come out on top in the per-capita analysis, The Bahamas and New Zealand, end up 31st and 29th, respectively, which seems reasonable. Tonga finished 61st in the three-games average, but would have cracked the top 40 in 1996 when they won their only medal.

I've glossed over a lot here, and I won't claim that this the world's best predictive model for Olympic performance by population. But as an assessment of how well a country does for its size, it's a lot more reasonable than the per-capita method, and it isn't much more difficult to calculate. On top of that, it still makes Cuba look pretty darn good. And I think Rosa would like that.


Anonymous said...

Great analysis, Amateur. Your blogging is getting better every day!

Amateur said...

Thanks, Smithers. I have not been at this as long as you have, but like you I am aiming for some kind of maturation. It is nice to have some external feedback.

Amateur said...

Unnecessary complication: if you divide the average number of medals by the cube root of the population in millions then you don't need the scaling factor of 100. In other words,

148/3/(19.9^(1/3)) = 18.3.

Host PPH said...

It is quite hard to understand Cubans if you don't have so much Spanish listening skills.