Monday, November 19, 2012

Baseball ELO fun

This is mostly just a placeholder for now, I ran all the game results for the 2012 season through a simple ELO generator I wrote. Here are the preliminary results:

American League      National League 
OAK 1320.1           ATL 1266.43
BAL 1286.49          SF  1266.01
LAA 1274.66          WAS 1261.66
TB  1271.04          CIN 1238.35
NYY 1270.66          MIL 1230.31
DET 1249.01          PHI 1226.64
TEX 1237.41          SL  1223.88
SEA 1198.28          LAD 1219.55
CWS 1189.35          SD  1216.5
TOR 1175.02          ARI 1188.84
KC  1164.9           NYM 1139.73
MIN 1126.35          COL 1129.06
CLE 1116.02          FLA 1121.09
BOS 1105.49          PIT 1116.45
                     HOU 1089.87
                     CHC 1080.85

These numbers are a straight win/loss record ELO with a K-rate of 15 and nothing else taken into account. I'm going to try and find time to make some sensible adjustments for things like home/away advantages and possibly scale the K-rate as the season progresses.

Nate Silver did something very similar a while back that can be found here: http://www.baseballprospectus.com/article.php?articleid=5247 but hasn't really been kept up to date that I found, there may also be some additional objective measurements that can be used to improve it now.

Data from the one and only retrosheet:
http://www.retrosheet.org/gamelogs/index.html

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Edit: charts and graphs as promised

Charts and Graphs:

Distribution

ELO Distribution grouped by 25 points
The first graph is a distribution graph. In games like chess we'd like to see a nice bell shape indicating a normal distribution with a few really good players and a few really bad ones and the majority somewhere in the middle. For baseball though, that doesn't quite work. There could very well be 50% of the teams that are heads and tails above the rest and evenly matched amongst themselves. Trading or not signing stars in hopes of building for the future can leave a significant number of teams near the bottom as well.

Method

I initially did 2 Distribution graphs, with the first one grouping the teams by 50 points and the second grouping them by 25 (shown above). While the first one looked like a bell curve with a chunk taken out of it the second is much more interesting as it really shows the divide better. The groupings were done starting with the indicated ELO and including all teams at that level up until the next one, so a team with 1149 would have been counted in the 1125 group in the above chart.

Analysis

There really seem to be 2 or 3 groupings of teams. Some (~9) bad teams, some(~8) average teams, and a lot (~13) of good teams, including one outstanding team: Oakland. This is more or less what I would have expected.

More to come...

Here's a sneak peek for AL East fans (you can use the slider on the bottom or zoom in out in the top left):