Pinball Rating (Elo) - New England Pinball League

timballs · October 14, 2015, 4:08am

Hi folks. I thought I’d share something I’ve been working on recently.

I’ve been playing in the New England Pinball League for my first time this current season and wanted to try to see if I could use the data they have from the past few seasons on http://nepl.league.papa.org to rate players based on true head-to-head results. Currently, the IFPA rates players based on simulated head-to-head results.

The format of a New England Pinball League meet is the following:

Players are placed in a 4 or 3 player group. They play a total of 4 matches against all other members of their group. Each player receives a certain number of points based on their standing after the game is finished (usually 5-3-2-1).

If a player plays for all 8 weeks in a 4 player group, they potentially have 8 weeks * 12 head-to-head matches against individual players in a regular season for a total of 96 games. In Season 5 of the New England Pinball league, Mitch Curtis played 80 head-to-head games (4 player games counts as head to head against 3 opponents) and came out with a record of 55-25 on the season. Mitch also wound up in 1st place for the league. Based on the IFPA rating system, Mitch’s record is 67-0 where he defeated every opponent 1 time. Winning over 2/3rds of your games is a huge feat in most sports with considerably long seasons. Going undefeated over 67 straight games? Nearly impossible.

In the case of a league like SuperLeague, where playing a full month’s entry will guarantee you finish ahead of hundreds of players, the rating can become very skewed and artificially inflated. The skewing of rating is obvious because of the simulated victories. The inflation comes when a walk-in plays 1 game, is assigned a rating of 1200, and instantaneously loses to 300+ players when results are processed.

Of course, the task of finding the true rating of every player by taking every qualifying result, and every head to head result is very difficult. That’s why I’m just doing this for fun

So, without further rambling, here are some numbers!

Google Sheets link to Season 5 NEPL with Elo rating

This is a very simple, no-frills implementation of Elo. To start, every player is assigned a rating of 1200. I use a constant of 30. Play ratings do not change over time and are only affected by results.

NEPL has 5 more seasons which I will be adding shortly. When they are up, I will add them to this forum post.

And here’s a chart!

This chart shows the average Elo of players each week based on which division they qualified in. Thought it was interesting to see how close the bottom 3 divisions were compared to the top 2.

On my short list for more analysis:
Rest of NEPL Seasons.
PPL Seasons (same format)
Pinburgh results (can’t get true head to head results, but can estimate expected win/loss every round.)

Would love to hear any comments, suggestions, and ideas!

haugstrup · October 14, 2015, 3:43pm

This is super fascinating! I’ve got a database of 3600 match results for the San Francisco League (will be 4000 results by December). This makes me want to calculate ratings for all our players as well

haugstrup · October 14, 2015, 3:44pm

I’m math-challenged so the answer might be obvious, but… why Elo instead of Glicko (the ratings system IFPA uses)?

timballs · October 14, 2015, 5:43pm

Glicko is probably a better rating system, but a little more complicated. Pretty much my choice of Elo for this came down to how simple it is to implement.

With Elo, since everything is an absolute value, it’s easy enough to find the expected # of wins against any opponent when you have the true results. When you get to Pinburgh where you have only a player’s record to go on, you can compare a player’s actual win/loss to an expected win/loss based on the sum of expected wins and losses to every opponent.

In the future I’ll try to add more rating systems and hopefully some cool tools to compare players.

bkerins · October 14, 2015, 6:46pm

Glicko is better but harder to compute and has a time dependency, so the dates of all the matches need to be known.

Want some Glicko code? I’ll happily send it to you. It’s in Pascal Kevin Martin wrote the version for PARS, which has the same formulas.

timballs · October 14, 2015, 8:37pm

Never used Pascal but sure! What i’ve got right now is a bunch of spaghetti python code. Even with html parsers it can be quite messy to parse some of this stuff (why are there trs inside trs?)

There is information for the week when these matches are played. If I’m going to use a player’s ELO across mutliple events (pinburgh and PPL for instance) I will need some sort of time dependency.

Anyhow, probably won’t make much more progress on this until after Expo

haugstrup · October 14, 2015, 8:46pm

Same thing, I am interested in seeing the Pascal implementation even if I don’t know where to get a Pascal runtime

Tim, there’s a python lib for you here: https://code.google.com/p/pyglicko2/ (and some additional implementations in other languages at http://www.glicko.net/glicko.html )

joe · October 14, 2015, 8:50pm

This is cool.

The League Manager software has an unpublished API (HTTP/JSON) for lots of player and machine stats… I’d been holding off publishing it (read: documenting it ) until 1) I was sure the v2.0 software was stable, and 2) I finished some other more important tasks. But both of those concerns are well in the rear-view mirror, so I’ll look into doing this soon.

timballs · October 14, 2015, 9:06pm

Cool! I really like the features that the league manager software has when it comes to keeping track of scores and all results.

I believe that there can be some cool stuff to come from looking at scores on individual games, eras, and stuff to find trends that may not be obvious just from observation. It’s nice since the league manager software does have all that data already in there.

One potential use for this information is to look at the strength of groups and see who has the easiest or most difficult time qualifying. I’m particularly interested to see how this will look in Pinburgh groups, if there is perhaps a strategy where deliberately losing makes you more likely to win. Not that I would ever be able to play that way, based on my pinburgh performance I was losing plenty by trying my best

bkerins · October 14, 2015, 9:29pm

FYI I recommend Glicko and not Glicko-2, which adds some weird parameters designed to deal with things that don’t happen in pinball (people playing a ton of matches continually, like they might in an online fighting game).

haugstrup · October 14, 2015, 9:43pm

So I have to write my own implementation anyway. Oh well, should be a fun project

spraynard · October 15, 2015, 12:12am

Here’s an R Package that does elo and glicko

Haven’t used it yet, but looks pretty straightforward if you know R

spraynard · October 15, 2015, 12:47am

Any opinions on Stephenson rating system? Apparently it beats out Glicko in terms of accuracy.

bkerins · October 15, 2015, 2:44am

I think Stephenson is rather specifically-tuned to chess, it tries to include the “home field advantage” effect of playing as white, by adding a parameter determining that home-field advantage for each player. It would be possible to generate pinball ratings using Stephenson instead of Glicko; realistically there is not much difference between them, and much of what Stephenson does well is at the very high end of the rating scale (using the rating system as a ranking of the top 15-20 chess players).

The formulas are publicly available (and most are the same as Glicko), so if somebody wants to try it, go for it!

timballs · January 4, 2016, 3:01am

I’ve updated the tool so I can now run all league manager-run leagues from start to finish. Spent a lot of time making this spreadsheet to try and find a good way to run the results. @bkerins is considered the strongest player in the league and currently has the highest ELO of any league player ever.

Spreadsheet with full NEPL results (seasons 5 - 10) here.

Link to full sheet

timballs · January 4, 2016, 3:11am

I can run PPL with this tool and will do so soon if anyone from PPL is interested. FSPA seems to run a little differently so I will need to change the tooling to work with it. I’m not sure if there are any other leagues which run with the league manager software. @joe ?

joe · January 4, 2016, 3:49am

Oh, there are around 30 leagues using the League Manager system. I’m not sure what data you’re scraping… The Meet Results have some variation across league formats, while the Stats pages are more uniform.

joe · January 4, 2016, 3:51am

BTW, at Bowen’s request I’ve got some work in progress to add ratings to the Meet Standings for NEPL… I’m just behind on finishing that up.

timballs · January 4, 2016, 3:55am

Scraping meet results so I can get the week-to-week head to head results, but I’m using an HTML parser to do so

joe · January 4, 2016, 5:17am

@timballs, you might find this URL much easier to handle for your needs:

http://nepl.league.papa.org/stats/6/8-8?sections=winLossByPlayer

The “6” in the example is the season number… the “8-8” is the first and last meet number you want incorporated in the stats, but if they’re the same, you get the results for just that meet… so that URL will give you head-to-head results for just season 6, meet 8.

BTW, there’s also an API (HTTP/JSON) lurking in the system that will be far nicer than HTML scraping for stuff like this… I just need to finish up some testing on it and, more painfully, write up some documentation… but it’s coming… so many projects, so little time!