MP Ratings / Challenge Matches Open Thread


You aren’t tied. Germain has a higher rating due to some other play. It says that based on your current ratings, Germain would be 53% likely to win your next match.


You won 29 matches against Lonnie. Header is the base, each row is win-loss against that opponent. I’d love to hear better ways to present this information (that will fit on a phone screen, so a big matrix is impractical).

As Bowen says the win probability is based on your rating and rd and no the individual win-loss record. You can see the calculation on page 5 of the Glicko paper:

This also allows us to calculate win probability for players who have never played each other.


Why do you suppose the ratings are so compressed? The bottom of the list has people above 1200, and the top only has one rating above 1900.

In the chess world, there are people at the very low end rated as low as 100. And on the high end you get Magnus Carlson somewhere north of 2600.

My chess rating is somewhere around 900 now, which means I can beat most casual players I know, but get quickly crushed in a serious tournament setting. My son has played tournament chess for years, and has a 2050 rating, making him an Expert and putting him in the top 100 list for 16 year olds in the entire USA. His rating is more than double mine, and it’s absolutely representative of our relative skill level, in that I have about a 0% chance of beating him.

Anyway, why the compressed range? Seems weird to think of the top players only being rated in the 1800’s, when in chess that would be a decent club player, nowhere near the top. Does the formula just need time to spread out the ratings?

One thing in chess that screws with the rating formula are “rating floors”. Put in to prevent sandbagging (intentionally lowering your rating to win $), the consequence is you have some number of players out there who by rule cannot get a lower rating. So when they lose games their rating stays the same while the opponent gains points.


I leave the chess/pinball comparisons to @bkerins. I just make websites, surely you don’t expect me to understand math!

Most Chess ratings are Elo, right? That’d make it an apples/oranges type situation.


I think it shows that chess is much more a game of skill than pinball.

I’m a better than decent chess player, but if I played a Grand Master I would lose EVERY single time.

However If I played a game of pinball against any of the top players, I may not win every time, but I sure as hell wouldn’t lose every time either. That’s just the nature of pinball.


Except that one game of pinball is in never considered to be close to the equivalent of one chess game.

Unless you even things out by playing enough balls/games that you’d actually make as many decisions as you would in one game of chess, it doesn’t really make sense to make that comparison, which is why tournaments are structured the way they are.

No one is going to bat an eye if you beat a top player on Paragon once. Beat them in cumulative points over the course of 20 games of Paragon? Now you’ve accomplished something a lot closer to one game of chess against a grandmaster.

See also: Number of players who have qualified for PAPA A by luck


So the fact I finished 4th out of 103 at a recent tournament (with a TGP of 85%, making it a ‘valid’ format), with 32 people ranked higher than me (including the world no.s 2 & 4) would be a comparison that works for you?

Or the following day finished 110th out of 203 with 41 people ranked higher than me?

Pinball tournaments/games are far more variable than chess tournaments/games. That’s why the ratings are so variable. It’s also the reason people enter them with at least a hint of a chance of winning them, or finishing significantly above other higher ranked players.

It’s the main reason why I love pinball in general, the volatility of any single tournament/game.


Chess ratings and pinball ratings are not directly comparable. Individual rating values only make sense within the distribution from which they were estimated. Just because chess ratings values go higher than pinball does not mean that the chess players are better at chess than pinball players are better at pinball. Moreover, the values themselves are arbitrary, and very much determined by the starting parameters of the system (I e default ratings and ratings deviation). These parameters will define the entire distribution, which is how you interpret the values.

If you want to make direct comparisons between pinball and chess, you’ll have to standardize the ratings. ( e.g by converting to z-scores)


Not at all. Probably the most important skill in pinball is using your skills to reduce volatility within the game. The format of those tournaments probably don’t reduce volatility to the levels of a match against a chess genius, but that doesn’t mean you couldn’t run a tournament that could.

Would you have fared the same if it would have been a best-of-7 round-robin?

The point is that at it’s definitely possible to determine who is the more highly skilled pinball player, because the more matches you have them play, the more accurate the results. Just because almost no pinball tournaments actually do this doesn’t mean that pinball itself can be assumed to be a game of less skill than chess.

I would agree that the accuracy the average pinball tournament is probably more volatile than the average chess tournament, and that it’s a good thing. I just don’t see how that has anything to do with a maxiumum level of skill with which either game can be played


If I played against Magnus Carlsen in 100 different random pinball games. What odds would you give him to win 0 games? 1 game? 2 games? etc…

Compare that to.

If I played 100 games of Chess against Magnus Carlsen. I give him perfect odds to win all 100 games.

Applying the same formula accross the different sports you’re going to have different values.

From this weekend, Nicole Bernier had a Matchplay Rating of 1422 ±106
I had a Matchplay Rating of 1677 ±86

This gave me an 81% chance of winning our head to head matchup. We played a 5-ball game of Ted Nugent and she crushed me. 450k to 110k. Am I overrated? Is she underrated? Is the nature of pinball such that we can never have large discrepancies between the numbers that make up our ratings?


Putting all the ratings up on a leaderboard isn’t nearly as useful as understanding how Win Probability can be calculated using those ratings. Your rating is a way to represent your skill level on a normal curve, and you can compare that normal distribution to another player you are playing to estimate the win probability in a matchup against that other player.

That’s real boring stuff, and it’s not really fun to talk about, because people are way more concerned about wins and losses and what’s happening in the moment. I’m excited about this project, but not for a new leaderboard. I’m excited that this crucial data is now being collected. I think that more stats help us get a better understanding of how to present pinball and make it more interesting. Maybe initially we will be able to broadcast better matches if we understand which groups are going to be hard–fought. Maybe eventually we get an understanding as to what makes an exciting game of pinball to record and commentate.


I’d argue Chess is a poor comparison with Pinball due to the number of possible external factors. A chess game is purely abstract. External factors can only affect the game by impacting your mental state.

Pinball seems much more akin to a athletic sport in that a huge number of external factors can affect a given performance.

To me, that means any individual performance is way less interesting than an analysis of the aggregate data. It seems likely that as a pinball community, we’re not yet sure about what level of confidence to have in a given ratings deviation.


Nope; it means there is less skill and more luck in pinball.

One way to measure the total amount of skill in a game is to count what are called "80/20"s. Take the best player in the game, and find another player they can beat about 80% of the time (and the other player wins 20%). Now take that player and find another player they can beat 80% of the time. The length of that chain, all the way down to the lowest-performing player, is a measure of the overall skill and variability of the game.

Chess clearly has more, way way more. The world champion can beat a pretty high grandmaster 80% of the time, and that grandmaster beats … etc etc. There are easily 10+ levels of 80/20s in chess. But in pinball, I’d estimate there are probably 4. A world champion beats a very good player 80/20, the very good player beats the average player 80/20, the average player beats an amateur 80/20, and the amateur beats a player with almost no experience 80/20.

Ratings give a picture of this, too. In both chess and in Matchplay’s pinball ratings, the 80/20 is about a difference of 240 points of rating. Right now, the highest player in chess has a 2882 rating and the worst player in chess has around a 400 rating; a range of almost 2500 points. In pinball the range is about 1000 points.

One solution is to artificially inflate and widen the ratings, which could be done easily by changing the numbers (in the same way that 7-5-3-1 and 3-2-1-0 are the same system). The current implementation uses the same overall rating methods that chess does.

Thanks @haugstrup for all your work! This is really cool.


The system decides this in the long term. Every time you lose, you lose rating points, and every time you win, you gain them. If you are overrated, the system will tend to correct itself.

The biggest exception is “pockets” of play where people are only being judged against one another, and it’s difficult to get true information about any of those players until there is cross-play between them and others. But, within these pockets, players will be accurately rated against one another, so the win probability calculations will still be very good.


I had never heard about counting "80/20"s before, and your explanation makes perfect sense. Thank you sir!

I agree, this IS really cool.


Update: The first of the integrations is done. If you are using Challonge for brackets, you can submit the results into Match Play Ratings. Head to and click the “Submit event” button.

If you have old brackets, feel free to submit them! Just make sure the “event date” is registered correctly. If you get a wrong date, shoot me an email with the right date


I see there’s a test submission from FSPA to integrate results from PAPA league manager software. How’s that working out?

Cleveland Pinball League has 10 seasons of data on PAPA league manager that we could contribute. Just let me know if there’s anything we can do on our end to push that your way.


@jdelz Really Soon™ – @joe and I are trying to figure out a way to automate all the historic data so you won’t have to submit your 10 past seasons manually. Ideally I’ll import all historic data from all leagues using the league manager in one go. :crossed_fingers:

Going forward you’ll have to push a button (or something) in the your league manager after each league night. Not sure exactly what it’ll look like yet. We’re making stuff up as we go along.


I accidentally added support for Brackelope imports last night. If you run tournaments with Brackelope or if you have old tournament results from Brackelope head over to to get them submitted.

You can also edit player names and IFPA numbers after you import events (not just Brackelope events, any events). This is super useful when you want to submit old tournament data, but you’re not the TD or you can’t edit the names in the original tournament anymore.

Whenever you submit an event you’ll now see “edit” links next to each player name. Click it to change the name or add an IFPA number. Match Play will automatically set the IFPA number if you set the name to exactly match a name in the IFPA database.


can someone do this for the crazy big brackelope tournies at Pinburgh when it was at PAPA?