Improved Ranking System ???


#1

Firstly, please don’t just look at this and think that it is an attack on the current ranking system, nor an attack on the US players - it’s not. I’ve been mailing @pinwizj back and forth and would now like others opinions.

I’ve been troubled by the ranking system only using the WPPR points to rank a player. I don’t believe this is the most accurate gauge of skill, for many reasons which have been gone over ad infinitum already - so there’s no need to repeat them here.

I think I have come up with a system which is more accurate than simply using WPPR points and more accurately compares players from across the globe.

To understand my thinking first it needs to be accepted that the following 2 statements are true.

If someone has a higher WPPR total, a higher Rating and a higher Eff%ge than another person they can safely be ranked as being better than that person.

Example:
Josh Sharpe
WPPR 697.75, Rating 1828.18, Eff% 41.06
Wayne Johns
WPPR 185.48, Rating 1648.83 ,Eff% 25.96
Therefore Josh is better than Wayne

Secondly if someone is better on all 3 rankings on more people than someone else they are better than them.

Example:
I looked at all of the current top 500 ranked people

Josh is better than 463 people out of the top 500 on all 3 rankings
Wayne is better than 139 people on all 3 rankings
Therefore Josh is better than Wayne.

Hardly ground breaking stuff so far.

However if you use this figure (how many people they are better than across all 3 rankings) as their new figure to be ranked against all of the other players it provides a more balanced ranking system, which I believe to be more accurate.

If 2 players are tied on the number of people they are better on across all 3 rankings, use the number of people they are better on across 2 rankings as the decider, then 1, then 0.

This gives the rankings marked in purple on the graphic.

Using exactly the same figures, another way of ranking would be to use the number of people that are better than that person across all 3 rankings, using 1 and 2 as the deciding factor in the event of ties.

This gives the rankings in blue on the spreadsheet.

I’ll list some of the pros and cons of each system as I see them

Current System (Green)

Pros

Very easy to understand how the figure is calculated

Gives big rewards for entering bigger competitions

Cons

Only uses a single value to rank everyone

Favours those players who enter lots of competitions and ‘get lucky’ or ‘over perform’ on a small percentage

Favours those players who enter large competitions and get large values of WPPR points for finishing mid table.

Ranked by number of people better than them (Blue)

Pros

Uses all 3 readily available data points

Thus taking in to account not only how many WPPR points, but also Rating and Eff%ge

Gives a more global ranking. Especially against players who don’t compete directly against each other frequently or in the same competitions

Cons

More complicated to understand initially

Gives higher value if a player has a high Rating or Eff% gained from smaller, or less comps (See Krisztian Szalai as an example)

Ranked by number of people worse than them (Green)

Pros

Uses all 3 readily available data points

Thus taking in to account not only how many WPPR points, but also Rating and Eff%ge

Gives a more global ranking. Especially against players who don’t compete directly against each other frequently or in the same competitions

Gives a much more balanced ranking

Cons

More complicated to understand initially

I honestly can’t think of any more.

I know your initial thoughts may be this has only been designed to improve my rating - not at all.

A couple of people really stood out as having huge swings with the new systems, as their names have appeared on my radar before.

Peter Watts (15769) massively improves from his current ranking. Looking at his results who is not to say that he isn’t as good as Keith Elwin, or Cayle, or Zach? Coming from Australia he simply does not have the opportunity to amass as many WPPR points as if he played in the US regularly. I think that new system more accurately reflects his skill level.

Louise Wagensonner(19305) massively drops. I also feel that is a fair reflection, seeing that the vast majority of her WPPR points are earned from mid table finishes in large volume competitions. She has only ever won 2 competitions out of 187, as well as looking at the head-head results where there are over 200 people she has played more than 5 times and lost more often than won . Not the stats you would expect from a player ranked 167th in the world. I think that new system more accurately reflects her skill level as well.

This is in no way this is intended as attacks on individuals and their rankings, they’re ranked where they are under the current system because of their perfomances.

None of the 3 systems will be able to accurately account for players who don’t play frequently, but the 2 new systems won’t reward so heavily those players who play LOTS of competitions.

This ranking system was created using only the 3 readily available metrics, although I think that there are more accurate metrics which could be created.

I can’t figure out how to share the original spreadsheet, so have only got a screenshot of the top 40 or so. If Josh is able to share the original spreadsheet I sent him it would allow anyone to take a look and play around with it.

It will be interesting to hear your thoughts?


#2

I appreciate the effort, thank you! And also thank you for raising this!

I play with Peter Watt regularly. He’s an exceptional player, and I believe he firmly belongs somewhere in the top 50. As you say, his problem (and that of a few other top Australian players) is that, Down Under, the points are simply not available for grabs. It doesn’t matter if Peter wins absolutely everything here (which he damn near does already), he won’t ever get into the top 50 unless he starts spending a large part of his life in the US.

I imagine that the situation is much the same for most other countries (with maybe the exception of Canada, which is reasonably close to the US).

The current system to a large degree favours people with lots of enthusiasm, time, and money. Basically, the more tournaments I attend (and can afford to attend), the better my chances of working my way up the list. Moreover, there is no penalty for doing poorly in a tournament. If I enter a tournament and totally bomb out, all that happens is that I don’t advance up the rankings, but I don’t go down either, even though I may have had an embarrassing string of losses to much lower-ranked players (who don’t get any bonus for having beaten someone supposedly better than them).

I’m aware of the current rating system. But watching my rating over time, I get the impression that the rating system isn’t particularly accurate either. In particular, it seems really volatile. I’ve seen jumps of 200+ rating positions (both up and down) for having done nothing all that spectacular.

A ranking system that better reflects skill level would be a huge improvement, as would be a system that adds some form of penalty for doing poorly (meaning that people with lots of time and money would have less of an advantage).


#3

This statement is a complete fallacy if you’re actually an ELITE PLAYER (and I’m not saying Peter isn’t).

A large part of someone’s life in the US for some elite players is ONE WEEKEND IN PITTSBURGH PER YEAR.

If you removed all the results from Zach’s IFPA profile, and kept just the last 3 years of PAPA results (at current decayed values, not including the Circuit Final), Zach would have 592.90 WPPR points. He would be ranked 21st in the world. Again . . . that’s playing in ONE WEEKEND OF TOURNAMENTS PER YEAR.

In my discussions with Wayne it’s become pretty clear that the WPPR rankings are an ACHIEVEMENT BASED SYSTEM . . . it’s NOT a SKILL BASED SYSTEM.

My biggest questions are:

  1. Should Peter Watt should be celebrated as one of the “best players in the world” (top 35), without ever having PARTICIPATED in a Major Championship (let alone ever performing well at a single one of them)?

[The subject becomes burden of proof, and with the current system the burden of proof is on Peter to prove his belongs in the discussion of the best players in the world . . . versus having a system that declares Peter an elite player without any direct evidence whatsoever against his peers.]

  1. For players like Peter Watt, what do the rankings mean to you? If you’re NEVER going to play in Majors and challenge the ‘perceived’ best in the world . . . what’s the point? Is it simply to satisfy the ego and say you’re in the top 10 of the world, versus the top 100 of the world? I honestly don’t know the answer to that, and what those motivations are for individual players to be “competitive” on the global scale with a ranking system, if they never intend on actually being competitive in reality on a global scale.

  2. If a player won’t play outside of 100 miles of their location, should a ranking system accommodate that level of non-travel? What if a player doesn’t want to travel 50 miles outside of their home? Should a system accommodate that level? What if I NEVER want to leave my house, but I’m still Zach Sharpe and I’m actually awesome? Should a pinball ranking system somehow still kowtow to my skills because I actually am awesome? Where’s that line of travel/commitment that should be EXPECTED for someone to be ranked the top 50 in the world? Clearly requiring international travel is ‘over the line’, and not leaving your house is ‘below the line’ . . . where’s the line?

Wayne didn’t realize it, but I’ve been playing around with an IFPA TrueRank system for yearsssss, which had much in common with the direction he was going. It incorporated all three of our current metrics, but instead of counting players ahead/behind you, it used the Aurcade system of percentages based on how relative your metric was compared to the top rated player in that metric.

I’ve added Wayne’s sheet to that Google Spreadsheet on the second tab for anyone that wants to look into his data (or my data):

Josh


#4

I agree with that. Given Peter as a case study, if he comes to PAPA and doesn’t win, he’s pretty much going to lose rating. If he doesn’t qualify, he’s probably going to lose 200+ points due to how distant he is in the rating system; he’ll win 32 games against people 300-500 lower than him and lose 32 games against people 0-400 lower than him. Since he’s expected to win 58 of 64 (90%+ average projected win rate or so, i don’t have the math in front of me but that’s a fair estimate) and won 26 less, his rating would tank and therefore so would his ranking. In his case, with the compounded punishment of losing rating AND eff% (unless he gets top 5), some players, even if he was in the the US, may say that PAPA is not worth the time and effort because they would go from a Top 40 player to potentially outside the top 100+.

Eff% as a qualifier is always a sore spot with me because it blindly does not look at local strength of schedule and punishes players developing in strong areas. Going back to Peter (with all due respect - he’s dominating Australia!), him and I have played approximately the same number of events and he’s got an 11 month head start on recorded tournaments, but I have about 2.75x the number of potential WPPRs in my pool (not counting the 100 for Pinvasion, counting eastern Ohio). I developed from a pinball schmuck in that time, taking a ton of early losses and poor performances as I developed. Hence, when I finally turned things around and started having solid performances, I built up this shadow that I’m still shaking off that puts my eff% in the basement (779th) even though I’m 210th in WPPRs/280th in rating. As another example of a Pittsburgh player who’s developed, look at Aleksander. He went from B division crusher to major champ in 18 months, was a dominant force in the StoneHedge Knockout Series (in Akron, OH), placed well in 4 circuit events (not counting Buffalo), top 50 player in WPPRs and ratings…and he’s 190th in eff% because he’s a developing player. We both are effectually “gimped” because we grew up playing Cryss, DJ, Jon, Al, G$, etc and not players of lesser skill that we would be able to quickly overtake in skill.

If a system overhaul is going to happen, it needs to be one that doesn’t punish player X for existing in place Y or having mediocre performance Z. I like the idea for the system, but it needs either new metrics or an improved analysis.


#5

What both of you are trying to do is a technique called ensemble learning. There is a lot of theory that shows this is a good approach (provably good), but without defining mathematically what the goal is, it is hard to measure. Heck, there does not seem to be any clear agreement in the non-mathematical objective.

@pinwizj have you ever thought about offering the dataset up on kaggle, you could maybe get real data scientists to look at it.

I want to type a lot more because I am really interested in these questions, but don’t have time today.


#6

But if he’s as good as Keith Elwin, Cayle or Zach . . . he will qualify, full stop, because that’s what those three guys do EVERY YEAR. There’s no potential for a massive ratings drop if you’re actually an elite player, so there’s nothing to fear.

You can’t have it both ways with respect to earning and being given . . . respect.

Is Peter as “good” as Colin Macalpine, Tim Hansen, Helena Walter or Noah Davis? Because throwing him up in the annals of Elwin/Cayle/Zach certainly jumps over a TON of quality players that are one move to Australia away from Peter never winning another tournament down there :wink:


#7

I don’t know who or what a kaggle is . . . sounds like a monster from the Harry Potter universe :slight_smile:

Our entire database of data is always available for whoever is interested.

Personally I’m working with Shepherd on trying to run some calculations on a new metric. It’s taking the top 250 players in WPPR, filtering them out, and comparing the winning percentages of those players against just that subgroup of players. Once you strip away all the activity against players that “don’t matter”(I say that in jest), I’m curious what it looks like when you are specifically comparing that small group against just their peers. See who jumps up, see who jumps down.


#8

FWIW I studied statistics at University - if that helps you :slight_smile:

The hardest thing to do is define what makes the “best/most skilful” player.

Is it the player with the most wins, or best percentage of wins, or best percentage of placings?

Should you take into account who they are playing against, the number of people they are playing against, the format of the comp they are playing in?

What decay period do you put on these results?

Should you be penalised for having a bad competition? Would it deter people from entering competitions? (I doubt it)

As Josh has said the current ranking system is an achievement based system - my version is trying to be a more skill based ranking system.

Other metrics which could be considered were:

  • head to head results,
  • winning tournaments %ge only,
  • Eff%ge based on percentile finish rather than points gained (which differ from tournament to tournament) The list is endless.

In the end I decided to only use the figures which were already being calculated and readily available.

Whatever system is used there will always be difference in opinion and debate - Who’s the best football player Ronaldo or Messi? Pele or Maradonna? - impossible to give a definitive answer, any ranking system will only give an opinion.
It’s which opinion agrees with most people that makes it most effective.


#9

It’s called the “Banners in the rafters” system :wink:


#10

I hope that data comes out soon, I would be interested to see how that plays out. :slight_smile:

What about cutting the eff% to a 1-2 year instead of the WPPR’s 4 year shelf life? It makes the statistic more current and keeps away from the “Pittsburgh/Seattle/NYC/etc Effect” I mentioned.

Finally, looking through the TrueWPPRs I fail to see one Pittsburgh player ranked above their WPPR ranking; we’re a bunch of kids at the WPPR table and there’s not enough pie for everyone. :wink:


#11

:#WEIGHYOURTROPHYPILE

:smiley:


#12

Perfect . . . going forward every Level 257 Selfie League final will award one of these:


#13

I love IFPA and everything it does for competitive pinball. IFPA rankings have their own reason - the IFPA Championships (States or Countries, and Worlds) and ?maybe the Epstein Cup teams? Outside of that, other organizations choose to use IFPA statistics as they see fit (for division restrictions or whatever). And, if some organization wanted to created a circuit, they could use their own format/points. Like the PAPA (Stern) Circuit. And completely disregard the IFPA rankings in determining who makes the finals.

Do IFPA rankings correctly measure pinball skill in an absolute way? Who knows? A better question would be - is there even a way to do that? Maybe? I can say that the #1 player, whoever they are, will be a heavy favorite against a novice player, but will lose in a single game on occasion, whereas a Grandmaster in Chess will (I don’t even feel the need to qualify this with ‘almost’) never lose to a novice (even a single game).

In my opinion the worst thing the IFPA could do would be to somehow penalize players for participating in events. I don’t exactly know their purpose anymore (Josh didn’t say on his latest podcast with Nate - in the past he seemed torn between promoting competition and measuring skill), but the explosion in tournaments worldwide has to be, in part, attributed to the IFPA. Measuring true ability is nice/noble, but practically speaking what does it matter? Your method does have one thing in it’s favor - it ranks KME #1 and Josh always said that doing that was essential when tweaking the ranking system.


#14

I will give a concrete example why even under the 3 ranking system - the ratings would be skewed.

At some point last year my ‘ranking’ was over 50 spots worse than it is right now, but my rating measurement had me in the 40s (I remember seeing it and wondering what exactly that was measuring) whereas now I am over 400th. I am way better now than I was a year ago, but under the 3 measurement system I would probably be ranked higher a year ago.


#15

I did mention that to Josh in our discussions :grin:

The actual purpose was to try and be able to compare individuals who don’t compete against each other in the same competitions. The way that works is with a skill based system.
The sheer difference in WPPR points available in different areas is what makes the current system skewed.


#16

Probably not exactly what you were wanting but I’ve already done it for the current top 10 ranked UK players.


#17

A year ago you would not have had as many WPPR points, thus there would be significantly more people above you in that metric, even if your rating dropped (and it does depend significantly what it dropped from and to) it is no guarantee that your overall ranking would have dropped by the same degree - if at all. There’s also the very real chance that your eff%ge would have risen as you got better and didn’t have as many low place finishes, again boosting the chance that your ranking would increase.

It may be that you are a better player than you were a year ago - but there’s no guarantee that the other players aren’t better than they were a year ago too.

This system compares your metrics against everyone. Just because you’ve improved doesn’t mean that everyone else hasn’t as well.


#18

The same as it’s always been . . . the motivate players to play competitive pinball (I think we check this box) . . . and to also be the most accurate ranking system in determining who the best pinball players in the world are (Last time I checked we continue to check this box since we’re the ONLY system) :wink:

Ultimately though it’s less about the system itself, and more about what you mentioned . . . it’s what we DO with the system that is far more important.

Qualifying for the SCS and IFPA WC are the two biggest real life tangible things that the WPPR system actually does. It can be NOT ACCURATE AT ALL, but it’s still the process we use for determining who plays at these prestigious events.

We understand that at the global level, there are simply areas of the world where access to competitive pinball is far more limited. That is why we have always and will always offer Country Exemption spots into our World Championship.

Peter Watt has a golden ticket to play against the best in the world whenever he wants by being the biggest fish in the Australian pond. If he NEVER takes that opportunity, I struggle to find any meaning over why any of this matters? The only thing that can possibly matter outside of wanting to play the best in the world is to stroke one’s ego over being able to say “I’m top 50 in the world because this IFPA system said I am”.


#19

Interesting stuff, for sure.

Small request: Please freeze the header row so we wretched refuse outside of the first screen’s worth of listings can more readily see what the columns mean. :smiley:


#20

You could use the data like Rating and Eff% to highlight “up and coming” players or “underranked” players as a separate part of the website (another tab under Resources or something) because there are definitely stats you can derive from a few filters (has played fewer than 2 years, has rating >1600, EFF% >15%) and show the players who aren’t ranked in the main ranking as high in those views. We all know, from looking at IFPA, which players have a chance to rise up through the current system, at least for American players. Players pointed out when Eric Stone won nationals that he was in the “All Green” club. The data that’s up there is useful for player lookup, it just doesn’t rank you according to those other non-achievement factors.