League format with a handicap

The goal is to set up my next league (which starts tonight) so that each foursome is actually two teams battling it out, and comparing combined scores. (Overall league standings (for WPPRs) will still be based on individual results, but your team record, which comes from the handicapped matchups, will be used for end of the season goodies/benefits).

I’ve been doing some math with my results from last season. To come up with a handicap, I took each score for each player for each game, and compared it to the average score for all players for that game (for the same night only, some games were repeated on later nights).

So, each player got a “% of average score” handicap number. These ranged from 216% down to 45%. So, the 216% means: “that player, on average, is expected to score 216% of the league average score for a given game”, ie - a little more than double the average score. The 45% means that player scored roughly half of the average.

I spent a lot of time worrying about things like: What about when someone really blows up a game? Does that artificially inflate their handicap? Does it screw up all the numbers? Why did the best player have 201% but the second best player had 216%? Etc.

I looked at a few things (like removing the 2 best scores), didn’t find anything conclusive, and then finally just gave up and created some what if scenarios from last season. I’d create matchups where the combined “% of average score” was roughly even between the two teams.

Long story short, this created remarkably even matches. It worked when high players were paired with low players. It worked when medium players were paired with medium players. It worked with the best and worst, against the 3rd and 4th. And by worked, I mean: every matchup was either dead even (like 15 wins for one team, and 15 for the other team), or off by one (17 wins for one team, and 18 wins for the other team). I was simulating it as if those same two teams faced off every week of the season.

Even doing Canadian doubles, one player against two, worked pretty well. I tested 216% (best player) vs 123% (3rd ranked) & 96% (7th ranked). That matchup ended up 18-22.

1st vs 4th and 5th, actually was 17-13 in favor of the top player.

(I thought everything was suspiciously close, so I tested several uneven pairings also, and did indeed find the expected landslide victories).

These simulations were just for ease of coding. In the real league, you’d have a different teammate each week…but an even matchup vs the other team, regardless of who is in the foursome. (Foursomes would be sculpted a bit to ensure this).

So, I’m going to move forward with this (league starts tonight). I’m pretty confident that even with uneven numbers of players, I can create relatively even matchups.

Will post again here with updates.

1 Like

Hey @kdeangleo,

I help to manage a league here in Australia, plus assist on various other events.

Most tournaments still run on a herb-like structure, and your software keeps coming up as a suggestion. Is it generally available for use, and if so could I get a copy?

Basically I’m just trading off the convenience of a google doc which provides an auto synched option for players to view, vs existing systems that might already be in place for me to use.



Messaged you so we don’t crowd this thread with anything off topic :smile:

Now that my league has ended, I wanted to follow up. I used the methodology described in my previous post - the short version of which is:

  • Un-handicapped standings are done by Outscored % (the % of other players you beat on a given game, averaged across all games played). These final standings are submitted for WPPR points.

  • You play in groups of four, and one person in the group will be your teammate. If your combined scores on that machine beat out the combined scores of the other team in your foursome, you get a Win. The teams are determined by taking % of Average Score for each player (based on their past performance), and making the two teams roughly equal when summing % of Average Score of the teammates.

  • There is a separate leaderboard for Wins…ordered by Total Wins (not winning percentage). At the end of the season, the top 3 got trophies and t-shirts. In the end of the year tournament, seeding was done by Outscored %, but the player with the higher number of Wins was always given honors in each matchup.

  • If the number of players wasn’t divisible by 4, then you’d have 2-on-1 or even 1-on-1 matchups.

Even from the first league night, I felt like it wasn’t really going as I had hoped. There was some planning between teammates, but I didn’t notice that much. People were a little confused at first, and I needed to be more clear about the benefits of team wins, and I should have been giving reminders throughout the season about why you would want to win those matches. (Although, I think having better benefits would have helped).

At the end of the season, I had a conversation with each player individually (there were 14 who attended regularly), and found out, to my surprise, that they wanted to do it again the same way next time around. One guy even initially said that he didn’t like it because he “felt pressure not to let the team down”, but after further discussion, he voted to do teams again in our next season.

It worked well for me, because although I was the best player, I was in a lot of situations where I needed to put up a lot of points, either because my teammate hadn’t scored well, or because I was alone against two opponents.

I do wish that I could have created teams in advance each week (so people could pre-plan), but I found out that just wasn’t possible because I never really knew attendance until right at the start time.

So even though it didn’t feel as awesome as I expected, the players enjoyed it more than our previous format (where you were grouped with the closest to you in ability, ie: 1,2,3,4 in a group, 5,6,7,8 in a group).

If anyone has suggestions for stronger incentives for Team Wins, I’d love to hear them. I’ve been considering splitting off part of the prizepool, and paying out the top players in Wins. Doing it by Total Wins also incentivizes attendance.

You can see the results here: http://pinballstats.com/leagues/2 (takes 10 seconds to load if no one has visited recently). The two players with the best win/loss percentages were the two players that made the greatest leap in skills throughout the season. Because of course their past performances where underestimating how well they’d play.

Personally I do not think a handicap system, where points are spotted on machines is a good idea. I do think that Pinburgh-like grouping of players within a league can be a good tool to give new players the opportunity to play people within their skill tier.

Here are the issues I have seen with trying to develop a handicap system for individual machines:

  • Average Score is a poor metric to grade players on.

The distribution of any player’s scores on any specific machine does not tend to follow a normal distribution. This has been true for every machine I have looked at so far. The curve is most similar to a log-normal distribution where the peak of the curve falls below the average. (The mean is considerably higher than the mode & median). Here’s an example of the distribution of scores on World Cup Soccer in PAPA 17 A qualifying.


  • Collecting enough data to come up with reliable handicapping is very difficult.

Over the duration of a league many things can happen to your machines that will change the way they play. Certain malfunctions (kickback not working, changing flipper rubbers, tilt bob falls out) can very quickly change the score range for different games. If you’re playing in a league at a location which has many games and few matches per week, it may take a while to be confident enough to have a handicapping, before factoring in the change in week 4 to widen the outlane posts on your Sterns to keep things moving along.

That said, the biggest thing is getting the feedback from league players if you do have a handicap system and if people like what you’ve done (like in @ryanwanger’s case) then you might as well continue doing it.


Thanks for this reply - not sure how I missed it!

Just to clarify: I’m not spotting points on any machines. I’m using “% of average score” to create balanced teams in a 4 player group. That’s calculated by league scores only, and always compared to the rest of the league, on that same machine, on the same night. Changes to how a game is playing are irrelevant (unless they happen halfway through a league night).

After all of this, I’ll have to make some kind of change for 2015, since the way I’m ranking players now qualifies as “indirect competition”.

I like the Pinburgh format of playing the people closest to you in ability, but when you only have 14 people showing up, like I do, the top and bottom groups stay (mostly) the same, and the middle two just swap players back and forth.

The FSPA format alleviates this somewhat by double-swapping between 4 player groups… so for typical grouping of the example 14 person league (3-3-4-4), the two losing players from group 3 would swap with the two winning players from group 4. It helps.

In general: You really end up getting more “circulation” of players and groups than you might expect. I just did a quick analysis of a few recent FSPA-format 16 person leagues… each person played an average of ~10 opponents per (10 week) season. Of course, if you have a player or two who are significantly better or worse than everyone else, they may get “stuck” on the top or bottom and thus see fewer opponents over the course of the season… but even good players will have a bad night, and bad players will have a good night, so people do move around.

1 Like

That’s not the Pinburgh format, that’s the FSPA format. :smile: The Pinburgh format starts with a full mix of everyone, and eventually heads to top and bottom groups paired by ranking.

I’d say with 14 players, just randomly pair everyone, or try to pair everyone with people they haven’t played before like a round robin might.

Just to nitpick: “play the people closest to you in ability” isn’t exactly the FSPA format either. The FSPA format is basically “win your group and move up a group next week; lose your group and move down a group next week”. Yes, the idea is to get players “of similar skill” competing with one another - and it succeeds at that goal - but if the sole intent was to match up the closest skilled players, it’d probably be more correct to use something like outscored % as the grouping criteria. The problem with that approach is that it tends to lead to group stagnation. The FSPA solution guarantees everyone will play with at least one and usually two different people from week to week… a nice compromise of the goals of playing against similarly skilled opponents, and getting the chance to play against lots of different people during the season.

1 Like

Okay, okay - my shorthand descriptions are flawed. :smile:

I read through the FSPA page, and everything was clear up until the “Divisions” part. Can you explain that in more detail?

Edit: I was reading it from the papa site, which doesn’t let you copy and paste for some reason. http://papa.org/learning-center/director-resources/directors/league-formats/

Yeah, that summary on the PAPA site is… well, too summarized. :confused:

So each meet, we note each player’s “ladder position”, which is just the index of where a player is if you stacked all the groups atop each other. (e.g. assume a 15 person league, 5 groups of 3. The top person in group 1 is ladder position 1, the bottom person in group 1 is ladder position 3, the top person in group 2 is ladder position 4… the bottom person in group 5 is ladder position 15. “Top” and “bottom” of each group is based on the regrouping after each meet.) Each player’s average ladder position is computed, excluding the first two and last two meets of the season. (*) To determine divisions, we sort players by average ladder, and chop the list evenly into however many divisions the rules call for, based on the number of players participating.

(*) We exclude the first two weeks so anyone who was severely misgrouped by the league commissioner at the start of the season has a chance to move toward a more reasonable group. We exclude the last two weeks to allow a period where the divisions are locked so players know what they’re shooting for to qualify for playoffs, without people being surprised by a last-minute division switch.

It’s easier than it sounds. Here’s the player-friendly summary from our Players Guide:

Divisions: Generally, players who usually play in the top groups will be qualified for “A” division, while players in the lower groups qualify for “B” division. If there are enough players, there may also be a “C” division, with groups divided into thirds. Players with the most league points in each division at the end of the regular season move on to the playoffs, and there play only against players in the same division.

@joe I’m considering FPSA for this next time around. Is it correct that points are only relevant to determine who goes up and who goes down, while standings and submitted results are based on average ladder rank?

Nope, standings depend on total league points earned. Average ladder rank is used to assign players to divisions.