Strikes Tournament Simulation Results, Part 2

I have revisited the simulation of strikes tournaments, this time using Python for better accessibility (code included!). The simulation is also far more thorough.

Player Model:

Players have an intrinsic skill parameter chosen from a log-normal distribution:
lognormal(0, 0.25)
This parameter represents the player’s ability to achieve high scores. The skill distribution looks like this:

The average is close to 1, representing a skill level typical of a tournament player. The left tail tapers off around 0.5, representing a newer player with less ability. The right tail tapers off around 2.0, representing a top-tier player (someone in the top 50, for example).

Game Score Model

To simulate the unpredictability of a pinball game, a player generates a score chosen from another lognormal distribution. This time, the first parameter is the player’s intrinsic skill, and the second is a scatter parameter of 0.45 (more on this value later):
lognormal(skill, 0.45).
Game score distributions for players of skill levels 0.5, 1.0, 1.5, and 2.0 are shown below.

It’s possible for a lower-skilled player to beat a higher-skilled player, but this is increasingly unlikely the further apart their skill levels are. Consider the following pairings:

  • 0.5 skill vs. 1.0 skill: Higher skill player wins 78.4% of games
  • 0.5 skill vs. 1.5 skill: Higher skill player wins 94.3% of games
  • 0.5 skill vs. 2.0 skill: Higher skill player wins 99.0% of games
  • 1.0 skill vs. 1.5 skill: Higher skill player wins 78.4% of games
  • 1.0 skill vs. 2.0 skill: Higher skill player wins 94.3% of games
  • 1.5 skill vs. 2.0 skill: Higher skill player wins 78.4% of games

The score scatter parameter of 0.45 was modified slightly from my previous simulations, where it was 0.50. The reason is an analysis of a couple tournaments which took place at District 82: an 8 Fair Strikes tournament on August 11, 2022 with 107 players, and another 8 Fair Strikes tournament on August 19, 2023 with 117 players. Using the lower bound of each player’s rating (as extracted from matchplay.events circa March 1, 2024) to represent player skill, the Kendall tau parameters for these two tournaments were calculated to be ~0.35 and ~0.54, respectively. Therefore, I chose the scatter parameter such that my simulation produced an average Kendall tau of ~0.45 for these tournaments, roughly halfway in-between. This is a very small sample size, but it was time-consuming to analyze the tournaments.

Note that as the game score scatter parameter increases, average Kendall tau for a tournament will decrease. Both the game score scatter parameter and the Kendall tau being 0.45 is a coincidence.

Tournament Types

Five types of strikes tournaments were examined: Progressive Strikes, Fair Strikes, Lenient Group Strikes, Oprah Strikes, and Single Strikes. The players receive strikes according to these rules:

Progressive Strikes (Swiss Groupings)

  • 4 players: 0/1/2/3 strikes
  • 3 players: 0/1/2 strikes
  • 2 players: 0/1 strikes

Fair Strikes (Swiss Groupings)

  • 4 players: 0/1/1/2 strikes
  • 3 players: 0/1/2 strikes
  • 2 players: 0/2 strikes

Lenient Group Strikes (Swiss Groupings)

  • 4 players: 0/0/1/1 strikes
  • 3 players: 0/0/1 strikes
  • 2 players: 0/1 strikes

Oprah Strikes (Swiss Groupings)

  • 4 players: 0/1/1/1 strikes
  • 3 players: 0/1/1 strikes
  • 2 players: 0/1 strikes

Single Strikes (Swiss Groupings)

  • 4 players: 0/0/0/1 strikes
  • 3 players: 0/0/1 strikes
  • 2 players: 0/1 strikes

Simulation Parameters

The simulations include the following permutations:

  • Progressive, Fair, Lenient Group, Oprah, and Single Strikes
  • A variety of strikes thresholds for exiting the tournament
  • Attendance including every value between 10 and 150 players

For each configuration, 5000 full tournaments were simulated, and average results were calculated across that configuration. The results include:

  • Average tournament length (in rounds)
  • Average tournament duration (in a kind of pseudo-time)
  • Average Kendall tau (a statistical measure of how well the tournament sorts players by intrinsic skill) - if the players were perfectly sorted, this would be 1.0, and if they were randomly sorted, this would be close to zero.

Results

Progressive Strikes


Fair Strikes


Lenient Group Strikes


Oprah Strikes


Single Strikes


Results, Narrowed

Many of these configurations are unlikely to be selected by a tournament organizer. For example, few tournament directors would choose 2 Fair Strikes. Therefore, we can narrow the scope to a few reasonable choices for the purposes of comparison:

  • 10 Progressive Strikes
  • 8 Fair Strikes
  • 3 Lenient Group Strikes
  • 6 Oprah Strikes
  • 1 Single Strike

We know a tournament with more rounds will do a better job of sorting players by intrinsic skill (therefore higher Kendall tau), so it’s interesting to plot average Kendall tau vs. average number of rounds, for all values of attendance:

We can see that 8 Fair Strikes pulls into the lead as the most ‘efficient’ way to sort a large number of players by skill. For small numbers of players, the data is more chaotic, but 6 Oprah Strikes has a slight advantage. Two other formats (10 Progressive Strikes and 3 Lenient Group Strikes) produce rankings which are a bit more random, and therefore have a lower average Kendall tau value. 1 Single Strike is by far the least predictable format.

It’s also interesting to consider the duration of a tournament. Because of the lack of tiebreakers, strikes tournaments are already more time-efficient than many other formats. The code estimates the duration of each round as the highest sum of game scores across all groups. When using Swiss groupings, it is likely that top players will encounter each other repeatedly, leading to long game duration in the top groups. The top group could easily take 3 times longer than the bottom group to finish their game. Some amount of waiting is inevitable, but we can still compare formats:

The results are fairly linear - duration is a roughly linear function of number of rounds, which would seem to agree with reality.

Variance

So far, the results presented include only average values across 5000 tournaments of each configuration. Let us consider the possible distribution of outcomes for the previously described popular tournament configurations:

If 100 players play a 10 Progressive Strikes tournament, the duration will be about ~13.48 rounds on average. But there is considerable spread in the distribution. Here is the distribution after simulating 5000 tournaments:

  • 11 rounds: 11 tournaments
  • 12 rounds: 1461 tournaments
  • 13 rounds: 1605 tournaments
  • 14 rounds: 922 tournaments
  • 15 rounds: 507 tournaments
  • 16 rounds: 251 tournaments
  • 17 rounds: 140 tournaments
  • 18 rounds: 64 tournaments
  • 19 rounds: 25 tournaments
  • 20 rounds: 11 tournaments
  • 21 rounds: 3 tournaments

The uncertainty in how many rounds 10 Progressive Strikes takes to conclude is very high, due to the long right tail of the distribution. This may not be a good choice if the tournament director wishes to get to bed at a reasonable time.

By comparison, 100 players playing an 8 Fair Strikes tournament is much more constrained, with a mean runtime of 13.76 rounds and the following distribution:

  • 13 rounds: 1762 tournaments
  • 14 rounds: 2687 tournaments
  • 15 rounds: 524 tournaments
  • 16 rounds: 27 tournaments

Likewise, 100 players playing a 3 Lenient Group Strikes tournament also has a fairly narrow distribution around its mean of 14.15 rounds:

  • 13 rounds: 882 tournaments
  • 14 rounds: 2685 tournaments
  • 15 rounds: 1240 tournaments
  • 16 rounds: 184 tournaments
  • 17 rounds: 9 tournaments

100 players playing a 6 Oprah Strikes tournament also has a fairly narrow distribution around its mean of 12.82 rounds:

  • 12 rounds: 1662 tournaments
  • 13 rounds: 2623 tournaments
  • 14 rounds: 649 tournaments
  • 15 rounds: 60 tournaments
  • 16 rounds: 6 tournaments

There is no variance in the number of rounds for a 1 Single Strike tournament; it is deterministic. 100 players takes 13 rounds to conclude.

This data is shown in the histogram plots below:

Other Topics of Interest

Suppose we wish to know whether Swiss groupings or random groupings will have the shorter runtime. We can answer this in an approximate way by comparing histograms of the same tournament configuration with the two types of groupings.

While Swiss groupings make each round take longer (due to the best players being concentrated at the top of the list), its reduction of the number of rounds outweighs this effect. Swiss groupings shorten the average runtime of an 8 Fair Strikes tournament by about 10%, from ~348 to ~314 time units.

What effect do Swiss pairings have on the ability of the tournament to sort the players by skill? Using the average Kendall tau parameter, we can see that Swiss pairings decrease sorting efficiency, but only very slightly. Swiss pairings reduce the average Kendall tau value from ~0.458 to ~0.449, a mere 2% reduction.

It seems reasonable to conclude that there is little downside to using Swiss pairings, and considerable upside for a tournament schedule.

Python Code

import math
import numpy as np
from operator import add
from scipy import stats
import csv


# USER-DEFINED PARAMETERS
SIM_TYPE = "Monte Carlo"    # "Monte Carlo" = average results for all permutations of N_PLAYERS and N_STRIKES
                            # "Histogram" = all results for N_PLAYERS_MIN and N_STRIKES_MIN
FORMAT = "Progressive Strikes"      # Progressive Strikes   [0,1,2,3]
                                    # Fair Strikes          [0,1,1,2]
                                    # Oprah Strikes         [0,1,1,1]
                                    # Lenient Group Strikes [0,0,1,1]
                                    # Single Strikes        [0,0,0,1]
N_PER_GAME = 4          # nominal number of players on each game (player count permitting)
SWISS = True            # True = Swiss groupings, False = random groupings
N_STRIKES_MIN = 3       # lowest strikes threshold for elimination (iterates from MIN to MAX)
N_STRIKES_MAX = 10      # highest strikes threshold for elimination
N_PLAYERS_MIN = 10      # lowest number of players in tournament (iterates from MIN to MAX)
N_PLAYERS_MAX = 150     # highest number of players in tournament
N_RUNS = 5000           # number of tournaments to simulate for each format and player count (higher = better statistics)
SKILL_SCATTER = 0.25    # second parameter of lognormal distribution of player skill
SCORE_SCATTER = 0.45    # second parameter of lognormal distribution of game score
RAND_SEED = 42          # starting seed for random number generator


# a player in a strikes-based tournament
class Player(object):
    def __init__(self, ID):
        self.ID = ID
        self.skill = rand.lognormal(0, SKILL_SCATTER)
        self.games = []
        self.strikes = 0
        self.bounties = 0
    
    # generate a game score
    def gameScore(self):
        return rand.lognormal(self.skill, SCORE_SCATTER)
    
    # store a game in memory
    def recordGame(self, game):
        self.games.append(game)
    
    # receive strikes & add to total
    def addStrikes(self, strikesGiven):
        self.strikes += strikesGiven
    
    # capture bounties & add to total
    def addBounties(self, bountiesCaptured):
        self.bounties += bountiesCaptured

# a strikes-based tournament
class Tournament(object):
    def __init__(self, nPlayers, nPerGame, format, nStrikes):
        self.nPlayers = nPlayers
        self.nPerGame = nPerGame
        self.format = format
        self.nStrikes = nStrikes
        self.players = []
        self.eliminatedPlayers = []
        self.groups = []
        self.numInRound = []
        self.roundDuration = 0
        self.totalDuration = 0
        self.nRounds = 0
        self.suddenDeath = False
        self.tau = 0
        self.tgpGameCounts = [0 for i in range(nPerGame - 1)]   # counts of m-player games played in tournament, where m = 2 to nPerGame
    
    # initialize list of players
    def initPlayers(self, n):
        self.players = [Player(i) for i in range(n)]
    
    # shuffle player order
    def shufflePlayers(self):
        rand.shuffle(self.players)
    
    # sort remaining players by strikes (stable sort)
    def sortByStrikes(self):
        self.players.sort(key=lambda p : p.strikes)
    
    # partition players into groups of at most n
    def partitionPlayers(self, n):
        if len(self.players) <= n:   # if all players fit in one group
            self.groups.append(self.players)
            self.players = []
        while len(self.players) > math.lcm(n, n-1): # while many players remain, group them n at a time
            self.groups.append(self.players[:n])
            self.players = self.players[n:]
        potentialPartitions = [i for i in [*partitions(len(self.players))] if max(i) <= n]
        potentialPartitions.sort(key = lambda x : min(x))
        chosenPartition = potentialPartitions[-1]   # choose partition with largest minimum group size
        for i in sorted(chosenPartition, reverse=True):
            self.groups.append(self.players[:i])
            self.players = self.players[i:]
        for g in self.groups:
            self.tgpGameCounts[len(g) - 2] += 1     # add number of 2-, 3-, 4-player game counts to tournament totals
    
    # each group plays their game
    def runGames(self):
        self.roundDuration = 0
        for g in self.groups:
            IDs = [p.ID for p in g]
            scores = [p.gameScore() for p in g]
            self.roundDuration = max(self.roundDuration, sum(scores))   # round duration is equal to the largest sum of players' scores on a game
            sortedScores = sorted(scores, reverse=True)
            ranks = [positions(sortedScores, lambda x : x == s)[0] for s in scores]
            strikesGiven = strikesGen(self.format, ranks)   # determine strikes given to each player
            for i in range(len(g)):
                g[i].addStrikes(strikesGiven[i])    # assign strikes
            for p in g:
                p.games.append(IDs) # record player IDs
            nElim = len([1 for p in g if p.strikes >= self.nStrikes])   # number of players in group who are eliminated
            iWin = positions(ranks, lambda x : x == 0)                  # indices of group winners
            if nElim > 0:
                g[iWin[0]].addBounties(nElim)                           # winner collects all bounties
    
    # remove eliminated players & return other players to main pool
    def cleanup(self):
        self.players = [p for g in self.groups for p in g if p.strikes < self.nStrikes]
        self.eliminatedPlayers.append(sorted([p for g in self.groups for p in g if p.strikes >= self.nStrikes], key = lambda p : p.strikes))
        self.groups = []
    
    # run rounds until one person remains
    def runTourney(self):
        self.initPlayers(self.nPlayers)
        while len(self.players) > 1:
            self.nRounds += 1
            self.numInRound.append(len(self.players))
            self.shufflePlayers()   # randomize player order
            if SWISS:
                self.sortByStrikes()
            self.partitionPlayers(self.nPerGame)
            # print("ROUND", self.nRounds, ":", len(self.players), "remaining.  Top group:", [p.strikes for p in self.groups[0]])
            self.runGames()
            self.cleanup()
            self.totalDuration += self.roundDuration
        self.kendallTau()
    
    # compute Kendall's tau, a measure of how well the tournament ranked the players by intrinsic skill
    def kendallTau(self):
        rankedPlayers = [p for round in self.eliminatedPlayers for p in round]
        rankedPlayers.insert(0, self.players[0])
        ranks = range(len(rankedPlayers))
        skills = [p.skill for p in rankedPlayers]
        tau, p_value = stats.kendalltau(ranks, skills)
        self.tau = tau
    
    # report tournament results
    def report(self):
        print("After", self.nRounds, "rounds")
        print("The winner is: Player", self.players[0].ID, "with", self.players[0].strikes, "strikes")
        print("Winner's games played:")
        for g in self.players[0].games:
            print(g)

# return positions of list items matching a predicate
def positions(list, predicate):
    return [i for i, v in enumerate(list) if predicate(v)]

# signum function
def sign(x):
    if x > 0:
        return 1
    elif x < 0:
        return -1
    else:
        return 0

# generator for integer partitions
def partitions(n, I=1):
    yield (n,)
    for i in range(I, n//2 + 1):
        for p in partitions(n-i, i):
            yield (i,) + p

# determine strikes assigned to each rank for a game result
def strikesGen(format, ranks):
    n = len(ranks)
    strikes = []
    if format == "Progressive Strikes":
        return ranks    # strikes equal ranks; [0, 1, 2, 3]
    elif format == "Fair Strikes":
        for i in range(n):
            strikes.append(sign(ranks[i]) + (1 - sign(n - 1 - ranks[i])))   # strikes = [0, 1, 1, 2]
    elif format == "Oprah Strikes":
        for i in range(n):
            strikes.append(sign(ranks[i]))  # all but winner get a strike; [0, 1, 1, 1]
    elif format == "Lenient Group Strikes":
        s = math.ceil(n/2)
        for i in range(n):
            if ranks[i] >= s:
                strikes.append(1)   # worse half of players get a strike, lenient when n=odd [0, 0, 1, 1]
            else:
                strikes.append(0)
    elif format == "Single Strikes":
        for i in range(n):
            strikes.append(math.floor(ranks[i]/(n - 1)))   # lowest score gets a strike; [0, 0, 0, 1]
    else:
        for i in range(n):
            strikes.append(1)   # give everyone a strike
    return strikes

# return stats on tourney
def tourneyStats(nPlayers, nPerGame, format, nStrikes, nRuns):
    roundsData = []
    durationData = []
    tauData = []
    bountyData = []
    tgpGameCounts = [0 for i in range(nPerGame - 1)]
    for i in range(nRuns):
        tourney = Tournament(nPlayers, nPerGame, format, nStrikes)
        tourney.runTourney()
        roundsData.append(tourney.nRounds)
        durationData.append(tourney.totalDuration)
        tauData.append(tourney.tau)
        bountyData.append(tourney.players[0].bounties)
        tgpGameCounts = list(map(add, tgpGameCounts, tourney.tgpGameCounts))
    avgRounds = np.mean(roundsData)
    avgDuration = np.mean(durationData)
    avgTau = np.mean(tauData)                           # average Kendall tau of rankings
    avgBounties = np.mean(bountyData)                   # average bounties collected by winning player
    tgpMultipliers = [0.5*i + 1 for i in range(nPerGame - 1)]   # TGP multipliers for 2-, 3-, 4-player games, etc.
    tgpNetMultiplier = sum([x*y for x, y in zip(tgpGameCounts, tgpMultipliers)]) / sum(tgpGameCounts)    # overall multiplier for tournament config
    tgpFinal = avgRounds*tgpNetMultiplier
    return (avgRounds, avgDuration, avgTau, avgBounties, tgpFinal)

# return histogram on tourney
def tourneyHist(nPlayers, nPerGame, format, nStrikes, nRuns):
    roundsData = []
    durationData = []
    tauData = []
    bountyData = []
    for i in range(nRuns):
        tourney = Tournament(nPlayers, nPerGame, format, nStrikes)
        tourney.runTourney()
        roundsData.append(tourney.nRounds)
        durationData.append(tourney.totalDuration)
        tauData.append(tourney.tau)
        bountyData.append(tourney.players[0].bounties)
    return (roundsData, durationData, tauData, bountyData)


### -------- SCRIPT -------- ###

# initialize random generator
rand = np.random.RandomState(RAND_SEED)

# Monte Carlo statistics for many tournament configurations
if SIM_TYPE == "Monte Carlo":
    nStrikes = range(N_STRIKES_MIN, N_STRIKES_MAX + 1)
    nPlayers = range(N_PLAYERS_MIN, N_PLAYERS_MAX + 1)
    iMax = len(nStrikes)
    jMax = len(nPlayers)
    for i in range(iMax):
        print("starting on", nStrikes[i], "strikes")
        fileID = str(nStrikes[i]).zfill(2)
        filename = FORMAT + " " + fileID + ".txt"
        f = open(filename, 'w')
        writer = csv.writer(f)
        for j in range(jMax):
            print(nPlayers[j], "players")
            avgRounds, avgDuration, avgTau, avgBounties, tgpValue = tourneyStats(nPlayers[j], N_PER_GAME, FORMAT, nStrikes[i], N_RUNS)
            writer.writerow([avgRounds, avgDuration, avgTau, avgBounties, tgpValue])
        f.close()

# complete data for a single tournament configuration (useful for making histograms)
if SIM_TYPE == "Histogram":
    roundsData, durationData, tauData, bountyData = tourneyHist(N_PLAYERS_MIN, N_PER_GAME, FORMAT, N_STRIKES_MIN, N_RUNS)
    filename = FORMAT + " " + str(N_STRIKES_MIN).zfill(2) + " " + str(N_PLAYERS_MIN).zfill(2) + " histogram.txt"
    f = open(filename, 'w')
    writer = csv.writer(f)
    for i in range(len(roundsData)):
        writer.writerow([roundsData[i], durationData[i], tauData[i], bountyData[i]])
    f.close()
8 Likes

Thanks for doing all of this analysis. Great job with the clear write-up.

These have been questions on my mind as we recently switched our 8 fair strikes 30-person tournament from Swiss to Random. It’s great to see the full analysis here.

This is true from the TD scheduling and results accuracy perspective, but my impression is that players disagree with this because Swiss feels like immediately being placed in a “losers bracket” where as Balanced is more social and less repetitive (for a small-medium size tournament).

Is anyone actually using Oprah Strikes or Single Strikes? I ran an Oprah Strikes B Division finals once just for the sake of limiting it to exactly 3 rounds, but it doesn’t seem like something that would be useful for a main tournament format.

Flip Side in Memphis ran a 1 Single Strike tournament recently. It made for a good broadcast - being in danger of elimination in every round kept the players on their toes. But it doesn’t seem to be a popular format, probably because good players don’t like high variance. Maybe it’s considered too chaotic for a ‘serious’ event?

I haven’t seen Oprah Strikes used in a tournament.

I’m in the process of adding TGP value to the script. I hope to compare formats and see if one yields more rating value per time or per round than the others. It will also be interesting to contrast my results with the official TGP calculation that assumes all players are of equal skill.

Fantastic work! It’s interesting how there are diminishing returns on tau as the number of players increases – it suggests that maybe adding more players to a tournament eventually stops giving you more information about the ability of those players.

What do you make of tau never going above .5? Perhaps because of the chance element added to match outcomes?

Because of the continuous distribution of skill I used, it will frequently occur that player A’s skill is slightly better than player B’s skill (favored perhaps 51% to 49%), but B ends up finishing ahead of A due to the significant game score variance. That sort of thing will happen dozens or hundreds of times in a large tournament, and will prevent tau from getting too large (for any reasonable number of rounds).

One question my code doesn’t currently address is how frequently the best player wins the tournament. It seems to happen regularly in real tournaments, though assessing absolute skill is obviously not possible.

TGP

I’ve incorporated a TGP calculation into the script (code updated above), which works as follows (and is hopefully correct):

  • Run 5000 tournament simulations for a given configuration
  • Tally all 2-, 3-, and 4-player games played across all simulations: [n2, n3, n4]
  • Assign TGP multipliers to each count: [1.0, 1.5, 2.0]
  • Compute the average TGP multiplier, weighted by the game counts: (1.0*n2 + 1.5*n3 + 2.0*n4) / (n2 + n3 + n4)
  • The product of TGP multiplier and average number of rounds gives the TGP value of the tournament configuration

Results

TGP values shown here are neither rounded nor truncated.





One obvious takeaway is that TGP is not monotonic with number of players, especially at the small end of tournament size. Certain specific attendance values produce more 3-player games than others, and this drags down the average TGP, only for it to bounce back with the addition of one more player.

A smooth curve could be fit to the data, permitting a much easier calculation, but this would ignore the idiosyncrasies of actual matchmaking. We have the computing power, so we may as well build the complete lookup table.

The next point to consider is whether one tournament format is more ‘time-efficient’ than another. That is, if you wished to maximize the TGP value per tournament duration, which format is best? It ends up being a close race, which is probably a good thing - that would indicate that the TGP formula (assuming I’ve interpreted it correctly) is relatively unbiased across tournament formats. But the bias is not zero. This comparison is shown below:

At larger sizes, 3 Lenient Strikes has an advantage (of about 1 TGP), while for smaller player counts, 10 Progressive Strikes and 8 Fair Strikes are slightly advantaged. For some reason, 10 Progressive Strikes and 8 Fair Strikes are nearly indistinguishable from each other - the data overlaps almost perfectly.

I don’t know why 3 Lenient Strikes pulls into the lead at high player counts. There’s nothing about that format which would seem to create fewer 2- and 3-player groups than would an 8 Fair Strikes tournament. Both tournaments types have a relatively small right tail in their duration distribution. Any ideas?

1 Like

how does head to head strikes measure up? both rounds-wise and kendall-tau?

I investigated Head-to-Head Strikes with anywhere from 1 to 7 strikes, and compared it to other formats. I ended up picking 6 Head-to-Head Strikes as most similar to other popular formats.

I also changed the strikes threshold for Oprah Strikes in the charts. Previously, I reported 6 Oprah Strikes. On further consideration, 7 Oprah Strikes is perhaps a better comparison to popular formats like 8 Fair Strikes or 10 Progressive Strikes.

Here are the results:




As expected, having only 2 players per game results in shorter rounds, but more rounds are required to compete with other formats. The Kendall tau value for 6 Head-to-Head Strikes is comparable with 8 Fair Strikes, so it’s a solid choice for sifting the best players to the top of the pile.

Unfortunately, the TGP formula heavily penalizes 2-player games, and 6 Head-to-Head Strikes is drastically inefficient in TGP-per-time compared to all other popular formats, except at very low player counts (like fewer than 10 players). With 150 players, 6 Head-to-Head Strikes doesn’t even reach 23 TGP, despite having the longest average duration.

EDIT: Added Kendall tau vs. duration plot, which better illustrates how 6 Head-to-Head Strikes is very comparable to the main popular formats in its ability to sort players by skill.

Thanks! just strengthens my conviction that 3/4 strikes head to head tournaments truly are the worst.

I think it’s reasonable for the TGP formula to penalize 2-player games. The time to complete a game scales (on average) linearly with n, the number of players. However, the number of player-to-player comparisons of skill increases quadratically with n.

Time        Comparisons        Comparisons per time
-----------------------------------------------
n           n*(n - 1)/2        (n - 1)/2
-----------------------------------------------
2            1                 0.5
3            3                 1
4            6                 1.5
5           15                 2
6           21                 2.5

We learn more information more quickly when there are more players on the same game. The TGP formula is actually being generous when it weights 2-player games as being half as valuable as 4-player games. If we weight the game value by ‘information gained’, 2-player games should be worth only 1/6 as much as 4-player games! Instead, TGP effectively weights the value by game duration (i.e., directly proportional to number of players).

1 Like

misterschu’s comment got me thinking, so I worked up a visualization for ‘TGP time efficiency’:

The most obvious trend is that efficiency goes down as player count goes up. This makes sense, as more players means more games, and greater probability of having one really long game that forces everyone to wait.

Oddly, 1 Single Strike has a tiny edge over the popular formats, which are all clustered tightly. Not what I’d have guessed. The deficiency of Head-to-Head Strikes is apparent, as it falls well short of the other formats by ~25%.

ya, there is no penalization and the TGP system is pretty transparently designed (as far as direct play goes) to scale with tournament time. not just from 2/3/4 player groups earning at 4/6/8% tgp per game played, but also in the way the tgp guide gets updated as different, more time-efficient formats (e.g. flip frenzy), become popular.

that said, 2-player matches have the efficiency of potentially being 5 ball games instead of 6, reducing match time by ~17%, while, basically every 4 player match (except e.g. 0/0/1/1 strikes), at best becomes a 11 ball game instead of 12, ~8% reduced match time – and last player having a walk-off is also less frequent in 4-player.

1 Like

How much time is saved when the last player gets to skip ball 3? Here is the recipe I used to investigate:

  1. Generate 100 players from the lognormal skill distribution previously described.
  2. Discard all but the N most skilled players, then randomize their order.
  3. Generate ball scores for each player, for 3 balls, from the lognormal game score distribution previously described.
  4. Add up the ball scores for all but the last player
  5. If the last player’s ball 1 and ball 2 scores are sufficient to win the game, discard their ball 3 score.
  6. The game duration is approximated as the total of all ball scores (except those discarded).

Distributions

Adding together three lognormal distributions does not create another lognormal distribution. It shrinks the tails and makes the result look more like a normal distribution (a result of the central limit theorem). So keep in mind that the total scores for each player in this ‘simulation’ are effectively coming from a distribution with a smaller right tail than what I used in the main tournament simulation.

As an example of what adding lognormal distributions does, here is a lognormal distrbution (orange) and the result of three lognormals added together (blue), scaled so they have very similar means (20,000 data points for each histogram):

What kind of distribution best approximates single ball scores on a pinball machine is certainly debatable. It likely varies dramatically by game. So keep in mind these limitations. Even the assumption that game duration is proportional to score is almost certainly wrong, especially on modern games.

Results

Using 4 players per game, I simulated 20,000 games with and without the ‘skip the last ball’ rule in effect. The average game duration was 73.796 when the final ball was preferentially skipped, compared to an average of 74.089 when the final ball was always played. That is a reduction in duration of about 0.4%.

Using 2 players per game, I simulated 20,000 games to perform the same comparison. The average game duration was 39.532 when the final ball was preferentially skipped, compared to an average of 41.093 when the final ball was always played. That is a reduction in duration of about 3.8%.

The overall reduction in time is quite small. Why would this be? A confluence of many factors is needed to save time. The last player needs either exceptionally good scores on balls 1 or 2, or to be significantly more skilled than the other player(s). This is unlikely because I structured the simulation to match the N most skilled players against each other. In other words, I looked at only matches between the best players in the tournament. Additionally, I could have used a ball score distribution with a larger right tail than a lognormal distribution has.

another limiting factor is that in a round of multiple matches, only the longest match affects round duration, so plenty of instances of time-savings from 5 or 11 ball games (walk-off wins) reduce an individual match duration without reducing tournament round duration.

I don’t know if that was part of your simulation, it’s not clear to me what values you used for N or when N was varied.

1 Like

Maybe I’m missing something, but is that what the data is showing? The logic behind the 2X value for 4P groups was that it takes longer, therefore should be worth more. In your “Avg Kendall Tau vs. Avg Duration” plot, you’re getting comparable Tau per Duration for a H2H as other formats.

If 6X H2H does just as good a job at sorting the players as the other formats and takes an equivalent amount of time, doesn’t that make the penalty unwarranted?

I agree with your view that a 4P match gets more data points than a single 2P match (it’s basically the equivalent of six 2p games at once), though I disagree that information scales linearly with the number of comparisons, because not every comparison provides equivalent information. For example, two equally skilled opponents provide more information than two very differently skilled players. You can actually compute the Fischer information of every comparison from your player skill values if you want go down that rabbit hole (but I think Tau is plenty fine).

It is interesting that 2P head-to-head doesn’t take 6 times as many rounds to get equivalent Tau as 4P formats. I’d bet that some of the 4P formats produce a lot of “low-information” comparisons. Moreover, several strikes formats toss comparisons (e.g., when two players both avoid/get strikes). In contrast, for H2H every comparison matters, and assuming swiss pairing, may be more efficient at creating high information pairings.

I was unclear. N here represents the number of players per game. I tested N=4 and N=2. I looked at 20,000 individual games, not in the context of any tournament. Very good point about many 5- or 11-ball games having no effect on tournament length! I glossed over this by assuming the top N players’ game was the only relevant one.

My assumptions regarding the overall system are:

  • Top players will prefer less chaotic formats (i.e., higher tau value) and ones with higher TGP value
  • Tournament organizers and players alike will prefer more time-efficient formats
  • The rules-makers want a diversity of formats

The rules-makers therefore have to provide reasonably balanced TGP-per-time across formats, or else everyone is incentivized to pick the ‘one efficient format’.

Perhaps Head-to-Head tournaments are viewed unfavorably by the rules-makers out of historical bias? Not sure about that one. They do seem to be punished unduly by the existing rules

I didn’t mean to suggest that number of skill comparisons within a game ought to be the basis of anything. It’s just a narrow lens for looking at the problem. Good point about not every comparison being equally meaningful. As you say, many formats discard information, but it seemingly has little effect on the overall ability of the tournament to sort players. Fair Strikes, for example, seems just as good as Progressive Strikes, if not better.

As for the Fisher information stuff (and Bayesian probability in general), it’s definitely a rabbit hole I’d rather not go down if without necessity. My model is admittedly crude, but I think it has some comparative power. I’m certainty open to other player skill and game score distributions. Lognormal is just the first well-behaved all-positive distribution that popped into my head.

Well, no one picks H2H because it provides terrible returns on TGP. But as your sims show – they can do just as good, if not better, than 4P when it comes to accurately ranking players by skill. So it seems to me like you have some nice evidence that TGP is extremely wrong when it comes to valuing head-to-head strikes tournaments, which is very cool!

In what sense? bad TGP per round, but should be about the same TGP per duration, 2 player games taking half has long as 4 player games.

Unless the sim’s duration estimates are off. Not sure how accurate the sim estimates are.

H2H strikes are pretty uncommon though right? That was my understanding at least. I’ve been sort of out of the loop lately…

1 Like