Skip to content

teepeemm/bracket

Repository files navigation

Seeding Statistics of Elimination Tournaments

This repository analyzes elimination tournaments in Wikipedia, and is the source for
Prescott, Timothy. 2025. "Seeding Statistics of Elimination Tournaments." Scatterplot 2 (1). https://doi:10.1080/29932955.2025.2523666.
If you're not as programmatically inclined, you can use a GUI with JavaScript

I've tried to be consistent on how to handle renaming and merging, but may not have entirely succeeded. It's also possible that a .txt file was downloaded under slightly different conditions in tourneys.json.

From start to finish, python analyze.py takes about 30 (wild guess) minutes. But it saves its results along the way, so if you get bored you can interrupt and restart the process. You can also do this if Pywikibot's throttling starts to become too onerous.

The documentation of the Python and Javascript files can be generated by pdoc -o docs *.py and jsdoc -c jsdoc.conf.json, respectively.

There are a large number of files created along the way that are .gitignored:

  • {group}/{tournament}/{year}.txt: the content of that year's entry in Wikipedia
  • {group}/{tournament}/None.txt: same as above, but all the tournaments are on one page
  • [group/[tournament/]]state.csv Each (normalized) university and its state
  • [group/[tournament/]]winloss.csv The matrix of counts of seed defeating seed
  • {group}/{tournament}/winlossplot.tex
  • {group}/{tournament}/winlossprobs.tex

Reseeding files:

  • [group/[tournament/]]reseed.csv How much each university should be reseeded
  • [group/]reseed_filtered.csv Same as reseed, but trimmed down so that points not plotted by TeX don't appear (it had trouble with the file size)
  • [group/]reseed_approx.csv Same as reseed, but a linear approximation of its components
  • [group/[tournament/]]state_reseed.csv Same as reseed, but grouped by state
  • [group/[tournament/]]tz_reseed.csv Same as reseed, but grouped by timezone

To cite the paper, you can use

@article{Prescott31122025,
    author = {Timothy Prescott},
    title = {Seeding Statistics of Elimination Tournaments},
    journal = {Scatterplot},
    volume = {2},
    number = {1},
    pages = {2523666},
    year = {2025},
    publisher = {Taylor \& Francis},
    doi = {10.1080/29932955.2025.2523666},
    URL = {https://doi.org/10.1080/29932955.2025.2523666},
    eprint = {https://doi.org/10.1080/29932955.2025.2523666},
    abstract = {We develop and provide Python code and a website
        to statistically analyze seedings in elimination tournaments.
        We are able to apply this code to fifty-eight thousand games
        to estimate the probability of an upset solely as a logistic
        function of the difference in seeding.
        We are also able to examine how well or poorly a team
        performs compared to its seeding.
        We conclude that the only team that is consistently
        underrated is \textbackslash your\_favorite\_team,
        while the only team that is consistently overrated is
        \textbackslash your\_hated\_rival.}
}

Non BibTeX citation styles are also available at the Scatterplot website.

You can also use the link at the top right of the page to "Cite this repository".

Releases

No releases published

Packages

No packages published