Understanding and Implementing the hypergeometric test in python

I. Understanding the Hypergeometric Distribution

  • The population size, usually denoted N.
    In this case, the total number of marbles in the jar: 100.
  • The number of “successes” in the population, usually denoted K.
    In this case, the number of red marbles in the jar: 10.
  • The sample size, usually denoted n.
    In this case, the number of draws from the jar: 10.

II. The Hypergeometric Test

III. Implementing the Hypergeometric Test in Python

  • M is the population size (previously N)
  • n is the number of successes in the population (previously K)
  • N is the sample size (previously n)
  • X is still the number of drawn “successes”.
from scipy.stats import hypergeom
pval = hypergeom.sf(x-1, M, n, N)

V. Appendix

  • The hypergeometric distribution is the lesser-known cousin of the binomial distribution, which describes the probability of k successes in n draws with replacement. The hypergeometric distribution describes probabilities of drawing marbles from the jar without putting them back in the jar after each draw.
  • The hypergeometric probability mass function is given by (using the original variable convention)




conscious mammalian organism, fanatical tea snob.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

At what age will you win the Nobel Prize? Let’s Visualize using R.

Diferentes métodos para una simulación de Montecarlo

My Reversim Summit 2021 Top Picks

Distribution of a categorical variable

Forgetful pandas 🐼

sleeping panda

Creating VGG from Scratch using Tensorflow

Fake News Detection Project

Random Forest Overview

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alex Lenail

Alex Lenail

conscious mammalian organism, fanatical tea snob.

More from Medium

Pearson’s Correlation

Solving Chi-Squared and ANOVA-Test using Python

Understanding logistic regression

Central Limit Theorum with simplest implementation and explaination