Understanding and Implementing the hypergeometric test in python

Whenever I find a topic I can’t find a sufficiently good tutorial or explanation of online, I feel compelled to offer one. I hope this helps you.

I. Understanding the Hypergeometric Distribution

The hypergeometric distribution describes the probability of events in the following scenario:

  • The number of “successes” in the population, usually denoted K.
    In this case, the number of red marbles in the jar: 10.
  • The sample size, usually denoted n.
    In this case, the number of draws from the jar: 10.

II. The Hypergeometric Test

Suppose we suspect that this is no regular jar, and despite their fewer number, we anticipate drawing a disproportionate number of red marbles.
We draw 10 marbles, of which 7 are red (X = 7), and we’re interested to know how unlikely such a result is to occur by chance.

III. Implementing the Hypergeometric Test in Python

Thanks to the great work of the open-source contributors over at scipy, implementing this test is no trouble at all, but deserves an explanation.

  • n is the number of successes in the population (previously K)
  • N is the sample size (previously n)
  • X is still the number of drawn “successes”.
from scipy.stats import hypergeom
pval = hypergeom.sf(x-1, M, n, N)

V. Appendix

  • The hypergeometric distribution is the lesser-known cousin of the binomial distribution, which describes the probability of k successes in n draws with replacement. The hypergeometric distribution describes probabilities of drawing marbles from the jar without putting them back in the jar after each draw.
  • The hypergeometric probability mass function is given by (using the original variable convention)

conscious mammalian organism, fanatical tea snob.