# Understanding and Implementing the hypergeometric test in python

Whenever I find a topic I can’t find a sufficiently good tutorial or explanation of online, I feel compelled to offer one. I hope this helps you.

## I. Understanding the Hypergeometric Distribution

The hypergeometric distribution describes the probability of events in the following scenario:

• The number of “successes” in the population, usually denoted K.
In this case, the number of red marbles in the jar: 10.
• The sample size, usually denoted n.
In this case, the number of draws from the jar: 10.

## II. The Hypergeometric Test

Suppose we suspect that this is no regular jar, and despite their fewer number, we anticipate drawing a disproportionate number of red marbles.
We draw 10 marbles, of which 7 are red (X = 7), and we’re interested to know how unlikely such a result is to occur by chance.

## III. Implementing the Hypergeometric Test in Python

Thanks to the great work of the open-source contributors over at scipy, implementing this test is no trouble at all, but deserves an explanation.

• n is the number of successes in the population (previously K)
• N is the sample size (previously n)
• X is still the number of drawn “successes”.
from scipy.stats import hypergeom
pval = hypergeom.sf(x-1, M, n, N)

## V. Appendix

• The hypergeometric distribution is the lesser-known cousin of the binomial distribution, which describes the probability of k successes in n draws with replacement. The hypergeometric distribution describes probabilities of drawing marbles from the jar without putting them back in the jar after each draw.
• The hypergeometric probability mass function is given by (using the original variable convention)

conscious mammalian organism, fanatical tea snob.

## More from Alex Lenail

conscious mammalian organism, fanatical tea snob.