# Understanding and Implementing the hypergeometric test in python

## I. Understanding the Hypergeometric Distribution

The hypergeometric distribution describes the probability of events in the following scenario:

• The population size, usually denoted N.
In this case, the total number of marbles in the jar: 100.
• The number of “successes” in the population, usually denoted K.
In this case, the number of red marbles in the jar: 10.
• The sample size, usually denoted n.
In this case, the number of draws from the jar: 10.

## II. The Hypergeometric Test

Suppose we suspect that this is no regular jar, and despite their fewer number, we anticipate drawing a disproportionate number of red marbles.
We draw 10 marbles, of which 7 are red (X = 7), and we’re interested to know how unlikely such a result is to occur by chance.

## III. Implementing the Hypergeometric Test in Python

Thanks to the great work of the open-source contributors over at scipy, implementing this test is no trouble at all, but deserves an explanation.

• M is the population size (previously N)
• n is the number of successes in the population (previously K)
• N is the sample size (previously n)
• X is still the number of drawn “successes”.
`from scipy.stats import hypergeompval = hypergeom.sf(x-1, M, n, N)`

## V. Appendix

• The hypergeometric distribution is the lesser-known cousin of the binomial distribution, which describes the probability of k successes in n draws with replacement. The hypergeometric distribution describes probabilities of drawing marbles from the jar without putting them back in the jar after each draw.
• The hypergeometric probability mass function is given by (using the original variable convention)

--

--

## More from Alex Lenail

conscious mammalian organism, fanatical tea snob.

Love podcasts or audiobooks? Learn on the go with our new app.

## Get the Medium app

conscious mammalian organism, fanatical tea snob.