Deriving twelve probability distributions

May 15, 2023

In my studies I have encountered commonly only 5 discrete and 7 continuous distributions.

Uniform (discrete and continuous)

There is little to be said of the uniform distributions, except that they are the most basic of the distributions. In fact, the uniform distribution forms the basis of scenarios which help derive the following distributions.

Normal

The Normal distribution is forced by two requirements.

  1. Its joint distribution, PX1,X2(x1,x2)P_{X_1,X_2}(x_1,x_2) sees independent X1,X2X_1,X_2.
  2. PX1,X2P_{X_1,X_2} is rotationally symmetric.

Of note is the integral

0ex2/2dx\int_0^\infty e^{-x^2/2}dx

which via transformation into polar coordinates shall equal 2π\sqrt{2\pi}, thus giving the normal distribution its constant.

Geometric, Exponential, Gamma, and Poisson

Consider this scenario:

A bakery sees on average λ<1\lambda<1 customers per minute, whose arrival times are distributed uniformly.

Questions on this scenario lead to the four distributions.

Geometric: One way this scenario could be fulfilled is by the following underlying mechanic: with probability λ\lambda, one customer arrives in each minute. Otherwise, no customer arrives in that minute. In which minute will the next customer arrive?

We take PDF P(x)P(x) as the probability that the next customer arrives in the xx-th minute. It must be then that for x1x-1 minutes prior, no customers arrive:

P(x)=(1λ)x1λ.P(x)=(1-\lambda)^{x-1}\lambda.

It is self-evident that the mean should be 1/λ1/\lambda. The variance is not easy to compute, but is (1λ)/λ2(1-\lambda)/\lambda^2, and is had by separating E[X2]=E[X(X1)]+E[X]E[X^2]=E[X(X-1)]+E[X].

Exponential: Dropping the geometric assumption earlier, how many minutes will pass before the next customer arrives?

The Exponential distribution is simply the continuous analogue of the geometric. Should we divide a minute into nn moments, we would expect λ/n\lambda/n customers per moment. Alternatively, with large enough nn, we expect a customer with probability λ/n\lambda/n.
For xx minutes to pass before the next customer arrives, then, we would see xnxn moments with no customer and a final moment with a customer:

P(x)limn(1λ/n)xn(λ/n)eλx,P(x)\approx \lim_{n\to\infty}(1-\lambda/n)^{xn}(\lambda/n)\propto e^{-\lambda x},

recalling that ex=limn(1+x/n)ne^x=\lim_{n\to\infty}(1+x/n)^n.

Indeed, P(x)P(x) must sum to 11, and we have

0eλx=[eλx/λ]0=1/λ\int_0^\infty e^{-\lambda x}=\Big[-e^{-\lambda x}/\lambda\Big]_0^\infty=1/\lambda

so P(x)P(x) must equal λeλx\lambda e^{-\lambda x}.

As the continuous analogue of the geometric, the mean should remain the same, at 1/λ1/\lambda. The variance is computed by a double integration by parts, and is 1/λ21/\lambda^2.

Gamma: How many minutes will pass before the next nn customers arrive?

Immediately, this is simply the sum of nn exponential distributions, and by its name, must invoke the Gamma function. To intuit the Gamma function, consider how it must extend the factorial function with an integral. Certainly, this integral must be evaluated by cascading integration by parts:

Γ(n+1)=0xnexdx.\Gamma(n+1)= \int_0^\infty x^ne^{-x} dx.

Perhaps the x-x in the exponent is unintuitive. But consider this: all the trash terms in the integration by parts must vanish, leaving only the factorial term remaining. xnx^n vanishes the 00 limit, so the ee term must vanish the \infty limit. Hence, the exponent must be negative.

Alternatively, consider the weak Sterling’s approximation derived earlier:

Γ(n+1)f(n)=n!(n/e)n.\Gamma(n+1)\approx f(n)=n!\approx (n/e)^n.

Given that Γ\Gamma must be an integral, the inside of the integral must then be similar to the derivative of (n/e)n(n/e)^n. Though weakly linked, there is indeed some similarity in form between xnexdxx^ne^{-x}dx and (n/e)n(n/e)^n.

To intuit the PDF PG(x)P_G(x) of the Gamma distribution, we take first the sum of two exponential RVs with PDF P(x)P(x):

P(Gamma(n=2,λ)=x)=PG(x)i=0xP(i)P(xi)di=i=0xλ2eλieλ(xi)di=λ2[eλxi]i=0x=λ2eλxx=λnxn1eλx\begin{aligned} &\mathbb{P}(Gamma(n=2,\lambda)=x)=P_G(x)\\ &\propto\int_{i=0}^x P(i)P(x-i)di\\ &=\int_{i=0}^x \lambda^2 e^{-\lambda i}e^{-\lambda(x-i)} di\\ &=\lambda^2\Big[e^{-\lambda x}i\Big]_{i=0}^x\\ &=\lambda^2 e^{-\lambda x}x\\ &=\lambda^n x^{n-1}e^{-\lambda x} \end{aligned}

of which the final line may be inferred from the general form on the line prior. It remains to scale this PDF to sum to unity. To do this, of course, we invoke the Γ\Gamma function, with a clever substitution y=λxy=\lambda x:

PG(x)=λn(y/λ)n1eyx=0PG(x)dx=x=0yney(1/x)dx=y=0yney(1/x)(1/λ)dy=y=0yn1eydy=Γ(n).\begin{aligned} P_G(x)&=\lambda^n (y/\lambda)^{n-1}e^{-y}\\ \int_{x=0}^\infty P_G(x)dx&=\int_{x=0}^\infty y^ne^{-y}(1/x)dx\\ &=\int_{y=0}^\infty y^ne^{-y}(1/x)(1/\lambda)dy\\ &=\int_{y=0}^\infty y^{n-1}e^{-y}dy\\ &=\Gamma(n). \end{aligned}

And thus we scale the proportional PDF earlier by 1/Γ(n)1/\Gamma(n):

PG(x)=λnxn1eλx/Γ(n).P_G(x)=\lambda^nx^{n-1}e^{-\lambda x}/\Gamma(n).

In other literature, you may see the variables nn replaced with α\alpha and λ\lambda replaced with β\beta.

As the Gamma distribution is the sum of i.i.d. exponential distributions, its mean is necessarily n/λn/\lambda and its variance n/λ2n/\lambda^2.

Poisson: Suppose now that λ1\lambda \geq 1, for ease of notation. How many customers will arrive in a minute?

Again, the approach of taking number of moments nn\to\infty is fruitful. We have for large nn moments per minute that with probability λ/n\lambda/n a customer is slated to arrive. So for xx customers to arrive in the minute, xx of those nn moments must have a customer, and the rest must not.

P(x)=limn(nx)(λ/n)x(1λ/n)nx=limnn!x!(nx)!(λ/n)x(1λ/n)xeλ=eλlimnn!x!(nx)!(λnnnλ)x=eλλxx!limnn(n1)(nx+1)(nλ)x=λxeλ/x!.\begin{aligned} P(x)&= \lim_{n\to\infty}{n\choose x}(\lambda/n)^x(1-\lambda/n)^{n-x}\\ &=\lim_{n\to\infty}\frac{n!}{x!(n-x)!}(\lambda/n)^x(1-\lambda/n)^{-x}e^{-\lambda}\\ &=e^{-\lambda}\lim_{n\to\infty}\frac{n!}{x!(n-x)!}(\frac{\lambda}{n}\cdot\frac{n}{n-\lambda})^x\\ &=e^{-\lambda}\frac{\lambda^x}{x!}\lim_{n\to\infty}\frac{n\cdot(n-1)\cdots(n-x+1)}{(n-\lambda)^x}\\ &=\lambda^x e^{-\lambda}/x!. \end{aligned}

By definition, the mean is λ\lambda. The variance, perhaps surprisingly, is also λ\lambda. The intuition to this lies in the binomial distribution: that when nn intervals are used, the mean is n(λ/nP(i)P(xi)di=i=0xλ2eλieλ(xi)di=λ2[eλxi]i=0x=λ2eλxx=λnxn1eλx

of which the final line may be inferred from the general form on the line prior. It remains to scale this PDF to sum to unity. To do this, of course, we invoke the Γ\Gamma function, with a clever substitution y=λxy=\lambda x:

PG(x)=λn(y/λ)n1eyx=0PG(x)dx=x=0yney(1/x)dx=y=0yney(1/x)(1/λ)dy=y=0yn1eydy=Γ(n).\begin{aligned} P_G(x)&=\lambda^n (y/\lambda)^{n-1}e^{-y}\\ \int_{x=0}^\infty P_G(x)dx&=\int_{x=0}^\infty y^ne^{-y}(1/x)dx\\ &=\int_{y=0}^\infty y^ne^{-y}(1/x)(1/\lambda)dy\\ &=\int_{y=0}^\infty y^{n-1}e^{-y}dy\\ &=\Gamma(n). \end{aligned}

And thus we scale the proportional PDF earlier by 1/Γ(n)1/\Gamma(n):

PG(x)=λnxn1eλx/Γ(n).P_G(x)=\lambda^nx^{n-1}e^{-\lambda x}/\Gamma(n).

In other literature, you may see the variables nn replaced with α\alpha and λ\lambda replaced with β\beta.

As the Gamma distribution is the sum of i.i.d. exponential distributions, its mean is necessarily n/λn/\lambda and its variance n/λ2n/\lambda^2.

Poisson: Suppose now that λ1\lambda \geq 1, for ease of notation. How many customers will arrive in a minute?

Again, the approach of taking number of moments nn\to\infty is fruitful. We have for large nn moments per minute that with probability λ/n\lambda/n a customer is slated to arrive. So for xx customers to arrive in the minute, xx of those nn moments must have a customer, and the rest must not.

P(x)=limn(nx)(λ/n)x(1λ/n)nx=limnn!x!(nx)!(λ/n)x(1λ/n)xeλ=eλlimnn!x!(nx)!(λnnnλ)x=eλλxx!limnn(n1)(nx+1)(nλ)x=λxeλ/x!.\begin{aligned} P(x)&= \lim_{n\to\infty}{n\choose x}(\lambda/n)^x(1-\lambda/n)^{n-x}\\ &=\lim_{n\to\infty}\frac{n!}{x!(n-x)!}(\lambda/n)^x(1-\lambda/n)^{-x}e^{-\lambda}\\ &=e^{-\lambda}\lim_{n\to\infty}\frac{n!}{x!(n-x)!}(\frac{\lambda}{n}\cdot\frac{n}{n-\lambda})^x\\ &=e^{-\lambda}\frac{\lambda^x}{x!}\lim_{n\to\infty}\frac{n\cdot(n-1)\cdots(n-x+1)}{(n-\lambda)^x}\\ &=\lambda^x e^{-\lambda}/x!. \end{aligned}

By definition, the mean is λ\lambda. The variance, perhaps surprisingly, is also λ\lambda. The intuition to this lies in the binomial distribution: that when nn intervals are used, the mean is n(λ/n)=λn(\lambda/n)=\lambda, but the variance is n(λ/n)(1λ/n)n(\lambda/n)(1-\lambda/n), but 1λ/n1-\lambda/n vanishes to unity as nn goes to \infty, and so the variance is the same as the mean, at λ\lambda.

Notably, the Poisson distribution is also the limit of the Binomial distribution taken nn\to\infty.

Bernoulli, Binomial, and Beta

There is little to be said of the Bernoulli and Binomial: it is too common, and it is easily derived. The mean is npnp, and the variance np(1p)np(1-p), which is useful in gaining intuition for the Poisson distribution.

To derive the Beta distribution, consider the following scenario.

A possibly unfair coin is flipped 1414 times, and it comes up heads 1010 times and tails 44 times. Suppose if you guess the bias of