Derivation and Intuition behind Poisson distribution

Cutting out all the bs, let's jump on deriving the Poisson distribution in an intuitive way using a simple example, applying the concepts of Bernoulli's distribution, the Binomial theorem, L'Hôpital's rule and Taylor Series. A classic way to approach this is by considering the probability of observing a certain number of rare events occurring in a fixed interval of time or space, with these events occurring independently and at a constant average rate.

Example: Modeling Rare Events with Subintervals

Suppose, we have an interval of 1 hour, and we want to model the number of cars that arrive in a quiet road during this hour. Let’s divide this hour into $n$ very small subintervals. In each subinterval, a car can either arrive or not arrive.

Total number of subintervals:

We divide the hour into $n$ subintervals. Each subinterval is very small (think of it as a few seconds or less), so the probability of a car arriving in any given subinterval is small.

2. Bernoulli trial in each subinterval:

In each small subinterval, a car can either arrive (success) or not (failure). Since each subinterval is very short, the probability of a car arriving in a single subinterval is small. Each subinterval represents a Bernoulli trial, where:

The probability that a car arrives in any subinterval is $p$. Here, $p$ is very small because the intervals are tiny.
The probability that no car arrives in that subinterval is $1−p$.

Suppose you want to model the number of times an event (e.g., a car arriving at a toll booth) occurs in an interval of time, say 1 hour. Let’s break this 1-hour interval into $n$ very small subintervals, each of length $\frac{1}{n}$ .

Binomial Distribution for Total Arrivals:

Now, the total number of cars arriving in the full hour is the sum of all these small Bernoulli trials. Since there are $n$ independent subintervals, the total number of cars in the entire hour follows a

Binomial distribution:

$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} $$

where:

$X$ is the total number of cars arriving in the full hour.
$k$ is the number of arrivals (successes).
p is the probability that a car arrives in a single subinterval.
(1−p) is the probability that no car arrives in a single subinterval.

Poisson Approximation Setup:

We want to move towards the Poisson process, which models the number of rare events in a continuous time interval. To do this, we'll let: