The Central Limit Theorem, tells us that if we take the mean of the samples (n) and plot the frequencies of their mean, we get a normal distribution! And as the sample size (n) increases --> approaches infinity, we find a normal distribution. Central limit theorem can be applied to any distribution.

Reference video:

Central limit theorem (video) | Khan Academy

Explain the Theorem Like I’m Five

Let’s say you are studying the population of beer drinkers in the US. You’d like to understand the mean age of those people but you don’t have time to survey the entire US population.

Instead of surveying the whole population, you collect one sample of 100 beer drinkers in the US. With this data, you are able to calculate an arithmetic mean. Maybe for this sample, the mean age is 35 years old. Say you collect another sample of 100 beer drinkers. For that new sample, the mean age is 39 years old. As you collect more and more means of those samples of 100 beer drinkers, you get what is called a sampling distribution. The sampling distribution is the distribution of the samples mean. In this example, 35 and 39 would be two observations in that sampling distribution.

The statement of the theorem says that the sampling distribution, the distribution of the samples mean you collected, will approximately take the shape of a bell curve around the population mean. This shape is also known as a normal distribution. Don’t get the statement wrong. The CLT is not saying that any population will have a normal distribution. It says the sampling distribution will.

As your samples get bigger, the sampling distribution will tend to look more and more like a normal distribution. The Theorem holds true for any population, regardless of their distribution.

Why is it important?

The Central Limit Theorem is at the core of what every data scientist does daily: make statistical inferences about data.

By knowing that our sample mean will fit somewhere in a normal distribution, we know that 68 percent of the observations lie within one standard deviation from the population mean, 95 percent will lie within two standard deviations, and so on.

The CLT is not limited to making inferences from a sample about a population.

There are four kinds of inferences we can make based on the CLT

We have the information of a valid sample. We can make accurate assumptions about it’s population.
We have the information of the population. We can make accurate assumptions about a valid sample from that population.
We have the information of a population and a valid sample. We can accurately infer if the sample was drawn from that population.
We have the information about two different valid samples. We can accurately infer if the two samples where drawn from the same population.

Sampling distribution of sample mean (Skewness and kurtosis)

Notice how for such a crazy distribution too, we are getting close to normal distribution. As we increase the sample size from 5 to 25, we get closer to normal distribution, skewness decreases and kurtosis increases.