Basic of Statistics

Siddhartha Purwar
4 min readJul 17, 2021

--

Statistics, Measure of Central Location, Variability.

If you get it, then this article is not for you :)

Statistics

It is the study of data(that can be numerical or in another form) to get useful insight from it. Data is everywhere, we create data and make decisions based on that. But most of the time the information that we want, cannot be directly interpreted from looking at the data(such data is called raw data). So we use some especially mathematical tools to better understand the data, and this is the work of a statistician.

Statistics can be classified into two areas based on what we want to know from the data.

  1. Inferential statistics

This type of statistics is used when we want to project the study of a small group into a large group, OR we want to infer some assumption about the population from the sample.

What are population and sample?

Imagine you are interested in knowing the average height of men and women in India. So all the men and women living in India will be your population. But there is a problem that you may guess, it is impossible to ask every individual man and woman for their height. One very effective solution to this problem is to focus on small fractions of the population which we call a sample. As a good statistician, you should choose to find average height of men and women from a smartly chosen sample and then use inferential statistic to conclude the result for the whole population

2. Descriptive statistics

This type of statistics is used when we want to know about the value in data like mean, mode, range, etc. In another word, if there are a thousand values in a data sample then you may use mean and variance to describe the whole data set.

from here onward I will be focusing on descriptive statistics

The measure of Central Location

Any normal dataset may contain thousands of values and we use a single value to represent the whole dataset and such values are known as measures of central location or measure of central tendency.

Let's continue our example where we want to know the average height of men and women in India. But why do we need average height and how it can help us? We may have many values in our dataset from the sample and we need a representative value. Having a representative value for a dataset will narrow our future study on that sample to a single value.

Let's understand it with one more example. Suppose a scooter-making company wants to know what should be the height of the seat from the ground. They will use the mean height of men and women to decide the height of the seat. This will leverage the headache of manufacturing the scooter according to individuals heights and just using average height as standard height.

There are 3 most commonly used measures of central location

Mean — This is the arithmetic average of the data.

Median — This is the middle observation in a dataset with ascending or descending order.

Mode — This is the most frequently observed value in the dataset.

Variability

Take two sample

Sample A: 30 40 50 60 70

Sample B: 10 30 50 70 90

If you try to find out the central value, let's say the mean for both samples the mean is the same that is 50. But we can clearly say that both samples are too different. So blindly interpreting that both samples are the same statistical is wrong.

But what is the difference between the above two samples? It is the variation of the values around the mean. Although both samples have the same mean, but in sample A there is less difference in value and the mean as compared to sample B.

More variability of the dataset means data is more dispersed.

We use variation, range, Interquartile range for finding a numerical value to represent the variation of the data.

I am stopping here. Thanks for reading ☺

In case you have any doubt or want to know further, you may leave a comment or write me a mail. I will be happy to help you

--

--

Siddhartha Purwar
Siddhartha Purwar

Written by Siddhartha Purwar

"Data data data, I can't make bricks without clay"

No responses yet