# Statistics Basics – Measures of Central Tendency & Measures of Variability

September 26, 2017 1 Comment

Measures of Central Tendency and Measures of Variability are frequently used in data analysis. This post provides simple definitions of the common measures.

**Measures of Central Tendency**

**Mean / Average** – sum of all data points or observations in a dataset divided by the total number of data points or observations in the dataset.

The mean or average of this dataset with 5 numbers {2, 4, 6, 8, 10} is: 6

Sum of all data points: (2+4+6+8+10)

Divided by: ———————– = 6

Number of data points: 5

**Median **– with the values (data points) in the dataset listed in increasing (ascending) order, the median is the midpoint of the values, such that there are an equal number of data points above and below the median. If there are an odd number of data points in the dataset, then the median value will be a single midpoint value. If there an even number of data points in the dataset, then the median value will be the mean/average of the two midpoint values.

The median of the same dataset {2, 4, 6, 8, 10} is: 6

This dataset has an odd number of data points (5), and the middle data point is the value 6, with 2 numbers below (2, 4) and 2 numbers above (8, 10).

Using an example of a dataset with an even number of data points:

The median of this dataset {2, 4, 6, 8, 10, 12} is: (6 + 8) / 2 = 7

Since there are 2 middle data points (6, 8), then we need to calculate the mean of those 2 numbers to determine the median.

**Mode **– the data point that appears the most times in the dataset.

Using our original dataset {2, 4, 6, 8, 10}, since each of the values only appear once, none appearing more times than the others, this dataset does not have a mode.

Using a new dataset {2, 2, 4, 4, 4, 4, 6, 8, 8, 8, 10}, the Mode in this case is: 4

4 is the value that appears the most times in the dataset.

**Measures of Variability**

**Min **– the minimum value of the all values in the dataset.

Min {2, 3, 3, 4, 5, 5, 5, 6, 7, **1**, 3, 2, 7, 7, 8, 2, 3, 9} is 1.

**Max **– the maximum value of the all values in the dataset.

Max {2, 3, 3, 4, 5, 5, 5, 6, 7, 1, 3, 2, 7, 7, 8, 2, 3, **9**} is 9.

**Variance **– a calculated value that quantifies how close or how dispersed the values in the dataset are to/from their average/mean value. It is the average of the squared differences from the mean.

Variance of {2, 3, 4, 5, 6} is calculated as follows …

First find the Mean. Mean = (2 + 3 + 4 + 5 + 6) / 5 = 4

Then, find the Squared Differences from the Mean … where ^2 means squared …

(2 – 4)^2 = (-2)^2 = 4

(3 – 4)^2 = (-1)^2 = 1

(4 – 4)^2 = (0)^2 = 0

(5 – 4)^2 = (1)^2 = 1

(6 – 4)^2 = (2)^2 = 4

Average of Squared Differences: (4 + 1 + 0 + 1 + 4) / 5 = 2

**Standard Deviation** – a calculated value that quantifies how close or how dispersed the values in the dataset are to/from each other. It is the square root of the **Variance** (defined above).

For the above dataset, Standard Deviation {2, 3, 4, 5, 6} = Square Root (2) =~ 1.414

**Kurtosis **– a calculated value that represents how close the tail of the distribution of the dataset is to the tail of a normal distribution*.

**Skewness **– a calculated value that represents how close the symmetry of the distribution of the dataset is to the symmetry of a normal distribution*.

* A normal distribution, also known as the bell curve, is a probability distribution in which most values are toward the center (closer to the average) and less and less observations occur as you go further from the center.

**Range **– the difference between the largest number in the dataset and the smallest number in the dataset.

Range {2, 4, 6, 8, 10} = 10 – 2 = 8

Thanks for reading!

Really, this is a great job ! thanks a lot for shared… I read each step and I know new interesants things…