Mean vs. Median

Suppose we have a set of values $S$ , defined as:

$\begin{eqnarray*} S &=& \{31, 61, 45, 212, 10, 21\} \end{eqnarray*}$

if we were to compute the arithmetic mean, we would obtain:

$\begin{eqnarray*} M_{a} &=& \frac{S}{6} \\ &\approx& 63.3(3) \end{eqnarray*}$

However, notice that this mean does not really look entirely "right" and that is due to the exceptional value $212$ which drags the mean towards a very high value compared to the rest of the values in the set.

This is why, it is useful sometimes to compute the median which gives a more correct overview of the average of the set. In order to compute the median, we sort the set $S$ in increasing order, thereby obtaining $S'$ :

$\begin{eqnarray*} S' &=& \{10, 21, 31, 61, 45, 212\} \end{eqnarray*}$

and calculate the arithmetic mean between the two values in the middle of the set:

$\begin{eqnarray*} \mathtt{median} &=& \frac{31+51}{2} \\ &=& 46 \end{eqnarray*}$

which gives us a median that is a better presentation of the middle of the original set $S$ .

Standard Deviation

The standard deviation can be thought of measuring how far the data values are spread from the mean.

Let there be a random variable with mean value $\mu$

$\begin{eqnarray*} E[X] &=& \mu \end{eqnarray*}$

where the operator $E$ denotes the average or expected value of $X$ , then the standard deviation of $X$ is the quantity:

$\begin{eqnarray*} \sigma &=& \sqrt{E[X^{2}] - (E[X])^{2}} \end{eqnarray*}$

Example

Given a set $S$ of measurements with the values:

$\begin{eqnarray*} S &=& \{3, 4, 5, 1, 2\} \end{eqnarray*}$

Calculate the arithmetic average $m$ of the elements of the set:

$\begin{eqnarray*} m &=& \frac{3 + 4 + 5 + 1 + 2}{5} \\ &=& 3 \end{eqnarray*}$

Calculate the difference squared between each element of the set $S$ and the mean $m$ to obtain the set $S'$ :

$\begin{eqnarray*} S' &=& \{(3-3)^{2}, (4-3)^{2}, (5-3)^{2}, (1-3)^{2}, (2-3)^{2}\} \\ &=& \{0, 1, 4, 4, 1\} \end{eqnarray*}$

By summing up the elements of $S'$ ( $0+1+4+4+1=10$ ) and then dividing by the number of elements minus one we obtain the variance $v$ :

$\begin{eqnarray*} v &=& \frac{10}{4} \\ &=& 2.5 \end{eqnarray*}$

and the standard deviation $\sigma$ is the square root of the variance:

$\begin{eqnarray*} \sigma &=& \sqrt{v} \\ &=&\sqrt{2.5} \\ &\approx& 1.58 \end{eqnarray*}$

By subtracting and adding the standard deviation from the mean we found out the bounds between which most values are placed.

For the lower bound, we have:

$\begin{eqnarray*} m - \sigma \\ &=& 3 - 1.58 \\ &=& 1.42 \end{eqnarray*}$

For the upper bound, we have:

$\begin{eqnarray*} m + \sigma \\ &=& 3 + 1.58 \\ &=& 4.58 \end{eqnarray*}$

which means that most values are placed between $1.42$ and $4.58$ .

Chebyshev's Theorem

The fraction of any set of numbers lying within k-standard deviation of those numbers of the mean of those numbers is at least:

$\begin{eqnarray*} 1-\frac{1}{k^{2}} \end{eqnarray*}$

where:

$\begin{eqnarray*} k &=& \frac{\mathtt{within}}{\sigma} \end{eqnarray*}$

and

$\begin{eqnarray*} k &>& 1 \end{eqnarray*}$

Example

Taking values for $k$ ( $>1$ ), we can observe the following:

for $k=2$ , we obtain $0.75$ , meaning that at least $75%$ of the values must be within two standard deviations from the mean.
for $k=3$ , we obtain $89%$ , meaning that at least $89%$ of the values must be within three standard deviations from the mean.
for $k=4$ , we obtain $93.75%$ , meaning that at least $93.75%$ of the values must be within four standard deviations from the mean.

Given the previous example, we know that the mean is $m=3$ and the standard deviation is $\sigma\approx1.58$ . Now we take values: for $k=2$ , we have $k=2*\sigma=3.16$ and, subtracting that value from the mean, we obtain the lower bound: $l=m-3.16=-0.16$ ; then we add the value to the mean, obtaining the upper bound: $h=m+3.16=6.16$ meaning that at least $75%$ of the values must be between the lower bound $l=-0.16$ and the upper bound $l=6.16$ . The same can be performed for other values of $k$ .

Table of Contents

Mean vs. Median

Standard Deviation

Example

Chebyshev's Theorem

Example