Table of Contents

Mean vs. Median

Suppose we have a set of values $S$, defined as:

\begin{eqnarray*}
S &=& \{31, 61, 45, 212, 10, 21\}
\end{eqnarray*}

if we were to compute the arithmetic mean, we would obtain:

\begin{eqnarray*}
M_{a} &=& \frac{S}{6} \\
&\approx& 63.3(3)
\end{eqnarray*}

However, notice that this mean does not really look entirely "right" and that is due to the exceptional value $212$ which drags the mean towards a very high value compared to the rest of the values in the set.

This is why, it is useful sometimes to compute the median which gives a more correct overview of the average of the set. In order to compute the median, we sort the set $S$ in increasing order, thereby obtaining $S'$:

\begin{eqnarray*}
S' &=& \{10, 21, 31, 61, 45, 212\}
\end{eqnarray*}

and calculate the arithmetic mean between the two values in the middle of the set:

\begin{eqnarray*}
\mathtt{median} &=& \frac{31+51}{2} \\
&=& 46
\end{eqnarray*}

which gives us a median that is a better presentation of the middle of the original set $S$.

Standard Deviation

The standard deviation can be thought of measuring how far the data values lie from the mean.

Let there be a random variable with mean value $\mu$

\begin{eqnarray*}
E[X] &=& \mu
\end{eqnarray*}

where the operator $E$ denotes the average or expected value of $X$, then the standard deviation of $X$ is the quantity:

\begin{eqnarray*}
\sigma &=& \sqrt{E[X^{2}] - (E[X])^{2}}
\end{eqnarray*}

Example

Given a set $S$ of measurements with the values:

\begin{eqnarray*}
S &=& \{3, 4, 5, 1, 2\}
\end{eqnarray*}

Calculate the arithmetic average $m$ of the elements of the set:

\begin{eqnarray*}
m &=& \frac{3 + 4 + 5 + 1 + 2}{5} \\
&=& 3
\end{eqnarray*}

Calculate the difference squared between each element of the set $S$ and the mean $m$ to obtain the set $S'$:

\begin{eqnarray*}
S' &=& \{(3-3)^{2}, (4-3)^{2}, (5-3)^{2}, (1-3)^{2}, (2-3)^{2}\} \\
&=& \{0, 1, 4, 4, 1\}
\end{eqnarray*}

By summing up the elements of $S'$ ($0+1+4+4+1=10$) and then dividing by the number of elements minus one we obtain the variance $v$:

\begin{eqnarray*}
v &=& \frac{10}{4} \\
&=& 2.5
\end{eqnarray*}

and the standard deviation $\sigma$ is the square root of the variance:

\begin{eqnarray*}
\sigma &=& \sqrt{v} \\
&=&\sqrt{2.5} \\
&\approx& 1.58
\end{eqnarray*}

By subtracting and adding the standard deviation from the mean we found out the bounds between which most values are placed.

For the lower bound, we have:

\begin{eqnarray*}
m - \sigma \\
&=& 3 - 1.58 \\
&=& 1.42
\end{eqnarray*}

For the upper bound, we have:

\begin{eqnarray*}
m + \sigma \\
&=& 3 + 1.58 \\
&=& 4.58
\end{eqnarray*}

which means that most values are placed between $1.42$ and $4.58$.

Chebyshev's Theorem

The fraction of any set of numbers lying within k-standard deviation of those numbers of the mean of those numbers is at least:

\begin{eqnarray*}
1-\frac{1}{k^{2}}
\end{eqnarray*}

where:

\begin{eqnarray*}
k &=& \frac{\mathtt{within}}{\sigma}
\end{eqnarray*}

and

\begin{eqnarray*}
k &>& 1
\end{eqnarray*}

Example

Taking values for $k$ ($>1$), we can observe the following:

Given the previous example, we know that the mean is $m=3$ and the standard deviation is $\sigma\approx1.58$. Now we take values: for $k=2$, we have $k=2*\sigma=3.16$ and, subtracting that value from the mean, we obtain the lower bound: $l=m-3.16=-0.16$; then we add the value to the mean, obtaining the upper bound: $h=m+3.16=6.16$ meaning that at least $75%$ of the values must be between the lower bound $l=-0.16$ and the upper bound $l=6.16$. The same can be performed for other values of $k$.