Mean vs. Median

Suppose we have a set of values $S$, defined as:

\begin{eqnarray*}
S &=& \{31, 61, 45, 212, 10, 21\}
\end{eqnarray*}

if we were to compute the arithmetic mean, we would obtain:

\begin{eqnarray*}
M_{a} &=& \frac{S}{6} \\
&\approx& 63.3(3)
\end{eqnarray*}

However, notice that this mean does not really look entirely "right" and that is due to the exceptional value $212$ which drags the mean towards a very high value compared to the rest of the values in the set.

This is why, it is useful sometimes to compute the median which gives a more correct overview of the average of the set. In order to compute the median, we sort the set $S$ in increasing order, thereby obtaining $S'$:

\begin{eqnarray*}
S' &=& \{10, 21, 31, 61, 45, 212\}
\end{eqnarray*}

and calculate the arithmetic mean between the two values in the middle of the set:

\begin{eqnarray*}
\mathtt{median} &=& \frac{31+51}{2} \\
&=& 46
\end{eqnarray*}

which gives us a median that is a better presentation of the middle of the original set $S$.

Standard Deviation

The standard deviation can be thought of measuring how far the data values lie from the mean.

Let there be a random variable with mean value $\mu$

\begin{eqnarray*}
E[X] &=& \mu
\end{eqnarray*}

where the operator $E$ denotes the average or expected value of $X$, then the standard deviation of $X$ is the quantity:

\begin{eqnarray*}
\sigma &=& \sqrt{E[X^{2}] - (E[X])^{2}}
\end{eqnarray*}

Example

Given a set $S$ of measurements with the values:

\begin{eqnarray*}
S &=& \{3, 4, 5, 1, 2\}
\end{eqnarray*}

Calculate the arithmetic average $m$ of the elements of the set:

\begin{eqnarray*}
m &=& \frac{3 + 4 + 5 + 1 + 2}{5} \\
&=& 3
\end{eqnarray*}

Calculate the difference squared between each element of the set $S$ and the mean $m$ to obtain the set $S'$:

\begin{eqnarray*}
S' &=& \{(3-3)^{2}, (4-3)^{2}, (5-3)^{2}, (1-3)^{2}, (2-3)^{2}\} \\
&=& \{0, 1, 4, 4, 1\}
\end{eqnarray*}

By summing up the elements of $S'$ ($0+1+4+4+1=10$) and then dividing by the number of elements minus one we obtain the variance $v$:

\begin{eqnarray*}
v &=& \frac{10}{4} \\
&=& 2.5
\end{eqnarray*}

and the standard deviation $\sigma$ is the square root of the variance:

\begin{eqnarray*}
\sigma &=& \sqrt{v} \\
&=&\sqrt{2.5} \\
&\approx& 1.58
\end{eqnarray*}

By subtracting and adding the standard deviation from the mean we found out the bounds between which most values are placed.

For the lower bound, we have:

\begin{eqnarray*}
m - \sigma \\
&=& 3 - 1.58 \\
&=& 1.42
\end{eqnarray*}

For the upper bound, we have:

\begin{eqnarray*}
m + \sigma \\
&=& 3 + 1.58 \\
&=& 4.58
\end{eqnarray*}

which means that most values are placed between $1.42$ and $4.58$.

Chebyshev's Theorem

The fraction of any set of numbers lying within k-standard deviation of those numbers of the mean of those numbers is at least:

\begin{eqnarray*}
1-\frac{1}{k^{2}}
\end{eqnarray*}

where:

\begin{eqnarray*}
k &=& \frac{\mathtt{within}}{\sigma}
\end{eqnarray*}

and

\begin{eqnarray*}
k &>& 1
\end{eqnarray*}

Example

Taking values for $k$ ($>1$), we can observe the following:

  • for $k=2$, we obtain $0.75$, meaning that at least $75%$ of the values must be within two standard deviations from the mean.
  • for $k=3$, we obtain $89%$, meaning that at least $89%$ of the values must be within three standard deviations from the mean.
  • for $k=4$, we obtain $93.75%$, meaning that at least $93.75%$ of the values must be within four standard deviations from the mean.

Given the previous example, we know that the mean is $m=3$ and the standard deviation is $\sigma\approx1.58$. Now we take values: for $k=2$, we have $k=2*\sigma=3.16$ and, subtracting that value from the mean, we obtain the lower bound: $l=m-3.16=-0.16$; then we add the value to the mean, obtaining the upper bound: $h=m+3.16=6.16$ meaning that at least $75%$ of the values must be between the lower bound $l=-0.16$ and the upper bound $l=6.16$. The same can be performed for other values of $k$.


fuss/mathematics/probabilities_and_statistics.txt ยท Last modified: 2017/02/22 18:30 (external edit)

Access website using Tor Access website using i2p


For the copyright, license, warranty and privacy terms for the usage of this website please see the license, privacy and plagiarism pages.