Mean vs. Median

Suppose we have a set of values $S$, defined as:

\begin{eqnarray*}
S &=& \{31, 61, 45, 212, 10, 21\}
\end{eqnarray*}

if we were to compute the arithmetic mean, we would obtain:

\begin{eqnarray*}
M_{a} &=& \frac{S}{6} \\
&\approx& 63.3(3)
\end{eqnarray*}

However, notice that this mean does not really look entirely "right" and that is due to the exceptional value $212$ which drags the mean towards a very high value compared to the rest of the values in the set.

This is why, it is useful sometimes to compute the median which gives a more correct overview of the average of the set. In order to compute the median, we sort the set $S$ in increasing order, thereby obtaining $S'$:

\begin{eqnarray*}
S' &=& \{10, 21, 31, 61, 45, 212\}
\end{eqnarray*}

and calculate the arithmetic mean between the two values in the middle of the set:

\begin{eqnarray*}
\mathtt{median} &=& \frac{31+51}{2} \\
&=& 46
\end{eqnarray*}

which gives us a median that is a better presentation of the middle of the original set $S$.

Standard Deviation

The standard deviation can be thought of measuring how far the data values are spread from the mean.

Let there be a random variable with mean value $\mu$

\begin{eqnarray*}
E[X] &=& \mu
\end{eqnarray*}

where the operator $E$ denotes the average or expected value of $X$, then the standard deviation of $X$ is the quantity:

\begin{eqnarray*}
\sigma &=& \sqrt{E[X^{2}] - (E[X])^{2}}
\end{eqnarray*}

Example

Given a set $S$ of measurements with the values:

\begin{eqnarray*}
S &=& \{3, 4, 5, 1, 2\}
\end{eqnarray*}

Calculate the arithmetic average $m$ of the elements of the set:

\begin{eqnarray*}
m &=& \frac{3 + 4 + 5 + 1 + 2}{5} \\
&=& 3
\end{eqnarray*}

Calculate the difference squared between each element of the set $S$ and the mean $m$ to obtain the set $S'$:

\begin{eqnarray*}
S' &=& \{(3-3)^{2}, (4-3)^{2}, (5-3)^{2}, (1-3)^{2}, (2-3)^{2}\} \\
&=& \{0, 1, 4, 4, 1\}
\end{eqnarray*}

By summing up the elements of $S'$ ($0+1+4+4+1=10$) and then dividing by the number of elements minus one we obtain the variance $v$:

\begin{eqnarray*}
v &=& \frac{10}{4} \\
&=& 2.5
\end{eqnarray*}

and the standard deviation $\sigma$ is the square root of the variance:

\begin{eqnarray*}
\sigma &=& \sqrt{v} \\
&=&\sqrt{2.5} \\
&\approx& 1.58
\end{eqnarray*}

By subtracting and adding the standard deviation from the mean we found out the bounds between which most values are placed.

For the lower bound, we have:

\begin{eqnarray*}
m - \sigma \\
&=& 3 - 1.58 \\
&=& 1.42
\end{eqnarray*}

For the upper bound, we have:

\begin{eqnarray*}
m + \sigma \\
&=& 3 + 1.58 \\
&=& 4.58
\end{eqnarray*}

which means that most values are placed between $1.42$ and $4.58$.

Using the Standard Deviation to Better Understand Lifespans

One of the preconceptions that float around popular culture is that lifespan is tied to genetics, environment and other factors such as wealth. Whilst the former does apply but applies more to the level of entire populations, it does not apply to individuals. For instance, one of the more "common" complaints are lifespans for males that reach up into the late 60s that are also considered unfulfilled. However, looking at the standard deviation would indicate that, for the most part, looking at the bell-curve, dying in the late 60s is just as likely as dying in the late 80s with the median line being somewhere in-between.

The chart is lifted from "optimum pensions" and traces the lifespan of women in Australia with the original article going into more details.

Even if the chart is pretty random (the only one available on Google that looked tidy) and it's application to women arbitrary, it can be observed that within two standard deviations, females in Australia are just as likely to die between the ages of 73 to 81 as they are almost just as likely to die between 97 and 105. The median instead is placed at 89, but note that all ages between the age of 73 and the age of 105 are to be found within two standard deviations, which means that $95\%$ of all women in Australia die between 73 and 105, which would place an age, say, compensating for males, of 70, to be pretty much a statistically plausible age to die. Concerning the median, a person of exactly $89$ years is $71\%$ as likely to die within a range of $8$ years both younger and older and only $10\%$ more likely to die beyond the median.

With that said, metrics concerning life expectancy are many times either over or underestimated and a lot of the time that takes place on subjective grounds but statistically speaking, many ages that are sometimes considered "too young", fall well-within the expected statistical distribution of a population.

Chebyshev's Theorem

The fraction of any set of numbers lying within k-standard deviation of those numbers of the mean of those numbers is at least:

\begin{eqnarray*}
1-\frac{1}{k^{2}}
\end{eqnarray*}

where:

\begin{eqnarray*}
k &=& \frac{\mathtt{within}}{\sigma}
\end{eqnarray*}

and

\begin{eqnarray*}
k &>& 1
\end{eqnarray*}

Example

Taking values for $k$ ($>1$), we can observe the following:

  • for $k=2$, we obtain $0.75$, meaning that at least $75%$ of the values must be within two standard deviations from the mean.
  • for $k=3$, we obtain $89%$, meaning that at least $89%$ of the values must be within three standard deviations from the mean.
  • for $k=4$, we obtain $93.75%$, meaning that at least $93.75%$ of the values must be within four standard deviations from the mean.

Given the previous example, we know that the mean is $m=3$ and the standard deviation is $\sigma\approx1.58$. Now we take values: for $k=2$, we have $k=2*\sigma=3.16$ and, subtracting that value from the mean, we obtain the lower bound: $l=m-3.16=-0.16$; then we add the value to the mean, obtaining the upper bound: $h=m+3.16=6.16$ meaning that at least $75%$ of the values must be between the lower bound $l=-0.16$ and the upper bound $l=6.16$. The same can be performed for other values of $k$.


fuss/mathematics/probabilities_and_statistics.txt ยท Last modified: by office

Wizardry and Steamworks

© 2025 Wizardry and Steamworks

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.