Suppose we have a set of values , defined as:
if we were to compute the arithmetic mean, we would obtain:
However, notice that this mean does not really look entirely "right" and that is due to the exceptional value which drags the mean towards a very high value compared to the rest of the values in the set.
This is why, it is useful sometimes to compute the median which gives a more correct overview of the average of the set. In order to compute the median, we sort the set in increasing order, thereby obtaining :
and calculate the arithmetic mean between the two values in the middle of the set:
which gives us a median that is a better presentation of the middle of the original set .
The standard deviation can be thought of measuring how far the data values are spread from the mean.
Let there be a random variable with mean value
where the operator denotes the average or expected value of , then the standard deviation of is the quantity:
Given a set of measurements with the values:
Calculate the arithmetic average of the elements of the set:
Calculate the difference squared between each element of the set and the mean to obtain the set :
By summing up the elements of () and then dividing by the number of elements minus one we obtain the variance :
and the standard deviation is the square root of the variance:
By subtracting and adding the standard deviation from the mean we found out the bounds between which most values are placed.
For the lower bound, we have:
For the upper bound, we have:
which means that most values are placed between and .
The fraction of any set of numbers lying within k-standard deviation of those numbers of the mean of those numbers is at least:
where:
and
Taking values for (), we can observe the following:
Given the previous example, we know that the mean is and the standard deviation is . Now we take values: for , we have and, subtracting that value from the mean, we obtain the lower bound: ; then we add the value to the mean, obtaining the upper bound: meaning that at least of the values must be between the lower bound and the upper bound . The same can be performed for other values of .