StandDev1
0 (0 Likes / 0 Dislikes)
In this video tutorial
we will look at sample data
and using the sample data
we will compute
some measures of central tendancy
and measures of dispersion
the sample data we will look at will be
the 6 test scores you see
80, 60, 70, 80, 100 and 90
and so let's determine
some measures of central tendancy
the sample mean, the median and the mode.
now the same mean
is known as x bar,
and x bar represents the sample mean
whereas the population mean
is represented by mu,
now we are not doing population mean
so we will just get rid of that
and we will talk about the equation for the sample mean.
which is x equals
the sum of, that's the summation symbol,
the sum of x
all the data values
divided by n.
So x bar, the sample mean
equals the sum of the data values
divided by the frequency
well to find the sum of the data values
we will take the data values 80, plus 60, plus 70, plus 80
(it's hard to write here!), plus 100, plus 90
and divide by 6
when we compute this the sum of the data values
sigma x is 480
and when we divide 480 by 6
which is the freequency
we find that the sample mean
is 80.
So our sample mean is 80.
Now the median,
the median is the middle data value
and in order to find the median
what we want to do is first put the data values in order
we don't want to try to find the median unless your data values are ordered.
Now is there is an odd amount of data values,
then there is always a true median.
And if there is an even amount of data values
then what you need to do is, there is in this case,
is find the average of the two middle values.
Obviously this one is 80
but just to show you
what you would do is you would take the two data values
you would add them up and divide by 2
average them.
So we have 160 divided by 2
and that also is 80.
So our mean is 80, our median is 80
and the mode is the data value that occurs the most
now by most we mean more frequent than the other data values
so there could be no mode
if all the data values occur once, or twice, or three times
maybe there is no mode
but if one data value occurs more than the others
then there is a mode and in this case 80 occurs twice
and the other data values only occur once
so the mode is 80.
So for our measures of central tendancy,
we have 80 for all of our measures of central tendancy
and what you have learned in this chapter is that if the mean,
equals the median and the mode
which they do in this case
then we have a Normal Distribution.
And in a Normal Distribution the mean,
is the best measure of central tendancy.
If the distribution were skewed,
in other words if the mean were greater than or less than the median
then that would indicate that the mean is being drawn towards an outlier
so the mean would be less than the median, there would be a low lying outlier,
if the mean were greater than the median there would be a high valued outlier,
which would draw the mean from center
thus leaving the median as the better measure of central tendancy.
OK, so now for our measures of dispersion,
the range, the sample variance and the sample standard deviation,
the range is simply the maximum value minus the minimum value
in our case then the range
will be 100 minus 60
so our range is 40.
And the range really all that is going to tell you
is the width if you will of your x-axis
so we have a range of 40, we have a low end value of 60 and a high end value of 100.
Now for sample variance and sample standard deviation
these are a little more confusing
the sample variance, the notation is s-squared.
If we had population variance it would be sigma-squared
but again we are not talking about population here we are talking about a sample
and the formula for sample variance
is s-squared equals
the sum of each data value minus the mean
quantity squared
divided by
n minus 1
Now if this were population data we would replace this with a capital N
and use all the data values in our population
but with sample data we use 1 less than the frequency size.
So in order to start this I find that a chart helps the best
we will write down our x values, our data values,
and they are 60, 70, 80, 80, 90 and 100
I have to get used to writing with this!
Now the next step and actually the first step
in computing the variance and standard deviation
is to compute the mean
we've already done that here. We have computed the mean up here to be 80
so once you compute the mean you then need to compute the deviations.
and the deviations are simply the difference between the data value and the mean.
so we take our data values and we subtract the mean
and this gives us the deviations
now when we say deviations, we are referring to the change from the mean
so in other words 60 is 20 below, hence the negative, the mean
70 is 10 below the mean of 80
80 is the mean so it has a deviation of 0
and again here
hopefully as I go along and do this more I will get better with this pen...
90 is 10 greater than the mean so we will have a positive 10
and 100 is 20 greater than the mean.
Now a quick check here
if we do not round our mean
which we didn't in this case,
our mean came out to be exactly 80
when we find the sum of the deviations
we should find that it's zero.
and as long as you don't have to round your mean this should hold true
Now if you have to round your mean it may not hold exactly true.
It maybe "off" a bit.
But here we have negative 30 and a positive 30
we do get zero when we find the sum of the deviations.
So to do anything with this data would not make sense because we would get zero!
So our next step here, we found the deviations,
Now we have to square each deviation
in order to get rid of the negatives.
so we are going to take our deviations, the data value minus the mean, and we are going to square them.
so negative 20 times negative 20 is positive 400
negative 10 times negative 10 is positive 100
0 times 0 is 0
10 times 10 is 100
20 times 20 is 400
so now we have squared our deviations
Next keeping with the numerator we want to find the sum of the squared deviations
well that simply means we want to add all these values up
so.... to find our variance
what we are going to do
is take 400, plus 100, plus our 0's, plus 100, plus 400
and we are going to divide by, 6 is n, but 6 minus 1 which is actually 5
when we find the sum of our squared deviations
we compute 1000
and 1000 divided by 5 is 200.
Therefore, our sample variance is 200.
And now our sample standard deviation, well the symbol for standard deviation is simply s,
and you see s is just the square root of s-squared
well, s therefore is just the square root of the variance
the variance was 200
the formula for the standard deviation is the same as the variance it is just the square root
so it is the SQUARE ROOT of the sum of the squared deviations divided by n minus 1
and again in the sample standard deviation it is n minus 1
whereas in a population standard deviation this would be all the data values
so we just need the square root of 200
and we find that to be about, if we round,
14, so our sample standard deviation is 14.