Can't help but think of her phrase: And now for a moment of geek! For I was thinking about something yesterday that hit me like a minor epiphany. This was, drum roll please: In interpreting statistical data there is a significant difference whether we take the average of a class or set of data or the median. In statistics-the average is often referred to as the mean in statistics but we'll use average as that is the more familiar, it's what we grew up with in school where it all came down to our average test score for our grades, etc.-there is both the average and the median.
This idea came to me after an article I read a few weeks about in the Wall Street Journal that documented that this year here in New York City, income is down for every borough exceptt Staten Island which is up. Manhattan's median income is down 9% to $63,000. This is still almost double the median income of the Bronx for example but the city's drop was the largest of the 4 boroughs. I then noticed a mention that the average income in the city is about $124,000 a year. This got me to thinking. First trying to remember what the median of statistical data is. I had known at one point. Then I remembered: The median is the midpoint of the data. For example in the numbers 1, 2, and 6, the average-which we are very familiar with-is 3. However the median would be 2. The median is literally the middle number. So of this set or class of numbers 3 is the average but the median is 2. This means that while we think of average as kind of the typical of a set of numbers-perhaps representative does a good job of connoting what we have in mind by the average-there is a case to be made that in the set(1.2 and 6) 2 is more typical and representative than 3.
For one thing the number 3 actually never comes up at all in the actual set. All three numbers are different-there is no mode, the number in a set that occurs most often; in our set each occurs once- but 2 of the 3 numbers are actually less than 3. The value of the average or the median is to give an idea of two things. It can be to find what is most representative. Examples of this are all around, two such are the average of exam scores in school or the batting average of baseball players. The average has the function here of the most fair appraisal of the performance of a student, an athlete or a worker.
The other importance of getting the average-or median which is not used for students or athletes-is in order to predict future performance. In other words the two uses are to appraise past performance and to predict future performance. If we want to predict future performance arguably the median can be better.
In actual use the median is used for some the average for other appraising and predicting. In looking at income level the median is preferable as it tells us much more accurately what the average American makes ironically than the actual average. The average can be skewered in our current economy of rising income inequality. As the top 1% keeps getting richer and the other 99% poorer average income would be a greatly skewed mark. Any economic analysis that points out that the average U.S. income is $200,000 so it isn't fair to raise taxes on people making just a little more than this as this is what average Americans make is being made by an economic sophist, who if he doesn't write for the Wall Street Journal editorial page, has missed his calling.
I think one measure of predictability and assessment is how much the median and the average vary. The more so the less representative the average.
As we saw above the average is better for some data, the median for others. So in school it's a good thing you get graded by average. Otherwise a student who had a failing grade after 4 tests would have no chance of coming back on the fifth. Yet there are probably situations where a student could do better being graded by median, if they bombed on just one test. Assuming the teacher doesn't just throw away the lowest then the median would work better for them.
So that's my moment of geek: choice of average or median in assessing data makes a big difference. Using the wrong one distorts the data and provides misleading results.
No comments:
Post a Comment