Histograms vs. Averages
Last week, while I was doing my Christmas shopping online (in order to save myself some time and mental anguish), I stumbled upon a data visualization query that ended up costing me:
What does the number 3.6 mean?
Specifically, what does 3.6 mean when it’s the average rating that a product has received from the community on a scale of 1-5?
At a glance, 3.65 looks like a great score! It’s higher than the median of 2.5, and it would lead you to believe that a majority of the people who made this purchase were happy with their choice.
But the truth isn’t always so simple when it comes to data sets.
Although averages can be useful, they’re most applicable when you’re dealing with data clusters (like, say a 1000-student MIT class where everyone scores over 80%). When your data is spread over wider ranges (like, say a 1-5 star product review) then you need another way to interpret the data set while ensuring that the story being told is still relevant to the person reading it. In this case, I needed to know which of the widgets I intended to buy came most highly recommended by the jury of my peers.
Fortunately, histograms were invented to combat this very problem! Thanks to online retailers like Amazon and iTunes turning to histograms, product reviews have become more informative than ever before.
In addition to showing you the overall average, histograms communicate exactly how that number was calculated at a glance. Take these customer reviews from two separate widgets who both received a rating of 3.6.
Now we’re able to see the story behind the number 3.6!
We can tell that while Widget A may have more five star ratings, it’s also completely unbalanced: it has just as many 4 star reviews as it does 1 star – and just a single person thought that Widget A deserved the middle-of-the-road rating of 3 stars.
Compare those results with Widget B, which has ratings that are balanced throughout the entire spectrum, and it paints an entirely different picture.
While Widget B may have some 1 Star ratings, the high amount of 3 Star scores alert me that at worst, some people found this product to be completely average. When I compare that with Widget A, which seems to be a “Love it or hate it” type of product, my choice is clear.
And so, with the help of data visualizations, my online shopping experience is brought to a close.