"Are we being dazzled by data?" That was the question put forth In a presentation by Tim Harford, economist, journalist and author of The Undercover Economist at Teradata Universe in Prague.
The resounding answer, perhaps unsurprisingly, is yes. "Sometimes the story is much bigger than what's captured in the data," he cautioned. "When we see beautiful information, we have to ask, 'is it true?'"
Data visualization is a tricky business. As an analyst there is an obligation to delver fair and factual data, and not be swayed by underlying objectives of a study. And while pretty charts and graphs may be essential to deliver the message, especially if the audience is unfamiliar with the topic, the oversimplification may deliver the wrong idea or fail to capture the whole truth.
Concerned for our tendencies to believe what we see, Harford says there are 3 big things worrying him about the way data is often presented.
Data that has an agenda: Presented in the right way data can have a "wow" factor that invokes an emotional response. But Harford argues it may be that the data doesn't really justify the statement. "It can be designed to tell a particular story, and we deserve to know what's going on behind the data." Even with the best of intentions like promoting drug studies and education, data used to manipulate is not data used with integrity.
Open data: Data has become easy to get ahold of, as are the tools to produce graphics and deliver "information." While this transparency is generally a good thing, it can be easily presented without exercising any statical judgement. What can look like a compelling case may really be noise. Think correlation versus causation. Presented incorrectly and enthusiastically enough to those who don't understand, people may feel obliged to act on the information without really appreciating what's going on.
Compelling illustrations of things we can't measure: People easily adopt something as absolute fact that is really an assumption in the data. A common example is the ability to reliably measure emotions based on what people say or tweet. "People moan on Twitter that aren't really sad," he says. And what about the people who are not on Twitter at all? "Twitter is not representative of the world." But it is there to be forged, and the more analysis we run the more compelling the data looks. "There are some things in the word we'll never know," adds Harford. "We should not convey that information with certainty."
Data sets can tell different stories in different displays. In the age of big data's it is increasingly important to focus on information that is important, and important to the right audience. A data scientist or analyst must be clear on the information's purpose before delivering it, and remain cautious of blind trust from executives to whom they report.
It must be remembered data science is still a process, and its presentation is an art.