If you’ve spent any vital time taking a look at data visualizations–or in a STEM school room–you’ve almost certainly had this maxim drilled into your head: Correlation does now not suggest causation. In simple English, it manner “simply because A and B seem to be comparable doesn’t imply that A brought about B to occur.” Statisticians and chart nerds love to show this fallacy by means of putting in place patently absurd correlations, like matching up the divorce charge in Maine with according to capita intake of margarine. Nobody would severely consider that consuming margarine reasons divorce. But what about subtler correlations like this one?
If, after scanning that graph, you’ll be able to’t lend a hand however suppose that upper housing costs are by some means inflicting ladies to have fewer young children… smartly, you wouldn’t be by myself. Carl Bergstrom and Jevin West, two researchers on the University of Washington, suppose that the very structure of the graph itself–one set of numbers laid out horizontally, any other set organized vertically–could also be partially guilty.
That vintage “X vs Y axis” graph, referred to as a scatterplot, is a workhorse visualization in science and statistics. Researchers use it to discover how carefully two units of measurements are comparable to one another. Scatterplots make this exploration more straightforward, for the reason that correlations actually line up as visible patterns proper in entrance of your eyes.
The hassle, says Bergstrom, is that those “correlation-only” scatterplots observe precisely the similar visible conventions as graphs which might be explicitly supposed to turn causation. Which graphs? According to Bergstrom, just about each and every one you noticed in highschool. Whether we have been fussing with f(x)’s in geometry magnificence or filling out lab reviews in chemistry, for the ones of us whose visual-statistical schooling ended in a while after senior promenade, all the concept of plotting data on an X-Y grid manner “this factor reasons that factor.”
“Because of conventions that the horizontal axis variable influences the vertical axis variable, we’re educated or no less than habituated to suppose in causal phrases when taking a look at scatterplots,” Bergstrom says.
But Bergstrom and West don’t need to rebuild graphing from the bottom up: “We are caught with with the norms we have already got,” they write. Their resolution? Keep the similar Cartesian grid machine all of us discovered on in highschool, however show it at a 45-degree perspective to create what they name a “diamond plot.” Here’s that graph about house costs and fertility once more, redisplayed consistent with Bergstrom and Wise’s scheme:
The correlations themselves nonetheless shape transparent visible patterns at the grid, similar to the did in out of date scatterplots. But with each units of numbers tilted at symmetrical angles, neither axis seems to take causal precedence over the opposite. In different phrases, the structure of the graph doesn’t nudge you to mission nonexistent storylines onto the data.
That’s the slump, anyway. Bergstrom and West freely admit that they nonetheless wish to validate diamond plots with rigorous consumer trying out. Alberto Cairo, knowledge dressmaker and creator of The Functional Art, thinks that “the diamond [plot] is an intriguing concept.” But he additionally thinks that the issue lies much less with graph design and extra in our personal integrated cognitive bias to peer causation in the whole lot. “We developed to locate patterns, although patterns are simply the product of random clustering, and get a hold of tales to give an explanation for them,” he says. “How to conquer those biases? A mindful effort, knowledgeable by means of schooling, to curb our impulse to leap to conclusions.”
Bergstrom consents that our herbal pattern-recognition behavior are a significant component in misinterpreting scatterplots; he simply doesn’t suppose it’s the one issue. He and Wise are making plans to check diamond plots this fall. But Bergstrom additionally understands that placing same old graphs at a Dutch perspective would possibly motive extra issues than it solves, by means of making the visualizations tougher to learn. “If it seems that diamond plots are efficient at lowering unwarranted causal inferences with out implementing too nice a cognitive value [on users], of route we will be able to be the use of them going ahead,” he says. “If now not, smartly, that’s the nature of science: You suggest an concept, take a look at it, and discard it if the proof stacks up on the contrary.”