Are Big Data Black Swan Killers?


The dream of big data is to make experience predictable, and black swan theory says it can’t be done, at least not wholly reliably. Both are right.

A black swan is some major event or accomplishment that surprises everyone, defying predictions because it relied on causes that nobody considered, like when a stock market price bubble bursts, or somebody invents fidget spinners.   

Such events seem predictable in hindsight, suggesting 1) they were inevitable, if only people had known what to look for, and therefore 2) they can be understood, and either replicated or, in the case of bad things like car crashes, prevented.

It’s why we think famous stock pickers and celebrities are somehow different, or did things differently, than the hordes of analysts and wanna-be stars who toil in obscurity. It funds how-to programs promising to mirror their efforts, in hopes of realizing the same successes.

Black swan theory says that those assumptions are wrong, both because the universe of incremental influences is infinite, and the progress of time changes the relationships between variables along with their values.

Surprises are, by definition, surprises.

It’s also why big data can’t kill them.

The two pillars of data science are, well, data, combined with probability, insomuch that the miracle of predictability is based less on determinist insights, and more on incredibly well-informed guesses. So its success is based on the amount and quality of its inputs andmodels.

Big data work best in environments with defined contexts, either circumstantial or by design. It’s easier to calculate probability for a single act, like voting day, than predict the air currents that may or may not result in a storm. Predictions online are easier than doing them in the real world, since the data points (and behaviors under scrutiny) are structurally limited.

The assumption is that big data’s batting average improves over time, and that’s true, especially for events and behaviors that need only be likely, not guaranteed.

Big data can match speed limits to the number and placement of drivers on a road. It can deliver useful insights into personal health, improve the results of online advertising, and accomplish tons of other useful (and commercially lucrative) things.

Pierre Simon-Laplace, a 19th century French mathematician, expressed the determinist vision of data and probability:

We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.

But anything short of that oversight can’t predict the unpredictable, since the surprises of the future are, by definition, not repeats of surprises in the past. The probability is an approximation, not a certainty.

It will be interesting to witness how big data insights make our lives more predictable and reliable.

It’ll be even more interesting when we’re surprised by the occasional and inevitable black swan.