“ ‘All models are wrong, but some are useful.’

So proclaimed WIRED editor-in-chief Chris Anderson 7 years ago, opening the July 2008 issue of stories relating to the advent of “The Petabyte Age” with his piece entitled: “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”. Anderson concludes this preface, saying:

WIRED is known for occasional overstatement, and it’s generally unwise to give any sort of credence to those who use the phrase “crunch the numbers” (numbers are many things, but crunchiness is not one of their qualities). WIRED’s captain’s strongly-worded Op-Ed can probably be written off as nothing but bluster and hype. Anderson warrants his far-fetched claims in a variety of ways, but one direct quote surprised me.

“To set the record straight: That’s a silly statement, I didn’t say it, and I disagree with it.” Peter Norvig writes in a blog post (All we want are the facts, ma’am) after the publication of Anderson’s piece, staunchly disavowing the quote attributed to him in the article. Norvig continues:

Ok. So both these intellectuals generally agree with one another: the scientific method is still alive and relevant, not likely to be made obsolete anytime soon.

Except, perhaps not. That is to say, when Anderson, referring to the opinions he forwards in his piece, writes “This kind of thinking is poised to go mainstream,” I feel more inclined to perceive him as uncharacteristically clairvoyant from the vantage point of 2015. What’s changed since 2008?

A scientific model is an explanation and a prediction bundled into one. Or at least, it used to be. As Norvig points out, in reference to an original paper by Leo Breiman, in a blog post of his (On Chomsky and the Two Cultures of Statistical Learning) there exists another kind of model which is growing in appeal and becoming widely adopted. He describes the dichotomy between these two models in the following way:

This second kind of model is different from the former in that it offers a prediction without an explanation. This is what Chomsky finds problematic.

The reason that Anderson’s piece seems to me more apt now than it was in 2008 is that I believe that the balance of power between these two approaches has begun to shift towards the latter, meaning the models we develop today are not only less frequently explanatory, but oftentimes veritable black boxes, such that we may not even understand the mechanisms that cause them to have such great prediction accuracy in the first place. Recently, explanatory models of complex phenomena (such as language) have been ceding the stage to a class of model which may have little to no explanatory power, but a great deal of predictive power. Even if these models might have explanatory power, researchers have routinely de-emphasized those aspects of these models. I’m talking about Neural Networks, and “Deep Learning,” a modeling approach which has experienced a renaissance.

Neural Networks have come to the fore in the past 5 or so years as some of the most powerful tools for modeling arbitrary data. They are universal function approximators, meaning they can technically model any function (provided sufficient compute, data, and an adequate architecture). For the purposes of many engineering tasks, their re-discovery is a great boon, but for scientists, they will have to be considered cautiously, because they belong staunchly to the algorithmic modeling culture, which today corresponds roughly to what we call Machine Learning.

Machine Learning imports an almost entirely distinct paradigm and methodology into research, parallel but separate from science. In traditional science, the means of generation for models is directed, deliberate, and well-reasoned (the scientific method). In Machine Learning, hypothesis generation by trial and error is the common method of choice, always to some degree, if not entirely. Models are not created so much as evolved (trained, in machine learning parlance).

And this allows for a sort of profound intellectual laziness on the part of some researchers who imbibe the Deep Learning kool-aid which can best be summarized by their catchphrase “Throw a Neural Network at the problem”. Doing so de-prioritizes understanding, practically exporting the role of science, of learning, to machines. This does not bode well for the role of robust and principled theory in science, which is why I feel inclined to give credence to Anderson’s inflammatory 2008 op-ed. Many researchers, a few years back would find that they had some data they sought to model, and deep learning strategies consistently provided the best prediction accuracy, and this would be the tacit signal that the research was complete.
Chomsky seems to believe this is calamitous, and “derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior.” (MIT Tech Review)

I tend to agree with him. But let’s hear Norvig out. He proceeds in his blog:

There’s a lot to unpack here.

Someday, surely, we will see the principle underlying existence as so simple, so beautiful, so obvious that we will all say to each other, ‘Oh, how could we all have been so blind, so long.’ — John Archibald Wheeler

Here is a beautiful affirmation of the traditional ontology of science — the faith that there exists a model of reality which is so simple and yet corresponds so well with observed phenomena that we grant this model the designation of scientific theory. This intuition, this faith in the existence of such a model (exhibited by Wheeler’s quote) is exactly what Norvig says Chomsky must abandon for “complex, messy domains” like social science.

However, Norvig also says something unobtrusively radical (above).
The crux, I believe, of Norvig’s argument, is cautiously worded as:

Here, Norvig is making the claim that in fact, these models may actually contain explanatory power, but more deeply buried than the insights one might garner from another type of model. Anderson claims “science can advance even without coherent models, unified theories, or really any mechanistic explanation at all” which may seem fairly accurate when one considers that classification accuracy keeps going up and we don’t have a clear idea why. But he’s mistaken, because uninterpretable models aren’t particularly actionable. Science, after all, is explanation and prediction. We mustn’t give in to this idea, but instead, discover means of understanding these complex models and teasing out the explanations for the phenomena being modeled that might lie therein.
Norvig illustrates the idea with this koan:

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
“What are you doing?”, asked Minsky.
“I am training a randomly wired neural net to play Tic-Tac-Toe,” Sussman replied.
“Why is the net wired randomly?”, asked Minsky.
“I do not want it to have any preconceptions of how to play”, Sussman said.
Minsky shut his eyes.
“Why do you close your eyes?”, Sussman asked his teacher.
“So that the room will be empty.”
At that moment, Sussman was enlightened.

The implication here is that there might exist robust theories and beautiful models underlying many of these complex phenomena, but that we simply choose not to look for them inside the many parameters of the neural network. In this koan, Sussman is identical to the modern day practitioner of deep learning and merely seeks to make it work, rather than understand how. But Minsky exhibits the intellectual courage needed to extract the secrets these models are less willing to give up, to peer inside the black box.

Once this neural network renaissance of “deep learning” really got going (circa 2013), it didn’t take long for machine learning researchers to begin trying to pry open the black box. The first of these, as far as I can tell, was Zeiler and Fergus’ seminal “Visualizing and Understanding Convolutional Networks”.

Tensorflow, a deep learning library released by Google, now has the methods presented in this paper built in to their “TensorBoard” tool.
But I think the most honest and thorough introspection I’ve seen must be Szegedy et al’s “Intriguing Properties of Neural Networks”.

In both of these papers, the authors seek to understand how these tremendously powerful models work, to develop a theory of neural networks instead of merely abdicating the role of thinking to these algorithmic tools.

Notable also is the work that’s been done on the “Manifold Hypothesis” of deep learning, which is extremely well described and visualized by Christopher Olah’s blog post:

He describes how neural networks might be plying the manifold upon which the data exists in order to separate it, which he visualizes in two dimensions:

This sort of work excites me a great deal, because it is unrelentingly intellectually brave to delve into these complex machines and try to understand them, rather than merely trying to get them to work.
It also points the way forwards for both machine learning research and machine learning-empowered research. Finally, it illuminates the contours of some truly beautiful questions:

How might we develop a language to begin to unravel complexity?
Might it be the case that what Norvig describes as “complex, messy domains” are in fact describable by simple, elegant theories in a different language than any we currently know? Recall that Sir Isaac Newton had to invent Calculus, and within that language, the vast complexity of physics was reduced down to simple rules. Might it be the case that these phenomena (life, intelligence) necessitate the inception of a new language, and perhaps a new paradigm, to deconvolve their complexity and explain it simply? How would one begin to go about this?




conscious mammalian organism, fanatical tea snob.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store