Deep Learning and Knowledge Representation

William Marble
Tue, Apr 5, 2022
4-minute read

In “Deep Learning is Hitting A Wall,” Gary Marcus argues that deep learning is reaching its limits as a paradigm for artificial intelligence. Recent difficulties with self-driving cars and unreliable language model outputs illustrate the limitations of systems that cannot understand the world, or the meaning of the words they are parroting. Instead, Marcus advocates that field of AI turn its attention to symbolic approaches. Such approaches would focus on encoding information about the world and deriving information using simple operations:

What does “manipulating symbols” really mean? Ultimately, it means two things: having sets of symbols (essentially just patterns that stand for things) to represent information, and processing (manipulating) those symbols in a specific way, using something like algebra (or logic, or computer programs) to operate over those symbols.

This article is as interesting for its exposition of symbolic approaches (relatively unknown to me) as for its sociology of science. Embraced by early computer scientists such as von Nuemann, symbolic approaches became dominant in the 1970s after intradisciplinary fighting between the symbolic camp and the neural network camp. Neural nets regained prominance in the 1980s, advanced by researchers who avoided the symbolic approach. The schism remains until today: Marcus’s polemic is itself is a testament to the rift. These are competing paradigms, though there are glimmers of a synthesis on the horizon.

One argument Marcus makes for symbolic approaches regards interpretability:

Deep learning systems are black boxes; we can look at their inputs, and their outputs, but we have a lot of trouble peering inside. We don’t know exactly why they make the decisions they do, and often don’t know what to do about them (except to gather more data) if they come up with the wrong answers. This makes them inherently unwieldy and uninterpretable, and in many ways unsuited for “augmented cognition” in conjunction with humans. Hybrids that allow us to connect the learning prowess of deep learning, with the explicit, semantic richness of symbols, could be transformative.

I find this argument persuasive. If the goal is to create a computer that can reason like humans, then we should learn from the way we, ourselves, organize knowledge.

Scientific investigation does not involve merely recording data about the world. Scientists seek to represent data through the use of parsimonious, interpretable rules that are (mostly) mutually consistent with rules summarizing other data. That is, we create theories — abstract rules that explain observations about how the world works. Theories have varying degrees of structure — Newton’s laws are simpler than Einstein’s theory.1 In social science, we have theories that explain the relationships between variables in the social world — for example, Meltzer and Richard’s model of inequality and redistribution. When assessing new situations that are unlike those we have seen before, the theory guides us on what to expect. When the theory’s predictions don’t match the data, we update the theory not at random, but based on our knowledge of the world grounded in other theories. It stands to reason that artificial intelligence should make use of this method of representation.

These reflections also suggest an answer to a question I discussed with Matt Tyler early in grad school: If you had a black-box algorithm that could perfectly predict, say, how legislators would vote, could it be published in the American Political Science Review? My intuition is that most political scientists would not find such a paper interesting or important. The reason is that the way the algorithm represents knowledge — e.g., through a deep neural network — is not compatible with the way that political scientists represent knowledge.

Perhaps algebraic methods would help change my answer to this question. I’m not very familiar with qualitative methods, but maybe something like qualitative comparative analysis, or quantitative variants like Bayesian rule sets, could be helpful for theory generation. These methods seek to explain relationships in data through algebraic statements (e.g., IF high inequality THEN high redistribution). Relationships represented in this way may be more easily integrated into our existing mental schema than statistical relationships.2

We could then seek to develop theory that accounts for those patterns. Hopefully, the theory provides novel predictions that can be further tested against the data. And so on. In other words, maybe we should be investing in symbolic methods for theory development, and traditional statistical methods for testing.

  1. A working definition of the “simplicity” of theory might be the number of free parameters. ↩︎

  2. A similar argument can be made for tree-based methods. ↩︎