Metamagical Themas, page 85
Asking for a program that can discover new scientific laws without having a program that can, say, do anagrams, is like wanting to go to the moon without having the ability to find your way around town. I do not make the comparison idly. The level of performance that Simon and his colleague. Langley wish to achieve in Bacon is on the order of the greatest scientists. It seems they feel that they are but a step away from the mechanization of genius. After his Procter Lecture, Simon was asked by a member of the audience, "How many scientific lifetimes does a five-hour run of Bacon represent?" After a few hundred milliseconds of human information processing, he replied, "Probably not more than one." I don't disagree with that. However, I would have put it differently. I would have said, "Probably not more than one millionth."
Anagrams and Epiphenomena
It's clear that I feel we're much further away from programs that do human-level scientific thinking than Simon does. Personally, I would just like to see a program that can do anagrams the way a person does. Why anagrams? Because they constitute a "toy domain" where some very significant subcognitive processes play the central role.
What I mean is this. When you look at a `Jumble" such as "telkin" in the newspaper, you immediately begin shifting around letters into tentative groups, making such stabs as "knitle", "klinte", "linket", "keltin", "tinkle" -and then you notice that indeed, "tinkle" is a word. The part of this process that I am interested in is the part that precedes the recognition of "tinkle" as a word. It's that part that involves experimentation, based only on the "style" or "feel" of English words-using intuitions about letter affinities, plausible clusters and their stabilities, syllable qualities, and so on. When you first read a jumble in the newspaper, you play around, rearranging, regrouping, reshuffling, in complex ways that you have no control over. In fact, it feels as if you throw the letters up into the air separately, and when they come down, they have somehow magically "glommed" together in some English-like word! It's a marvelous feeling and it is anything but cognitive, anything but conscious. (Yet, interestingly, you take credit for being good at anagrams, if you are good!)
It turns out that most literate people can handle Jumbles (i.e., single-word anagrams) of five or six letters, sometimes seven or eight letters. With practice, maybe even ten or twelve. But beyond that, it gets very hard to keep the letters in your head. It is especially hard if there are repeated letters, since one tends to get confused about which letters there are multiple copies of. (In one case, I rearranged the letters "dinnal" into "nadlid"-incorrectly. You can try "raregarden", if you dare.) Now in one sense, the fact that the problem gets harder and harder with more and more letters is hardly surprising. It is obviously related to the famous "7 plus or minus 2" figure that psychologist George A. Miller first reported in connection with short-term memory capacity. But there are different ways of interpreting such a connection.
One way to think that this might come about is to assume that concepts for the individual letters get "activated" and then interact. When too many get activated simultaneously, then you get swamped with combinations and you drop some letters and make too many of others, and so on. This view would say that you simply encounter an explosion of connections, and your system gets overloaded. It does not postulate any explicit "storage location" in memory-a fixed set of registers or data structures-in which letters get placed and then shoved around. In this model, short-term memory (and its _associated "magic number") is an epiphenomenon (or "innocently emergent" phenomenon, as Daniel Dennett calls it), by which I mean it is a consequence that emerges out of the design of the system, a product of many interacting factors, something that was not necessarily known, predictable, or even anticipated to emerge at all. This is the view that I advocate.
A contrasting view might be to build a model of cognition in which you have an explicit structure called "short-term memory", containing about seven (or five, or nine) "slots" into which certain data structures can be fitted, and when it is full, well, then it is full and you have to wait until an empty slot opens up. This is one approach that has been followed by Newell and associates in work on production systems. The problem with this approach is that it takes something that clearly is a very complex consequence of underlying mechanisms and simply plugs it in as an explicit structure, bypassing the question of what those underlying mechanisms might be. It is difficult for me to believe that any model of cognition based on such a "bypass" could be an accurate model.
When a computer's operating system begins thrashing (i.e., bogging down in its timesharing performance) at around 35 users, do you go find the systems programmer and say, "Hey, go raise the thrashing-number in memory from 35 to 60, okay?"? No, you don't. It wouldn't make any sense. This particular value of 35 is not stored in some local spot in the computer's memory where it can be easily accessed and modified. In that way, it is very different from, say, a student's grade in a university's administrative data base, or a letter in a word in an article you're writing on your home computer. That number 35 emerges dynamically from a host of strategic decisions made by the designers of the operating system and the computer's hardware, and so on. It is not available for twiddling. There is no "thrashing-threshold dial" to crank on an operating system, unfortunately.
Why should there be a "short-term-memory-size" dial on an intelligence? Why should 7 be a magic number built into the system explicitly from the start? If the size of short-term memory really were explicitly stored in our genes, then surely it would take only a simple mutation to reset the "dial" at 8 or 9 or 50, so that intelligence would evolve at ever-increasing rates. I doubt that AI people think that this is even remotely close to the truth; and yet they sometimes act as if it made sense to assume it is a close approximation to the truth.
It is standard practice for AI people to bypass epiphenomena ("collective phenomena", if you prefer) by simply installing structures that mimic the superficial features of those epiphenomena. (Such mimics are the "shadows" of genuine cognitive acts, as John Searle calls them in his paper cited above.) The expectation-or at least the hope-is for tremendous performance to issue forth; yet the systems lack the complex underpinning necessary.
The anagrams problem is one that exemplifies mechanisms of thought that AI people have not explored. How do those letters swirl among one another, fluidly and tentatively making and breaking alliances? Glomming together, then coming apart, almost like little biological objects in a cell. AI people have not paid much attention to such problems as anagrams. Perhaps they would say that the problem is "already solved". After all, a virtuoso programmer has made a program print out all possible words that anagrammize into other words in English. Or perhaps they would point out that in principle you can do an "alphabetize" followed by a "hash" and thereby retrieve, from any given set of letters, all the words they anagrammize into. Well, this is all fine and dandy, but it is really beside the point. It is merely a show of brute force, and has nothing to contribute to our understanding of how we actually do anagrams ourselves, just as most chess programs have absolutely nothing to say about how chess masters play (as de Groot, and later, Simon and coworkers have pointed out).
Is the domain of anagrams simply a trivial, silly, "toy" domain? Or is it serious? I maintain that it is a far purer, far more interesting domain than many of the complex real-world domains of the expert systems, precisely because it, is so playful, so unconscious, so enjoyable, for people. It is obviously more related to creativity and spontaneity than it is to logical derivations, but that does not make it-or the mode of thinking that it represents-any less worthy of attention. In fact, because it epitomizes the unconscious mode of thought, I think it more worthy of attention.
In short, it seems to me that something fundamental is missing in the orthodox AI "information-processing" model of cognition, and that is some sort of substrate from which intelligence emerges as an epiphenomenon. Most AI people do not want to tackle that kind of underpinning work. Could it be that they really believe that machines already can think, already have concepts, already can do analogies? It seems that a large camp of AI people really do believe these things.
Not Cognition, But Subcognition, Is Computational
Such beliefs arise, in my opinion, from a confusion of levels, exemplified by the title of Barr's paper: "Cognition as Computation". Am I really computing when I think? Admittedly, my neurons may be performing sums in an analog way, but does this pseudo-arithmetical hardware mean that the epiphenomena themselves are also doing arithmetic, or should be-or even can be-described in conventional computer-science terminology? Does the fact that taxis stop at red lights mean that traffic jams stop at red lights? One should not confuse the properties of objects with the properties of statistical ensembles of those objects. In this analogy, traffic jams play the role of thoughts and taxis play the role of neurons or neuron-firings. It is not meant to be a deep analogy, only one that emphasizes that what you see at the top level need not have anything to do with the underlying swarm of activities bringing it into existence. In particular, something can be computational at one level, but not at another level.
Yet many AI people, despite considerable sophistication in thinking about a given system at different levels, still seem to miss this. Most AI work goes into efforts to build rational thought ("cognition") out of smaller rational thoughts (elementary steps of deduction, for instance, or elementary motions in a tree). It comes down to thinking that what we see at the top level of our minds-our ability to think-comes out of rational "information-processing" activity, with no deeper levels below that.
Many interesting ideas, in fact, have been inspired by this hope. I. find much of the work in AI to be fascinating and provocative, yet somehow I feel dissatisfied with the overall trend. For instance, there are some people who believe that the ultimate solution to Al lies in getting better and better theorem-proving mechanisms in some predicate calculus. They have developed extremely efficient and novel ways of thinking about logic. Some people-Simon and Newell particularly-have argued that the ultimate solution lies in getting more and more efficient ways of searching a vast space of possibilities. (They refer to "selective heuristic search" as the key mechanism of intelligence.) Again, many interesting discoveries have come out of this.
Then there are others who think that the key to thought involves making some complex language in which pattern matching or backtracking or inheritance or planning or reflective logic is easily, carried out. Now admittedly, such systems, when developed, are good for solving a large class of problems, exemplified by such AI chestnuts as the missionary-and-cannibals problem, cryptarithmetic problems, retrograde chess problems, and many other specialized sorts of basically logical analysis. However, these kinds of techniques of building small logical components up to make large logical structures have not proven good for such things as recognizing your mother, or for drawing the alphabet in a novel and pleasing way.
One group of Al people who seem to have a different attitude consists of those who are working on problems of perception and recognition. There, the idea of coordinating many parallel processes is important, as is the idea that pieces of evidence can add up in a self-reinforcing way, so as to bring about the locking-in of a hypothesis that no one of the pieces of evidence could on its own justify. It is not easy to describe the flavor of this kind of program architecture without going into multiple technical details. However, it is very different in flavor from ones operating in a world where everything comes clean and precategorized-where everything is specified in advance: "There are three missionaries and three cannibals and one boat and one river and . . ." which is immediately turned into a predicate-calculus statement or a frame representation, ready to be manipulated by an "inference engine". The missing link seems to be the one between perception and cognition, which I would rephrase as the link between subcognition and cognition, that gap between the sub-100-millisecond world and the super-100-millisecond world.
Earlier, I mentioned the brain and referred to the "neural substrate" of cognition. Although I am not pressing for a neurophysiological approach to AI, I am unlike many AI people in that I believe that any AI model eventually has to converge to brainlike hardware, or at least to an architecture that at some level of abstraction is "isomorphic" to brain architecture (also at some level of abstraction). This may sound empty, since that level could be anywhere, but I believe that the level at which the isomorphism must apply will turn out to be considerably lower than (I think) most AI people believe. This disagreement is intimately connected to the question of whether cognition should or should not be described as "computation".
Passive Symbols and Formal Rules
One way to explore this disagreement is to look at some of the ways that Simon and Newell express themselves about "symbols".
At the root of intelligence are symbols, with their denotative power and their susceptibility to manipulation. And symbols can be manufactured of almost anything that can be arranged and patterned and combined. Intelligence is mind implemented by any patternable kind of matter.
From this quotation and others, one can see that to Simon and Newell, a symbol seems to be any token, any character inside a computer that has an ASCII code (a standard but arbitrarily assigned sequence of seven bits). To me, by contrast, "symbol" connotes something with representational power. To them (if I am not mistaken), it would be fine to call a bit (inside a computer) or a neuron-firing a "symbol". However, I cannot feel comfortable with that usage of the term.
To me, the crux of the word "symbol" is its connection with the verb "to symbolize", which means "to denote", "to represent", "to stand for", and so on. Now, in the quote above, Simon refers to the "denotative power" of symbols-yet elsewhere in his paper, Barr quotes Simon as saying that thought is "the manipulation of formal tokens". It is not clear to me which side of the fence Simon and Newell really are on.
It takes an immense amount of richness for something to represent something else. The letter 'I' does not in and of itself stand for the person I am, or for the concept of selfhood. That quality comes to it from the way that the word behaves in the totality of the English language. It comes from, a massively complex set of usages and patterns and regularities, ones that are regular enough for babies to be able to detect so that they too eventually come to say `I' to talk about themselves.
Formal tokens such as `I' or "hamburger" are in themselves empty. They do not denote. Nor can they be made to denote in the full, rich, intuitive sense of the term by having them obey some rules. You can't simply push around some Pnames of Lisp atoms according to complex rules and hope to come out with genuine thought or understanding. (This, by the way, is probably a charitable way to interpret John Searle's point in his above-mentioned paper-namely, as a rebellion against claims that programs that can manipulate tokens such as "John", "ate", "a", "hamburger" actually have understanding. Manipulation of empty tokens is not enough to create understanding-although it is enough to imbue them with meaning in a limited sense of the term, as I stress in my book Gödel, Escher, Bach-particularly in Chapters II through VI.)
Active Symbols and the Ant Colony Metaphor
So what is enough? What am I advocating? What do I mean by "symbol"? I gave an exposition of my concept of active symbols in Chapters XI and XII of Godel, Escher, Bach. However, the notion was first presented in the dialogue "Prelude ... Ant Fugue" in that book, which revolved about a hypothetical conscious ant colony. The purpose of the discussion was not to speculate about whether ant colonies are conscious or not, but to set up an extended metaphor for brain activity-a framework in which to discuss the relationship between "holistic", or collective, phenomena, and the microscopic events that make them up.
One of the ideas that inspired the dialogue has been stated by E. O. Wilson in his book The Insect Societies this way: "Mass communication is defined as the transfer, among groups, of information that a single individual could not pass to another." One has to imagine teams of ants cooperating on tasks, and information passing from team to team that no ant is aware of (if ants indeed are "aware" of information at all-but that is another question). One can carry this up a few levels and imagine hyperhyperteams carrying and passing information that no hyperteam, not to mention team or solitary ant, ever dreamt of.
I feel it is critical to focus on collective phenomena, particularly on the idea that some information or knowledge or ideas can exist at the level of collective activities, while being totally absent at the lowest level. In fact, one can even go so far as to say that no information exists at that lowest level. It is hardly an amazing revelation, when transported back to the brain: namely, that no ideas are flowing in those neurotransmitters that spark back and forth between neurons. Yet such a simple notion undermines the idea that thought and "symbol manipulation" are the same thing, if by "symbol" one means a formal token such as a bit or a letter or a Lisp Pname.
What is the difference? Why couldn't symbol manipulation-in the sense that I believe Simon and Newell and many writers on Al mean itaccomplish the same thing? The crux of the matter is that these people see symbols as lifeless, dead, passive objects-things to be manipulated by some overlying program. I see symbols-representational structures in the brain (or perhaps someday in a computer)-as active, like the imaginary hyperhyperteams in the ant colony. That is the level at which denotation takes place, not at the level of the single ant. The single ant has no right to be called "symbolic", because its actions stand for nothing. (Of course, in a real ant colony, we have no reason to believe that teams at any level genuinely stand for objects outside the colony (or inside it, for that matter) -but the ant-colony metaphor is only a thinly disguised way of making discussion of the brain more vivid.)

