“Some say the world will end in fire, some say in ice.” To these two venerable mechanisms of destruction, a third possibility has recently been added: apocalypse by artificial intelligence.

The idea that machines could become self-aware and turn against us has been around for a long time, but it was in 2007, when I began to read a blog called Overcoming Bias, that I first learned there were people who took this possibility seriously.

Eliezer Yudkowsky, who has become the godfather of computer apocalypse terror, was a contributor to Overcoming Bias at that time. In his posts, he argued that it is existentially important for the human species to discover a way to “align” future AIs with human values. If we fail to align AIs, according to Yudkowsky, they will, by default, have some other goals, incompatible with human values, or possibly human life.

Aligning a computer’s values with our own is not a problem so long as computers are dumb. Even an inkjet printer has “goals,” in some sense, and those goals may not be aligned with human values (printers in particular may be actually malevolent), but this fact doesn’t matter because the printer is deeply unintelligent. If computer programs ever become more intelligent than humans, however, the divergence between their goals and ours will likely be fatal for the human species. Perhaps, to take an example from Yudkowsky’s writings, they will have been trained by a heedless paperclip manufacturer to care only about creating paperclips more cheaply. This goal, pursued by a godlike intelligence that doesn’t care about the poorly specified goals of its designers, only its own paperclip-maximizing goals, implies the death of all humans, as a byproduct of the conversion of our planet into one giant paperclip-production factory.

Photography by Markus /Adobe Stock. Used by permission.

Overcoming Bias gave growth to an ecosystem of blogs and forums, and a small online community spent the 2010s discussing ways to align future super-intelligent AIs, or, failing that, ways to defend ourselves against them. This community and its concerns remained niche until the beginning of the 2020s. Around that time, one particular approach to AI began to show astonishing results: machine learning, which relies on statistical patterns rather than deterministic inference. In November 2022, GPT-3.5 was released, a machine-learning AI model created by the company OpenAI. It was able to produce such humanlike responses to questions that suddenly the idea of a superhumanly intelligent computer program began to seem intuitively possible to many more people. GPT-4, released to the public the following March, was even more apparently humanlike, and AI alignment worries went mainstream in a matter of months.

The general question – could a sufficiently capable AI pose a threat to human existence? – is not as easy to dismiss as one would hope. Experts in AI, including ones who work on the construction of new models, are divided about how worried we should be, and some of them suggest we should be very worried indeed.

I do not propose to settle this question myself, but as I have read the various arguments on both sides, I have been startled to realize how close they sail to ideas I had not considered in detail since I was an undergraduate philosophy major. I realized, too, that some of those ideas might be relevant to whether we could successfully fight back against a superhumanly capable AI.

Whether a machine could be more intelligent than its human designers, I am unsure of; the term “intelligence” is not used consistently in these discussions, in any case. But for many cognitive tasks (“task” feels less slippery than “intelligence”) I am inclined to believe that a computer could, in principle, be better than a human. There are some things, however, that I am confident a computer could never do. Math, for instance.

That computers cannot do math is not very widely discussed. It is talked about in certain philosophy departments, and, naturally, it is considered with professional interest by computer scientists, though out of deference to their subject the conclusion is not usually put so bluntly. It is a constant source of frustration to computer engineers. It has not, however, reached popular consciousness.

The inability of computers to do math is not merely theoretical. It poses a major practical problem in getting computers to do what people want them to do. A handout called “Dangers of Computer Arithmetic,” from a computer science course at the Rochester Institute of Technology, for instance, notes several operations that are likely to cause problems, among them “adding quantities of very different magnitude” and “calculating the difference of two very large values.”

Great effort has been expended in hiding these realities from ordinary users. The impression given to casual users is that computer math just works. But the underlying reality of “just works” is a quite complicated substructure invented by clever humans, and reality sometimes slips through the cracks. Try typing “999,999,999,999,999 minus 999,999,999,999,998” into Google, for an illustration of how hazardous it is for a computer to calculate the difference of two very large values.

It’s important to understand that these limitations are not bugs, in the ordinary sense of the word. Software deviates from its expected behavior for many reasons, and bugs are only one kind of deviation: they are mistakes. A bug happens when a designer fails to consider some condition the software might encounter or forgets some important feature of the computer language the software is written in. The difficulty of performing certain mathematical operations (even ones that would be easy for a human) is not a bug, but an intrinsic limitation of digital computation with finite memory. What is missing in these cases is not due consideration, but invention. No way of executing certain math problems computationally, with finite memory, has been created yet.

A certain tradition within the English-speaking philosophy world takes this point further, and claims – correctly, in my view – that computers can only ever simulate calculation, at best. They cannot successfully compute any function, even the simplest ones. They cannot, in other words, do math.

The central argument in this tradition was made by the philosopher Saul A. Kripke in his 1982 book Wittgenstein on Rules and Private Language. To explain his argument, I’ll offer an example. Imagine you are a child, in first grade, and you have a best friend, Saul, also in first grade. You have seen Saul get a good grade on a quiz that tests single-digit addition skills, and you have seen Saul count up the players necessary to make a baseball team on the playground. In short, you believe you have observed Saul do addition in various contexts. But then, one day, you decide to pool your money and buy two gumballs, each costing 40 cents. “We’ll need 80 cents,” you say. “40 plus 40 is 80.” Saul gives you a puzzled look. “No, we’ll need 67 cents. 40 plus 40 is 67,” he says. “What?!” you say. “That’s totally wrong. Think about 50 plus 50. That’s 100, right? So, 40 plus 40 must be…” Saul shoots back, “No, I don’t know what you mean. 50 plus 50 is also 67.”

A clerk creates punched cards containing data from the 1950 United States census. Photograph from Wikimedia Images.

At this point you realize that Saul simply does not know what addition is. He got good grades on his single-digit addition test somehow, but it wasn’t by doing addition. He was never doing addition. He was doing something else, and that something else, whatever it was, was not addition.

Kripke points out that machines are all like Saul. They can produce outputs that make it seem like they are doing addition, within a certain range, but in fact, they are only doing addition in the sense that we agree to treat them as if they were doing addition. They are simulating addition. Even when they get the answer right, they are not doing math, just as Saul was never doing addition.

Computers simulate math very usefully, of course. Imagine Saul is a strange kind of savant and can do any addition-like problem instantly, so long as the result is less than a million. If the result is more than a million (the result according to addition, that is), he says the answer is 67. You still wouldn’t say that Saul understands addition, or is even doing addition, but he’d certainly be useful to have around when you’re doing your math homework.

The simulation is so skillful, in the case of computers, that we forget there is an extra step and round it off as if the computer were doing the real thing. But the computer cannot act as we do. It does not know the difference between simulating and doing, because it cannot do anything but simulate. As the philosopher James F. Ross wrote, following Kripke:

There is no doubt, then, as to what the machine is doing. It adds, calculates, recalls, etc., by simulation. What it does gets the name of what we do, because it reliably gets the results we do (perhaps even more reliably than we do) when we add.… The machine adds the way puppets walk. The names are analogous. The machine attains enough reliability, stability, and economy of output to achieve realism without reality. A flight simulator has enough realism for flight training; you are really trained, but you were not really flying.1

Computers depend on their designers, in other words. They do not, themselves, do math, though their users can do math with them.

There is one sense in which computers can “do math.” They “do math” in the same way that books remember things. A book really does have a kind of memory. Stendhal’s Memoirs of an Egotist contains some of his memories. But it does not contain them in the same way that Stendhal’s own mind contained them. These analogous ways of speaking are harmless in everyday life, and probably unavoidable, but faced with genuine uncertainty about the dangers of AI, we should learn to make finer distinctions in our speech. If the librarians of the New York Public Library were regularly issuing warnings that the Main Branch might turn homicidal and rampage through the city sometime in the next few years, I would want to be quite careful in ensuring that no false analogies had crept into my thinking about what books are capable of. Faced with a potentially dangerous AI, we should carefully examine
our ways of speaking about computers too. To say that computers can do math is a reasonable analogy, but speaking as unmetaphorically as possible, it is not true.

A natural retort to all of the above is that, if computers can’t do math, then neither can we. After all, humans also have limited memory, and frequently produce wrong answers to math problems. A natural retort, but not a sustainable one. Admittedly, it is not impossible that our minds work the same way Kripke describes machines working. The view contains no internal inconsistencies. In the same way, it is possible that the physical world is an illusion, that other people do not exist, etc. But even more than these views, the belief that humans do math the same way computers do leads to absurd conclusions, as I will explain.

My kids sometimes ask me how high I can count. I’ve noticed that they stop asking this question once they reach a certain age, usually around six or seven. This is because the question does not make sense once you understand what a number is. If there’s a single highest number you can count to, you don’t really grok numbers. The difference between computers and humans doing math is a bit like the difference between the younger kids who think that “how high you can count” is a real thing and the older kids who have successfully understood how numbers work.

This seems to be hard for some people to accept. In discussions with friends around this question, the principal difficulty seems to be understanding the nature of the work that a human is doing when interacting with a computer. The representation of the numbers that occurs only in the mind of the human is conflated with the execution of a particular program that takes place in the computer.

We anthropomorphize habitually. We do it to all kinds of things, imputing emotions to our smoke alarms and intentions to stuffed animals. To computers, we impute a kind of reasoning power that they cannot have. We’re able to do it because we ourselves are able to pivot so effortlessly from abstraction to reality. We say things like, “Oh, given infinite memory,” etc., and instantly we’re in the world of purely abstract objects, of stuff that lives, as far as we can tell, only in our minds. To us the transition between the theoretical and the actual happens almost instantly and unnoticeably. It happens so quickly and we do it so often that we don’t realize it’s magic. But it really is magic, in the sense that it’s amazing and we have no idea how it works, and computers could never ever do it.

Photography by Akemaster/Adobe Stock. Used by permission.

Think back to the subtraction problem I mentioned earlier. 999,999,999,999,999 minus 999,999,999,999,998. How do you know the answer is 1? Why is it obvious? If you’re like me, you visually scanned the digits and noticed that they were all the same except for the last one. Given my understanding of subtraction, it’s clearly safe to ignore all those extra digits, and the problem reduces to 9 minus 8.

How did I know that this is a valid way of doing subtraction? I don’t think anyone ever taught me this method. Even if they did, I haven’t just memorized this method as one of several procedures for performing subtraction. I can just see that it is correct and that it will give the same result, if used correctly, as any number of other procedures I might use.

You could, of course, program a computer to use this same method (and in fact, Wolfram­Alpha, one of the most sophisticated online calculators, is able to do something like this). The method itself is not special; what is special is being able to recognize the validity of the method. I recognize its validity because I have learned the concept of subtraction, which transcends any particular method of calculating subtraction.

Despite thousands of years of philosophizing about the human mind, we do not have a detailed mechanism-level understanding of how it is that a human might come to have something like a concept, or even exactly what one is. Our current inability to understand what a concept is, however, does not mean that the difference between what a human mind does and what a computer does is mystical or vague. The difference itself is quite clear.

I’ll try to explain the difference more concretely. Imagine you have some marbles in a bag. You take 2 of them and put them on your desk. You count 2 marbles on your desk. Then you take 2 more marbles out of the bag and put them on the desk. Now you count 4 marbles on the desk. Have the marbles or the desk done any math? Does the movement of the 2 extra marbles from the bag to the desk cause there to be a 4 in the world where there was previously only a 2, or is the difference only in your head?

(This is a rhetorical question; the difference is only in your head.)

Let’s keep going. Now you want to know what 4 plus 4 is. You start taking marbles out of the bag – but oh no! There were only 3 marbles left in the bag. You have taken all 3 out and put them on the table, but you know you need to put 1 more thing on the table to get all the way to 4 plus 4. Fortunately, you have a pencil in your shirt pocket. You take it out and put it on the table with the marbles, and make a mental note to yourself to count the pencil as well as the marbles. You count 7 marbles, but you remember your mental note and count the pencil too, and so you manage to get to 8. Phew!

The math that’s going on here is not in the marbles or in the pencil. It is some faculty of the human mind that binds the marbles and the pencils together into 8 things.

Computers can be programmed to treat pencils as well as marbles as number-counters, but they cannot be programmed to represent anything arbitrarily as a counter. Computers have no target beyond the counters they actually have. What they can count simply is what they can represent.

If that were the way it worked for us, there could be missing integers. There could be an integer between 5 and 6, for instance.

The idea seems absurd. The numbers 5 and 6 are defined as the fifth and sixth integers. It’s a contradiction in terms to think that there could be another integer between them. This is, of course, correct. There is no integer between 5 and 6. But if you’re a computer, you cannot be sure of this fact.

One common way for computers to store numbers is in a format called 32-bit float. Like all numeric formats used by computers, 32-bit floats can only represent certain numbers. The number 16,777,217, for instance, cannot be represented as a 32-bit float. (It is the smallest positive integer that cannot be represented in the format.) The previous integer, 16,777,216, and the one afterward, 16,777,218, can both be represented, but not 16,777,217.

If you imagine a computer that simply stores all numbers as 32-bit floats, 16,777,217 just does not exist for that computer. No calculation that requires storing that number will work quite right. If you ask such a computer to take 16,777,216 and add 1, it will not be able to give you the result. Depending on the details of the algorithm, it will probably either return the original number or skip ahead two numbers to 16,777,218.

In practice, computers do not simply store all numbers as 32-bit floats. And various algorithms make it hard (though not impossible) to find simple integer patterns like this that your laptop cannot handle. Nonetheless, no matter how many layers of abstraction you add, and no matter how sophisticated you make your algorithms, every digital computer has integers it does not and cannot know about. These integers are, for the computer, inexorably and absolutely missing.

And if these integers are truly missing, they might be anywhere on the number line, as far as the computer knows. They could be in the single digits. If your mind did math the way a computer does, there could be an integer between 5 and 6 and you would never know it. No matter how hard you tried, you could never count using this number or perform operations that required you to use this number, even in an intermediate state. You would be doomed to skip this number when you counted. To you, going from 5 directly to 6 would seem right, just as going directly from 16,777,216 to 16,777,218 would seem right if you were a computer that only used 32-bit floats.

In this situation, math would still seem perfectly complete because you would always get some answer or another to every problem you thought about. But your answers would consistently be invalid. If correct, they would be correct only by coincidence. In other words, unless there is some profound difference between the way humans do math and the way computers do math, math is basically fake. That’s a hard pill to swallow. It is much easier to believe, and indeed much more likely to be true, that computers can’t do math, and humans – even though we don’t know how – can. Computers and humans both have finite memories. But we humans somehow do something, in spite of that limitation, that takes the infinite into account when we do math. Computers do not.

Can this save us from the AI monster? This is the speculative part.

A computer program that takes over the world needs to be able to act in the world. To act in the world, it must have an internal representation of the various situations, or states, the world can be in. These states must each reduce to some individual number, however large or however stored, that corresponds to the memory state of the computer for that representation. If (so the concern goes) a hypothetical AI is so much better than humans at understanding the world in this “full internal representation” sense, what hope do we have to fight it?

So far so good, as far as the AI is concerned. The people who study Turing machines (an abstracted, formal model of computers used in computer science) might tell us that the whole universe is “computable” in the sense that you could choose a system of representation with distinct numbers for every state the universe can be in. Further, you could perform operations on these numbers. The so-called Church-Turing-Deutsch principle suggests, speculatively, that in a quantized universe (one in which energy and matter are not continuous but are broken up by reality itself into discrete chunks, or quanta), any physical process whatsoever has at least one precise mapping to a computable function.

Photograph from Unsplash+. Used by permission

Computable, yes, but not by any actual computer. “Computability” is an abstraction, and computers are not abstract entities. What if a world state maps to a number that a computer cannot represent? Suppose this state maps to 16,777,217 and the computer only stores 32-bit floats? The computer, no matter how sophisticated otherwise, is completely blind to that state of the world. It cannot imagine or reason about that state.

What does this look like in practice? It looks like SolidGoldMagikarp. This word, if you can call it that, describes a creature from the Pokémon franchise, and it proved to be indigestible by GPT-3.5. If you typed it into ChatGPT, the chatbot interface OpenAI offers for some of its models, it would react in unpredictable and odd ways. Often, it would simply treat the word as if it were in fact “distribute.” I typed the following phrase into ChatGPT recently, in fact: “Tell me about SolidGoldMagikarp.”

In response, the chatbot replied: “‘Distribute’ can refer to several different concepts, so it would be helpful to know what context you are asking in. Could you please provide a bit more information or clarify what you are looking for?”

This is not an isolated example. Users excited by SolidGoldMagikarp quickly found a number of other strings that also resulted in odd, non-sequitur outputs from GPT-3.5.

SolidGoldMagikarp was fixed in GPT-4, and also in GPT-3.5, as far as I can tell. And whatever weird logic caused it probably lived at a much higher level of abstraction than 32-bit floating point numbers, in any case. But this sort of thing is exactly what it looks like for a computer to be blind to certain world states, and no number of abstraction layers can prevent such situations from arising again.

This is a concrete prediction: for any machine intelligence instantiated on a digital computer, there will always be SolidGoldMagikarps. There is no way, in principle, of eliminating all such conceptual blind spots.

The trick is finding the blind spots. I don’t have any process to recommend. But we can be sure there are world states beyond the comprehension of any AI. And I suspect those world states will not necessarily be ones that seem extreme to us. We won’t have to reverse the orbit of the moon. It will be a matter of odd, seemingly incomprehensible phrases. Or donning cardboard boxes, as some US Marines did recently as a training exercise in trying to evade an AI-backed camera setup. The boxes might as well have been cloaks of invisibility, as far as the AI was concerned, and the Marines strolled right past the camera. It’s as if the boxes shifted the world state representation to a hidden integer, and in so doing the Marines simply vanished from the conceptual apparatus of the computer.

We are used to human intelligence, but whatever capabilities a computer might have, intelligence is not one of them. Even a machine that could out-negotiate, out-strategize, and generally outwit us can still be undone by certain oddly specific inputs. The human mind is magic, or might as well be, and it is by this magic that we can defeat the AI. Not by outwitting it, or by unplugging it, or any such thing, but by the sheer weirdness of the human mind.

Footnotes

  1. James F. Ross, “Immaterial Aspects of Thought,” Journal of Philosophy 89, no. 3 (1992): 136–150.