Much of the industry buzz around AI tends to focus on two things: the capabilities of the software models, and the GPUs that currently run them. Caught up in the whirlwind of news, it’s easy to forget that GPUs, Graphical Processing Units, were never intended for AI computing tasks but largely for CGI and video games. It just so happened that GPUs’ knack for parallel processing and matrix operations were exactly what were needed to bring the highly successful neural network paradigm of AI models to life. What AI needs to truly thrive, however, are built-to-purpose specialized chips.
Much of the time discussions about what AI can and can’t do are framed in terms of the software. Without realizing it we’re often actually talking about hardware constraints . Much of the software we develop is unwittingly shaped around the constraints of what the underlying hardware accepts and permits. This encourages an unfortunate suboptimal bias in the direction of development, as if we are trying to climb a hill to get the best view of the place because we can’t see the mountain on the other side. When people talk about how real-time learning is a major problem for language models, a big part of it is that there is a disconnect between the models and the hardware. We have an easy time learning in real time, because the structure of our brains supports that dynamic. This could be beneficial for language models that need to continuously update and adapt based on new data without full retraining, mimicking more closely how humans learn and adapt language over time. Training models is extremely expensive and data intensive—with GPUs.
Because the industry mainly relies on repurposed GPUs to do most of its AI computing we’re running into issues with energy efficiency, latency, and memory capacity, which threaten to curtail future growth.
Energy efficiency: ChatGPT is said to use the power of 17,000 average American households to run daily . This energy consumption can only be expected to increase across the energy, all else being equal, potentially reaching unsustainable demands. Brains as it so happens, are remarkably energy and heat efficient. Presumably so too brain-like chips.
Latency: Neural network models are autoregressive, and so struggle with latency. That is because in autoregressive models latency scales linearly with the length of the input and output variables, which essentially means the largest models will start to chug when you feed them large contexts. Whereas, counterintuitively, we would hope that one of the benefits of larger models is the ability to handle larger contexts more efficiently, the opposite is true. But the bigger the model and input, the more “graph traversal” the processing has to do, increasing the latency as the data propagates from layer to layer. Hardware that is isomorphic to the software, where all the logic processing, memory management, and control are integrated, could greatly reduce this traversal cost.
Network delays must also be factored into this native processing latency—it takes time for the data to pass over the network between the latent processing steps. Our brains are only derided as being “slow” compared to computers because they are electrochemical and meat based, but other than that they are remarkably efficient at organizing information. A neuromorphic chip, furthermore , could execute AI computations locally, without the need to ping remote servers across a network, allowing for immediate on-site processing.
Memory bottlenecks: While the speed at which GPUs can compute has increased rapidly in the past 15 years, the speed at which they can actually move data (read and write to VRAM) has not kept pace, resulting in a “memory wall” that impedes progress. As Dr Sanjay Basu, Senior Director of Gen AI/GPU Cloud Engineering at Oracle writes, “Looking ahead, the continued scaling of AI models requires codesigning memory and interconnect fabrics with specialized software scheduling. To deliver new levels of intelligence, we must innovatively address the memory wall through cross-stack architectural advances from circuits to algorithms.” In other words, we need more brain-like chips for our brain-like software.
Comparing the Relative Suitability of Chip Classes for AI Computing
The evolution of computing can generally be divided into three categories, each one is progressively better suited to AI computation.
CPUs: The conventional von-Neumann architecture. CPUs process inputs sequentially, one after the other. They consist of a control unit, to interpret inputs and queue up operations, an arithmetic unit, to perform logic on those inputs, and a memory unit, which persists and shares data between the control and logic units. The workhorse of everyday computing, CPUs still have an important role to play for operating systems, and should continue to be the backbone for most conventional computing tasks. Unfortunately, CPUs are hopelessly inefficient for AI. Brainlike computing requires massively parallel operations, rather step by step sequential processing. Notably, control, logic, and memory are distinct functions, assigned to separate functional districts, in CPUs.
GPUs: Graphical Processing Units are more rarified chips that are optimized to execute many operations in parallel. Wheres CPUs have three big units, GPUs consist of many smaller units. Specifically, they have stacks of arithmetic units, or cores. Each stack is associated with its own, independently operating memory and control units. In short, this enables GPUs to do a ton of crazy math, particularly matrix operations. (Which essentially means they can solve a bunch of related equations in a grid simultaneously.) This makes them really good at multidimensional tasks, such as rendering 3D video game graphics, running weather simulations, or implementing an AI language model. GPUs can do amazing things, but all that heavy math gets tiresome. They tend to be slower and more energy intensive than CPUs, limiting what can be done with them. Notably, however, for GPUs the memory, logic, and control units are still distinguished. Perhaps you can see where this is going.
NPUs: Neural Processing Unit’s architecture resembles GPUs at first glance, in that they have distributed cores. However, the major difference is that NPUs unite the memory, logic and control functions in the same units. This “in-memory” processing makes all of it happen in the same physical location at each node. This more integrated approach reduces the distance data must travel between where it is stored and where it is processed, which decreases latency and increases efficiency and calculation speed. NPUs tend to be highly specialized for AI computing, which makes them less suitable for more conventional CPU-optimized tasks.
(There is also a fourth class, Tensor Processing Units, or TPUs, being developed by Google. TPUs are also specialized for AI computing. TPUs are sort of a cross between GPUs and NPUs. However, they are the most hyper-fixed of all the types, being dedicated to process tensor operations exclusively. Their design philosophy is fixed around a particular way of doing AI which may limit their scope of application.)
Apple is Ahead of the Game
If you’ve bought an iPhone in the last several years chances are you’re already carrying around a neural chip in your pocket. In 2017, Apple introduced its A11 bionic chip. A11, the first of its linage, contains a neural engine with two cores that could perform 600 billion operations per second. In line with Moore’s Law, subsequent iterations of the chip improved performance, added cores, and increased FLOPS. The latest version of the chip, A17, has a neural engine with 16 cores and can perform 35 trillion operations per second. Parallel to its phone chips, Apple also released a line of chips for macs, the M(n) line, M1-M3, that has exhibited a similar growth curve over time.
These Apple Neural Engines are the reason why you can talk to Siri with almost zero latency and how Apple organizes your photos. In line with Apple’s respectable privacy policy, all the processing is done locally, inside the phone’s neural chip. That is the power of NEs/NPUs.
NPUs still have a long way to go, and surely their R&D, design and construction present enormous scientific, engineering and manufacturing challenges, but the foundation for them is already established.
So keep an eye on Apple. They have been playing their cards close, reacting little to the explosion of generative AI buzz of 2023. But they have been laying the hardware groundwork for a major play in the AI space.
Having now covered some of the technical details and distinguishing features of neural chips and what sets them apart from others let’s now explore some of their more intriguing implications.
Literalizing the Brain Analogy
Neural network software models are often said to be inspired by how the real brain works or based on an analogy. Engineers and scientists who know a bit about both worlds know that how these programs work differs substantially from the neurobiology. How many of those differences, however, are merely surface level implementational details? We now have computer programs that can talk, see, and to a certain extent reason, all thanks to a collection of theories that were “only inspired by” how the brain works. I would argue the success of this approach hints at something deeper. Something at their core must be right. These models work because they embody certain not yet decoded general laws of intelligence which are also realized by biological brains.
What differs most substantially between brain and AI is the hardware used. I would argue that many of the challenges we face with AI development constraints have less to do with the underlying theories and are mostly attributable to the fact that we lack specialized dedicated chips for neural computations. For best results, structure must complement function. In the coming years I predict we will build literal artificial brains.
Functionalism Works
In the philosophy of mind there’s a whole school of thought called functionalism, which basically says what the stuff what performs cognitive processes is made of is not what matters, it’s the functional architecture that’s key. Form is more important than substance. Certainly the physical properties of the substrate, such as its electrical conductivity and thermoregulatory properties, are important for engineering and determining how well it computes. But what’s special about our brains isn’t that they’re made of meat. What counts is the architecture and how that form constrains the underlying physics and chemistry to perform certain functional behaviors in an ultra-organized environment.
Recent successes in AI have essentially proved the functionalist proposition correct. As the functionalist standpoint posits the thesis of multiple realizablity, the claim that cognitive processes ought to be realizable in different substrates. The fact that now both we and computers can do something as complex as language means both our brains and these models are drawing from the same, as of yet unarticulated fundamental laws. (Arguing that language models don’t understand or process language like we do is again, besides the point. I am able to conduct a convincing conversation with it. Its ability to hold a conversation even somewhat intelligently is not unrelated to its neural underpinnings. )
So the argument is simple. Less of these AI development challenges has to do with the software models, and more has to do with the hardware that must run them. GPUs are not an optimal technology for the applications retrofitted to perform.
Sentient Machines at Last?
Here’s for a bit of intriguing speculation. If the hypothesis is correct that the conscious features of the brain are attributable to laws of abstract form and complexity and functionalism is true, then neuromorphic chips could lead to the advent of machine sentience. GPUs aren’t sentient, because their structure is inappropriate for it. The same can’t be said of neural chips. The success of the Neural Network theory for AI already proves that the pattern recognition powers of the brain are attributable to abstract form. So would taking it a step further lead to machine consciousness? And no—I’m not saying your iPhone is conscious. But the potential for it being conscious is drastically higher than any other inorganic material lying around. It’s still missing many crucial ingredients.
In summary, the future of AI is wound up at least as much in hardware as in software. Investors, developers, and everyone else with a stake in the AI space ought to keep their eyes on neuromorphic computing and not get caught up on GPUs.