Fact, Fiction, and Fantasy: Demystifying AI Hype in the Information Age

2024.04.13

Nancy Fulda, an AI researcher and science fiction writer, offers a primer on how neural networks actually work⁠—explaining backpropagation, weights, and training in accessible terms. She recounts how her own research on ChatGPT’s ability to predict statistical voting patterns was sensationalized by headlines into claims of AI omniscience, illustrating the terminology gap between researchers and the public. Fulda emphasizes that while AI is genuinely transformative⁠—enabling new medicines, restoring sight, generating video⁠—current systems lack internal state between interactions and remain far from sentience, with roughly 80% of commercial AI projects failing.

Nancy Fulda

Nancy Fulda is an American computer scientist, researcher, and award-winning science fiction author who specializes in artificial intelligence and natural language processing. She holds both Bachelor’s and Master’s degrees from Brigham Young University. Her technical career began at the German Aerospace Center (DLR), where she worked on autonomous robotic systems. She later pursued doctoral research at the University of Utah, focusing on the cognitive aspects of human-machine interaction and the ethical implications of emerging technologies.

Scroll to current paragraph while playing

Nancy Fulda

In 1710, Jonathan Swift said, Falsehood flies, and the truth comes limping after it. This quote never struck home to me quite so much as it did one morning last year when I opened my daily news browser and saw a headline about some of my recent research. The research we had done focused on studying statistical patterns of voter behavior. I was working with some political scientists. They found this stuff super cool. I was more interested in the AI side of everything. We asked ChatGPT to see if it could predict statistical voting behavior of specific demographic groups. In and of itself, this is not particularly exciting. Anybody who is even remotely familiar with American politics can tell you that a white Catholic man from Texas is likely to be voting Republican, and a black Episcopalian woman from New York is probably casting her vote with the Democrats. Our question was whether ChatGPT understood these identity and geopolitical nuances too. So we ran a study Comparing what ChatGPT predicted specific demographic groups would say with actual national survey data, showing what people in those groups actually did say.

Nancy Fulda

This, however, is the headline that appeared about the research. Spoiler, that is not what the study said. We were looking at broad statistical trends, which are very different from knowing what any specific person is likely to do. And yet, somehow, my innocent research project about statistical behaviors and ChatGPT’s ability to predict those patterns turned into A pseudo-omniscient AI with prognostication abilities. Suddenly, ChatGPT was predicting how specific people would vote. Any person. Perfectly.

Nancy Fulda

This and similar experiences have changed my approach to Breaking news stories about AI. Instead of reading a news story and thinking, that’s so cool, I read a news story and think, that is so cool. I wonder what the researchers actually did. This usually leads me on a rabbit hole adventure following the links. And there is usually something really amazing. that did happen. It is not always correlated with the headline. It has also led me to ponder the ways in which stories about AI do mutate because it is such a fascinating and exciting technology. one that I wrote about before I worked in, and one that I work in so that I can write about it better.

Nancy Fulda

I’ve come to the conclusion that there is a broad terminology gap. When we talk about AI, the researchers who develop AI systems, the practitioners who deploy them, the managers who decide about them and the journalists who write about it all are all using the same words. But our foundational conceptions of what those words mean are sometimes fundamentally different. Today, I’d like to bridge that gap just a tiny bit.

Nancy Fulda

And I think a young man of my acquaintance perhaps summed up the problem best. You know, he said, I’ve been training and deploying AI models for years. I can talk about batch normalization, layers, learning rates, input data, cleaning input data. all of the words. I know all the words. And I know how to use all the things. But then he said, I’ve just never met anyone who can explain to me how a neural network actually functions.

Nancy Fulda

What’s going on under the hood? I was perhaps too enthusiastic at this point. I was like, Yeah, I know that. I can explain that. I’ll give you a crash course. So we spent five minutes, and he found it very useful. This might be unfortunate for you because I’m going to attempt to repeat this experiment. Unbeknownst to you all, you have signed up for a five-minute crash course on AI technology. Here we go.

Nancy Fulda

First. I would like to emphasize that I’m going to talk about neural networks today. But neural networks are only a tiny drop in the vast ocean of technologies that fall under the label AI. All of my favorite, mostly outdated and forgotten, learning algorithms are there. Genetic algorithms, Bayesian learning, KNN clustering, predicate logic systems. AI describes all of those. And yet, when there’s a flashing news headline about AI any time in the past year, it’s pretty much been about a neural network. So that is the technology we’re focusing on today.

Nancy Fulda

So, my friends, this is a neural network. The circles represent neurons. They are loosely based, although largely unresemblant of, the neurons in our physical brains. The connections between them are called weights. They represent the strength of a synaptic connection between two neurons. The job of any given neuron in this network is to sum up all of the incoming signals that reach it through those connections and then decide if the Grand total of that sum is big enough for it to pass any signal on to its neighbors.

Nancy Fulda

Tracing the paths of these signals in a real brain would be very difficult because the brain is recurrently connected. That means neurons connect backwards, forwards, sideways in three dimensions, and then the web of computation becomes very complex. Fortunately for us, The neural network system used by ChatGPT, DALI, Stable Diffusion, any of those technologies, has links that travel only in one direction. Input comes in at the top. You can draw it from the bottom going up too, but on my slide, it comes in at the top. And the numbers only ever travel in one direction. At the bottom, we get some output. The permuted signal that came from that input and ran through this process, which I’m going to explain to you, and it’s going to make sense. It’s really fun. We only use one math equation. Models all the way through, bounces through these neurons, and numbers come out the bottom.

Nancy Fulda

Now, depending on how you interpret those numbers, the AI system has done something. For example, if we wanted to create a system that said if the sum of incoming numbers was greater than or less than zero, we could say Okay, if the left output neuron has a high value, the sum is greater than zero. And if the right output neuron has a high value, It means the sum is less than zero. So then you look at those two output neurons, you compare their values, and you say, oh, here’s the answer to my question.

Nancy Fulda

So here we go. We’re going to pretend we have some inputs, they are numbers. They can be positive or negative. They can be any value you can imagine, although they’re usually in the range of negative one to one. They come in. Each of those numbers then gets multiplied by a weight value. Remember the synaptic strength of the connections to determine how much of that neuron’s activation is actually going to reach its neighbor. To simplify things, we’ll just look at this one little neuron in the second layer. It has three friends. They each have a certain amount of signal that they are sending forward through the path of computation. Each of those signals gets multiplied by some weight value. Let’s pretend that the first neuron has a weight value of 0. 2, second 0. 1. Third has a wave hour of zero. Those neurons do not talk to each other.

Nancy Fulda

All right, math majors in the room. If we are going to calculate the incoming signal, To that incoming second neuron, what’s it getting? It’s getting 1. 1 times 0. 2. Yeah, this is participatory now.

Nancy Fulda

Okay, what comes in next? 0. 5 times 0. 1, right?

Nancy Fulda

And then 0. 8 times 0. And if anybody wants to math it all out, I used a calculator just to make sure I didn’t have a lie on my slides. We have a total incoming signal of 0. 82.

Nancy Fulda

That’s great. This neuron now knows the grand total sum of its incoming signals. Its job is now to decide how much signal to pass on. There are various algorithmic ways we decide that, but one of the simplest is to just say, I’ll pass it all on. That doesn’t work very well when you’re training an actual machine learning system, but we’re going to go with it. So, this neuron is going to pass on some signal. The same thing happens with all the other neurons in that little layer, and then you get to some sort of output value.

Nancy Fulda

This is the fundamental process that is happening inside ChatGPT, inside stable diffusion, inside MidJourney, inside all of those cool technologies that we read and see on the Internet. Now, it’s happening at a much, much larger scale. We’re talking about hundreds of millions to billions of little weight values connecting all of those neurons. Is more numbers than there are stars in the Milky Way galaxy. These things are phenomenally huge. That’s what makes them so exciting. But this is what’s going on. This is what chugs under the hood. And if you set all of those weight values perfectly to the exact right number, you end up with the magical things Carl just told us about.

Nancy Fulda

You can imagine it like a giant pinball machine. If you adjust all the springs and levers and knobs and bouncy things just right, you can make it so that balls of a certain size and weight come out one hole. and balls of a different size and weight come out somewhere else. Fundamentally, that’s what’s going on. We’re just talking about lots and lots of balls with lots and lots of different sizes and weights, and then you get a pretty picture. But the same principle applies.

Nancy Fulda

Okay, fine, Dr. Fulda, you are all asking me. But how do you choose the perfect magic combination of weights when there are 175 billion of them? This is where the magic of machine learning comes in. We are not going to math this part out, but there is a beautiful thing called back propagation. It is the training algorithm that is used for all of the headline making systems I am aware of right now. It’s not the only one, but it’s the big one. And back propagation works like this.

Nancy Fulda

You start by randomly initializing all the weights, total random values somewhere between zero and one. You run your data through, and guess what? It does a really terrible job at whatever you want your AI system to learn. it outputs garbage. That’s okay because garbage is the first step to awesomeness.

Nancy Fulda

We then look at those output weights and we say, okay, sure, it gave us maybe a 0. 3 and 0. 7. We wanted those to be something different. And that is why you’ll hear things about training data and training examples when we talk about neural networks. Because we have to tell the neural network what it was supposed to do in order for it to be able to get. Better. It needs feedback.

Nancy Fulda

So, what if we really wanted to have seen 0. 2 and 0. 9? The backpropagation algorithm says, okay, I’m going to figure out how wrong I was and in which direction. In this case, that neuron with the 0. 2 was With the 0. 3, it was too high, right? So we’re going to calculate that error signal and send it backwards through the network. We wanted 0. 2, we got 0. 3. That means that each of the weights coming into that neural network was a little too high. Too much signal got through. And the neuron’s value was too large.

Nancy Fulda

We therefore adjust each of the weights a slight little bit in the direction that would have made the answer more correct. The amount that each weight gets adjusted depends on the amount of signal that it transmitted to that neuron. The weights that were more responsible for the mistake get a larger adjustment. The weights that weren’t very responsible get a smaller adjustment. You do this a couple of hundred thousand to a couple hundred million times with that much training data. And the weights slowly, slowly, slowly converge to a set of perfect pinball machine characteristics. And you get magic. It is amazing.

Nancy Fulda

There is nothing more pleasurable to me than working with these systems. I did once call one of them my sixth child. I stopped doing that at some point, but that is how I feel about them. They are so fun, and they are driving so many amazing things. You now can explain this to your friends and neighbors, right? This is how it’s all done under the hood.

Nancy Fulda

Obviously, it’s way more complicated than that. There are gymnastics happening with how those input data are structured, combined, how the input prompts are created. There are All kinds of supplementary technologies. You’ll hear about RELU activations, batch normalization. But the core technology, the real heart and soul of it, you just learned. You know what’s going on now.

Nancy Fulda

This brings us back to our question and our quest for truth on the Internet. Well, let’s start with some verifiable facts. What is really going on out there if we can’t always trust the headlines?

Nancy Fulda

Well, we do have really big language models, and they’re going to get bigger. These suckers are phenomenally giant. They are so giant that nobody’s laptop or home computer can train them. They can struggle to run very, very small versions of them. But generally, you need a big data center. It’s just that big. And they’re doing really cool things.

Nancy Fulda

But it is also true when you read that the scientists don’t really know why the neural network did what it did. We give numbers in. Based on numbers that come out, we tell it it made a mistake. And the neural network does the process of deciding how to change all the weights. There’s no way that we have at present to determine Why did this output come out instead of that output? Except to say, well, we ran the numbers, and that’s what it did. There are lines of research that are seeking to address this and that are looking at ways to attach meaning to some of those numbers. in a way that humans would understand, but that research is not very progressed yet.

Nancy Fulda

Also true, AI is so amazing, you guys. It is, in fact, generating videos and artwork. It is, in fact, giving voice to the mute and eyesight to the blind. It is, in fact, changing our world, helping researchers to develop new medicines, helping us to envision new futures. It is the biggest change I’ve seen in my lifetime. And yes, I am also including COVID in that statement. It’s just phenomenal.

Nancy Fulda

However, you should all know that AI is also very prone to failure and not suited to all tasks. These seckers are really challenging to train. When they work, we get splashy, fantastic news headlines. When they fail, they fail silently, and nobody wants to admit that they even tried. A recent study found that about 80% of AI-related projects in commercial domains fail. They just don’t work. Sometimes because the project was malconceived, sometimes because they had the wrong project leader, but often Very often, because the problem just wasn’t one that these technologies are currently suited to solve.

Nancy Fulda

Also true that AI will soon be heavily regulated. Right now, we are in the Wild West phase of the AI revolution. The technology is here and legislation has not caught up yet. This is very much similar to the experiences of one of my favorite science heroes, Nikola Tesla, who in the early days of electricity, before regulation had caught up, had a tower in Colorado. That was 80 feet tall. It had a metal spire that went several feet up above that, several hundred feet up above that, and it put out sparks of simulated lightning. Big thunderbolts that frightened horses 25 miles away and caused the hooves of cows to spark when they walked and caused butterflies due to the static electricity in the air To have a blue-glowing halo called St. Elmo’s Fire to glow around them as they flapped their wings. It’s kind of the stuff of horror stories, but it’s also the basis of science fiction mad scientist tropes, which makes me very cool, right? We are in that space. Of AI. It is going to change. I’m not sure how much or where the legislation will fall. It will protect the public in many ways. It will also slow down progress in many ways. And it’s up to each person to decide whether they think either of those things is good or bad.

Nancy Fulda

The third thing I want to emphasize before I run out of time is the idea that sapience or sentience or artificial general AI is a big step away from these current models that are making the headlines today. I don’t know how long it will take us to make that step. But the current systems based on neural networks don’t maintain any sense of internal state between interactions. We pass information in, we get information out. Sometimes we store the information we passed in so we can pass it in again. And then we end up with a chat bot that maintains a coherent conversation. But the weights in the neural network have not changed as a result of that discussion. The system does not rewrite itself. In the commercially deployed systems.

Nancy Fulda

Now, there is research around that. And I will not go into the details of these here. I do want to show them very, very quickly. You should feel free anytime today. Come up and talk to me. These are the things my research lab dreams of, looks for funding of, they keep me awake at night, and they’re really cool. So, what’s coming next?

Nancy Fulda

Neural parasites. Since AI models are really difficult to train and you don’t want to fine-tune them, can you have a little bit of code that injects itself and changes it for you? Insights from biology. Can we make those neural networks more like the things that do happen in our brains? What if a neural network did have mirror neurons like people did? And so when it saw someone being sad, it also started to feel sad. What would that mean for us? What if we could manipulate its stored parametric memory for real? Analog computation, what if we didn’t use GPUs anymore? GPUs are energy hogs. If we did analog computation, these things would take orders of magnitude less energy to deploy and run. And there’s a huge, huge focus on AI safety. It may comfort you or perhaps disturb you to know that AI researchers are thinking deeply and heavily about this topic. It is coming up at every academic conference I go to. And people are wondering, what should we be doing? What can we do? How would that look? And there are lots of really interesting conversations happening in that space.

Nancy Fulda

Thank you so much for your time today.