History, Current Uses, and Future of Artificial Intelligence

Figure 1: Artificial Intelligence Equation by Martin Bellman

Artificial Intelligence was theorized in 1956 by John Mcarthy. He got many of the core concepts right, but he was around seven decades off. Martin Bellman invented this equation in 1959. We do not need to dive into the nuances of this equation, but know that this equation started the first phase of AI. This equation attempted to mimic our brain’s reward circuit.

To understand AI, we need to understand a couple of concepts in neuroscience: the formation of memories and the dopamine reward pathway. If we understand these two concepts, we can understand AI.

Our brain has almost 100 billion neurons. Neurons alone are not intelligent. Neurons connect through a structure called a synapse. As humans and every living creature, we perceive the world through our senses. For each unique perception, we have chains of synapses called neural circuits. This may be the sound of your name, the smell of cinnamon, the sensation of the process of walking, etc. Everything that you know is stored within neural circuits all over your brain. We perceive a near-infinite amount of sensations, so we need a way to know what is actually worth paying attention to. All memory is the bias (or strengthening) of certain neural circuits, making us more likely to perceive that same sensation again.

Dopamine is our brain’s currency of what is “good”. When people hear dopamine, they think of pleasure. However, dopamine is actually the neuromodulator that regulates desire, pursuit, and motivation. In short, it drives us to do things. When our brain releases dopamine prior to and during certain activities, our brain enters a neuroplastic state, which means neural circuits are ready to be changed. From this neuroplastic state, neural circuits that relate to the pursuit of said goal/activity are strengthened or weakened. For everything we do, we have an implicit expectation for the amount of dopamine released. Once we do the activity, our brain compares the amount of released dopamine to our expectations, and the larger the difference in dopamine, the greater the strengthening or weakening of certain neural circuits.

So, now that we understand these neuroscience concepts, let’s talk about artificial neural networks. The main takeaway is that artificial intelligence mimics our brain’s dopamine reward pathway and our memory formation. Neural networks have an input layer where the network takes in information. Then the data is passed through the hidden layers. Neurons in the hidden layers are connected through artificial synapses; these are called parameters. Simply put, these parameters are just the weights that the input is multiplied by. At the end of the network is the output layer, which are the outputs that we defined before making the neural network. For example, if the inputs of a neural network are images, the output will be “Cat” or “Non-cat”. Once the AI makes a prediction, it compares its prediction to the actual value. Then through a process called backpropagation, it updates the synapses of the network to correct itself. The greater the error between its prediction and the actual value, the greater the change. Sounds familiar? Well, this is exactly how our brain’s dopamine reward pathway works.

Artificial Intelligence needs labeled data for the network to learn from. When I say, “the actual value”, this actual value comes from a labeled dataset. For example, if we wanted to train a neural network to classify images of cats. The dataset would contain images and labels of “cat” or “non-cat”. The network would take in the image, then break it down into numbers and feed it through the network. At the end of the network, the value will be between 0 and 1. 1 meaning cat and 0 meaning noncat. Every time the model guesses wrong, the model updates its weights.

There are many different types of Artificial neural networks that have been created. These networks have been created for certain tasks and can only perform those tasks. But all of this suddenly changed in 2017.

The transformer architecture took the world by storm. The thing that makes transformers so powerful is the attention mechanism. This allows the model to understand the relationships among words. For example, if you told an old model, “The dog crossed the street because it saw a treat”, old models would not know whether it referred to the dog or the street. Attention mechanism learns relationships among words. As transformer models are trained, relationships among words are remembered, for example, dog and treat. Transformers are great at working with words. First, it converts the words into tokens. The model understands numbers, not words. Each word is assigned to a token. For example, the word cat will be assigned 1108. The word cat will always be assigned to 1108, always. The token for the word cat was predefined before the model even started training. Now, these tokens are put into the neural network. Now the tokens are passed through the neural network, with the attention mechanism, and we get an output token. Since tokens will always match certain words, we can convert that token to text that we can read. Every single modern AI network uses transformers. It is by far the most powerful architecture. All image generation, classification, and so on, all of these models converged into one model, which we call the transformer. Transformers can do computations in parallel, which means that when you train them, the calculations that occur inside the neural network occur at the same time, and not sequentially. This is a massive benefit as the scale of the models grows. It makes it much easier to train a large model on multiple GPUs.

Now, among transformers, one type of model reigns king, large language models (LLMs). Something that may surprise you is that an LLM only predicts one word at a time. They do not generate the whole response at once. LLMs are extremely good at predicting the next word in a sequence. For example, if you ask, “What is the tallest building in the world?” The model reads that sentence, and it predicts the most likely next word. LLMs add the predicted word to the question plus what it has generated so far and keeps on predicting one word at a time until it gives the full answer. There is a lot more that goes into the model, for example, contextualization and knowing when to stop giving a response, but we won’t get that detailed for this article. The awesome thing about LLMs, is unlike any model, even other types of transformer models, it does not need labeled data. It learns from the data it is given. It trains by performing predictions on the next word on all the data you train it on. In the case of ChatGPT, it is trained on basically the whole public internet, multiple times. Through this unfathomable amount of training, which consumes enough energy to power a country, it gains immense knowledge from trying to predict the next word on a vast variety of data. Now, the ChatGPT that we interact with is actually trained on labeled data. The finetuning of this model involves creating a bunch of Q&A questions to show the model how to respond. ChatGPT has begun to utilize hybrid models. For example, when you ask ChatGPT to generate an image, ChatGPT’s LLM interprets your request and provides a detailed textual description, which is then passed into the DALL-E image generation model. We can expect to see more examples of hybrid models in the future. Imagine creating your own song with a prompt or making a 1-minute cartoon through a prompt.

We will now talk about what to expect from AI in the next year or two. One surprising benefit of LLMs is that they can do things that they were never meant to do. When you train a model on a giant corpus of data, you don’t decide how it learns. One of the hottest topics in AI right now is the emergence of using LLMs for offensive cybersecurity. A Capture the Flag Challenge for LLMs was conducted in 2023. The results were surprising. The LLMs did exceptionally well on the offensive and defensive end. They aren’t quite as good as the best humans in those domains due to the dynamic nature of cybersecurity, but we can 100% expect to use LLM on the offensive and defensive end of cybersecurity in the future. AI will gain a stronger foothold in the cybersecurity industry as LLMs grow more powerful, especially with the implementation of GPT-5. It is rumored to have around 10 trillion parameters, which is 5 times GPT-4. Also, the extensive finetuning will make it much better at communicating and giving you exactly what you need. ChatGPT is an incredibly powerful tool. It is the best teacher in the world. Understand how to fully utilize it.

All the AI giants like Meta, Microsoft, and Google are currently in a race for artificial general intelligence (AGI). Right now, our AI is considered artificially generated intelligence. Our AI cannot think critically and reason like we do. It is learned from labeled data and is extremely good at making predictions. Artificial general intelligence will have the ability to critically think like us. It will have an expert-level understanding of every domain. It will be able to solve complex tasks that it was not trained to do. In an attempt to create AGI, the AI giants are pouring resources into LLMS. Currently, bigger = better. The more parameters and the more data LLMs train on, the better they perform.

The final evolution of artificial intelligence is known as artificial superintelligence. As of now, this is still a theoretical concept. To be honest, it sounds pretty scary. We may be capable of creating something much more powerful than us. Artificial superintelligence will be miles ahead of us in every single domain. It will be able to improve upon itself at an exponential rate. Its capabilities are quite literally outside of our understanding.

Around 6,000 thousand years ago, most humans were still hunting in tribes. Five thousand years later, the first societies emerged. We made slow and steady progress as a species. As time went on, our rate of progression increased. Just in the past hundred years, we developed cars, airplanes, computers, nukes, cell phones, and the internet, and now we are at the dawn of the age of AI. Progress will only get faster and faster. AI will be at the heart of this exponential progress. The question that everyone should be asking themselves is, so what? Throughout history, there have been certain unavoidable waves. Electricity, cars, the internet, etc. Those who did not adapt to the new technology were swept away. The early adopters ran circles around those who were too stubborn or too naive to change. AI is THE unavoidable wave. “AI” is a buzzword tossed around, but actual AI is here to stay. Understand how to leverage AI. As cybersecurity personnel, AI will affect our industry more than almost any other industry. The AI wave is here, is it your choice whether to ride it or get swept under it.

Written By: Daniel FioRito