Is AI really magical, or is it just good at remembering stuff?
It is past 2 am in the night and I have stopped to take a break. A break from generating code using AI models. It seems like we, as software engineers, are not at all required anymore. The AI models are pure magic.
So I set out to learn about all the magic behind this thing. If I believe in one thing, it is that magic doesn’t exist. It is magical till you know about it.
P.S. - THIS IS NOT ANOTHER OF THOSE AI GENERATED BLOGS. IT IS A 100% HUMAN WRITING
IT BECAUSE HE CANNOT SLEEP AND IS OVERLY WORRIED IF HE WILL HAVE A JOB IN THE
NEAR FUTURE.
Why is everyone running behind AI?
In the past couple of months or in a year, all we have heard about is AI, AI and more AI. AI can do that, AI can do this. That company released a new AI that can do this. We don’t need developers. There have been massive layoffs. And so on and so on.
The thing is, AI makes your life easy. The things that you used to wish that you could do is a possibility today. You can write that program in seconds. You can read a book in minutes. You can even book that flight ticket, hotel and taxis in the blink of an eye. Maybe a couple of blinks, depending on how strong your internet connection is.
What is an AI model?
An AI model, put simply, is just a program that can do tasks that needed human intelligence in the past, like generating text or images or videos. Say, translating text, or transcription speech to text, etc.
Why did AI get popular all of a sudden?
Unlike a normal CPU, which has 8–10 cores, running an AI program needs a lot more cores. The reason is that these programs need things to be executed in parallel, and for that, they need 100s of 1000s of cores to do things in parallel. Hence, we had to move from CPUs to GPUs and even TPUs. These are processors with 1000s of cores which allow parallel processing. These days, we have the technology that allows us to manufacture these types of processors with ease. Again, when I say ease, it means only big corporations can because it obviously needs a lot of money in the first place.
Since big companies had the money, they made AI models and are selling them to us as subscriptions. You would have seen Chatgpt asking you to pay 20$ per month for using it.
So, what is inside this AI program?
To be frank, the details are far too complicated, but here is what I've learned. There is something called neural networks and the transformer architecture. Basically, once you enter data — say, a sentence- that sentence goes into a weird mesh of neural networks. A neural network is just a collection of neurons connected to each other via paths. Each of these paths has a value associated with it, which they call its weights.
The number of such paths in a neural network is called its parameters. Apparently the more paths/parameters it has, the better the model. So when you see something like DeepSeek 14B or 70B, it basically means DeepSeek company has two neural networks — one which has 14 billion parameters and 70 billion parameters. And since the larger the better, 70B wins.
Why is larger better?
Basically the neural network can be imagined as a human brain. Just like how the human brain ages, it creates more and more connections; similarly, the neural network also ages and creates more and more connections or paths between neurons. A human brain grows based on the data/experiences it has. Similarly, an AI neural network grows based on the data that is fed into it. This they call a dataset. This is also a reason why the AI models are coming up so quickly, because data is readily available today. These data are passed to the AI model to train (adjust the weights) it for specific use cases.
For e.g. the same neural network can be trained with two types of data and produce an entirely different mesh. That is why we have text models, image models, embedding models and so on. It is the type of data on which they were trained.
The more complicated the mesh, the more data it was fed. And the more data it is fed, the better it is capable at doing its job. Just like a human being with a lot of experience will be better than a baby who is just learning to walk.
So, what is GPT?
Well. Transformers is the architecture, i.e. take data, parallel process it, adjust weights, and create a complicated mesh of neural networks. This architecture then gave birth to what they call GPT, which is a special type of transformer that can generate patterns from the neural network. Put simply, it has been trained on the English language in such a way that it can recognise and generate patterns of text. So if you give it a sentence, it can dissect it and generate a response to it.
You might feel it is magical. But think about how two humans would have a conversation. Let us say one human says, “Hey, how is it going”? As a human being, you know that it is a positive sentence, as in a happy emotion. You know the person who has asked is asking about how your life is. You know that you should reply in a positive manner and keep it short and simple. Plus, you also know that you should ask him something in return as a gesture.
Now, how do you know all of this? Well, it is because you have seen others do it, maybe your parents, movies, etc. That means you are pre-trained for this sentence. And that is exactly what GPTs are, too. They have been fed millions of such sentences and made to understand what reply to give, what emotion is depicted, what follow-ups to ask, etc. Put simply, they have seen billions of such sentences and know what to reply and when.
So next time you talk to Chatgpt, remember, it knows how to reply to anything that you type to it.
Here is a thought
AI is learning in the exact same way as humans. Unlike humans who age or have distractions, or have responsibilities, AI has none. It just keeps learning day in and day out. So the question is, will humans be able to survive this era?
Happy AI-ing!