How Does ChatGPT Work?
You've heard all about ChatGPT these past months, but do you know how it works?
ChatGPT has been working like magic and it communicates with users with a single entry field. It is similar to Google where users ask a question and ChatGPT provides the answer. The only difference is ChatGPT provides an answer based on user intent and context. For example, you can’t ask Google to write code for you, however, you can do that with ChatGPT. So, how does ChatGPT do this? This article will look at the answers.
Main Phases of ChatGPT Operation
When you ask Google about something, it goes through its database and provides the best possible answer. Google has two phases - the spidering and data gathering phase and the user interaction phase.
ChatGPT also works in a similar manner. The data gathering phase is known as pre-training and the user interaction phase is called inference.
Pre-Training
AI’s pre-training uses two approaches - supervised and unsupervised. For most AIs like ChatGPT, a supervised method is used. Supervised training is where an AI model is trained on a labeled dataset. For example, users can train an AI on customer service conversations. The AI can be trained to provide answers based on input. In this approach, the AI model is trained to learn the complete mapping function and map outputs accurately. This approach, however, has scale limits. If all inputs and outputs are anticipated, training would take long.
ChatGPT, however, has very few limitations. It can do anything from explaining quantum physics, write your resume, write code, and just about anything under the sun. Since there is no way for the AI to anticipate all questions, it is obvious that pre-training for ChatGPT is unsupervised. Non-supervised is where the model is trained to learn the underlying patterns and structure without a specific task in mind.
Transformer Architecture
Transformer architecture is used for processing natural language data. It is a kind of neural network that simulates the way the brain works. The transformer architecture uses self-attention and processes word sequences. This is similar to the way a human might look at a sentence or paragraph to figure out the context.
The transformer consists of several layers and sublayers. These layers help the transformer understand word relationships.
ChatGPT Training Dataset
ChatGPT’s training dataset is massive. It is based on the generative pre-trained transformer 3 (GPT-3) architecture. GPT-3 uses a dataset called WebText2 that has a library of over 45TB of text. This allowed ChatGPT to learn relationships and patterns and decipher context more accurately. This is one of the main reasons it is so effective and popular.
Advertisement
By having curious and bored idiots feeding it data.