Artificial Intelligence has never looked so advanced as it is now with OpenAI’s ChatGPT. This model uses Reinforcement Learning from Human Feedback, and it can help you with coding, inventing stories, and even telling a joke. Although the software has some limitations, it has been mind-blowing for users over social media as they share some discoveries they are making with this project.
In a blog post, ChatGPT creators explain how it works:
We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses.
To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.
This software is fine-tuned from a model in the GPT-3.5 series. Both were trained on an Azure AI supercomputing infrastructure. For example, one tweet that blew up was how a Senior Data Twitter Engineer could technically deceive Elon Musk by creating a plausible idea of code contribution. What’s interesting is how the AI learns and improves for what the user wants, as you can read it here.
Another compelling use case for ChatGPT is the AI creating a script of Seinfield in which Jerry needs to learn the bubble sort algorithm. And the results are impressive.
ChatGPT can also be a good example of a debugging companion or even creating a business strategy plan. That said, it’s important to understand that the software still has some limitations and sometimes will tell you wrong things.
- ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers;
- ChatGPT is sensitive to tweaks to the input phrasing or attempting the exact prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly;
- The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI;
- Ideally, the model would ask clarifying questions when the user provided an ambiguous query. Instead, these current models usually guess what the user intended;
- It will sometimes respond to harmful instructions or exhibit biased behavior.
While users try to take advantage of AI, it’s important to note that it does not reinforce violence, bullying, or anything that could harm a person. Since everybody is trying to use ChatGPT, it may be at full capacity, but you can take your chance here.