“GPT-4 to detect GPT-4 errors”, in these words has presented OpenAI President Greg Brockman on the AI company’s latest proposal for improve your flagship model in the field of programming. We’re talking about CriticGPT, a GPT-4-based model specifically designed to detect errors in ChatGPT code output.
The Microsoft-backed firm says that CriticGPT has proven to be very effective in helping people detect errors in the famous chatbot’s responses. In internal tests, they explain, the results of people who received the help of CriticGPT surpassed those who did the work alone by 60%. Now, this model is ready to move to the next stage.
A new tool for reinforcement learning
In the training tasks of models such as GPT-4, what is known as human feedback reinforcement learning (HFRL) comes into play. It is a machine learning technique that, broadly speaking, uses human created responsesthe so-called AI trainers, to improve the accuracy of the model for certain tasks.
OpenAI will begin deploying CriticGPT-like models to its trainers to help them detect the increasingly subtle bugs that GPT-4 typically reproduces via ChatGPT. “This is a step toward being able to evaluate the results of advanced AI systems that can be difficult for people to qualify without better tools,” the company said on its blog.
But how does CriticGPT work? As we can see in the image above, the model writes “criticisms” to ChatGPT responses. These critiques are not always correct, but they can help human trainers to make problems visible that might have gone unnoticed. OpenAI describes this mechanic as an “assistance” to the RLHF process.
CriticGPT, being based on GPT-4, also went through the process of reinforcement learning from human feedback. As curious as it may seem, in light of the tests, it seems to be a good idea for ChatGPT based on GPT-4 to improve in programming tasks, a field where some studies have highlighted the significant percentage of incorrect answers of the model.
The company is also trying to improve the safety of its models after the dissolution of its “superalignment” team. To do this, it has a committee that is led by Sam Altman. One of the missions of this committee is to present recommendations to the board of directors chaired by Greg Brockman, but of the company whose CEO is Sam Altman.
Images | OpenAI | Milad Fakurian | Village Global
At Xataka | YouTube sees a future in which AI will clone today’s music. Convincing record labels is not going to be easy
Add Comment