Everyone can aid in the improvement of GPT-4 with OpenAI Evals

To help improve GPT-4 by reporting shortcomings, the company has open-sourced OpenAI Evals, announced in the blog post.

Mar 16, 2023

Apps

Along with the Gpt-4 release, the company has also open-sourced OpenAI Evals to help improve the LLM. Users will be able to report shortcomings which will help drive further improvements.

OpenAI revealed its latest language model a couple of days ago, and it is currently the hottest topic on the internet. However, the company didn't only release GPT-4 but also open-sourced its software framework, OpenAI Evals. This move will speed up the solution of possible issues that might come to light after certain benchmarks and evaluations.

we are open-sourcing OpenAI Evals, our framework for automated evaluation of AI model performance, to allow anyone to help improve our models.

— Sam Altman (@sama) March 14, 2023

The company uses Evals to guide the development of its LLMs in identifying shortcomings and preventing regressions. Now that it is an open-source software framework, users can apply it to track performance across model versions and product integrations. As an example, the blog post said that Stripe had used it to complement their human evaluations to measure the accuracy of their GPT-powered documentation tool.

OpenAI Evals will be helpful for further improvements

"Because the code is all open-source, Evals supports writing new classes to implement custom evaluation logic. In our own experience, however, many benchmarks follow one of a few "templates," so we have also included the templates that have been most useful internally (including a template for "model-graded evals"—we've found that GPT-4 is surprisingly capable of checking its own work). Generally, the most effective way to build a new eval will be to instantiate one of these templates along with providing data. We're excited to see what others can build with these templates and with Evals more generally," says the post.

The company has also invited everyone to use OpenAI Evals to test its models. This will be beneficial for both sides as OpenAI will improve its product while developers and other customers will have a better experience with better features.

Unfortunately, OpenAI will not be giving any fees to contributors. However, the company plans to grant GPT-4 access to those who contribute "high-quality" benchmarks. Check the official GitHub page if you want to contribute to OpenAI Evals.

ChatGPT's recent fame and success might shape the future. Microsoft has already invested heavily in OpenAI, and other tech giants are also trying to stay in the race. Google might launch Bard at this year's I/O event. Besides, Apple reportedly briefed its employees last month that engineers have been working on a large language model and other AI tools.

On the other hand, OpenAI Evals might help the company solve issues faster and have the upper hand against other companies in the AI race.