We thought ChatGPT was great for programming: they did a study and half of their answers are wrong

Searching for answers on Stack Overflow or searching on Google isn’t as cool anymore. Many programmers have found ChatGPT a great tool for streamline your work and depend less on the aforementioned platforms. However, OpenAI’s artificial intelligence (AI) chatbot is far from perfect, and relying entirely on it may not be wise.

ChatGPT, like any other large language model (LLM) based tool, has several limitations. The company itself, led by Sam Altman, points out on its website that the chatbot “can make mistakes” and invites you to verify important information. Now, in the world of programming, how well (or poorly) does it do its job? Let’s see what some researchers say.

When more than 50% of the answers are incorrect

A group of researchers from Purdue University presented an investigation this month motivated by the “growing popularity of ChatGPT” and the dynamics of LLMs to “generate invented texts” which are generally difficult to recognize by users who lack certain experience in the subject in question. Many answers, in fact, are plausible, but wrong.

“We found that 52% of ChatGPT responses contain misinformation,” the researchers say. In this sense, they add that 77% of the responses are more detailed than human responses (which does not guarantee their accuracy) and that 78% of these suffer from different degrees of inconsistency. These are figures that really do not go unnoticed.

To obtain these values, the researchers took 517 programming questions from Stack Overflow. They then examined the correctness, coherence, completeness and conciseness of the responses with ChatGPT based on GPT-3.5 and conducted a large-scale linguistic analysis such as a user study to understand ChatGPT responses from different points of view.

The Purdue researchers chose GPT-3.5 instead of GPT-4, the latest version of the language model at the time of the study, which is the most widely used free version. It should be noted that they also did parallel testing with GPT-4 and concluded that while the newer model performs “slightly better,” both have a high inaccuracy rate.

When we talk about ChatGPT we are referring to an AI chatbot that can be used for different tasks. From helping us program even write a letter. In the programming world we also have other AI-powered tools designed specifically for developers, such as GitHub Copilot, which integrates into development environments.

The Google search engine is experiencing the most important revolution in its history. For now it's being a disaster

In any case, we are witnessing firsthand how AI changes the way we work, and in this process we are discovering benefits and defects of the tools we use. For now, ChatGPT seems to be far from being able to surpass human responses in the field of programming. In fact, it is prohibited to post answers with this tool on Stack Overflow.

Images | Saputera Gem | Rivage

In Xataka | Copilot, ChatGPT and GPT-4 have changed the world of programming forever. This is what the programmers think

In Xataka | Elon Musk and xAI want to win the AI war: they just received $6 billion to achieve it

Source link

TagsAI Computer Digital Nasa Phones science Tecnology

We thought ChatGPT was great for programming: they did a study and half of their answers are wrong

When more than 50% of the answers are incorrect

About the author

Redaction TLN

Add Comment

Cancel reply

Recent Posts

Recent Comments

Archives

Categories

When more than 50% of the answers are incorrect

You may also like

About the author

Redaction TLN

Add Comment

Recent Posts

Recent Comments

Archives

Categories