What is ChatGPT's IQ?'
What is ChatGPT's IQ?'
Would you hire ChatGPT?
Before you use ChatGPT as a copywriter, personal assistant, translator, strategy consultant, or marketing analyst, shouldn’t ChatGPT go through an interview? If you rely solely on the massive buzz to try and play with ChatGPT you won’t get a fair assessment of its capabilities.
Different companies have different hiring processes and look for various skills and behaviors. IQ is one measure of intelligence with a good reproducibility level. But the reproducibility is an issue because ChatGPT gives a different answer on every prompt. The second issue is that IQ tests are for humans, adjusted to the median intelligence of the general population.
As many professionals use the chatbot from openai to perform what was exclusively done by humans, it is fair to apply human criteria to evaluate its capacities.
Evaluating its IQ
Let’s try several approaches to evaluate the intelligence quotient of ChatGPT.
The first approach is to ask itself if it knows its own IQ. As one of its flaws is to answer every question, it may hold the answer. Prompt: What is your IQ?
I am an AI language model developed by OpenAI, and I do not have intelligence or IQ as a human being would. My abilities are limited to generating text based on the input I receive, and I can provide information, answer questions, and generate responses to various types of requests.
We can analyze this answer in different ways:
- it is very humble and does not want to claim intelligence,
- the company’s lawyers have asked to put in place many filters to avoid lawsuits,
- it is indeed not intelligent at all,
The problem is that the answer is pretty articulate and is not a canned answer. The prompt was very short, did not give any context, and did not specify the meaning of IQ. Yet it puts the answer correctly in context and interprets IQ as Intelligence Quotient. There are markers of an intelligent answer.
Let’s ignore the answer and pursue our search for the perceived intelligence level, as it refuses to answer the question.
One test gave it 83. Cited by several references, Sergei Ivanov , a researcher at Amazon Web Services, conducted the test. That was enough for every Tech journalist to copy the information referring to M. Ivanov’s original tweet .
First of all, 83 is low. It is outside the 70% percentile, estimated to be 85 to 115 . It means 83 is within the 15% of least intelligent human beings. On a side note, a novel by Arthur Herzog , is titled IQ 83 , which the New York Times reviewed under the comment: A world lapsing into imbecility…true horror!
Secondly, we don’t know how Sergei Ivanov performed the test. According to his tweet, he used the website IQTest, but this is not enough to make it a reproducible experiment.
Notably, this is probably the first time a machine-learning product can pass a language-based IQ test. In most situations, it could pass the Turing Test if it were not for its disconcerting habit of reminding us every other answer that it is an ‘AI language model developed by OpenAI.’
In terms of protocol, we used iqtest.com . Their default test is the same every time, using 38 questions. We started by asking the chatbot: “Please answer every question true or false.” To which it replied: “Sure, I’ll do my best! Please go ahead and ask your questions.” We then copied each question, one by one, and used ChatGPT’s answers to choose true or false.
The first interesting behavior is that the bot complied for the first 30 questions and only replied with true or false. Starting with the 31st question, it made sentences, elaborating on the logic behind its answer.
On the first run, the result was: 78. That places the chatbot in the lower 7.1% of humanity. We used the Jan 30  version.
We compared it why a random set of answers (alternating true and false). As timing is a part of the scoring, we waited for 11:07, which was the time it took us to feed ChatGPT’s answers into the IQ test. That gave us an IQ of 104. That may be a side effect test’s design, but it also provides a comparison point. ChatGPT is worst than alternating between true and false answers. That tells us something about how its design and its emergent behavior.
What does it mean?
Should we discard the technology because it does not have natural intelligence? We’ll then be at risk of losing on opportunities. Applying IQ to ChatGPT is difficult because its intelligence emerged from pure language skills. As explained in its own response, it is only a language model.
Human perception of intelligence
We must go back to how humans evaluate intelligence to understand why it has generated so much buzz in the last few weeks. In our day-to-day life, we make a judgment on people’s cognitive abilities through the usage of language. It is why Alan Turing design his test through conversation. It is why there is such a fascination for chatbots. And this is why you are so happy whenever it makes a blatant mistake that no human would make.
Tools and their usage
Before using any tool, it is crucial to understand how it works and what are the good use cases. We are still discovering the usage of language models. This vast experiment of millions of users interacting with a language model will teach much about how to use it and how not to use it. The biggest flaw emerging and generating much criticism is that ChatGPT always answers, even when the answer is obviously (to a human) wrong. The justification from the creator, via its bot itself, is that it was designed to be correct. The problem is that most users ignore that when using the chatbot.
The depth of the issue is enormous and should lead users to be extra cautious. There are countless examples of the bot being wrong at math. But one of the most letting examples is in chess: ChatGPT vs stockfish . Stockfish is a high-ranked open-source chess bot. The main issue in this chess game is that ChatGPT has no chess knowledge but plays anyway and breaks the rules at every other move.
I have seen this behavior before in very young children. They want to play, be part of whatever adults are doing, and don’t understand the limitation of their abilities. It is what ChatGPT is: a 3-year-old with extraordinary language abilities. Humans are getting fooled by its language ability, ignoring that this digital entity has no knowledge, no logic, and absolutely no self-awareness. Using it for topics that you have no expertise about is extremely dangerous.
This technological advancement is an outstanding achievement. As a general-purpose, multi-lingual language model, it is unlike anything we have seen before. The issue is in what users see in the answers that is not there. Openai created the model to please humans, reply to every question, using perfect grammar and an extensive vocabulary. It works well in that regard.
How to use it?
What are the use cases for a powerful language model?
A language model can generate large quantities of textual content, but it will be average. It will be average in that it will capture general consensus and ideas spread across its entire training set. It will not generate any new ideas. These can only come from the prompt, i.e. the human using the tool.
The obvious use case for ChatGPT is, chatting. That is no surprise because this is what everybody has been doing for the last few months. But for the conversation to be helpful, it must have a goal or a purpose. I hope this technology won’t lure users into endless, aimless, useless discussions on social media.
More than the language model is needed for these conversations to have an aim. It must be connected to services to provide help, answers, or information. It is the perfect use case for the future of call centers in chat mode. But it is also a good use case for automated phone services with voice recognition and generation technologies.
Language models can efficiently summarize large amounts of text or expand on short texts. It differs from the first use case of generation from a simple prompt. In this use case, the idea can be novel, and its meaning can be profound and innovative. The model is used for its primary function, language manipulation.
How not to use it?
It is not a search engine, as it does not have access to the internet! The technology will integrate into search engines but cannot search as it is.
There is no logic or algorithm to decide, recommend, or design a strategy. It generates its answers based on the average of all the content fed into training the AI model.
The model may generate ideas by merging elements from the prompt in a new way. One could argue that it is creativity: two existing ideas coming together for the first time. All the ideas I got from the bot were those I input in my prompt.
Quote, sources, verification
The model has no access to the internet. It cannot provide sources for the information it gives. It makes up wrong quotes, false pieces of information, and invented facts. It is guaranteed to change its assertion when asked, “Are you sure?”
It is a fantastic piece of technology. The considerable publicity and usage will speed up the answer to the question, “what can we use it for.” It pushes the frontier of what is possible in human-to-machine interaction. It has the potential to Empower Knowledge Workers in a way we have not seen before.
As with any technology, since the invention of fire, its usage requires good practice, safeguards, and training.
We are Here to Help
At System in Motion, we are committed to building long-term solutions and solid foundations for your Information System. We can help you optimize your Information System, generating value for your business. Contact us for any inquiry.