We took ChatGPT, Google’s AI called Bard and the AI that most experts regard as the true ChatGPT challenges called Claude.ai. We gave them a simple and practical task involving research, evaluation and writing up their findings. We were surprised by the results.
Practical AI – Has Chat GPT gotten lazy – can Claude.ai or Google’s Bard do better? Is free better than paid?
This is part of a series of use cases AI for businesses to help understand how to use current AI models in real world examples.
In the world of AI language models, there’s a growing demand for practical applications that can assist with everyday tasks. One such task is conducting internet research and organizing the gathered information effectively. In this experiment, we explore how three AI models—Chat GPT, Claude.ai, and Google Bard—fare in handling this task.
For those who want to use AI to do research, I’ve done pretty exhaustive use case to help get you started. It’s long, but I wanted to give you all the details including a rather shocking final point for those who read to the end (or just scroll down )
For those who just want the bottom line, here it is:
Top findings:
– ChatGPT has indeed gotten “lazy” even in the paid version
– Splitting the task into smaller components can get ChatGPT to do more detailed work but can be time consuming and you can use up your allotted processing time – even on the paid version
– The free version of ChatGPT was faster but limited in what it could do
– Claude.ai vastly outperformed ChatGPT but missed some details, or used different sources than ChatGPT.
– When Claude.ai was told of its error, it went back to redo the research and filled in almost all of the missing details
– Google’s Bard could not perform this task. This is hugely surprising given that Google should excel at internet search. But we’ve seen this many times before in our testing. Despite what Google keeps announcing, we have yet to find a task that Bard can do well.
And for those who want to read this post to the end, there is an absolutley shocking revelation about Google’s Bard.
The full use case:
I’m always looking for practical ways to use generative AI for common business-related tasks. I set out with a simple research task:
· take a list of items where I have some initial research
· organize my current notes
· research the internet and update what I have already collected
· put it into a format that I can use
· give me a suggestion of books I might read next
For a simple use case, I chose a list of books I’ve read in the past months. I read a lot and keeping track of the authors, their characters, and the books I’ve already read makes it easier for me to choose new books and make sure it’s not something I’ve already read.
It’s simple, but it’s also the basis of what Kindle is doing when it suggests new books for me in a series I’ve already read. I can also imagine all kinds of uses for this in any business. Keeping track of what people have already bought and using that to make viable suggestions as to what they might need or want is a great use of a large language model.
The results:
I gave the list of books that I’d read with some notes I’d made. My notes were consisted of lists of authors, some of the character names and lists of what I’d read. What I’d read already was not fully up to date. Sometimes I’d have the titles. Sometimes just simple notes that I’d read a series up to a certain book or date. Here’s what I asked the AI model to do:
Step 1 – Let’s clean up the list we have
Take the information I had and organize it by author, by character and by books that I’d read.
Step 2 – Clean up the data and research what is missing
Go through my notes and fill in what is missing. List the books that I’ve already read in order in each series. I gave an example of where data was missing. For some authors I’d just listed books 1 to 8 with no other information.
Step 3 – Organize the list into a table that I could download
Step 4 – Based on what I’d read, suggest what other books I might want to read from my Kindle Unlimited subscription.
Step 5 – Take my notes from this testing and draft report I can publish
Here are the results:
1. ChatGPT – Has it gotten lazy?
It the list and organized it with no problem. But at first it refused to do anything . This is a paid version of ChatGPT. Here’s what it said to me:
I’ve heard lots of comments that ChatGPT had gotten “lazy.” In fact, I’d experienced it myself. Sometimes it just gives a very short answer, where once it would have given a much clearer explanation.
The remedy for this is to break the task into smaller components and have ChatGPT execute them one by one. This is also a way that we have found to get a more accurate and reliable answer from ChatGPT. But normally you can give it the full process step by step in one prompt and let it work through them.
In this case, I asked it to proceed author by author. But it did one author at a time and waited for me to tell it to proceed. This was not only time consuming, but I was worried that it would forget what it had done to date. It’s a noted issue that ChatGPT will often “remember” the start and end of a session but forget things in the middle. So I had to keep getting it to refresh the original table.
This not only took more than hour to get halfway through a list of 35 authors. I had to stop half-way through the project, when ChatGPT announced that I had exhausted the amount of processing it could do for me at this time, and I’d have to start again in two or three hours. I will repeat, this is a paid version.
It allowed me to continue with the free version (3.5) and surprisingly, the earlier version was a lot faster, but it couldn’t go out to the internet so it couldn’t find information on the authors I was reading – presumably this wasn’t in its training dataset. It would be interesting to see how it might do on a research task where it had been trained.
It did however, suggest a good list of books that I could read next.
2. Claude.ai
I did the same test with the same prompts on the free version of Claude.ai. If you are in Canada you’ll have to use a VPN as Claude.ai is not available in all countries. It also checks your phone number to see if you are in the US. But if you use a VPN and log in with a Google account you can get by this.
Claude.ai not only took the full task, but it completed it in minutes. It searched the internet, updated the lists and gave me a full report.
But – it missed some items. When I compared it with the list from ChatGPT, I discovered that it was missing a few items. This could be inaccuracy but it could also just reflect that Claude.ai picked different sources.
I told Claude.ai that it missed a few items and asked it to go back and check its answers. Again, this is another classic way to ensure accuracy – ask the model to be sure it hasn’t missed anything. It apologized and went back to check and found most of what it had missed. I spotted one book title that it didn’t get on this second round, but it was the most recent of the series so it is possible it found another list that was not fully up to date.
While Claude.ai did all this at lightning speed, it did say that I had only 3 prompts left when it had done this task. Doing this research was obviously resource intensive – but remember, I’m using the free version of Claude.ai so some restrictions are to be expected.
It also suggested a good list of books for me to read
3. Google Bard – a huge disappointment
If ChatGPT has gotten lazy, Bard has been a continuing disappointment. It claims it “can’t search the internet.” This is strange, given that it’s put out by a company that has more experience in searching the internet than any other in the world.
Google continues to share videos of how intelligent its AI is going to be, but so far they have produced little in the way of results except well edited demos.
Where you can get ChatGPT to be less lazy, Bard will not budge. It says, “search it yourself.
Even when asked to provide suggestions on what to read next, it also pushed more work back to the reader. It wanted me to tell it what it was about the books that I enjoyed before it would try to answer
Conclusion (and a something shocking):
In any research these models are only as reliable as their source data. It didn’t fit with this experiment because I started out with a list I’d researched myself, but I highly recommend that anytime you use a large language model you should get it to identify its sources. That’s the only way to figure out if it’s looking at reliable sources of data.
ChatGPT did some amazing research in its early days, but it’s clearly become “lazier” even for paid users. OpenAI denies doing anything to throttle it back, but so far it can have issues. Some of these issues can be overcome by prompting.
When well prompted, ChatGPT is still most accurate and writes exceptionally well. Claude.ai is far less “lazy” and takes a lot less time to process than ChatGPT. It has some minor inaccuracies but careful prompting can reduce or eliminate them
Google’s Bard failed at all of the tasks we gave it including the final task of writing a simpler blog post sumarizing this experiment. All three models did a good job of writing a summary of this lengthy experiment. Bard did an exceptional job in its summary intially, except for one point until it lied about about it’s results.
Here’s exactly what Bard said. Bear in mind that it simple did not do any of the tasks it was asked and the notes it was given to summarize made that clear. But here’s exactly what it said:
Even if Bard had given any results on its research, I’d be very careful in using them. Whether this was a simple “hallucination” or something in its training that makes it report itself as superior no matter what the results, it’s troubing to say the least.