In 2022, the release of a revolutionary technology shocked millions of people around the world. In a development long foreseen by various science fiction books, the release of intelligent AI chatbots gave people a whole new way to approach problems. ChatGPT’s ability to respond instantaneously, accurately, and in a jarringly human fashion cemented its popularity among people all over the internet. Now, over two years after the initial release of ChatGPT, a new, similarly-fashioned, Chinese AI assistant has once again captured the internet’s fascination. Despite ChatGPT’s more powerful AI chips and longer, more costly development, many AI users describe DeepSeek as comparable to today’s highest ranked AI tool. Some have even declared it better than ChatGPT, which would mark DeepSeek as truly revolutionary in the world of AI development. But how do they really hold up against one another? Is DeepSeek really better than ChatGPT?
Mechanically, DeepSeek and ChatGPT are both large language models (LLMs) that are built off of machine learning. This refers to the practice of feeding a program an incredibly large amount of data, to the point where it can identify features of that data by itself. In the same way that showing a baby a million examples of a rectangle would help them identify rectangles, showing an LLM a trillion examples of a sentence helps them build their sentences. Both DeepSeek and ChatGPT contain a massive amount of parameters which help them identify the variables within a prompt, and in turn generate a matching response. In this article, ChatGPT-4o, with one trillion parameters, will face off against DeepSeek-v3, with 671 billion parameters. It is also notable that ChatGPT and DeepSeek have different strategies for generating responses. DeepSeek employs a mixture-of-experts approach, which is essentially referring to only the most qualified experts for each task. On the other hand, ChatGPT uses a more traditional transformer model, where the idea is to represent every expert work on every task. This makes ChatGPT more consistent, albeit at the cost of some of its efficiency.
I settled on three subjective tests that judge each AI in a variety of fields. These tests were judged by only the most reliable of judges—my friends—and the identities of the AI were not divulged until after the grading had been completed. Here are the results.
Category 1: Writing
I asked both programs to respond to the following prompt: Write me a complete short story (~1,000 words) that is scary and ends in a huge twist.
In my opinion, this was the task that both programs performed the best on. Starting with ChatGPT, the AI decided that a strange reflection would make the perfect basis for its story. The plot centered around a woman whose reflection in the mirror didn’t match the reality she was in. As the plot progresses, the discrepancies increase in magnitude until her reflection goes missing at the end of the story. DeepSeek settled on a more traditional story: a haunted house (very original) emanating whispers from its walls. The main character, a news reporter, decides to venture into the house to try and land a good story, but what she finds instead is an animated doll that chases her out. Finally, in classic horror story fashion, the protagonist reinvestigates the scene of the mystery and ends up a victim herself, consumed by the walls of the house and joining the voices that guided her there in the first place. The judges found DeepSeek’s story to be more fleshed out and scary overall; compared to ChatGPT it ended up with an overall higher score of 7.4 out of 10. Judges mostly found ChatGPT’s story—rated 6.5 out of 10—to be confusing and lacking a clear ending, with one judge saying, “I think Emily (the protagonist in ChatGPT’s story) is [crazy].” Perhaps ChatGPT’s larger parameters contributed to a more scrambled story. The judges also enjoyed the twist a lot more in DeepSeek’s story, and thus felt that it answered the prompt a lot better. With that, the first point goes to DeepSeek.
Category 2: Cooking
I next asked the following: Generate a short recipe with detailed ingredients and steps to make a complete pasta dish.
Interestingly, both of the AI decided to go with some variant of a creamy garlic parmesan pasta, so I guess those searches must be quite popular online. The recipes were strikingly similar, but DeepSeek added slightly more variety by incorporating spinach whereas ChatGPT didn’t. After getting ingredients and collecting four panelists, we started by making ChatGPT’s dish. Judges found that the pasta was palatable, although not anything close to what you might find at a restaurant. They particularly thought that the recipe was too cheesy, to the point where the cheese overpowered the rest of the elements. The judges also thought that the texture was chewier than they’d hoped, and the flavors weren’t homogenous throughout the dish. Although ChatGPT’s recipe had a nice simplicity to it, it was pretty bland overall. On the contrary, the judges found DeepSeek’s recipe to be a lot more balanced in terms of its flavors and textures. It wasn’t too cheesy and it had more creaminess compared to ChatGPT. DeepSeek’s dish was also described as better seasoned and more consistent in terms of flavor. The conclusion was that DeepSeek’s dish was better, with a rating of 7.25 out of 10, in comparison to ChatGPT’s score of 6.25 out of 10. With this, the second point also goes to DeepSeek. (Writer’s note: Our human attempted to make his own pasta dish that he claimed would easily beat both of the AI creations. His dish was unanimously voted as the worst of the three without the identity of the chef even being revealed. AI takeover?)
Category 3: Music
The next prompt was: Write me a creative music piece that captures the atmosphere of winter.
This was the task that both AI struggled the most with, which makes sense; with fewer sources, music is much harder to adopt directly from the internet compared to writing. The overall consensus from the judges was that both of the AI-generated pieces were horrendous at making tunes—but which one was worse? ChatGPT decided to adopt an incredibly basic, repeating pattern, moving stepwise in alternating directions. Most judges failed to see the resemblance to winter due to its usage of only one note value, that being the eighth note. However bad ChatGPT was at creating music though, DeepSeek did unimaginably worse. DeepSeek’s music was voted unanimously as dreadful, consisting of almost entirely stepwise motion. The usage of a slower tempo made the piece feel even more boring and “copy-pasted,” and judges thought that the rests at the end felt incredibly out of place. The scores came out to 2.2 and 4.4 out of ten in ChatGPT’s favor; although, based on this competition, it can be concluded that AI platforms in general do not make great musicians.
So yes, DeepSeek is better than ChatGPT, at least for the specific criteria I tested. It’s for sure better than me at writing. ChatGPT has taken a massive plummet and fallen off into obscurity, never to see the light of day again. Or has it? After extensively using both of these AI (with only pure intentions), here are my official conclusions: DeepSeek is better than ChatGPT at executing tasks, but ChatGPT was a lot more receptive to feedback when I asked it to alter its responses. At times, DeepSeek would just stop responding, which grew very annoying. Its responses also came with certain restrictions due to a much stronger self-censor. What does this all mean though? Maybe the next time you go cooking, and for whatever reason want to use an AI recipe instead of finding one off of the internet, you might decide to use DeepSeek over ChatGPT. In the end though, this article was written mainly for fun, so use whichever AI you want—with pure intentions only of course—whether that be DeepSeek, ChatGPT, or some crazy new AI that will inevitably pop up after this article is published.