DeepSeek for dummies: AI trade isn't dying, but could be changing

Eli Rodney
January 28, 2025
Artificial Intelligence

A quick, non-tech primer on LLMs

Looking at LLMs through the lens of financial analysis, here’s what you need to understand.

There are two distinct phases from the cost side of LLM deployment - training and inference.

In training, huge amounts of data is collected, processed, and fed to the model (pre-training) before going through fine tuning and reinforcement learning phases. The training phase is incredibly expensive, but a short CapEx cycle.

When the trained model is actually deployed for people to use, the compute demand shifts to inference. Costs in this phase ramp alongside the real-world use of the model.

As models get bigger & better, training cost goes vertical

GPT-3, the model underneath ChatGPT when it first launched in 2022, is believed to have cost roughly $5M to train. As new LLMs have gotten bigger and better, these costs have increased by a factor of 100 or more.

Mark Zuckerberg’s comments on Meta’s Llama family of models illustrates this perfectly:

The amount of compute needed to train Llama 4 will likely be almost 10x more than what we used to train Llama 3.

We’re training the Llama 4 models on a cluster that is bigger than 100,000 H100s, bigger than anything that I’ve seen reported for what others are doing.

Mark Zuckerberg (META) - Q2/Q3 2024 earnings call

Let’s do some napkin math here.

While not explicitly disclosed, it’s believed that a single H100 GPU costs roughly $30K. Let’s assume Meta has some negotiating power and can bring that down to $20K. That’s >$2B of GPUs alone being used to train Meta’s next series of LLMs. 

The appetite for GPUs doesn’t seem to be slowing down either, with Anthropic’s (another AI lab) CEO suggesting big tech GPU procurement numbers could be in the millions by 2026.

DeepSeek makes industry rethink CapEx

Chinese company DeepSeek launched a series of models recently, including its general purpose v3 LLM and its more advanced R1 reasoning model.

The performance of both models is on par with comparable models from OpenAI:

DeepSeek is on par with, or better than OpenAI in all categories, source: arXiv

That isn’t what has the investment world going crazy though, it’s the cost.

DeepSeek claims the final training run for its v3 model cost less than $6M and took just 55 days to complete, a small fraction of the CapEx figures quoted by big U.S. tech companies.

Its API costs reflect this:

Model Type

Cost Type

DeepSeek

OpenAI

General Purpose

Input

$0.14/M

$2.50/M

General Purpose

Output

$0.28/M

$10.00/M

Reasoning

Input

$0.55/M

$15.00/M

Reasoning

Output

$2.19/M

$60.00/M

But as always, there is important nuance missing from the headlines.

DeepSeek’s v3, while a testament to the efficiency gains still left on the table, isn’t an apples to apples cost comparison.

The multi-billion dollar training budgets from big tech are being deployed to push the edge of what’s possible, while DeepSeek is taking the approach of optimizing what already exists. Both are important, but they’re not comparable.

Mental framework: LLM = commodity

The bottom line is LLMs are a commodity, and efficiency gains on the underlying models unlock previously uneconomic use cases. To cement this in your head, think about oil:

When the price of oil rises, so does the price of value-added products that depend on it, hurting end-market demand.

If technology emerges that enables more efficient extraction of oil (ie. modern fracking techniques), new supply will be unlocked, allowing producers of value-added products to lower prices while maintaining margins. This drives increased demand from end-markets and in turn, increased demand for oil.

AI is undergoing a similar shift right now.

With elevated training costs for new models, the cost of value-added products (software) that leverages AI must also be high, and some products may not be economically feasible to build at all.

In this analogy, DeepSeek’s more efficient training techniques are the equivalent to fracking technology. The cost of value-added products should come down as a result, and entirely new categories of products may now be economically feasible, increasing demand for inference.

We’re already seeing signs that this is true. Don’t take it from us, listen to Satya Nadella explain why they aren’t selling GPUs:

I mean the good news for us is that we're not waiting for that inference to show up, right? If you sort of think about the point we even made that this is going to be the fastest growth to $10 billion of any business in our history, it's all inference, right?

…we are literally going to the real demand, which is in the enterprise space or our own products like GitHub Copilot or M365 Copilot.

Satya Nadella (MSFT) - Q1 2025 earnings call

Who will benefit most from cheaper AI?

As the cost of leveraging LLMs in software falls, capital should flow to the application layer, where companies are using the now cheaper technology to create value.

Here’s some Canadian names we think are well-positioned to take advantage of this shift:

  • Shopify (SHOP): Management has talked about leveraging AI in all facets of the business for a number of quarters, both internally and externally. Given their scale, the potential impact of these initiatives (margins and revenue) are huge.

  • OpenText (OTEX): The company has leaned in heavily to AI development via its Aviator line of products that layer LLM-powered capabilities on top of their client’s enterprise data.

  • Constellation (CSU): As a vertical integrator of software companies with a track record of cost discipline, Constellation should be able to leverage LLMs to strip out costs and enhance its product offering.

Disclaimer: Bullpen Finance Inc. is not a registered investment advisor. The information provided is for educational purposes only and should not be considered investment advice. See our terms of service for more information.