AI Prices Are Falling Fast. Here's What SMBs Should Do Now.
DeepSeek price cuts and cache-hit pricing mean SMBs can run AI tools daily for a fraction of last year's cost. Here's what changed and how to act.
AI API costs have dropped dramatically in the past 12 months, and two developments explain most of it: DeepSeek's aggressive pricing on V3-Pro and the spread of cache-hit pricing across major providers. Together, these make daily AI usage genuinely affordable for small businesses. A task that cost $50/month in API fees in early 2024 can now run for under $5 with the right model and caching setup. That changes the math on what's worth building.
Why are AI prices dropping so fast right now?
AI pricing has fallen faster in the past year than almost anyone predicted. If you priced out an API-based workflow in early 2024 and shelved it because the costs didn't make sense, it's time to look again. The economics have shifted enough that tools which were marginal bets are now obvious ones.
Two specific developments are driving this, and understanding them helps you make better decisions about where to deploy AI in your business.
What did DeepSeek actually change about AI costs?
DeepSeek's release of its V3 model family forced a repricing moment across the industry. Their V3-Pro model delivers performance that benchmarks competitively with much more expensive models, at a fraction of the cost. According to reporting from Berea Online, DeepSeek's pricing on V3-Pro undercuts comparable Western models significantly, and the competitive pressure pushed providers like OpenAI, Anthropic, and Google to respond with their own cuts.
This isn't a niche development. When the cheapest credible option drops, everyone else adjusts or loses customers. That's what happened here, and SMBs are direct beneficiaries.
The competitive pressure DeepSeek created did more to lower AI costs for small businesses than any single product launch from a US provider.
For operators who don't want to route data through DeepSeek's infrastructure for policy or compliance reasons, the practical effect is still real: the price war they triggered brought costs down across the board, including on OpenAI and Anthropic APIs.
What is cache-hit pricing and why does it matter for repetitive business work?
Cache-hit pricing is the second big development, and it's underappreciated. Here's how it works: when you send a prompt to an AI API, the provider processes your input tokens to generate a response. If you send a very similar prompt again (same system instructions, same context), the provider can reuse the cached computation instead of starting from scratch. Cached tokens cost significantly less, often 50–90% less than standard input pricing.
For most business workflows, this is a big deal. Think about what repetitive AI tasks actually look like in practice:
- Drafting responses to customer emails using a consistent system prompt
- Running the same data extraction logic across new invoices each week
- Summarizing support tickets using a template you've already written
- Generating product descriptions from a fixed format
All of these tasks reuse large chunks of the same prompt context every single run. With cache-hit pricing, you pay full price once and a fraction of that for every repeat call. If you're running a workflow hundreds of times a month, the savings compound fast.
OpenAI introduced prompt caching for GPT-4o in late 2024, and Anthropic has offered it on Claude models as well. The feature exists. Most SMBs just haven't structured their prompts to take advantage of it.
How much cheaper has AI actually gotten in real numbers?
The price compression over the last 12–18 months is striking when you look at specific models. GPT-4-level capability that cost roughly $30 per million output tokens in early 2023 is now available for under $2 per million on comparable models, according to publicly available API pricing pages from OpenAI and Anthropic.
For context on what that means in practice: a workflow that processes 500 customer emails per month, averaging 800 tokens of input and 300 tokens of output per email, would have cost roughly $50–$60/month at 2023 pricing. At current rates, with caching on the system prompt, that same workflow runs under $5/month.
That's not a rounding error. That's a business case that didn't exist before.
Which AI tasks are now worth doing daily that weren't before?
The cost drop doesn't just make existing use cases cheaper. It makes new categories of use cases viable. When you're paying fractions of a cent per run, you can justify running AI on every transaction, every ticket, every lead, not just batching it weekly.
Here are the task categories where daily AI use now makes clear financial sense for most SMBs:
| Task Category | Example | Why It Works Now | |---|---|---| | Customer communication | Draft reply suggestions for support inbox | High prompt reuse = cache savings | | Document processing | Extract fields from invoices, contracts | Repetitive structure = cheap per-doc | | Lead qualification | Score and summarize inbound leads | High volume + low token count | | Internal reporting | Weekly summaries from CRM or ops data | Template-heavy = cacheable | | Content operations | First-draft generation from a brief | Consistent format = reusable prompt |
None of these are new ideas. What's new is that the cost-per-run no longer requires you to batch, throttle, or justify each call individually.
What should you actually watch out for when costs drop?
Cheaper isn't automatically better. A few things to keep in mind:
Model selection still matters. Cheaper models sometimes hallucinate more or follow instructions less reliably. The right move is to test the specific task you're automating on the model you're considering, not assume that cheap equals good enough.
Data routing is a real consideration. Some DeepSeek models route data through infrastructure subject to Chinese data law. If your workflows touch customer PII, contracts, or anything sensitive, verify where your data goes before optimizing for the lowest price.
Volume can create its own costs. Running AI on every event sounds great until you've got a runaway loop calling the API 10,000 times because of a bug. Set rate limits and cost alerts in your API dashboard before you automate anything at scale.
What we'd actually do
- Audit one repetitive workflow you're already doing manually and price it out. Pick something with consistent structure (email replies, invoice processing, lead notes) and calculate what it would cost to run via API at current rates with caching. Most SMBs are surprised how low the number is.
- Structure your prompts for caching. Put your static system instructions and context at the top of every prompt, before the variable input. This is the single easiest change that cuts API costs for repetitive tasks, often by 50% or more.
- Don't optimize for price before you've validated the output. Run your chosen model on 20–30 real examples from your business before you commit to a provider or build automation around it. Cheap and wrong is worse than slightly more expensive and reliable.
If you want to work through this with other SMB operators who are building real workflows, not just reading about them, that's exactly what we do at skool.com/aiforbusiness.
FAQ
How much have AI API prices actually dropped in the last year?
GPT-4-level capability that cost around $30 per million output tokens in early 2023 is now available for under $2 on comparable models. A workflow processing 500 emails per month that cost $50–$60 at 2023 rates can now run for under $5 with prompt caching applied correctly.
Is DeepSeek safe to use for business data?
It depends on your data. Some DeepSeek models route data through infrastructure subject to Chinese law. For workflows involving customer PII, contracts, or anything sensitive, verify data routing before using DeepSeek. For many SMBs, the smarter play is using DeepSeek-driven competition to negotiate better terms with US-based providers.
What is prompt caching and how do I use it?
Prompt caching lets providers reuse computation from repeated prompt prefixes, cutting your cost on those tokens by 50–90%. To use it, put all static context (system instructions, templates, background info) at the top of your prompt before the variable input. OpenAI and Anthropic both support this on their main models.
Want this running in your business?
The Skool community is where we show the full builds, share the templates, and help you implement. Three tiers, from team training to fractional AI expert.
- Weekly Q&A with Alex and Cameron
- Templates and frameworks you can steal
- Real builds, running in real businesses
More on AI Strategy
The Chamber's AI Hiring Guide: What SMBs Should Actually Use
The U.S. Chamber of Commerce released an AI hiring tools guide for small businesses. Here's what it recommends, what's missing, and what's worth acting on.
ChatGPT and Claude Prices Are Dropping. Now What?
OpenAI and Anthropic are both eyeing significant price cuts on AI tools. Here's what that means for the AI budget you're managing right now.
Microsoft Bundles Copilot Into M365 Plans: What SMBs Need to Know
Starting July 1, Microsoft embeds Copilot into Business Standard and Premium plans. Here's what changes on your bill and inside your daily workflows.