How does NumStack calculate AI token costs?

NumStack uses real published pricing from OpenAI, Anthropic, Google, and other AI providers. Enter your typical usage (input tokens, output tokens, requests per day) to see projected monthly costs across all models.

Why do different AI models have different token prices?

Different models have different computational costs. Smaller models like GPT-4o mini or Haiku are cheaper but less capable. Larger models like GPT-4o or Claude Sonnet are more powerful but cost more per token.

How can I reduce my AI API costs?

Use smaller models for simple tasks, cache repeated prompts, trim output length (output tokens cost 3-5x more than input), and batch requests instead of real-time when latency allows.

AI Cost Optimization Handbook

You're building an AI product. You found a great model. You shipped it to users. Then the bill comes: $5,000. Or $50,000. You weren't prepared. This guide shows you how to forecast AI costs before they surprise you, and how to optimize them as you scale.

Why AI Costs Are Hard to Calculate

When you use traditional cloud services (AWS, Google Cloud), costs are predictable:

Virtual machine? $0.05/hour, always
Database storage? $0.023/GB/month, always
Bandwidth? Consistent pricing, always

AI APIs are different. The problem: Your cost depends on what you do, not just how long you run. With OpenAI's GPT-4, a 10-word prompt costs less than a 1,000-word prompt. A 100-token response costs less than a 10,000-token response. Your monthly bill is the sum of thousands of tiny transactions. Without understanding token economics, you can't budget.

What Are Tokens?

Let's start with the basics.

Tokens are the unit of measurement for language models. Think of them as "chunks of text."

1 token ≈ 4 characters
1 token ≈ 1 word
100 tokens ≈ 75 words

So if you send a prompt with 500 words + get back a response with 200 words:

Input tokens: ~500
Output tokens: ~200
Total tokens: ~700

And you pay for both input and output. Output tokens cost 2-4x more than input tokens (on most providers).

How Pricing Works

Here's where it gets complex. Different providers charge differently:

OpenAI (GPT-4, GPT-4o)

GPT-4o: $5 per 1M input tokens, $15 per 1M output tokens
GPT-4 Turbo: $10 per 1M input tokens, $30 per 1M output tokens
Example: 1,000 requests × 500 input tokens + 200 output tokens = $5 + $3 = $8/month

Anthropic (Claude)

Claude 3.5 Sonnet: $3 per 1M input tokens, $15 per 1M output tokens
Claude 3 Opus: $15 per 1M input tokens, $75 per 1M output tokens
Example: Same usage = $4.50 + $3 = $7.50/month (cheaper than GPT-4o)

Google (Gemini)

Gemini 2.0 Flash: $0.075 per 1M input tokens, $0.30 per 1M output tokens
Gemini 1.5 Pro: $1.25 per 1M input tokens, $5 per 1M output tokens
Example: Same usage = $0.75 + $0.60 = $1.35/month (much cheaper)

The key insight: Model choice can change your costs by 5-10x, even with identical usage patterns.

Hidden Costs You're Not Thinking About

Beyond token pricing, watch for:

1. Context Window Costs

If you use a model with a larger context window, you might send more context (older messages, longer documents), which increases input tokens. Example: ChatGPT with 16K context might include 4K of previous conversation history, while a smaller model forces you to summarize. More context = more tokens = higher bill.

2. Tool Call Overhead

When your AI agent uses tools (API calls, database queries), each tool invocation + result adds tokens. Example: An agent that calls 5 tools per request × 200 tokens/tool = 1,000 extra tokens per request. At scale, this multiplies fast.

3. Retry Costs

If your prompt times out or fails, you retry and pay again. Example: 5% failure rate on 100,000/month requests = 5,000 failed requests you're re-running. That's an extra 5% on your bill.

4. Vector Embeddings (if using RAG)

If you embed documents for search, each embedding costs money. Example: Embedding 10,000 documents at OpenAI's rates (~$0.02 per 1M tokens) = ~$0.20 if documents average 1K tokens each.

5. Function Calling Complexity

When you give an AI model access to many functions/tools, the model needs more context just to understand what's available. More context = higher input token cost.

How to Calculate Your True Monthly Cost

Here's the step-by-step process:

Step 1: Estimate Your Monthly Requests

How many user requests will you get per month?
Be conservative (assume 1/10 of what you think initially)

Example: Building a chat summarization tool. You estimate 10,000 summaries/month.

Step 2: Estimate Input Tokens Per Request

What's the average length of user input?
Do you include context (previous messages, documents)?

Example: Average user input = 300 tokens (about 225 words). You include 100 tokens of context from prior conversation. Total input: 400 tokens.

Step 3: Estimate Output Tokens Per Request

How long is the AI response?
Plan for variation (some requests need 500 tokens, others 1,000)

Example: Average summary = 150 tokens.

Step 4: Calculate Total Tokens/Month

(Input tokens + Output tokens) × Requests/month = (400 + 150) × 10,000 = 5,500,000 tokens/month

Step 5: Apply Pricing

Pick your model's price per token, multiply: (Input tokens × input rate) + (Output tokens × output rate)

Example with OpenAI GPT-4o:

Input: 4,000,000 × $0.000005 = $20
Output: 1,500,000 × $0.000015 = $22.50
Total: $42.50/month

Step 6: Add Hidden Costs

Tool calls: +10% = $4.25
Retries (5%): +$2.12
Revised total: $48.87/month

How to Optimize (Save 50%+)

Once you know your costs, optimize:

1. Right-Size Your Model

Don't use GPT-4o if GPT-4o Mini does the job.

Benchmark: Test multiple models on your use case
Compare: Cost per request vs quality of output
Decide: Usually GPT-4o Mini or Claude 3.5 Haiku are 30-50% cheaper for equivalent quality

Savings: 30-50%

2. Reduce Input Tokens

Compress context (don't send the whole conversation, just relevant parts)
Use smaller summaries instead of raw documents
Implement caching for repeated prompts

Savings: 20-40%

3. Reduce Output Tokens

Ask the model to be concise ("Respond in 2-3 sentences")
Use structured output (JSON templates reduce rambling)
Implement stop sequences (stop after 500 tokens, not 2,000)

Savings: 15-30%

4. Batch Requests

Don't call the API for single requests; batch 100 at a time
Most providers offer batch APIs with 50% discounts

Savings: 50% (if volume justifies it)

5. Use a Cheaper Model for Certain Tasks

Simple tasks (classification, routing): Use a cheap model (Gemini Flash, GPT-4o Mini)
Complex tasks (reasoning, creativity): Use an expensive model (Claude Opus, GPT-4 Turbo)

Savings: 60-80% on high-volume tasks

Real-World Example: Building a Chatbot

Let's say you're building a customer support chatbot. Here's how to forecast costs:

Assumptions:

100 customers
Average 5 support conversations/customer/month = 500 conversations/month
Average conversation: 4 exchanges (customer asks, AI responds, repeat)
Total API calls: 2,000/month

Input Tokens Per Call:

Customer message: 50 tokens (average)
System prompt (instructions): 200 tokens
Previous conversation context: 100 tokens
Total input: 350 tokens

Output Tokens Per Call:

AI response: 150 tokens average

Cost Calculation (OpenAI GPT-4o):

Input: 2,000 × 350 × $0.000005 = $3.50
Output: 2,000 × 150 × $0.000015 = $4.50
Monthly cost: $8

But with optimization:

Switch to GPT-4o Mini (same quality for support): -50%
Compress context (cache previous messages better): -20%
Optimized cost: $3.20/month

That's 60% savings on a small use case. At 10,000 conversations/month, you'd save $32/month ($384/year).

Use This to Your Advantage

Most founders don't understand AI costs. This is your competitive advantage:

1. Forecast accurately — Budget correctly, don't run out of money
2. Optimize relentlessly — Every 10% reduction in tokens saves thousands annually
3. Price your product smartly — Understand your unit economics
4. Scale sustainably — Know when a feature is profitable vs a money loser

Calculate Your Costs Now

Don't guess. Use our AI Cost Calculator to forecast your specific scenario:

→ Go to Token Cost Calculator

Enter your expected:

Monthly requests
Average input tokens
Average output tokens
Your model

Get exact monthly costs across OpenAI, Anthropic, Google, and more. Then start optimizing.

Have questions? Check our FAQ or calculator methodology pages.

AI Cost Optimization Handbook: Never Get Surprised by Your API Bills Again