How does NumStack calculate AI token costs?

NumStack uses real published pricing from OpenAI, Anthropic, Google, and other AI providers. Enter your typical usage (input tokens, output tokens, requests per day) to see projected monthly costs across all models.

Why do different AI models have different token prices?

Different models have different computational costs. Smaller models like GPT-4o mini or Haiku are cheaper but less capable. Larger models like GPT-4o or Claude Sonnet are more powerful but cost more per token.

How can I reduce my AI API costs?

Use smaller models for simple tasks, cache repeated prompts, trim output length (output tokens cost 3-5x more than input), and batch requests instead of real-time when latency allows.

← Back to NumStack

Agent Workflow Cost Calculator

Model the end-to-end cost of your AI pipeline. Add each step, set token counts and daily volumes, and instantly see where your budget is going.

Workflow Steps(3/10)

Step 1

Step Name

Model

OpenAI

Input Tokens

Output Tokens

Runs / Day

times this step runs each day

Step 2

Step Name

Model

Anthropic

Input Tokens

Output Tokens

Runs / Day

times this step runs each day

Step 3

Step Name

Model

Anthropic

Input Tokens

Output Tokens

Runs / Day

times this step runs each day

Why model your full agent pipeline?

Single-call calculators miss the real picture. A production AI agent typically runs 5–10 LLM calls per user request — routing, retrieval, reasoning, validation, and response generation. Each step has different token counts, different models, and different call frequencies.

The Agent Workflow Calculator lets you model every step independently so you can see exactly where your AI budget is going before you deploy — and identify which swap would save the most money.

Common multi-step agent patterns

•RAG pipeline: Intent router → vector search → context expansion → answer generation
•Tool-use agent: Planner → tool selector → executor → summarizer
•Reasoning chain: Draft → critic → revise → final
•Classification + generation: Cheap classifier first, expensive generator only when needed

How to reduce multi-step pipeline costs

•Route cheap tasks to Haiku, Flash, or GPT-4o mini — save 80–95% vs flagship
•Short-circuit early: classify intent first, skip expensive steps for simple queries
•Cache repeated context (system prompts, retrieved docs) across turns
•Cap output tokens per step — unnecessary verbosity is expensive

🔗 Related tools: Token Cost Calculator · Model Selector · Agent Cost Calculator