Overview
Large language models calculate costs based on token usage, including both input tokens (what you send) and output tokens (what the model returns). Understanding and optimizing token usage is crucial for managing API costs effectively.
Core Principles
How LLM Pricing Works
- Token-based billing: Cost is calculated per token, not per message
- Input + Output tokens: You pay for both what you send and what you receive
- Stateless nature: LLMs don't remember previous conversations
- Context resending: Every follow-up message includes the entire conversation history
- Compounding costs: Token usage grows exponentially with conversation length
Token Usage Examples
Kod:
Message 1: 1,000 input + 2,000 output = 3,000 tokens
Message 2: 4,000 input (includes Message 1) + 2,000 output = 6,000 tokens
Message 3: 7,000 input (includes Messages 1-2) + 2,000 output = 9,000 tokens
Optimization Rules
Rule 1: Start New Chats for New Tasks
Always start a fresh conversation for unrelated tasks
Use /clear command in Claude Code to wipe conversation history
One chat window per task
Start fresh when switching contexts
Never reuse long chat threads for different tasks
Don't let conversations grow beyond necessary scope
Exceptions:
- When next task builds directly on the previous one
- Working on complex, interconnected features requiring ongoing context
Rule 2: Summarize Long Conversations
When conversations must continue, summarize at 50% context capacity
Use /compact command in Claude Code
Provide custom summary instructions
Focus summaries on relevant information only
Trim unnecessary context and discussions
- "Please only summarize and keep the last message"
- "Focus on technical details, skip general discussion"
- "Summarize only action items and open issues"
- "Format summary as XML/JSON for structured data"
Rule 3: Choose the Right Model
Match model capability to task complexity
Use /model command to switch models manually
Start with powerful models for planning and architecture
Switch to lighter models for implementation and refinement
Don't default to most expensive model for simple tasks
Model Selection Guidelines:
Use Powerful Models (Claude Opus, GPT-4) For:
- Complex reasoning and problem-solving
- System architecture design
- High-level planning and strategy
- Edge case identification
- Critical decision-making
Use Lighter Models (Claude Sonnet, GPT-3.5) For:
- Code refactoring and cleanup
- Documentation writing
- Simple implementations
- Routine coding tasks
- Follow-up questions and clarifications
- Claude Opus: $15/1M input tokens, $75/1M output tokens
- Claude Sonnet: $3/1M input tokens, $15/1M output tokens
Best Practices
Conversation Management
- Monitor context usage: Watch for capacity indicators (aim for <50%)
- Be specific in requests: Avoid vague prompts that generate unnecessary output
- Use focused follow-ups: Ask targeted questions rather than broad requests
- Clean up regularly: Don't let conversations accumulate unnecessary history
Cost-Effective Workflows
- Planning Phase: Use powerful model for architecture and planning
- Implementation Phase: Switch to lighter model for coding
- Review Phase: Use appropriate model based on complexity of review needed
- Documentation Phase: Use lighter model for writing and formatting
Token Waste Prevention
Avoid uploading irrelevant files or large documents
Don't include unnecessary context in prompts
Minimize repetitive back-and-forth on simple topics
Don't let auto-selections choose overpowered models
Be precise and direct in your requests
Structure complex requests clearly
Use appropriate formatting to reduce ambiguity
Advanced Techniques
Custom Model Configuration
Kod:
/model claude-3-5-sonnet-20241022
- Copy exact model names from provider documentation
- Save frequently used models for quick switching
- Create model profiles for different types of work
Context Optimizations
- Preemptive summarization: Summarize before hitting limits
- Strategic context: Only include relevant previous messages
- Structured prompts: Use clear formatting to reduce interpretation overhead
- Batch related questions: Group similar requests to minimize context switching
Performance vs. Cost Balance
- Long-term projects: Invest in planning with powerful models upfront
- Iterative work: Use lighter models for incremental changes
- Critical tasks: Don't compromise on model quality for important decisions
- Routine tasks: Optimize aggressively for cost on repetitive work
Monitoring and Measurement
- Track token usage patterns over time
- Monitor cost per task type
- Identify inefficient conversation patterns
- Measure model performance vs. cost for different task types
- Set budget alerts and usage limits
Quick Reference Commands
Claude Code Commands
/clear- Start new conversation/compact- Summarize and compress conversation/model- Switch between models/model default- Return to default model selection/model claude-3-5-sonnet-20241022- Use specific model
Decision Matrix
| Task Type | Recommended Model | Justification |
|---|---|---|
| Architecture Planning | Powerful (Opus/GPT-4) | Requires deep reasoning |
| Code Implementation | Light (Sonnet/GPT-3.5) | Straightforward execution |
| Bug Fixing | Medium-Light | Depends on complexity |
| Documentation | Light | Mostly formatting and clarity |
| Code Review | Medium | Balance of insight and cost |
| Research/Analysis | Powerful | Requires comprehensive understanding |
Remember: The goal is to build more while spending less. These rules help you optimize for both cost and performance without sacrificing quality.
TR: Daha az Token harcayarak Yapay Zeka kullanımı

