LLM Token Optimization Rules

'drose · 19 Eki 2025

Overview

Large language models calculate costs based on token usage, including both input tokens (what you send) and output tokens (what the model returns). Understanding and optimizing token usage is crucial for managing API costs effectively.

Core Principles

How LLM Pricing Works

Token-based billing: Cost is calculated per token, not per message
Input + Output tokens: You pay for both what you send and what you receive
Stateless nature: LLMs don't remember previous conversations
Context resending: Every follow-up message includes the entire conversation history
Compounding costs: Token usage grows exponentially with conversation length

Token Usage Examples

Kod:

Message 1: 1,000 input + 2,000 output = 3,000 tokens
Message 2: 4,000 input (includes Message 1) + 2,000 output = 6,000 tokens
Message 3: 7,000 input (includes Messages 1-2) + 2,000 output = 9,000 tokens

Optimization Rules

Rule 1: Start New Chats for New Tasks

Always start a fresh conversation for unrelated tasks

Use /clear command in Claude Code to wipe conversation history
One chat window per task
Start fresh when switching contexts
Never reuse long chat threads for different tasks
Don't let conversations grow beyond necessary scope

Exceptions:

When next task builds directly on the previous one
Working on complex, interconnected features requiring ongoing context

Rule 2: Summarize Long Conversations

When conversations must continue, summarize at 50% context capacity

Use /compact command in Claude Code
Provide custom summary instructions
Focus summaries on relevant information only
Trim unnecessary context and discussions

Example Summary Instructions:

"Please only summarize and keep the last message"
"Focus on technical details, skip general discussion"
"Summarize only action items and open issues"
"Format summary as XML/JSON for structured data"

Rule 3: Choose the Right Model

Match model capability to task complexity

Use /model command to switch models manually
Start with powerful models for planning and architecture
Switch to lighter models for implementation and refinement
Don't default to most expensive model for simple tasks

Model Selection Guidelines:

Use Powerful Models (Claude Opus, GPT-4) For:

Complex reasoning and problem-solving
System architecture design
High-level planning and strategy
Edge case identification
Critical decision-making

Use Lighter Models (Claude Sonnet, GPT-3.5) For:

Code refactoring and cleanup
Documentation writing
Simple implementations
Routine coding tasks
Follow-up questions and clarifications

Cost Comparison Example:

Claude Opus: $15/1M input tokens, $75/1M output tokens
Claude Sonnet: $3/1M input tokens, $15/1M output tokens

Best Practices

Conversation Management

Monitor context usage: Watch for capacity indicators (aim for <50%)
Be specific in requests: Avoid vague prompts that generate unnecessary output
Use focused follow-ups: Ask targeted questions rather than broad requests
Clean up regularly: Don't let conversations accumulate unnecessary history

Cost-Effective Workflows

Planning Phase: Use powerful model for architecture and planning
Implementation Phase: Switch to lighter model for coding
Review Phase: Use appropriate model based on complexity of review needed
Documentation Phase: Use lighter model for writing and formatting

Token Waste Prevention

Avoid uploading irrelevant files or large documents
Don't include unnecessary context in prompts
Minimize repetitive back-and-forth on simple topics
Don't let auto-selections choose overpowered models
Be precise and direct in your requests
Structure complex requests clearly
Use appropriate formatting to reduce ambiguity

Advanced Techniques

Custom Model Configuration

Kod:

/model claude-3-5-sonnet-20241022

Copy exact model names from provider documentation
Save frequently used models for quick switching
Create model profiles for different types of work

Context Optimizations

Preemptive summarization: Summarize before hitting limits
Strategic context: Only include relevant previous messages
Structured prompts: Use clear formatting to reduce interpretation overhead
Batch related questions: Group similar requests to minimize context switching

Performance vs. Cost Balance

Long-term projects: Invest in planning with powerful models upfront
Iterative work: Use lighter models for incremental changes
Critical tasks: Don't compromise on model quality for important decisions
Routine tasks: Optimize aggressively for cost on repetitive work

Monitoring and Measurement

Track token usage patterns over time
Monitor cost per task type
Identify inefficient conversation patterns
Measure model performance vs. cost for different task types
Set budget alerts and usage limits

Quick Reference Commands

Claude Code Commands

/clear - Start new conversation
/compact - Summarize and compress conversation
/model - Switch between models
/model default - Return to default model selection
/model claude-3-5-sonnet-20241022 - Use specific model

Decision Matrix

Task Type	Recommended Model	Justification
Architecture Planning	Powerful (Opus/GPT-4)	Requires deep reasoning
Code Implementation	Light (Sonnet/GPT-3.5)	Straightforward execution
Bug Fixing	Medium-Light	Depends on complexity
Documentation	Light	Mostly formatting and clarity
Code Review	Medium	Balance of insight and cost
Research/Analysis	Powerful	Requires comprehensive understanding

Remember: The goal is to build more while spending less. These rules help you optimize for both cost and performance without sacrificing quality.

TR: Daha az Token harcayarak Yapay Zeka kullanımı

ByFelez · 7 Ocak 2026

nice topic

LLM Token Optimization Rules

'drose

Moderasyon Lideri

Overview

Core Principles

How LLM Pricing Works

Token Usage Examples

Optimization Rules

Rule 1: Start New Chats for New Tasks

Rule 2: Summarize Long Conversations

Rule 3: Choose the Right Model

Use Powerful Models (Claude Opus, GPT-4) For:

Use Lighter Models (Claude Sonnet, GPT-3.5) For:

Best Practices

Conversation Management

Cost-Effective Workflows

Token Waste Prevention

Advanced Techniques

Custom Model Configuration

Context Optimizations

Performance vs. Cost Balance

Monitoring and Measurement

Quick Reference Commands

Claude Code Commands

Decision Matrix

ByFelez

Uzman üye

Sosyal medya sayfalarımız

LLM Token Optimization Rules

'drose

Moderasyon Lideri

Overview​

Core Principles​

How LLM Pricing Works​

Token Usage Examples​

Optimization Rules​

Rule 1: Start New Chats for New Tasks​

Rule 2: Summarize Long Conversations​

Rule 3: Choose the Right Model​

Use Powerful Models (Claude Opus, GPT-4) For:​

Use Lighter Models (Claude Sonnet, GPT-3.5) For:​

Best Practices​

Conversation Management​

Cost-Effective Workflows​

Token Waste Prevention​

Advanced Techniques​

Custom Model Configuration​

Context Optimizations​

Performance vs. Cost Balance​

Monitoring and Measurement​

Quick Reference Commands​

Claude Code Commands​

Decision Matrix​

ByFelez

Uzman üye

Overview

Core Principles

How LLM Pricing Works

Token Usage Examples

Optimization Rules

Rule 1: Start New Chats for New Tasks

Rule 2: Summarize Long Conversations

Rule 3: Choose the Right Model

Use Powerful Models (Claude Opus, GPT-4) For:

Use Lighter Models (Claude Sonnet, GPT-3.5) For:

Best Practices

Conversation Management

Cost-Effective Workflows

Token Waste Prevention

Advanced Techniques

Custom Model Configuration

Context Optimizations

Performance vs. Cost Balance

Monitoring and Measurement

Quick Reference Commands

Claude Code Commands

Decision Matrix