Reasoning Models

Overview

Reasoning models provide step-by-step thinking processes, making them ideal for complex problem-solving, math, coding, and analysis tasks. RedPill supports all major reasoning models with TEE protection.

Reasoning tokens show the model’s “thinking process” before generating the final answer, leading to more accurate and well-reasoned responses.

Supported Reasoning Models

GPU TEE Reasoning Models

Model	Reasoning Capability	Best For
`z-ai/glm-5.1`	Very High	Confidential reasoning and agentic workflows
`z-ai/glm-5`	Very High	Confidential systems engineering
`qwen/qwen3.5-397b-a17b`	Very High	Large confidential reasoning workloads
`moonshotai/kimi-k2-thinking`	Very High	Long-horizon thinking in TEE
`deepseek/deepseek-r1-0528`	Very High	Complex reasoning in TEE
`deepseek/deepseek-v3.2`	High	Confidential analysis and coding

OpenAI O-Series

Model	Reasoning Capability	Best For
`openai/o1`	Very High	Complex problem-solving
`openai/o1-mini`	High	Faster reasoning tasks
`openai/o3-mini`	Very High	Latest reasoning model

Anthropic Claude

Model	Reasoning Capability	Best For
`anthropic/claude-sonnet-4`	Very High	Analysis, research
`anthropic/claude-4.1`	Highest	Complex reasoning
`anthropic/claude-sonnet-4.5`	High	Balanced performance

Google Gemini

Model	Reasoning Capability	Best For
`google/gemini-2.0-flash-thinking`	High	Fast thinking

Other Thinking Models

Model	Reasoning Capability
`qwen/qwq-32b-preview`	High
`alibaba/qwen-plus-latest`	Medium-High

Basic Usage

Simple Reasoning Request

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_REDPILL_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

response = client.chat.completions.create(
    model="openai/o1",
    messages=[
        {
            "role": "user",
            "content": "Solve this problem step by step: If a train travels at 60 mph for 2.5 hours, then 80 mph for 1 hour, what is the total distance traveled?"
        }
    ]
)

print(response.choices[0].message.content)

Response with Reasoning

The model will show its thinking process:

1. First segment: 60 mph × 2.5 hours = 150 miles
2. Second segment: 80 mph × 1 hour = 80 miles
3. Total distance = 150 + 80 = 230 miles

Answer: 230 miles

Controlling Reasoning Effort

Some models support controlling how much they “think”:

response = client.chat.completions.create(
    model="openai/o1",
    messages=[{"role": "user", "content": "Complex math problem..."}],
    reasoning={
        "effort": "high"  # Options: "low", "medium", "high"
    }
)

Effort Levels

Level	Description	Use Case
`low`	Quick thinking	Simple problems
`medium`	Balanced	Most tasks
`high`	Deep reasoning	Complex problems

Limiting Reasoning Tokens

Control cost by limiting reasoning tokens:

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Analyze this dataset..."}],
    reasoning={
        "max_tokens": 2000  # Limit reasoning to 2000 tokens
    }
)

Excluding Reasoning from Response

Get only the final answer, not the thinking process:

response = client.chat.completions.create(
    model="openai/o1",
    messages=[{"role": "user", "content": "Calculate..."}],
    reasoning={
        "include_reasoning": False  # Only return final answer
    }
)

Use Case: Math & Science

math_problem = """
A water tank can be filled by three pipes A, B, and C.
- Pipe A can fill the tank in 6 hours
- Pipe B can fill it in 8 hours
- Pipe C can fill it in 12 hours

If all three pipes are opened simultaneously, how long will it take to fill the tank?
"""

response = client.chat.completions.create(
    model="openai/o1",
    messages=[{"role": "user", "content": math_problem}],
    reasoning={"effort": "high"}
)

print(response.choices[0].message.content)

Output (with reasoning):

Let me think through this step by step:

1. Find rate for each pipe:
   - Pipe A: 1/6 tank per hour
   - Pipe B: 1/8 tank per hour
   - Pipe C: 1/12 tank per hour

2. Combined rate when all open:
   1/6 + 1/8 + 1/12 = 4/24 + 3/24 + 2/24 = 9/24 = 3/8 tank per hour

3. Time to fill one tank:
   1 ÷ (3/8) = 8/3 = 2 hours and 40 minutes

Answer: 2 hours 40 minutes (or 2.67 hours)

Use Case: Code Debugging

code_problem = """
This Python function is supposed to find duplicates in a list, but it's not working correctly:

def find_duplicates(lst):
    seen = set()
    duplicates = []
    for item in lst:
        if item in seen:
            duplicates.append(item)
        seen.add(item)
    return duplicates

Test case: find_duplicates([1, 2, 3, 2, 4, 3, 5, 3])
Expected: [2, 3, 3]
Actual: [2, 3]

What's wrong and how do I fix it?
"""

response = client.chat.completions.create(
    model="deepseek/deepseek-chat",  # Also great for code reasoning
    messages=[{"role": "user", "content": code_problem}]
)

print(response.choices[0].message.content)

Use Case: Logic Puzzles

puzzle = """
Five houses in a row, each painted a different color.
- The English person lives in the red house
- The Swede has a dog
- The Dane drinks tea
- The green house is directly to the left of the white house
- The person in the green house drinks coffee
- The person who smokes Pall Mall has birds
- The person in the yellow house smokes Dunhill
- The person in the middle house drinks milk
- The Norwegian lives in the first house
- The person who smokes Blend lives next to the one with cats
- The person with a horse lives next to the one who smokes Dunhill
- The person who smokes Blue Master drinks beer
- The German smokes Prince
- The Norwegian lives next to the blue house
- The person who smokes Blend has a neighbor who drinks water

Who owns the fish?
"""

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": puzzle}],
    reasoning={"effort": "high"}
)

print(response.choices[0].message.content)

Use Case: Research & Analysis

research_query = """
Analyze the pros and cons of implementing a microservices architecture
for a startup with 10 engineers building an e-commerce platform.
Consider:
- Development complexity
- Operational overhead
- Scalability benefits
- Team coordination
- Cost implications
- Time to market

Provide a recommendation with detailed reasoning.
"""

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": research_query}],
    reasoning={"effort": "high"}
)

print(response.choices[0].message.content)

Use Case: Strategic Planning

strategy_question = """
Our SaaS company has $500K ARR and is growing 15% MoM.
We have:
- 2 engineers
- 1 designer
- 1 sales person
- $200K in the bank

Should we:
A) Hire 2 more engineers to ship features faster
B) Hire 2 sales people to accelerate growth
C) Hire 1 engineer + 1 sales person
D) Focus on profitability and don't hire

Think through each option's implications over the next 12 months.
"""

response = client.chat.completions.create(
    model="anthropic/claude-4.1",
    messages=[{"role": "user", "content": strategy_question}],
    reasoning={"effort": "high"}
)

print(response.choices[0].message.content)

Chain of Thought Prompting

Enhance reasoning with explicit prompting:

prompt = """
Let's solve this step by step:

Problem: A bakery sells cakes for $12 each. The ingredients cost $5 per cake,
and fixed costs are $500/month. How many cakes must they sell to profit $2000/month?

Please show your work:
1. Calculate profit per cake
2. Determine total profit needed
3. Calculate cakes needed
4. Verify the answer
"""

response = client.chat.completions.create(
    model="openai/o1",
    messages=[{"role": "user", "content": prompt}]
)

print(response.choices[0].message.content)

Multi-Step Reasoning

For complex multi-step problems:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_REDPILL_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# Step 1: Break down the problem
breakdown = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{
        "role": "user",
        "content": "Break down the problem of optimizing a database with 10M rows and slow queries into 5 actionable steps"
    }]
)

steps = breakdown.choices[0].message.content

# Step 2: Analyze each step
analysis = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[
        {"role": "user", "content": "Break down the problem..."},
        {"role": "assistant", "content": steps},
        {"role": "user", "content": "For each step, explain the reasoning and potential trade-offs"}
    ],
    reasoning={"effort": "high"}
)

print(analysis.choices[0].message.content)

Injecting Reasoning

Use one model’s reasoning to improve another:

# Get reasoning from Claude
reasoning_response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "How should I structure a Redis cache layer?"}],
    reasoning={"effort": "high"}
)

reasoning = reasoning_response.choices[0].message.content

# Use reasoning with GPT-5
final_response = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[
        {"role": "user", "content": f"Based on this reasoning:\n\n{reasoning}\n\nGenerate a detailed implementation plan with code examples."}
    ]
)

print(final_response.choices[0].message.content)

Cost Considerations

Reasoning tokens are charged as output tokens:

Model	Input Price	Reasoning Price	Output Price
`openai/o1`	$15/1M	$60/1M	$60/1M
`anthropic/claude-4.1`	$3/1M	$15/1M	$15/1M
`google/gemini-2.0-flash-thinking`	Free	Free	Free

Tips to manage costs:

Use max_tokens to limit reasoning
Set effort: "low" for simple tasks
Use cheaper models for initial exploration
Cache results for repeated questions

Best Practices

When to Use Reasoning Models

Best for:

Complex math and logic problems
Strategic decision-making
Code debugging and optimization
Research and analysis
Multi-step problem solving

Not needed for:

Simple Q&A
Creative writing
Basic information retrieval
Casual conversation

Prompt Engineering

Be specific about what you want
Ask for step-by-step solutions
Provide all necessary context
Use “Let’s think step by step” prompts
Request verification of answers

Model Selection

o1: Best for math and science
Claude 4: Best for analysis and strategy
o3-mini: Good balance of speed and reasoning
DeepSeek: Best for code-related reasoning

Cost Optimization

Start with lower effort levels
Increase only if needed
Use cheaper models for testing
Cache expensive results
Monitor token usage

Comparison with Regular Models

Feature	Regular Models	Reasoning Models
Speed	Fast	Slower
Cost	Lower	Higher
Accuracy	Good	Better
Explainability	Limited	Detailed
Best For	General tasks	Complex problems

Overview

Supported Reasoning Models

GPU TEE Reasoning Models

OpenAI O-Series

Anthropic Claude

Google Gemini

Other Thinking Models

Basic Usage

Simple Reasoning Request

Response with Reasoning

Controlling Reasoning Effort

Effort Levels

Limiting Reasoning Tokens

Excluding Reasoning from Response

Use Case: Math & Science

Use Case: Code Debugging

Use Case: Logic Puzzles

Use Case: Research & Analysis

Use Case: Strategic Planning

Chain of Thought Prompting

Multi-Step Reasoning

Injecting Reasoning

Cost Considerations

Best Practices

Comparison with Regular Models

Resources

Next Steps

Supported Models

Function Calling

Streaming

Pricing

Documentation Index

​Overview

​Supported Reasoning Models

​GPU TEE Reasoning Models

​OpenAI O-Series

​Anthropic Claude

​Google Gemini

​Other Thinking Models

​Basic Usage

​Simple Reasoning Request

​Response with Reasoning

​Controlling Reasoning Effort

​Effort Levels

​Limiting Reasoning Tokens

​Excluding Reasoning from Response

​Use Case: Math & Science

​Use Case: Code Debugging

​Use Case: Logic Puzzles

​Use Case: Research & Analysis

​Use Case: Strategic Planning

​Chain of Thought Prompting

​Multi-Step Reasoning

​Injecting Reasoning

​Cost Considerations

​Best Practices

​Comparison with Regular Models

​Resources

​Next Steps

Supported Models

Function Calling

Streaming

Pricing

Overview

Supported Reasoning Models

GPU TEE Reasoning Models

OpenAI O-Series

Anthropic Claude

Google Gemini

Other Thinking Models

Basic Usage

Simple Reasoning Request

Response with Reasoning

Controlling Reasoning Effort

Effort Levels

Limiting Reasoning Tokens

Excluding Reasoning from Response

Use Case: Math & Science

Use Case: Code Debugging

Use Case: Logic Puzzles

Use Case: Research & Analysis

Use Case: Strategic Planning

Chain of Thought Prompting

Multi-Step Reasoning

Injecting Reasoning

Cost Considerations

Best Practices

Comparison with Regular Models

Resources

Next Steps