The Inference Intelligence Layer

Understand Your LLM
Inference Topology

Reconstruct the complete inference map from your codebase. See models, vendors, runtimes, hardware, costs, and performance—all without telemetry or vendor lock-in.

500+
Detection Targets
<60s
Analysis Time
0
Cloud Dependencies

One Command. Complete Visibility.

No configuration. No telemetry. No vendor lock-in. Just run one command and get your complete inference StackMap.

peakinfer analyze .
Click "Run Demo" to see PeakInfer in action
$ peakinfer analyze .

Everything You Need to Optimize Inference

PeakInfer gives you the intelligence layer that sits above all vendors, runtimes, and hardware.

StackMap Knowledge Graph

Reconstruct complete inference topology from code. Maps models, vendors, runtimes, hardware, and dataflows into a canonical knowledge graph.

Pricing Delta Engine

Real-time pricing intelligence across all providers. Track cost deltas, spot pricing, and find the best deals for your architecture.

Static Code Analysis

Detects LLM calls, routing logic, retry patterns, batching, caching, and more across Python, TypeScript, Go, and Java.

Cost Optimization

Identifies hotspots, suggests alternatives, and calculates cost savings. Compare vendors, runtimes, and hardware options.

Multi-Provider Comparison

Compare OpenAI, Anthropic, Together, Fireworks, and 20+ providers side-by-side. See pricing, latency, and performance deltas.

Privacy First

All analysis happens locally. No telemetry. No cloud accounts. Your code never leaves your machine (except for Claude Code SDK analysis).

Visualize Your Inference Stack

See how your codebase connects to models, vendors, runtimes, and hardware in an interactive StackMap.

Codebase
OpenAI
19 calls
Anthropic
7 calls
Together
2 calls
gpt-4o
9 calls
gpt-4o-mini
5 calls
claude-sonnet-4
7 calls
llama-3-70b
2 calls
vLLM
NVIDIA H100
Codebase
Vendors
Models
Runtimes
Hardware

The Bloomberg Terminal of Inference

Real-time pricing intelligence across all providers. Track deltas, find savings, and optimize costs.

Real-Time Pricing Intelligence

Updated weekly from public sources and community contributions

VendorModelInput / 1M tokensOutput / 1M tokensMonthly CostPrice Delta
OpenAIgpt-4o$2.50$10.00$890 - $1,290
12%
Anthropicclaude-sonnet-4$3.00$15.00$210 - $380
Togetherllama-3-70b$0.20$0.20$50 - $70
8%
Fireworksllama-3-70b$0.15$0.15$38 - $52
24%
Alternative providers can save up to 36% on monthly costs. Run peakinfer pricing for detailed comparisons.

Built for AI Engineering Teams

Codebase Audits

Understand exactly how LLMs are used across your entire codebase. Find all inference callsites, routing logic, and optimization opportunities.

  • Complete inference inventory
  • Hotspot identification
  • Pattern detection

Cost Optimization

Reduce inference costs by 20-40% through intelligent model selection, vendor comparison, and optimization suggestions.

  • Monthly cost estimates
  • Alternative provider suggestions
  • Batching and caching recommendations

PR Reviews

GitHub Actions integration shows StackMap changes, pricing deltas, and optimization opportunities in every PR.

  • Automatic PR comments
  • Cost regression detection
  • Team-wide visibility

Architecture Planning

Make informed decisions about vendors, runtimes, and hardware. Compare options side-by-side with real pricing data.

  • Vendor-agnostic comparisons
  • Hardware cost modeling
  • Runtime efficiency analysis

Ready to Optimize Your Inference?

Get started with PeakInfer in seconds. No signup required. No cloud accounts. Just one command.

$ npm install -g peakinfer
$ peakinfer analyze .