AI Gateway with Semantic Caching

Cut your LLM API costs
by 40-70%.

SemanticGuard is an AI gateway with intelligent caching for OpenAI, Anthropic, and Google. One line of code. See savings in minutes.

Start free -- no credit card View docs

import { createOpenAI } from "@ai-sdk/openai";
import { withSemanticGuard } from "@semanticguard/ai-sdk";

const openai = createOpenAI({
  apiKey: "sk-...",
  fetch: withSemanticGuard({ gatewayUrl: "https://semanticguard.dev" }),
});

// All calls now cached + tracked automatically
const result = await generateText({
  model: openai("gpt-4o"),
  prompt: "Summarize this document...",
});

Know exactly what you spend

Real-time cost dashboard by model, project, and request. Shadow Mode shows savings you are leaving on the table before you enable caching.

↻

Save 40-70% automatically

Intelligent multi-layer caching that understands when two requests mean the same thing. Cache hits return in under 50ms. Zero false positives by design.

↔

Works with any provider

OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock, Mistral, and any AI SDK provider. One fetch wrapper. No vendor lock-in.

What are you building?

Domain-aware caching that understands your use case.

AI-Powered API

50-80%

Your API handles 100K requests/day. 60% are semantically identical across users. Cache hits return in 4ms instead of 2 seconds.

fetch: withSemanticGuard({ domain: 'general' })

Chatbot or Copilot

40-70%

Users ask 'how do I reset my password?' 50 different ways. Cache the first answer, serve the rest instantly.

fetch: withSemanticGuard({ domain: 'customer-support' })

Agentic Workflows

60-85%

Your agent generates the same output for every user. Same template, different parameters. Cache the template.

fetch: withSemanticGuard({ domain: 'agentic' })

RAG Pipeline

30-50%

Same question against the same documents? Serve the cached synthesis instead of re-computing embeddings and LLM calls.

fetch: withSemanticGuard({ domain: 'rag' })

Content Generation

40-60%

Same email template, different recipient. Same social post, different product. Cache the template, swap the variables.

fetch: withSemanticGuard({ domain: 'content-gen' })

Dev Tools

50-70%

Your code review bot says 'add error handling' 200 times a week. Cache repeating patterns across similar code.

fetch: withSemanticGuard({ domain: 'dev-tools' })

How It Works

Add the wrapper

Add fetch: withSemanticGuard() to your AI SDK provider config. One line, any provider.

See what you spend

Shadow Mode runs immediately. See cost per request, per model, and exactly how much caching would save you.

Turn on caching

Flip the switch when you are ready. Cache hits return in under 50ms. Your dashboard shows real savings in real time.

Built for production

Your keys, your data

API keys pass through, never stored. Prompts only logged if you opt in.

Fail-open design

If cache is down, requests go straight to your provider. Zero downtime risk.

Same-vendor routing

Auxiliary calls use your vendor's cheapest model. Data never leaves your vendor.

Edge runtime

Runs on Vercel Edge. Sub-50ms cache hits. No cold starts.

Pricing

Start free with Shadow Mode. See your savings before you commit.

Free

10K requests/mo

Shadow Mode shows potential savings
Exact match cache
Cost analytics dashboard
MCP server for AI agents

Get started

Popular

Pro

$49/mo

50K included, then $0.50/1K

Full semantic cache
Advanced pattern matching
Advanced analytics + projections
Up to 500K requests/mo

Start free, upgrade later

Enterprise

15%

of documented savings

$500/mo minimum commitment
Unlimited requests
We win when you save
AWS/GCP marketplace billing

Talk to sales

FAQ

SemanticGuard uses multiple caching strategies that go far beyond simple key-value matching. It understands prompt structure, detects reusable patterns across requests, and verifies every cache match before serving. The result: 40-70% of LLM calls in production apps can be served from cache with zero compromise on response quality.

Zero false positives is our design goal. We use multiple verification layers before serving a cached response, and you can tune the matching sensitivity from the dashboard. You are always in control of what gets cached and when.

Your upstream API keys are passed through to the provider at request time and never stored in plaintext. We store only a one-way hash for identification. Your data stays with your chosen vendor.

Start with Shadow Mode (free tier default). It logs every request and shows what you would save if caching were enabled. No cached responses are served until you explicitly turn on caching.

One line of code. Add fetch: withSemanticGuard() to your AI SDK provider config. No API format changes, no vendor lock-in. Works with any provider that accepts a custom fetch function.

Cut your LLM API costsby 40-70%.