AI Gateway with Semantic Caching
SemanticGuard is an AI gateway with intelligent caching for OpenAI, Anthropic, and Google. One line of code. See savings in minutes.
import { createOpenAI } from "@ai-sdk/openai";import { withSemanticGuard } from "@semanticguard/ai-sdk";const openai = createOpenAI({apiKey: "sk-...",fetch: withSemanticGuard({ gatewayUrl: "https://semanticguard.dev" }),});// All calls now cached + tracked automaticallyconst result = await generateText({model: openai("gpt-4o"),prompt: "Summarize this document...",});
Real-time cost dashboard by model, project, and request. Shadow Mode shows savings you are leaving on the table before you enable caching.
Intelligent multi-layer caching that understands when two requests mean the same thing. Cache hits return in under 50ms. Zero false positives by design.
OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock, Mistral, and any AI SDK provider. One fetch wrapper. No vendor lock-in.
Domain-aware caching that understands your use case.
Your API handles 100K requests/day. 60% are semantically identical across users. Cache hits return in 4ms instead of 2 seconds.
fetch: withSemanticGuard({ domain: 'general' })Users ask 'how do I reset my password?' 50 different ways. Cache the first answer, serve the rest instantly.
fetch: withSemanticGuard({ domain: 'customer-support' })Your agent generates the same output for every user. Same template, different parameters. Cache the template.
fetch: withSemanticGuard({ domain: 'agentic' })Same question against the same documents? Serve the cached synthesis instead of re-computing embeddings and LLM calls.
fetch: withSemanticGuard({ domain: 'rag' })Same email template, different recipient. Same social post, different product. Cache the template, swap the variables.
fetch: withSemanticGuard({ domain: 'content-gen' })Your code review bot says 'add error handling' 200 times a week. Cache repeating patterns across similar code.
fetch: withSemanticGuard({ domain: 'dev-tools' })Add fetch: withSemanticGuard() to your AI SDK provider config. One line, any provider.
Shadow Mode runs immediately. See cost per request, per model, and exactly how much caching would save you.
Flip the switch when you are ready. Cache hits return in under 50ms. Your dashboard shows real savings in real time.
Your keys, your data
API keys pass through, never stored. Prompts only logged if you opt in.
Fail-open design
If cache is down, requests go straight to your provider. Zero downtime risk.
Same-vendor routing
Auxiliary calls use your vendor's cheapest model. Data never leaves your vendor.
Edge runtime
Runs on Vercel Edge. Sub-50ms cache hits. No cold starts.
Start free with Shadow Mode. See your savings before you commit.
$0
10K requests/mo
$49/mo
50K included, then $0.50/1K
15%
of documented savings