LLM Gateway abstrai providers (OpenAI, Anthropic, local models), adiciona caching, rate limiting e fallback

Curated resources to complement your reading
Seu app chama openai.chat.completions.create() diretamente. Amanhã, OpenAI quebra ou aumenta preço 10x. LLM Gateway abstrai providers e adiciona resilience.
// ❌ Acoplado a OpenAI
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function chat(message: string) {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: message }]
});
return response.choices[0].message.content;
}
Se quiser trocar para Anthropic Claude, precisa reescrever tudo.
// Gateway abstrai providers
interface LLMGateway {
chat(messages: Message[]): Promise<string>;
}
class OpenAIProvider implements LLMGateway {
async chat(messages) {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages
});
return response.choices[0].message.content;
}
}
class AnthropicProvider implements LLMGateway {
async chat(messages) {
const response = await anthropic.messages.create({
model: 'claude-3-opus',
messages
});
return response.content[0].text;
}
}
// App usa interface genérica
const gateway: LLMGateway = new OpenAIProvider();
const answer = await gateway.chat([{ role: 'user', content: 'Hello' }]);
class CachingGateway implements LLMGateway {
private cache = new Map<string, string>();
constructor(private provider: LLMGateway) {}
async chat(messages: Message[]): Promise<string> {
const key = JSON.stringify(messages);
if (this.cache.has(key)) {
return this.cache.get(key)!;
}
const response = await this.provider.chat(messages);
this.cache.set(key, response);
return response;
}
}
class RateLimitedGateway implements LLMGateway {
private bucket: TokenBucket;
async chat(messages: Message[]): Promise<string> {
await this.bucket.consume(1);
return this.provider.chat(messages);
}
}
class FallbackGateway implements LLMGateway {
constructor(
private primary: LLMGateway,
private fallback: LLMGateway
) {}
async chat(messages: Message[]): Promise<string> {
try {
return await this.primary.chat(messages);
} catch (error) {
console.warn('Primary failed, using fallback');
return await this.fallback.chat(messages);
}
}
}
class LoadBalancedGateway implements LLMGateway {
private currentIndex = 0;
constructor(private providers: LLMGateway[]) {}
async chat(messages: Message[]): Promise<string> {
const provider = this.providers[this.currentIndex];
this.currentIndex = (this.currentIndex + 1) % this.providers.length;
return provider.chat(messages);
}
}
// Combinar features
const gateway = new CachingGateway(
new RateLimitedGateway(
new FallbackGateway(
new OpenAIProvider(),
new AnthropicProvider()
),
new TokenBucket(100, 10)
)
);
// 1. Checa cache
// 2. Rate limit
// 3. Tenta OpenAI
// 4. Se falhar, usa Anthropic
from litellm import completion
# Abstrai providers automaticamente
response = completion(
model="gpt-4", # ou "claude-3-opus", "gemini-pro"
messages=[{"role": "user", "content": "Hello"}]
)
import { ChatOpenAI, ChatAnthropic } from 'langchain/chat_models';
const model = new ChatOpenAI(); // Facilmente trocável
const response = await model.call([{ role: 'user', content: 'Hello' }]);
LLM Gateway desacopla app de vendors, adiciona caching, rate limiting e resilience. Para produção, é essencial.
Se você chama OpenAI diretamente sem abstração, você está a uma API change de reescrever tudo.
AI Engineer
Ajudo equipes a lançar experiências memoráveis combinando arquitetura, storytelling e métricas reais.
47-point checklist to catch bugs, security risks, and performance issues before launch.
Continue exploring similar topics

A comprehensive guide to spec-driven development workflows with AI coding assistants, featuring real-world Next.js examples and Claude Code commands.

Aprenda Model Context Protocol (MCP) na prática: conceitos, arquitetura e exemplos reais com Claude Code para criar agentes e workflows de IA escaláveis.

Aqui está a verdade desconfortável: seu assistente de código com IA é burro. Não porque o modelo é fraco. Claude Sonnet 4.5 e GPT-4o são genuinamente brilhantes. O problema é que eles estão vendados.
Production-tested templates trusted by developers. Save weeks of setup on your next project.
Modular packages for founders and engineering leads. Every engagement includes diagnosis, documentation, and direct access.
2 advisory slots for Q2