Build AI Features for Your SaaS

What We're Building

This guide walks you through adding AI capabilities to an existing SaaS product. Unlike building AI-first applications from scratch, integrating AI into established products requires careful consideration of existing architecture, user workflows, and business models.

We'll cover the most common AI integration patterns that SaaS companies are adopting:

💬

Streaming Chat

Real-time AI assistant with context awareness

📄

Document Analysis

Extract insights from uploaded files

✍️

Content Generation

AI-powered reports and summaries

📊

Usage Billing

Token-based metering and limits

By the end of this guide, you'll have production-ready patterns for integrating multiple AI providers, handling rate limits, streaming responses, and tracking AI usage for billing purposes.

Prerequisites

Before starting this guide, make sure you have the following:

Existing SaaS application - A working product with authentication, database, and API infrastructure
API keys - Claude API key (Anthropic) and/or OpenAI API key
Node.js 18+ and experience with TypeScript
PostgreSQL database for storing usage data and conversation history
Redis instance for rate limiting and job queues (can use Upstash for serverless)
Basic understanding of streaming responses and Server-Sent Events (SSE)

Multi-Provider Strategy

We recommend using both Claude and OpenAI APIs. This provides redundancy if one service has issues, and allows you to use the best model for each task (e.g., Claude for nuanced conversations, GPT-4 for structured outputs).

Tech Stack Specification

Here's the recommended technology stack for AI integration into existing SaaS products:

Layer	Technology	Why This Choice
AI Integration	Claude API, OpenAI	Core AI capabilities with provider redundancy and model flexibility
Caching	Redis / Upstash	Response caching to reduce API costs and improve latency
Queue	Inngest / QStash	Background processing for long-running AI tasks without timeouts
Rate Limiting	Upstash	Usage management with sliding window algorithms per tier
Database	PostgreSQL	Usage tracking, conversation history, and cost analytics

AI Integration Architecture

Here's how the AI components integrate with a typical SaaS architecture:

┌─────────────────────────────────────────────────────────────────────┐
│                         CLIENT (React/Next.js)                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │  Chat Widget │  │ Doc Upload   │  │ AI Reports   │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
│         │                 │                 │                       │
│         └─────────────────┼─────────────────┘                       │
│                           │ SSE / REST                              │
└───────────────────────────┼─────────────────────────────────────────┘
                            │
┌───────────────────────────┼─────────────────────────────────────────┐
│                     API LAYER (Node.js)                             │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                    Rate Limiter (Redis)                       │  │
│  │         Per-user limits │ Per-org limits │ Global limits      │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                           │                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │ AI Router    │  │ Usage Tracker│  │ Job Queue    │              │
│  │ Claude/GPT   │  │ Tokens/Costs │  │ BullMQ       │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
└─────────┼─────────────────┼─────────────────┼───────────────────────┘
          │                 │                 │
┌─────────┼─────────────────┼─────────────────┼───────────────────────┐
│         ▼                 ▼                 ▼                       │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │ Claude API   │  │ PostgreSQL   │  │ Redis        │              │
│  │ OpenAI API   │  │ Usage logs   │  │ Queue + Rate │              │
│  └──────────────┘  └──────────────┘  └──────────────┘              │
│                         DATA LAYER                                  │
└─────────────────────────────────────────────────────────────────────┘

AI Agent Workflow

Here's how to leverage AI tools throughout this build to maximize productivity:

Project Setup with Claude Code

Use Claude Code to scaffold the AI integration layer based on your existing codebase structure:

Prompt

# Example prompt for Claude Code
"Analyze my existing SaaS codebase and create an AI integration
layer that includes:
1. A unified AI client that supports both Claude and OpenAI
2. Rate limiting middleware using Redis
3. Usage tracking with PostgreSQL
4. Streaming response handlers for SSE

Follow my existing patterns for error handling and authentication.
Use TypeScript and match my current code style."

UI Generation with v0.dev

Generate chat interfaces and AI-powered UI components quickly:

Pro Tip

When prompting v0.dev, describe your existing design system colors and patterns. Ask for "a chat widget that matches a dark SaaS dashboard with purple accent colors" to get components that integrate seamlessly.

Development with Cursor

Cursor excels at understanding API documentation. Paste Claude or OpenAI API docs directly and ask Cursor to generate type-safe client wrappers and error handling.

Step-by-Step Build Guide

Phase 1: AI Architecture Patterns for SaaS

Before writing code, establish your AI architecture patterns and database schema. These patterns work for any SaaS product adding AI features - whether you're building chat, analysis, or generation capabilities:

SQL

-- AI usage tracking schema
CREATE TABLE ai_conversations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES users(id),
  org_id UUID REFERENCES organizations(id),
  title VARCHAR(255),
  context JSONB, -- Store app-specific context
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE ai_messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversation_id UUID REFERENCES ai_conversations(id),
  role VARCHAR(20) NOT NULL, -- 'user', 'assistant', 'system'
  content TEXT NOT NULL,
  model VARCHAR(50), -- 'claude-3-opus', 'gpt-4', etc.
  tokens_input INTEGER,
  tokens_output INTEGER,
  cost_cents DECIMAL(10,4),
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE ai_usage_daily (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  org_id UUID REFERENCES organizations(id),
  date DATE NOT NULL,
  total_tokens BIGINT DEFAULT 0,
  total_requests INTEGER DEFAULT 0,
  total_cost_cents DECIMAL(10,2) DEFAULT 0,
  UNIQUE(org_id, date)
);

CREATE INDEX idx_ai_messages_conversation ON ai_messages(conversation_id);
CREATE INDEX idx_ai_usage_org_date ON ai_usage_daily(org_id, date);

Phase 2: Claude API Integration Setup

Set up a unified AI client that supports both Claude and OpenAI with automatic failover. This abstraction makes it easy to switch models or add new providers later:

TypeScript

// lib/ai/providers.ts
import Anthropic from '@anthropic-ai/sdk';
import OpenAI from 'openai';

// Initialize API clients
export const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

// Model configurations for different use cases
export const AI_MODELS = {
  // Claude models - great for nuanced conversation
  claude: {
    fast: 'claude-3-haiku-20240307',
    balanced: 'claude-3-sonnet-20240229',
    powerful: 'claude-3-opus-20240229',
  },
  // OpenAI models - great for structured output
  openai: {
    fast: 'gpt-4o-mini',
    balanced: 'gpt-4o',
    powerful: 'gpt-4-turbo',
  },
} as const;

// Helper to select model based on task complexity
export function selectModel(
  provider: 'claude' | 'openai',
  complexity: 'fast' | 'balanced' | 'powerful'
): string {
  return AI_MODELS[provider][complexity];
}

Phase 3: Streaming Responses Implementation

Streaming is essential for responsive AI chat. Users see tokens as they arrive instead of waiting for the complete response. This unified client supports both Claude and OpenAI with automatic fallback:

TypeScript

// lib/ai/unified-client.ts
import Anthropic from '@anthropic-ai/sdk';
import OpenAI from 'openai';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export interface ChatMessage {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

export interface StreamCallbacks {
  onToken: (token: string) => void;
  onComplete: (usage: { input: number; output: number }) => void;
  onError: (error: Error) => void;
}

export async function streamChat(
  messages: ChatMessage[],
  options: {
    model?: string;
    systemPrompt?: string;
    maxTokens?: number;
  },
  callbacks: StreamCallbacks
): Promise<void> {
  const model = options.model || 'claude-3-sonnet-20240229';
  const isClaude = model.startsWith('claude');

  try {
    if (isClaude) {
      await streamClaude(messages, options, callbacks);
    } else {
      await streamOpenAI(messages, options, callbacks);
    }
  } catch (error) {
    // Fallback to alternate provider on failure
    console.error(`Primary AI failed, attempting fallback`, error);
    try {
      if (isClaude) {
        await streamOpenAI(messages, { ...options, model: 'gpt-4o' }, callbacks);
      } else {
        await streamClaude(messages, { ...options, model: 'claude-3-sonnet-20240229' }, callbacks);
      }
    } catch (fallbackError) {
      callbacks.onError(fallbackError as Error);
    }
  }
}

async function streamClaude(
  messages: ChatMessage[],
  options: { systemPrompt?: string; maxTokens?: number; model?: string },
  callbacks: StreamCallbacks
): Promise<void> {
  const stream = await anthropic.messages.stream({
    model: options.model || 'claude-3-sonnet-20240229',
    max_tokens: options.maxTokens || 1024,
    system: options.systemPrompt,
    messages: messages.filter(m => m.role !== 'system').map(m => ({
      role: m.role as 'user' | 'assistant',
      content: m.content,
    })),
  });

  for await (const event of stream) {
    if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
      callbacks.onToken(event.delta.text);
    }
  }

  const finalMessage = await stream.finalMessage();
  callbacks.onComplete({
    input: finalMessage.usage.input_tokens,
    output: finalMessage.usage.output_tokens,
  });
}

Phase 4: Usage Tracking and Rate Limiting

Implement rate limiting before exposing AI endpoints. This protects your costs and ensures fair usage across customers. Use Upstash for serverless-friendly rate limiting:

TypeScript

// lib/ai/rate-limiter.ts
import { Redis } from '@upstash/redis';
import { Ratelimit } from '@upstash/ratelimit';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL!,
  token: process.env.UPSTASH_REDIS_TOKEN!,
});

// Different limits for different subscription tiers
export const rateLimiters = {
  // Free tier: 10 requests per minute
  free: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(10, '1 m'),
    prefix: 'ai:free',
  }),

  // Pro tier: 60 requests per minute
  pro: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(60, '1 m'),
    prefix: 'ai:pro',
  }),

  // Enterprise: 200 requests per minute
  enterprise: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(200, '1 m'),
    prefix: 'ai:enterprise',
  }),
};

export async function checkRateLimit(
  orgId: string,
  tier: 'free' | 'pro' | 'enterprise'
): Promise<{ success: boolean; remaining: number }> {
  const limiter = rateLimiters[tier];
  const { success, remaining } = await limiter.limit(orgId);

  return { success, remaining };
}

Track usage for billing and analytics with a dedicated usage tracker:

TypeScript

// lib/ai/usage-tracker.ts
import { db } from '@/lib/db';

interface UsageRecord {
  orgId: string;
  userId: string;
  model: string;
  tokensInput: number;
  tokensOutput: number;
}

// Model pricing per 1K tokens (in cents)
const PRICING = {
  'claude-3-opus': { input: 1.5, output: 7.5 },
  'claude-3-sonnet': { input: 0.3, output: 1.5 },
  'claude-3-haiku': { input: 0.025, output: 0.125 },
  'gpt-4-turbo': { input: 1.0, output: 3.0 },
  'gpt-4o': { input: 0.5, output: 1.5 },
};

export async function trackUsage(usage: UsageRecord): Promise<void> {
  const pricing = PRICING[usage.model as keyof typeof PRICING];
  const costCents =
    (usage.tokensInput / 1000) * pricing.input +
    (usage.tokensOutput / 1000) * pricing.output;

  // Update daily aggregates (upsert pattern)
  await db.query(`
    INSERT INTO ai_usage_daily (org_id, date, total_tokens, total_requests, total_cost_cents)
    VALUES ($1, CURRENT_DATE, $2, 1, $3)
    ON CONFLICT (org_id, date)
    DO UPDATE SET
      total_tokens = ai_usage_daily.total_tokens + $2,
      total_requests = ai_usage_daily.total_requests + 1,
      total_cost_cents = ai_usage_daily.total_cost_cents + $3
  `, [
    usage.orgId,
    usage.tokensInput + usage.tokensOutput,
    costCents
  ]);
}

export async function checkMonthlyLimit(
  orgId: string,
  limitCents: number
): Promise<{ withinLimit: boolean; used: number }> {
  const result = await db.query(`
    SELECT COALESCE(SUM(total_cost_cents), 0) as total
    FROM ai_usage_daily
    WHERE org_id = $1
    AND date >= DATE_TRUNC('month', CURRENT_DATE)
  `, [orgId]);

  const used = parseFloat(result.rows[0].total);
  return { withinLimit: used < limitCents, used };
}

Phase 5: Background AI Processing

Long-running AI tasks like document analysis or report generation should run in the background using Inngest or QStash. This avoids timeout issues and provides better UX:

TypeScript

// lib/ai/content-generator.ts
import { Queue, Worker } from 'bullmq';
import { anthropic } from './unified-client';

const connection = {
  host: process.env.REDIS_HOST,
  port: parseInt(process.env.REDIS_PORT || '6379')
};

// Queue for long-running AI generation tasks
export const contentQueue = new Queue('ai-content', { connection });

interface ContentJob {
  type: 'report' | 'email' | 'summary';
  data: Record<string, any>;
  orgId: string;
  userId: string;
  webhookUrl?: string;
}

// Add a content generation job
export async function queueContentGeneration(
  job: ContentJob
): Promise<string> {
  const result = await contentQueue.add('generate', job, {
    attempts: 3,
    backoff: { type: 'exponential', delay: 1000 },
  });
  return result.id!;
}

// Worker to process content generation
const worker = new Worker('ai-content', async (job) => {
  const { type, data, orgId, userId, webhookUrl } = job.data as ContentJob;

  const templates = {
    report: `Generate a detailed report based on: ${JSON.stringify(data)}`,
    email: `Write a professional email: ${JSON.stringify(data)}`,
    summary: `Create an executive summary: ${JSON.stringify(data)}`,
  };

  const response = await anthropic.messages.create({
    model: 'claude-3-opus-20240229', // Use Opus for quality content
    max_tokens: 4096,
    messages: [{ role: 'user', content: templates[type] }],
  });

  const content = response.content[0].type === 'text'
    ? response.content[0].text
    : '';

  // Notify via webhook if provided
  if (webhookUrl) {
    await fetch(webhookUrl, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ jobId: job.id, content, type }),
    });
  }

  return { content, tokensUsed: response.usage };
}, { connection });

Phase 6: Caching Strategies for Cost Optimization

Caching AI responses is one of the most impactful ways to reduce costs. Use Redis/Upstash to cache identical or similar queries:

TypeScript

// lib/ai/cache.ts - AI Response Caching
import { Redis } from '@upstash/redis';
import crypto from 'crypto';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_URL!,
  token: process.env.UPSTASH_REDIS_TOKEN!,
});

// Cache TTL in seconds (24 hours for most responses)
const CACHE_TTL = 86400;

// Generate a cache key from the prompt and context
function generateCacheKey(
  prompt: string,
  model: string,
  context?: string
): string {
  const input = `${model}:${prompt}:${context || ''}`;
  return `ai:cache:${crypto.createHash('sha256').update(input).digest('hex')}`;
}

// Check cache before calling AI
export async function getCachedResponse(
  prompt: string,
  model: string,
  context?: string
): Promise<string | null> {
  const key = generateCacheKey(prompt, model, context);
  return await redis.get(key);
}

// Store response in cache
export async function cacheResponse(
  prompt: string,
  model: string,
  response: string,
  context?: string,
  ttl: number = CACHE_TTL
): Promise<void> {
  const key = generateCacheKey(prompt, model, context);
  await redis.setex(key, ttl, response);
}

// Wrapper that handles caching automatically
export async function withCache<T>(
  cacheKey: { prompt: string; model: string; context?: string },
  fetchFn: () => Promise<T>,
  options: { ttl?: number; skipCache?: boolean } = {}
): Promise<T & { cached: boolean }> {
  if (!options.skipCache) {
    const cached = await getCachedResponse(
      cacheKey.prompt,
      cacheKey.model,
      cacheKey.context
    );
    if (cached) {
      return { ...JSON.parse(cached), cached: true };
    }
  }

  const result = await fetchFn();

  await cacheResponse(
    cacheKey.prompt,
    cacheKey.model,
    JSON.stringify(result),
    cacheKey.context,
    options.ttl
  );

  return { ...result, cached: false };
}

Cost Optimization Strategies

Cache hit rates of 30-50% are common for SaaS products with repeated queries. Combined with model tiering (Haiku for simple tasks, Sonnet for complex), you can reduce AI costs by 60-80%.

Key caching strategies that work for any SaaS:

Semantic caching: Cache similar queries, not just identical ones. Use embeddings to find semantically similar prompts.
Context-aware caching: Include relevant context (user tier, date range, etc.) in the cache key for personalized but cacheable responses.
Tiered TTLs: Use shorter TTLs (1 hour) for dynamic content, longer TTLs (24+ hours) for factual or reference content.
Cache warming: Pre-generate responses for common queries during off-peak hours.

Common Issues and Solutions

Here are some common issues you might encounter and how to solve them:

Streaming Timeout on Vercel

Vercel Edge Functions have a 30-second timeout. For long-running AI responses, use export const runtime = 'edge' which allows streaming up to 30 seconds, or move to background jobs for content generation tasks.

Context Window Management

When conversations get long, you'll hit context limits. Implement a sliding window or summarization strategy:

TypeScript

// Trim conversation history to fit context window
function trimConversation(
  messages: ChatMessage[],
  maxTokens: number = 100000
): ChatMessage[] {
  let tokenCount = 0;
  const trimmed: ChatMessage[] = [];

  // Always keep system message
  const system = messages.find(m => m.role === 'system');
  if (system) trimmed.push(system);

  // Add messages from most recent, respecting token limit
  for (const msg of messages.reverse()) {
    const msgTokens = estimateTokens(msg.content);
    if (tokenCount + msgTokens > maxTokens) break;
    if (msg.role !== 'system') {
      trimmed.unshift(msg);
      tokenCount += msgTokens;
    }
  }

  return trimmed;
}

Controlling AI Costs

AI costs can spiral quickly. Combine rate limiting (Phase 4) with caching (Phase 6) for maximum savings:

Hard limits per organization: Stop AI access when monthly spend exceeds plan limit
Model tiering: Use Claude Haiku or GPT-4o-mini for simple tasks, reserve Opus for complex analysis
Response caching: See Phase 6 for detailed caching patterns - aim for 30-50% cache hit rates
Alert thresholds: Notify admins when usage hits 80% of limits

Next Steps

You now have a solid foundation for AI integration. Here's how to expand:

Add RAG (Retrieval-Augmented Generation): Connect a vector database like Pinecone or pgvector to give AI access to your product's knowledge base
Implement AI Agents: Allow AI to take actions in your app (create tasks, update records) with proper permission guards
Build Custom Fine-Tuning: Use OpenAI's fine-tuning API to create models specialized for your domain
Add Voice Interface: Integrate Whisper for voice input and ElevenLabs for AI-generated audio responses
Create AI Marketplace: Let users create and share custom AI prompts/workflows within your platform

Need Help?

Adding AI to an existing SaaS product is complex. If you need expert guidance on architecture, implementation, or optimization, reach out for a consultation.

Follow the Vibe Coding Enthusiast

Follow JD — product updates on LinkedIn, personal takes on X.

What We're Building

Streaming Chat

Document Analysis

Content Generation

Usage Billing

Prerequisites

Tech Stack Specification

AI Integration Architecture

AI Agent Workflow

Project Setup with Claude Code

UI Generation with v0.dev

Development with Cursor

Step-by-Step Build Guide

Phase 1: AI Architecture Patterns for SaaS

Phase 2: Claude API Integration Setup

Phase 3: Streaming Responses Implementation

Phase 4: Usage Tracking and Rate Limiting

Phase 5: Background AI Processing

Phase 6: Caching Strategies for Cost Optimization

Common Issues and Solutions

Context Window Management

Controlling AI Costs

Next Steps

Related Guides

FinTech AI Dashboard

E-commerce AI Recommender

HealthTech Patient Portal

Follow the Vibe Coding Enthusiast