A founder asked me a few weeks ago how much of his application he would have to rewrite to "add AI." None of it, in his case. LLM integration is usually a service layer you bolt on, not a core rewrite — and the fastest way to torch a budget is to assume otherwise.
I am Adriano. I have shipped 250+ projects since 2009, including AI features inside production apps and a self-initiated AI product, Instill (30+ active users, 1,000+ skills, 45+ projects, built on the same patterns this article describes). Some of my LLM integrations took a weekend. Others took three months. The difference was rarely the model. It was how clearly the integration was planned against what already existed.
TL;DR
- You can add LLM features to an existing web app without rebuilding it. Treat AI as a service layer, not a rewrite.
- Three architecture patterns cover most cases: direct API calls, middleware proxy, or async queue.
- Real API costs for mid-market applications: $200 to $3,000 a month depending on volume and model.
- A 4-phase roadmap (Audit, Prototype, Harden, Scale) keeps the existing app stable while you bolt on intelligence.
- Use third-party APIs for standard tasks. Build or fine-tune only when your data is the competitive advantage.
Table of contents
- What LLM integration actually means
- When adding AI makes sense, and when it does not
- Three architecture patterns for LLM integration
- Real API costs you will actually pay
- Build vs. buy: the decision framework
- The 4-phase integration roadmap
- Common mistakes that kill LLM projects
- FAQ
- Next steps
What LLM integration actually means
An LLM is a kind of AI that reads and writes human language. When founders say "add AI to my app," they usually mean connecting their existing web application to one of these models through an API (Application Programming Interface — a standard way for two systems to talk).
Think of it like adding a new payment processor. The app already works. The checkout is not getting rewritten. You are connecting to Stripe's API so the app can process payments. LLM integration works the same way: your app sends text to a model provider, the provider processes it, your app receives a response.
In practice: the user types a question into a search bar. Your backend sends it to an LLM API along with the relevant context (your product docs, knowledge base, FAQ). The model returns an answer. The round trip takes 1 to 3 seconds.
Your existing database, authentication, and frontend stay where they are. You are adding a capability to a working system, not replacing the system.
When adding AI makes sense, and when it does not
Before spending a dollar on development, run the use case through these filters.
Good candidates for LLM integration
Customer-facing search and support. Traditional keyword search matches exact words. LLM-powered search understands intent. "My account is locked" matches an article titled "Password Reset Guide" even though the words do not overlap.
Content generation and summarisation. Any workflow where users create or consume text gets faster with an LLM. McKinsey's State of AI research puts productivity gains on writing and summarising tasks in the 30 to 60% range when the integration is well designed.
Data extraction from unstructured text. If your team manually pulls fields from PDFs, emails, or forms, an LLM automates 70 to 80% of the work and flags the rest for review. Insurance claims, invoice intake, contract review — all strong candidates. According to Goldman Sachs research, generative AI could automate roughly a quarter of current work tasks across major economies, and document-heavy operations sit near the top of the list. One client I worked with cut 40 hours a month of manual document processing through a single workflow built on this exact pattern.
Internal tools and admin panels. Adding a natural-language query layer ("show me all customers in Texas who have not ordered in 90 days") saves hours of building custom filter UIs.
Poor candidates for LLM integration
Anything that needs guaranteed correctness. LLMs produce plausible text, not certified text. Medical diagnosis, legal compliance, financial calculations need deterministic systems. You can use an LLM as an assistant, but a human or a rules engine signs off the output.
Simple rule-based tasks. If the logic is "if X then Y," a conditional statement does the job. Calling an LLM API for that costs money and adds latency.
Apps with very low text volume. A few dozen requests a day, mostly structured data, and LLM integration is overhead with no payoff.
For a broader view of where AI fits beyond a single web app, see AI solutions for business.
Three architecture patterns for LLM integration
Three patterns cover the bulk of LLM integration work. The right one depends on your existing stack, latency target, and how much control you need.
Pattern 1: direct API calls (simplest)
Your backend calls the LLM provider's API directly when a user triggers an AI feature.
Architecture. Three stops. The user's browser, your backend, and the model API. The request flows left to right, the response flows back.
Best for. Prototypes and low-volume applications, under 1,000 AI requests a day. Fast to ship — days, not weeks. No new infrastructure. The trade-off: every request waits 1 to 3 seconds for the model, and there is no caching or rate limiting unless you add it.
A typical first build: an internal knowledge base search where the backend sends user questions to a model API along with the relevant docs as context, and the answer comes back in 2 seconds. This is the same speed-first pattern that worked on the Cuez API rebuild (3s to 300ms, 10x faster) — keep the existing system, add the new layer cleanly, do not rewrite.
Pattern 2: middleware proxy layer (balanced)
You add a lightweight service between your backend and the LLM API. The proxy handles caching, rate limiting, prompt management, cost tracking, and fallback logic.
Architecture. Same three stops as Pattern 1, plus a fourth box (the AI proxy) between backend and model API. The proxy caches responses, enforces rate limits, manages prompts in one place, and retries or falls back to a different model on errors.
Best for. Production applications running 1,000+ AI requests a day. Caching usually cuts API calls by 30 to 50%. The proxy makes swapping models a configuration change instead of a code rewrite.
This is the pattern I recommend for most production integrations. The cost of the extra service is small. The optionality you get back is large.
Pattern 3: async queue (most resilient)
AI requests go into a message queue (RabbitMQ, Amazon SQS, Redis). A separate worker processes them in the background and stores the results.
Architecture. Two flows. The user triggers the AI feature, your backend drops a job into a queue and tells the user "processing." A background worker picks up jobs, calls the model API, stores results, and notifies the frontend when done.
Best for. High-volume applications (10,000+ daily requests) and batch work. Hypothetical: a catalogue of 15,000 product descriptions to generate. Queue-based processing handles that in a few hours with parallel workers and automatic retry. The trade-off: more infrastructure to build, and users do not get instant responses.
For more on building AI capabilities into a web app from the ground up, see building AI into your web app.
Real API costs you will actually pay
Most blog posts dodge this with "it depends." Here are actual numbers from production projects in 2025-2026.
Cost per request (approximate)
| Model | Input cost (per 1M tokens) | Output cost (per 1M tokens) | Typical request cost |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | $0.003 to $0.02 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | $0.004 to $0.025 |
| GPT-4o Mini | $0.15 | $0.60 | $0.0003 to $0.002 |
| Claude 3.5 Haiku | $0.80 | $4.00 | $0.001 to $0.008 |
A "token" is roughly three-quarters of a word. A typical customer-support exchange uses about 700 tokens total.
Monthly estimates by scale
| Scale | Daily requests | Mid-tier model/mo | Premium model/mo |
|---|---|---|---|
| Small | 100 to 500 | $15 to $100 | $50 to $350 |
| Medium | 500 to 5,000 | $100 to $800 | $350 to $2,500 |
| Large | 5,000 to 50,000 | $800 to $5,000 | $2,500 to $15,000 |
What these numbers leave out
API costs are 20 to 40% of the total. The rest:
- Development time. $5,000 to $30,000 for the initial integration depending on pattern complexity.
- Prompt engineering. 10 to 20 hours of testing the instructions you send to the model.
- Monitoring and maintenance. 2 to 5 hours a month for quality checks and prompt updates.
A realistic all-in budget for a mid-market SaaS adding one AI feature: $8,000 to $25,000 upfront, plus $300 to $2,000 a month ongoing. For context on labour costs, Bureau of Labor Statistics data puts a US software developer's fully loaded hour around $80, so a single feature paying back even a few hours a week of in-house engineering time covers the ongoing API spend. The biggest variable is not the AI. It is how clean your existing data and codebase already are.
Build vs. buy: the decision framework
For 95% of businesses reading this, the answer is: use the API. Here is how to decide.
Use a third-party API when
- The use case is general. Summarisation, search, content generation, classification all work out of the box with major models.
- Speed matters more than customisation. API integration ships in 2 to 4 weeks. Training a custom model takes 3 to 6 months.
- Your data volume is small to medium. Under 100,000 documents, RAG (Retrieval-Augmented Generation — feeding relevant documents to the model alongside the user's question so it answers from your data) with a third-party API will outperform a custom-trained model.
Consider building or fine-tuning when
- Your data is the product. A proprietary dataset that makes your AI uniquely better is worth protecting with a fine-tuned model.
- Regulatory rules demand it. Healthcare, defence, parts of financial services — sometimes data cannot leave your infrastructure. Self-hosted open-source models (Llama 3, Mistral) handle that.
- You need cost efficiency at very large scale. At 1 million+ API calls a day, a self-hosted model can run 60 to 80% cheaper. Below that volume, operational overhead eats the savings.
The middle path: RAG with API calls
For most clients, the right answer is RAG with a third-party API. You store your data in a vector database (a store optimised for finding similar text). When a user asks a question, your app finds the relevant documents and sends them to the model along with the question. The model answers based on your specific data without you training anything. You get roughly 80% of the benefit of a custom model at 10% of the cost. I cover the full pattern in practical RAG for existing apps.
The 4-phase integration roadmap
This sequence has worked across the LLM integrations I have shipped into production. Plan 6 to 12 weeks for a mid-market application.
Phase 1: audit (week 1 to 2)
Map the existing architecture, identify the highest-value AI use case, and assess data readiness. Deliverables: an architecture diagram with the proposed integration point, a data quality assessment, a cost estimate, and a go/no-go recommendation.
What kills projects here. Skipping the audit. Teams that jump to coding waste two to three times the budget because they hit data or architecture problems mid-build.
Phase 2: prototype (week 3 to 5)
Build a working proof of concept using Pattern 1 (direct API calls) against your real data, not sample data. Get five to ten internal users testing it. Measure response time, accuracy, and actual API cost.
Every LLM demo looks impressive against clean inputs. The test that matters is whether it gives useful answers when fed the messy, incomplete data your real system contains.
Phase 3: harden (week 6 to 9)
Move from prototype to production. Switch to Pattern 2 if needed. Add error handling, caching, rate limiting, monitoring, and input validation.
The detail most teams miss: input validation. Users will type anything into your AI feature, including prompt injection attempts (instructions designed to trick the model into ignoring its rules). A hardened integration validates and sanitises every input before it reaches the model.
Phase 4: scale (week 10 to 12)
Roll out to all users behind a feature flag. Set up analytics to measure business impact. Optimise costs by routing easier requests to cheaper models without quality loss. Document the architecture for the team that will maintain it.
Common mistakes that kill LLM projects
Patterns I see most often, in roughly the order they appear.
Starting with the model instead of the problem. "We want to add GPT-4 to our app" is not a goal. "We want to cut support resolution time by 40%" is. Start with the outcome, then choose the tool.
Ignoring latency. LLM API calls take 1 to 5 seconds. If users expect instant responses, you need streaming (the answer appears word by word) or background processing. A 4-second loading spinner is not acceptable UX.
Sending too much context per request. Founders want to feed the model "everything." Sending the entire knowledge base on every request is expensive and slow. RAG solves it by sending only the relevant documents for each question.
Not budgeting for prompt work. The prompt (the instructions you give the model) determines about 80% of output quality. I plan 10 to 20 hours for prompt development on every project. Skip it and you get answers that are technically correct and unhelpful.
Treating it as a one-time project. Model providers update their models. A prompt that worked in January can produce different results after a March update. Plan 2 to 5 hours a month for monitoring and minor tuning.
These patterns apply beyond a single feature. If the broader question is AI automation for business operations, the same principles hold whether you are automating support, document processing, or internal workflows. For a from-scratch build, see custom applications.
FAQ
How long does it take to add LLM features to an existing web app?
Plan 6 to 12 weeks from audit to full production for a single AI feature. A basic proof of concept can work in 1 to 2 weeks. Hardening for production takes the rest. Timeline depends on codebase complexity and data readiness.
Do I need to rewrite my application to integrate an LLM?
No. LLM integration works through APIs — you add a capability alongside the existing code. Database, authentication, and frontend stay the same. The new code is the layer that sends requests to the model and handles responses, usually a few hundred lines.
What does LLM integration cost for a mid-size SaaS application?
Budget $8,000 to $25,000 for the initial development plus $300 to $2,000 a month ongoing. Direct API calls are cheapest to implement; async queue-based is the most expensive. Ongoing cost depends on volume and model choice. If you want one person owning both the build and the maintenance, my AI automation retainer is $3,000/month, single tier, monthly cancel.
Can I switch LLM providers after integration?
Yes, especially with the middleware proxy pattern. The proxy abstracts provider-specific calls, so switching becomes a configuration change rather than a code rewrite. This optionality is the main reason I default to Pattern 2 in production.
Is my data safe when using LLM APIs?
The major providers (OpenAI, Anthropic) offer enterprise plans with SOC 2 compliance, contractual no-training guarantees on your data, and signed data processing agreements. If data cannot leave your infrastructure at all, self-hosted open-source models (Llama 3, Mistral) give you full control.
What is the difference between RAG and fine-tuning?
RAG retrieves relevant pieces of your data at query time and feeds them to the model. Fine-tuning re-trains the model on your data so it answers a certain way. RAG is faster, cheaper, easier to keep current, and the right answer for most business cases. Fine-tuning is the right answer when style or domain precision matters more than freshness.
How do I handle hallucinations?
Three layers. Ground the model with retrieved context (RAG). Cite sources in the answer so users can verify. Add confidence scoring and an easy escalation to a human for low-confidence responses. You will not eliminate hallucinations, but you will make them rare and easy to catch.
Reflecting on sixteen years of shipping software
Every successful LLM integration I have shipped has come down to the same handful of choices: pick a use case where AI plausibly beats the existing approach, integrate as a service layer rather than a rewrite, plan for input validation and latency before launch, and stay around to tune the prompts after real users start typing into it. Skip any of those steps and the project costs more and ships less.
That is the same pattern I have used since 2009. From the Cuez API rescue (3s to 300ms), to the 40+ payment provider integrations at bolttech, a $1B+ unicorn, to the AI features inside Instill, the lever has been the same: read the problem accurately first, ship the smallest useful version, then improve in public against real usage.
LLM integration is not a model choice. It is a planning choice with a model attached.
Next steps
If you are past the "should I add AI" question and into "how do I do it without breaking what works," the answer starts with Phase 1: a focused audit of the existing system, the data, and the use case that delivers the most value.
I do this work with clients every month. For a clear assessment of where LLM integration fits into your application, and an honest answer about whether it is worth the spend, book a free strategy call. No pitch.
Further reading
- AI Automation services — $3,000/month retainer
- Custom Applications — monthly subscription from $3,499/month
- Instill case study — self-initiated AI product, 30+ users, 1,000+ skills
- Cuez case study — 10x faster API
- Practical RAG: add AI to your existing app
- AI agents for business owners