Quick take: The most critical architecture decision is API vs. fine-tuning vs. building from scratch. Using APIs like GPT-4 gets you live in weeks but costs $10k-$100k monthly at scale. Fine-tuning costs $50k-$200k upfront but drops usage costs by 80%. Building from scratch costs $500k+ and takes 12+ months. Most startups should start with APIs.
Your technical co-founder or vendor will make dozens of architecture decisions that determine whether your AI product succeeds or fails. You don’t need to understand the implementation details, but you should understand the business implications of five key choices.
These decisions affect your costs, timeline, scalability, and exit options. They’re made early, often before you realize they matter, and they’re expensive to reverse. Here’s what you need to know.
Architecture Decision Overview
| Decision | Cost Impact | Timeline Impact | When to Decide | Reversal Cost |
|---|---|---|---|---|
| API vs. Fine-tuned vs. From-Scratch | 10-100x variance | 3 months to 2 years | Before starting development | 50-100% of original cost |
| Cloud Provider (AWS vs. GCP vs. Azure) | 20-40% cost variance | Minor | First month | 30-50% of infrastructure work |
| Vector Database Choice | Storage costs vary 5x | Minor | When building retrieval | 20-40% of data infrastructure |
| Streaming vs. Batch Processing | Infrastructure costs vary 3x | Affects UX significantly | Architecture planning | 40-60% of backend work |
| Microservices vs. Monolith | Complexity vs. scale | 50% longer for microservices | Architecture planning | 70-90% of codebase |
1. API vs. Fine-Tuned vs. From-Scratch Models
This decision determines 80% of your cost structure and timeline. Using third-party APIs like GPT-4 or Claude means you pay per request but can launch in weeks. Fine-tuning an existing model costs $50k-$200k upfront but reduces ongoing costs dramatically. Building a model from scratch costs $500k+ and takes a year or more.
APIs make sense when you need general capabilities quickly. You’re renting intelligence from OpenAI or Anthropic. Your cost scales linearly with usage. At 1,000 users, you might pay $2k per month. At 100,000 users, you might pay $200k. The advantage is speed to market and no ML expertise required.
Fine-tuning takes a base model and trains it on your specific use case. You need ML engineers and training data, but you end up with a model optimized for your task. Costs drop to 10-20% of API pricing at scale. This makes sense when you have specific requirements, proprietary data, or usage that makes API costs unsustainable.
Building from scratch means training your own model on your own architecture. Only consider this if you have truly unique requirements, massive scale, or your model is your core IP. Companies like OpenAI and Anthropic do this. Most startups shouldn’t. Budget $500k minimum and 12-18 months.
The trap is starting with APIs, scaling to where costs hurt, then discovering fine-tuning requires rebuilding significant infrastructure. Plan your path early. If you think you’ll need fine-tuning, architect for it from day one even if you start with APIs.
2. Cloud Provider (AWS vs. GCP vs. Azure)
Your cloud provider affects costs by 20-40% and determines which AI services integrate easily. AWS dominates market share and has the most mature services. GCP offers better pricing on compute and tight integration with Google’s AI tools. Azure is strongest for enterprises already using Microsoft infrastructure.
For AI workloads, the differences matter more than for simple web apps. GCP’s TPUs are optimized for machine learning and can be 30-40% cheaper than AWS for training. AWS has better availability zones and more geographic coverage. Azure has the best enterprise support and compliance certifications.
Your choice also determines which managed AI services you can use easily. AWS has SageMaker, GCP has Vertex AI, Azure has Azure ML. These aren’t interchangeable. If you build deeply on one platform’s AI services, switching means rebuilding.
Most startups pick based on what their team knows or where they get startup credits. That’s fine early on, but recognize it’s a multi-year commitment. Switching clouds mid-growth is expensive and risky. Consider where you’ll be in two years, not just where you are today.
Ask your technical team which provider they’re choosing and why. If the answer is “that’s what I know” or “we got credits,” push for a more strategic analysis. The right choice can save you 30% on infrastructure costs at scale.
3. Vector Database Choice
Vector databases store embeddings for semantic search, recommendation systems, and retrieval-augmented generation. Your choice affects storage costs by 5x and query performance significantly. Options include Pinecone (managed service), Weaviate (open source), Qdrant (high performance), or vector extensions in PostgreSQL.
Managed services like Pinecone are easiest to start with. You pay for storage and queries, typically $50-$500 per month at small scale, scaling to $5k-$50k at higher volumes. They handle infrastructure, scaling, and updates. The tradeoff is ongoing cost and dependency.
Self-hosted options like Weaviate or Qdrant give you more control and lower costs at scale but require DevOps expertise. You pay for compute and storage directly, which is usually 60-80% cheaper than managed services once you have enough volume to justify the engineering overhead.
PostgreSQL with pgvector is increasingly popular for startups that need vector search but don’t want another database to manage. Performance is worse than specialized vector databases, but for many use cases it’s good enough and eliminates a dependency.
The decision point is whether you’re building a core feature on vector search or using it as a supporting capability. If your product is fundamentally about semantic search or recommendations, invest in a proper vector database. If you’re just adding RAG to improve an LLM’s responses, PostgreSQL might suffice.
4. Streaming vs. Batch Processing
Streaming architecture processes data in real-time as it arrives. Batch processing handles data in scheduled groups. Streaming costs more to build and operate but provides instant results. Batch is simpler and cheaper but introduces delays.
For AI products, this often manifests as streaming LLM responses word-by-word versus waiting for the complete response. Users strongly prefer streaming for long responses because it feels faster even when total time is similar. But streaming adds complexity to your frontend and backend.
Streaming architecture requires persistent connections, state management, and careful error handling. You need infrastructure like Redis or message queues. Development time increases by 30-50% compared to simple request-response patterns. Operating costs rise because you maintain more active connections.
Batch processing works well for operations that don’t need to feel instant. Report generation, data analysis, email processing, and bulk updates fit batch patterns. You can run these jobs during off-peak hours, optimize for cost over speed, and handle failures more simply.
The mistake is forcing everything into one pattern. Most AI products need both. Use streaming for user-facing interactions where responsiveness matters. Use batch for background jobs, analytics, and operations users don’t wait for.
5. Microservices vs. Monolith
Microservices split your application into independent services that communicate over networks. Monoliths keep everything in one codebase. Microservices scale better and let teams work independently. Monoliths are simpler to build, test, and deploy.
The conventional wisdom is that startups should start with monoliths and split into microservices as they grow. This is mostly right, but AI applications have a twist. AI model serving often benefits from being a separate service from day one because scaling requirements differ drastically from your web application.
Your web app might handle 100 requests per second comfortably. Your AI model might take 2-3 seconds per inference and need GPU instances. Running these on the same infrastructure is inefficient. Splitting model serving into its own service lets you scale and optimize each component independently.
Beyond that separation, bias toward monolithic architecture early. Microservices add enormous complexity. You need service discovery, API gateways, distributed tracing, and coordination across teams. Development velocity drops by 30-50% because simple changes require coordinating multiple services.
Ask your team how they’re structuring services and why. One monolith for your web app plus one service for model inference is reasonable. Splitting into 5-10 microservices before product-market fit is over-engineering unless you have strong reasons.
How We Selected These Decisions
We analyzed 50+ AI architecture reviews and identified decisions that have the highest business impact for non-technical founders. We excluded purely technical decisions that don’t significantly affect costs, timelines, or strategic options.
We prioritized decisions that are hard to reverse and made early in development. These are the choices where founder involvement makes the biggest difference. Later-stage optimizations matter less because you’ll have context and resources to handle them.
Frequently Asked Questions
Should I hire a technical advisor to help with these decisions?
Yes, especially if you don’t have a technical co-founder. A few hours with an experienced AI architect can save you months and hundreds of thousands of dollars. These decisions are too important to make based on what your development team happens to know.
Can I change these decisions later if I make the wrong choice?
Some are easier to change than others. Switching vector databases is usually manageable. Changing from APIs to fine-tuned models requires significant rework. Moving to a different cloud provider is expensive but possible. Going from monolith to microservices or vice versa is essentially a rewrite.
How do I know if my technical team is making good decisions?
Ask them to explain the tradeoffs and why they chose their approach. Good engineers can articulate the business implications of technical choices. Red flags include “that’s just what we always use” or inability to explain alternatives.
Should I optimize for current costs or future scale?
Balance both. Don’t pay for scale you might never need, but don’t paint yourself into corners that require expensive rewrites. The right approach is often starting simple with a clear path to scale when needed.
What if my technical team disagrees with these recommendations?
These are frameworks, not rules. Context matters. But if your team can’t explain why their approach makes sense for your specific situation, that’s a problem. Good technical leaders welcome business-focused questions.
Key Takeaways
- API vs. fine-tuned vs. from-scratch models determines your cost structure and timeline more than any other decision
- Cloud provider choice affects infrastructure costs by 20-40% and creates multi-year lock-in through managed AI services
- Vector database selection impacts storage costs by 5x, with managed services easier but more expensive at scale
- Streaming vs. batch processing involves tradeoffs between user experience and development complexity
- Most AI startups should use a monolith for their web app but split model serving into a separate service
- These decisions are made early, often before founders realize they matter, and are expensive to reverse later
- Non-technical founders should understand the business implications even if they don’t understand implementation details
SFAI Labs helps founders evaluate architecture decisions before they become expensive mistakes. We provide architecture reviews, vendor oversight, and strategic technical guidance. Schedule a 1-hour architecture consultation.
SFAI Labs