MONOPOLY — Technology Decision Matrix
Table of Contents
- Database Selection
- Cache Selection
- Message Queue / Event Streaming
- API Protocol
- Search Engine
- Object Storage
- Container Orchestration
- Load Balancer
- Observability Stack
- CDN
1. Database Selection
Relational (SQL)
| Database | Best For | Avoid When | Scale Ceiling |
|---|
| PostgreSQL | Complex queries, JSONB, GIS, strong consistency, most default use cases | Ultra-high write throughput (>100K writes/s) | ~10TB single node; use Citus for horizontal |
| MySQL / MariaDB | Read-heavy apps, legacy systems, WordPress/Drupal ecosystem | Complex queries, full ACID at scale | ~10TB; use Vitess for sharding |
| CockroachDB | Global distributed SQL, geo-partitioning, multi-region | Simple single-region apps (overkill) | Petabyte-scale |
| PlanetScale | MySQL-compatible, serverless, branch-based workflow | Complex JOINs (foreign keys removed by design) | Very high — Vitess based |
| Amazon Aurora | AWS-native apps, managed PostgreSQL/MySQL, high availability | Non-AWS environments | Up to 128TB, 15 replicas |
NoSQL
| Database | Best For | Avoid When | Scale Ceiling |
|---|
| MongoDB | Flexible schema, document model, prototyping | Financial transactions requiring ACID | Petabyte-scale with sharding |
| DynamoDB | Key-value at massive scale, AWS-native, serverless, predictable latency | Complex queries, ad-hoc analytics, JOINs | Unlimited (AWS-managed) |
| Cassandra | Write-heavy, time-series, wide-column, geographically distributed | Read-heavy with complex queries | Petabyte-scale; used at Apple, Netflix |
| Redis | Cache, sessions, leaderboards, pub/sub, rate limiting | Primary data store for complex models | ~1TB per node; cluster for more |
| Elasticsearch | Full-text search, log aggregation, analytics | Primary database (durability risk) | Petabyte-scale with clusters |
| InfluxDB | Time-series metrics, IoT, monitoring data | General-purpose data | Very high write throughput |
| Neo4j | Graph data, social networks, recommendation engines, fraud detection | Non-graph data (overhead not worth it) | Billions of nodes |
Decision Framework
Is your data relational (joins, foreign keys, transactions)?
YES → Start with PostgreSQL
NO → Continue below
Is your primary access pattern key-value?
YES, need extreme scale → DynamoDB or Cassandra
YES, need speed/cache → Redis
Is your data document-shaped (nested, flexible schema)?
YES → MongoDB
Is it time-series (metrics, logs, IoT)?
YES → InfluxDB or TimescaleDB
Is it graph (relationships are the data)?
YES → Neo4j
Is it search?
YES → Elasticsearch / OpenSearch
2. Cache Selection
| Technology | Best For | Max Single Node | Cluster Support |
|---|
| Redis | Sessions, leaderboards, pub/sub, complex data structures, Lua scripting | ~1TB RAM | Yes (Redis Cluster, Redis Sentinel) |
| Memcached | Simple key-value, multi-threaded, large object cache | ~64GB RAM | Yes (client-side sharding) |
| Varnish | HTTP reverse proxy cache, full-page caching | RAM bound | Limited |
| CloudFront / CDN | Static assets, edge caching globally | N/A (distributed) | Built-in global distribution |
Default recommendation: Redis — more features, better ecosystem, active development.
Use Memcached only when: you need multi-threading for CPU-bound caching workloads and don't need data structures beyond string.
3. Message Queue / Event Streaming
| Technology | Model | Best For | Throughput | Retention |
|---|
| Apache Kafka | Log-based streaming | Event sourcing, high-throughput pipelines, replay, audit | Millions msg/s | Days to forever |
| RabbitMQ | AMQP message broker | Task queues, RPC, routing, fanout | 50K–100K msg/s | Until consumed |
| AWS SQS | Managed queue | AWS-native, simple task queue, serverless | Very high (managed) | Up to 14 days |
| AWS SNS | Pub/sub notification | Fan-out to many subscribers (email, SMS, Lambda, SQS) | Very high (managed) | No retention |
| Google Pub/Sub | Managed streaming | GCP-native, global, serverless | Very high (managed) | Up to 7 days |
| Redis Pub/Sub | In-memory pub/sub | Real-time notifications, low latency, fire-and-forget | Very high | None (no retention) |
| NATS | Lightweight messaging | IoT, microservices, low latency | Very high | JetStream adds retention |
Decision Matrix
Need event replay / audit trail?
YES → Kafka or Kinesis
Need simple task queue with retries and DLQ?
AWS shop → SQS
Self-hosted → RabbitMQ
Need real-time pub/sub with no persistence?
Redis Pub/Sub or NATS
Need fan-out to multiple consumers?
Kafka (consumer groups) or SNS → SQS fan-out
Need < 5 minutes guaranteed delivery, AWS-native, zero ops?
SQS
Volume > 1 million messages/second?
Kafka (self-hosted) or Kinesis (managed)
4. API Protocol
| Protocol | Best For | Avoid When |
|---|
| REST (HTTP/JSON) | Public APIs, CRUD, browser clients, simplicity | Strict typing required; high-performance internal services |
| GraphQL | Complex client data requirements, mobile (reduce over-fetching), BFF pattern | Simple CRUD; not worth the complexity |
| gRPC (HTTP/2 + Protobuf) | Internal microservice communication, low latency, strict contracts, streaming | Public browser APIs (needs gRPC-web) |
| WebSocket | Real-time bidirectional (chat, live dashboards, multiplayer games) | One-way server push (use SSE instead) |
| SSE (Server-Sent Events) | Server → client push (notifications, live feeds) | Bidirectional communication |
| GraphQL Subscriptions | Real-time with GraphQL schema consistency | Simple push scenarios |
Default recommendation:
- External / public: REST
- Internal service-to-service: gRPC
- Real-time features: WebSocket or SSE
5. Search Engine
| Technology | Best For | Avoid When |
|---|
| Elasticsearch | Full-text search, log analytics (ELK), complex aggregations | Simple lookups; operational overhead is high |
| OpenSearch | AWS-native Elasticsearch alternative | Non-AWS preferred setups |
| Typesense | Simple, fast full-text search, typo tolerance, easy ops | Complex aggregations at massive scale |
| Algolia | Managed search-as-a-service, fast setup, great UI | High volume (expensive); self-hosted preference |
| Meilisearch | Self-hosted, developer-friendly, fast relevancy | Enterprise-scale analytics |
| PostgreSQL FTS | Basic full-text search, already using PostgreSQL | High relevancy requirements or large datasets |
Rule of thumb: Use PostgreSQL FTS under 1M documents. Move to Typesense or Elasticsearch above that.
6. Object Storage
| Service | Best For | Egress Cost |
|---|
| AWS S3 | AWS-native apps, de facto standard, massive ecosystem | $0.09/GB (expensive) |
| Cloudflare R2 | S3-compatible, zero egress cost, global | $0.00 egress |
| GCS | GCP-native | $0.12/GB |
| Azure Blob | Azure-native | $0.087/GB |
| Backblaze B2 | Cost-sensitive, S3-compatible | Free with Cloudflare |
| MinIO | Self-hosted S3-compatible | Self-managed |
Cost optimization tip: Use Cloudflare R2 for user-facing media delivery (zero egress). Use S3 for internal/AWS-integrated storage.
7. Container Orchestration
| Technology | Best For | Avoid When |
|---|
| Kubernetes (K8s) | Large teams, complex deployments, multi-cloud, full control | Small teams (ops overhead is very high) |
| AWS ECS + Fargate | AWS-native, serverless containers, simpler than K8s | Multi-cloud or K8s ecosystem tools needed |
| AWS EKS | Managed K8s on AWS, best of both | Small teams; Fargate may be enough |
| GKE (Google) | Best managed K8s, GCP-native, Autopilot mode | Non-GCP environments |
| Docker Compose | Local dev, small single-server deployments | Production at any meaningful scale |
| Nomad | HashiCorp ecosystem, simpler than K8s, multi-workload | K8s ecosystem tools required |
Startup default: ECS + Fargate (zero cluster management).
Scale default: EKS or GKE once team > 5 engineers or services > 10.
8. Load Balancer
| Technology | Layer | Best For |
|---|
| AWS ALB | L7 (HTTP/HTTPS) | AWS apps, path-based routing, WebSocket, HTTP/2 |
| AWS NLB | L4 (TCP/UDP) | Ultra-low latency, static IP, non-HTTP protocols |
| GCP GLB | L7 global | GCP apps, global anycast, single IP worldwide |
| Nginx | L4/L7 | Self-hosted, reverse proxy, flexible config |
| HAProxy | L4/L7 | High performance self-hosted, advanced routing |
| Cloudflare | L7 global + DDoS | DDoS protection + CDN + load balancing combined |
| Traefik | L7 | Kubernetes-native, automatic SSL, service discovery |
9. Observability Stack
Metrics
| Tool | Best For |
|---|
| Prometheus + Grafana | Self-hosted, open-source, Kubernetes-native |
| Datadog | Managed, APM + infra + logs unified, expensive |
| CloudWatch | AWS-native, zero setup, integrated with AWS services |
| New Relic | APM-focused, good for application-level insights |
Logging
| Tool | Best For |
|---|
| ELK Stack (Elasticsearch + Logstash + Kibana) | Self-hosted, powerful, high volume |
| Loki + Grafana | Lightweight, Kubernetes-native, cheap |
| Splunk | Enterprise, compliance, expensive |
| AWS CloudWatch Logs | AWS-native, zero setup |
| Datadog Logs | Unified with metrics, expensive |
Distributed Tracing
| Tool | Best For |
|---|
| Jaeger | Open-source, Kubernetes-native, OpenTelemetry |
| Zipkin | Simple, lightweight, good integrations |
| AWS X-Ray | AWS-native, integrates with Lambda, ECS |
| Datadog APM | Managed, unified with metrics and logs |
| Honeycomb | High-cardinality event-based observability |
Recommended open-source stack: Prometheus + Grafana + Loki + Jaeger (all integrate via OpenTelemetry)
Recommended managed stack: Datadog (expensive but unified) or Grafana Cloud
10. CDN
| Technology | Best For | Edge Locations |
|---|
| Cloudflare | DDoS protection + CDN + DNS, best free tier, edge workers | 300+ |
| AWS CloudFront | AWS-native, deep S3 and API GW integration | 450+ |
| Akamai | Enterprise, highest performance, expensive | 4000+ |
| Fastly | Real-time purging, streaming, VCL customization | 90+ |
| Vercel Edge / Netlify | Jamstack, frontend-first, zero config | 100+ |
Default recommendation: Cloudflare for most use cases (best value, DDoS included, free SSL, Workers for edge compute).
Scale Benchmarks Quick Reference
| Technology | Write Throughput | Read Throughput | Notes |
|---|
| PostgreSQL (single) | ~10K writes/s | ~50K reads/s | With connection pooling |
| PostgreSQL (replicas) | ~10K writes/s | ~200K reads/s | 4 replicas |
| MySQL (single) | ~15K writes/s | ~60K reads/s | |
| Cassandra | ~1M writes/s | ~500K reads/s | 10-node cluster |
| Redis | ~1M ops/s | ~1M ops/s | Single node in-memory |
| Kafka | ~1M msgs/s | ~1M msgs/s | Per partition |
| Elasticsearch | ~50K docs/s | ~10K queries/s | Per node |
| MongoDB | ~50K writes/s | ~100K reads/s | Per replica set |
All benchmarks are approximate and depend heavily on hardware, payload size, and query complexity.
Limitations
- This is a reference document and may not cover all edge cases. Always verify architectures before production.