Architecture Engineering

System Design: From Zero to Production

A practical guide to designing scalable systems. Learn real-world patterns used by companies like Netflix, Uber, and Stripe.

Ioodu · · Updated: Mar 1, 2024 · 30 min read
#System Design #Architecture #Engineering

System Design Fundamentals

Every senior engineer must think about system design. Whether you’re building a startup’s first product or scaling to millions of users, the principles remain the same.

The Five Pillars

┌─────────────────────────────────────────────────────────────┐
│                      SCALABILITY                            │
│         Handle growth in users, data, and traffic          │
└────────────────────────┬────────────────────────────────────┘

    ┌────────────────────┼────────────────────┐
    │                    │                    │
    ▼                    ▼                    ▼
┌─────────┐        ┌─────────┐        ┌─────────┐
│RELIABILITY│      │AVAILABILITY│     │MAINTAINABILITY│
│  Fix bugs  │      │ Never fail   │     │  Code clarity │
└─────────┘        └─────────┘        └─────────┘

1. Load Balancing

The Problem

A single server can’t handle millions of users. We need multiple servers with traffic distribution.

The Solution

interface LoadBalancer {
  registerServer(server: Server): void;
  removeServer(server: Server): void;
  getNextServer(request: Request): Server;
}

class RoundRobinLB implements LoadBalancer {
  private servers: Server[] = [];
  private currentIndex = 0;

  getNextServer(): Server {
    const server = this.servers[this.currentIndex];
    this.currentIndex = (this.currentIndex + 1) % this.servers.length;
    return server;
  }
}

class WeightedRoundRobinLB implements LoadBalancer {
  // Servers with more capacity get more traffic
  getNextServer(): Server {
    // Weight-based selection
  }
}

class LeastConnectionsLB implements LoadBalancer {
  // Route to server with fewest active connections
  getNextServer(): Server {
    // Select server with min connections
  }
}

Health Checks

class HealthCheck {
  async check(server: Server): Promise<boolean> {
    try {
      const response = await fetch(server.healthEndpoint, {
        method: 'GET',
        timeout: 5000
      });
      return response.ok;
    } catch {
      return false;
    }
  }

  async monitor(servers: Server[], interval: number) {
    setInterval(async () => {
      for (const server of servers) {
        const healthy = await this.check(server);
        server.setHealthy(healthy);
      }
    }, interval);
  }
}

2. Caching Strategies

Cache Patterns

interface Cache<T> {
  get(key: string): Promise<T | null>;
  set(key: string, value: T, ttl?: number): Promise<void>;
  delete(key: string): Promise<void>;
}

// Read-through cache
class ReadThroughCache<T> implements Cache<T> {
  constructor(
    private cache: Cache<T>,
    private dataSource: DataSource<T>
  ) {}

  async get(key: string): Promise<T | null> {
    // Check cache first
    let value = await this.cache.get(key);
    if (value) return value;

    // Load from source and cache
    value = await this.dataSource.load(key);
    await this.cache.set(key, value, 3600); // 1 hour TTL
    return value;
  }
}

// Write-through cache
class WriteThroughCache<T> implements Cache<T> {
  async set(key: string, value: T): Promise<void> {
    await Promise.all([
      this.cache.set(key, value),
      this.database.save(key, value)
    ]);
  }
}

Cache Invalidation

┌─────────────────────────────────────────┐
│           Cache Invalidation            │
├─────────────────────────────────────────┤
│ 1. TTL-based (simple, eventual)         │
│    cache.set(key, value, ttl=300)       │
│                                          │
│ 2. Write-through (consistent, slower)   │
│    DB write → Cache update              │
│                                          │
│ 3. Write-behind (fast, complex)         │
│    DB write → Queue → Cache update      │
│                                          │
│ 4. Delete-based (eventual)              │
│    Delete cache → DB write              │
└─────────────────────────────────────────┘

3. Database Design

Sharding Strategies

// Horizontal sharding by user ID
class ShardedDatabase {
  private shards: Map<number, Database> = new Map();
  private shardCount: number = 4;

  constructor() {
    for (let i = 0; i < this.shardCount; i++) {
      this.shards.set(i, new Database(`shard_${i}`));
    }
  }

  private getShard(userId: string): number {
    // Consistent hashing
    return hash(userId) % this.shardCount;
  }

  async saveUser(user: User): Promise<void> {
    const shardId = this.getShard(user.id);
    await this.shards.get(shardId)!.save(user);
  }

  async getUser(userId: string): Promise<User | null> {
    const shardId = this.getShard(userId);
    return this.shards.get(shardId)!.find(userId);
  }
}

CQRS Pattern

// Separate read and write models for scale
class CQRSStore {
  // Write model - optimized for writes
  private writeDb: SQLDatabase;
  private eventStore: EventStore;

  async saveOrder(order: Order): Promise<void> {
    await this.writeDb.transaction(async (trx) => {
      await trx.orders.insert(order);
      await this.eventStore.publish('OrderCreated', order);
    });
  }

  // Read model - optimized for reads
  private readDb: ReadDatabase;
  private readReplicas: ReadDatabase[];

  async getOrderSummary(orderId: string): Promise<OrderSummary> {
    // Read from replica for scale
    const replica = this.getLeastLoadedReplica();
    return replica.query(`
      SELECT o.*, u.name as user_name
      FROM orders o
      JOIN users u ON o.user_id = u.id
      WHERE o.id = ?
    `, [orderId]);
  }
}

4. Message Queues

Event-Driven Architecture

interface MessageQueue {
  publish(topic: string, message: any): Promise<void>;
  subscribe(topic: string, handler: Handler): Promise<void>;
}

class OrderService {
  constructor(private queue: MessageQueue) {}

  async createOrder(order: Order): Promise<Order> {
    // Create order
    const saved = await this.orderRepo.save(order);

    // Publish events (async, decoupled)
    await this.queue.publish('order.created', {
      orderId: saved.id,
      userId: saved.userId,
      total: saved.total,
      items: saved.items
    });

    return saved;
  }
}

class NotificationService {
  constructor(private queue: MessageQueue) {}

  async start(): Promise<void> {
    await this.queue.subscribe('order.created', async (event) => {
      await this.sendConfirmationEmail(event.userId, event.orderId);
      await this.updateInventory(event.items);
    });
  }
}

Handling Failures

class ReliableMessageHandler {
  private deadLetterQueue: MessageQueue;

  async handle(message: Message): Promise<void> {
    const maxRetries = 3;
    let attempts = 0;

    while (attempts < maxRetries) {
      try {
        await this.process(message);
        return;
      } catch (error) {
        attempts++;
        if (attempts < maxRetries) {
          // Exponential backoff
          await this.sleep(Math.pow(2, attempts) * 1000);
        }
      }
    }

    // Send to dead letter queue after all retries fail
    await this.deadLetterQueue.publish(message.topic, {
      originalMessage: message,
      failedAttempts: attempts,
      lastError: error.message
    });
  }
}

5. API Design

Versioning Strategy

// URL-based versioning
/app/v1/users
/app/v2/users

// Header-based versioning
GET /users
Accept-Version: v1

// GraphQL
POST /graphql
{ "query": "{ users { id name } }" }

Rate Limiting

class RateLimiter {
  private redis: Redis;

  async isAllowed(
    userId: string,
    limit: number,
    window: number
  ): Promise<boolean> {
    const key = `rate:${userId}`;
    const current = await this.redis.incr(key);

    if (current === 1) {
      await this.redis.expire(key, window);
    }

    return current <= limit;
  }

  async handleRequest(req: Request): Promise<Response> {
    const allowed = await this.isAllowed(
      req.userId,
      limit = 1000,
      window = 60 // 1 minute
    );

    if (!allowed) {
      return new Response('Rate limit exceeded', {
        status: 429,
        headers: {
          'Retry-After': '60'
        }
      });
    }

    return this.handler.handle(req);
  }
}

6. Real-World Architecture Example

E-commerce Platform

┌──────────────────────────────────────────────────────────────┐
│                        CDN (CloudFlare)                     │
└────────────────────────┬─────────────────────────────────────┘

┌────────────────────────▼─────────────────────────────────────┐
│              Load Balancer (ALB/Nginx)                      │
│            Health checks, SSL termination                     │
└────────────────────────┬─────────────────────────────────────┘

    ┌────────────────────┼────────────────────┐
    │                    │                    │
    ▼                    ▼                    ▼
┌─────────┐        ┌─────────┐         ┌─────────┐
│  Web    │        │  API    │         │  Admin  │
│  Tier   │        │  Tier   │         │  Tier   │
└────┬────┘        └────┬────┘         └────┬────┘
     │                    │                    │
     └────────────────────┼────────────────────┘

        ┌─────────────────┼─────────────────┐
        │                 │                 │
        ▼                 ▼                 ▼
  ┌───────────┐    ┌───────────┐     ┌───────────┐
  │  Redis    │    │   Kafka   │     │   search  │
  │  Cache    │    │  Events   │     │  (ES)     │
  └───────────┘    └─────┬─────┘     └───────────┘

        ┌────────────────┼────────────────┐
        │                │                │
        ▼                ▼                ▼
  ┌───────────┐   ┌───────────┐   ┌───────────┐
  │  Primary  │   │ Replicas  │   │ Analytics │
  │  Postgres │   │  (Read)   │   │  (Click)  │
  └───────────┘   └───────────┘   └───────────┘

Conclusion

System design isn’t about memorizing solutions—it’s about understanding trade-offs:

  • Consistency vs Availability: CAP theorem
  • Latency vs Throughput: Bulk vs real-time
  • Complexity vs Reliability: More components = more failure points

Practice by designing systems for familiar products. Start simple, iterate, and always measure in production.


Next: Deep dive into specific technologies and their trade-offs.

评论