Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DineHub Documentation

Welcome to DineHub — a resilient, multi-region cloud restaurant ordering system designed for scale.

What is DineHub?

DineHub is a distributed restaurant ordering platform that connects customers with restaurants across multiple geographic regions. It’s designed from the ground up for high availability, security, and horizontal scalability.

Why This Architecture?

Modern cloud applications face three fundamental challenges:

ChallengeTraditional ApproachOur Approach
AvailabilitySingle points of failureMulti-region with automatic failover
SecurityPerimeter-based firewallsZero-trust mesh with encryption everywhere
ScalabilityVertical scaling (bigger servers)Horizontal scaling (more servers)

DineHub demonstrates how to build a production-ready system that addresses these challenges through deliberate architectural decisions.

System Overview

At its core, DineHub consists of three layers:

┌─────────────────────────────────────────────────────────────┐
│                      USER INTERFACE                          │
│                   React + Bun + Tailwind                     │
│         Fast, type-safe, with real-time updates              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    SERVICE LAYER                             │
│             Spring Boot + GraalVM Native Image               │
│      Stateless, horizontally scalable, sub-second startup  │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    DATA LAYER                                │
│         Citus (Distributed PostgreSQL)                     │
│    Data sharded across regions, automatic query routing     │
└─────────────────────────────────────────────────────────────┘

Key Features

For Customers

  • Browse restaurants across multiple regions
  • Place orders with real-time status tracking
  • Secure authentication via JWT or Google OAuth
  • Responsive design that works on mobile and desktop

For Restaurant Owners

  • Manage restaurant listings and menus
  • View and process incoming orders
  • Track order lifecycle from pending to delivered
  • Role-based access control for staff

For Operators

  • Deploy to multiple regions with single commands
  • Monitor system health via built-in observability
  • Scale horizontally by adding nodes
  • Zero-downtime deployments with automatic rollback

Architecture Highlights

Multi-Region Deployment

Unlike traditional applications deployed to a single data center, DineHub runs across multiple GCP regions, for example:

  • US East (Virginia) — Primary region for North America
  • EU West (Ireland) — Primary region for Europe
  • Additional regions can be added as needed

Each region contains a complete stack: ingress, backend, and database workers. If one region fails, traffic automatically routes to healthy regions.

Zero-Trust Networking

We don’t trust the network—even our own. All internal communication happens over encrypted tunnels:

  • Tailscale mesh: WireGuard-encrypted connections between all nodes
  • Headscale: Self-hosted coordination (no dependency on Tailscale SaaS)
  • No public IPs: Only the ingress node is exposed to the internet
  • Mutual authentication: Every connection is authenticated at both ends

Distributed Database

Traditional databases become bottlenecks under load. We use Citus to distribute PostgreSQL horizontally:

  • Coordinator node: Routes queries to appropriate workers
  • Worker nodes: Store data shards distributed by restaurant_id
  • Automatic sharding: Data automatically distributed as restaurants grow
  • Query parallelization: Complex queries execute across multiple workers

Immutable Infrastructure

We treat infrastructure as code—literally. Our Nix configuration:

  • Version controlled: All changes tracked in Git
  • Reproducible: Same configuration always produces same system
  • Atomic: Deployments succeed or roll back completely
  • Testable: Infrastructure tested in VMs before production

Technology Choices

Frontend: Bun + React + Tailwind

  • Bun: Fast all-in-one JavaScript runtime (10x faster than Node for bundling)
  • React 19: Concurrent rendering and automatic batching
  • Tailwind v4: PostCSS-free, CSS-first styling with zero runtime
  • TanStack Query: Automatic caching and background refetching

Why not Node? Bun provides a unified toolchain without webpack configuration hell.

Backend: Spring Boot + GraalVM

  • Spring Boot 4: Mature ecosystem with production-ready defaults
  • GraalVM Native Image: Compiles to native binary for fast startup and low memory
  • PostgreSQL + Citus: Proven relational database with horizontal scaling
  • JWT Authentication: Stateless tokens for horizontal scalability

Why native compilation? Cold starts matter when auto-scaling. A native binary starts in milliseconds, not seconds.

Infrastructure: Nix + NixOS

  • Nix Flakes: Reproducible builds with locked dependencies
  • NixOS: Declarative Linux distribution configured entirely via Nix
  • deploy-rs: Atomic deployments with automatic rollback
  • Tailscale: Self-hosted mesh networking without VPN complexity

Why Nix? Traditional configuration management drifts over time. Nix guarantees that what we build today can be rebuilt identically in five years.

API Design: OpenAPI + Schemathesis

  • OpenAPI Specification: Single source of truth for API contracts in specs/openapi.yaml
  • Schemathesis: Property-based testing that validates implementation matches specification
  • Redocly: Documentation generation and spec linting
  • Contract Testing: API consumers can rely on documented behavior being accurate

This specification-first approach means the API documentation is never out of date—it’s automatically validated against the implementation on every build.

Getting Started

Prerequisites

You’ll need Nix installed (the Determinate Systems installer is recommended):

curl -fsSL https://install.determinate.systems/nix | sh -s -- install --determinate

Quick Start

  1. Enter the development environment (installs all tools automatically):

    nix develop
    
  2. Start the local development stack (backend + frontend + database):

    nix run .#compose
    
  3. View the documentation (what you’re reading now):

    nix run .#docs.serve
    
  4. Run the full test suite:

    nix flake check -L
    

Project Structure

├── frontend/          # Bun + React SPA
├── backend/           # Spring Boot service
├── nix/               # Nix configuration
├── docs/              # This documentation
├── flake.nix          # Nix entry point
└── README.md          # Quick reference

Documentation Guide

This documentation is organized into sections:

System Architecture

Component Guides

  • Frontend — UI layer design and React patterns
  • Backend — Service layer architecture and domain model
  • Nix Build System — Reproducible builds and declarative infrastructure

API Reference

Design Principles

Throughout this system, we follow these principles:

  1. Type Safety First: TypeScript and Java with strict compilation catch errors at build time
  2. Security by Default: Encryption everywhere, least-privilege access, no secrets in code
  3. Horizontal Scalability: Design for adding nodes, not bigger nodes
  4. Reproducibility: Builds and deployments must be repeatable and version-controlled
  5. Observability: Every component exposes metrics and health checks
  6. Developer Experience: Complex infrastructure, simple development workflow

Contributing

This is a university software engineering project. To contribute:

  1. Enter the dev shell: nix develop
  2. Create a branch for your changes
  3. Run tests before committing: nix flake check -L
  4. Format code: nix fmt
  5. Submit merge request with clear description

Resources


DineHub was built by Trinity College Dublin Software Engineering Group 26 as a software engineering project demonstrating modern cloud architecture patterns.

Infrastructure

Designing for resilience, security, and scale.

This section documents how we deploy and operate the DineHub restaurant ordering system across multiple cloud regions.


At a Glance

AspectOur Solution
ComputeAWS EC2 with NixOS
NetworkingTailscale mesh (self-hosted via Headscale)
DatabaseCitus (distributed PostgreSQL)
Ingressnginx reverse proxy
DeploymentNixOS modules + deploy-rs

Documentation

  • System Architecture — Complete system design with diagrams and component overview
  • Deployment Process — How we deploy, rollback, and manage infrastructure changes
  • Networking — Zero-trust mesh networking with Tailscale
  • Database — Distributed PostgreSQL with Citus for horizontal scaling
  • Security — Defense in depth across all layers

System Architecture

DineHub - A resilient multi-region cloud restaurant ordering system

This document explains how our system is designed to be highly available, secure, and scalable across multiple cloud regions.


Overview

Our architecture follows three core principles:

PrincipleWhat it means
High AvailabilityThe system stays online even if a server or entire region fails
Security by DefaultAll internal traffic is encrypted; only web ports are public
Horizontal ScalabilityWe can add more servers to handle increased load

System Diagram

                         ┌──────────────────┐
                         │    Internet      │
                         │   (customers)    │
                         └────────┬─────────┘
                                  │
                            ports 80/443
                                  │
                    ╔═════════════▼═════════════╗
                    ║      INGRESS NODE         ║
                    ║  ┌─────────────────────┐  ║
                    ║  │   nginx (reverse    │  ║
                    ║  │   proxy + TLS)      │  ║
                    ║  └─────────────────────┘  ║
                    ╚═════════════╤═════════════╝
                                  │
    ══════════════════════════════╪══════════════════════════════
              Tailscale Mesh Network (encrypted, private)
    ══════════════════════════════╪══════════════════════════════
                                  │
           ┌──────────────────────┼──────────────────────┐
           │                      │                      │
    ╔══════▼══════╗        ╔══════▼══════╗        ╔══════▼══════╗
    ║   BACKEND   ║        ║   BACKEND   ║        ║  HEADSCALE  ║
    ║  Region A   ║        ║  Region B   ║        ║  (control)  ║
    ╚══════╤══════╝        ╚══════╤══════╝        ╚═════════════╝
           │                      │
           └──────────┬───────────┘
                      │
               ╔══════▼══════╗
               ║    CITUS    ║
               ║ COORDINATOR ║
               ╚══════╤══════╝
                      │
        ┌─────────────┼─────────────┐
        │             │             │
    ╔═══▼═══╗     ╔═══▼═══╗     ╔═══▼═══╗
    ║WORKER ║     ║WORKER ║     ║WORKER ║
    ║   1   ║     ║   2   ║     ║   3   ║
    ╚═══════╝     ╚═══════╝     ╚═══════╝

Components

1. Ingress Layer (nginx)

The ingress node is the only part of our system exposed to the internet.

Responsibilities:

  • Terminates TLS/HTTPS connections
  • Load balances requests across backend servers
  • Rate limits to prevent abuse
Internet → nginx (:443) → Tailscale → Backend servers

2. Secure Networking (Tailscale + Headscale)

We use Headscale, a self-hosted version of Tailscale, to create a private mesh network.

Why this approach?

Traditional ApproachOur Approach
Complex firewall rulesSimple: block everything except Tailscale
VPN tunnels between regionsAutomatic mesh between all nodes
Public IPs on every serverOnly ingress has public exposure
Manual certificate managementWireGuard encryption built-in

How it works:

  • Every server runs the Tailscale client
  • Headscale (our control server) authenticates nodes
  • Nodes communicate via private 100.x.x.x addresses
  • All traffic is encrypted with WireGuard

3. Backend Servers

Stateless application servers that handle the business logic.

Key properties:

  • Can be deployed in multiple regions for lower latency
  • Horizontally scalable (add more when needed)
  • Connect to database over the secure mesh

4. Distributed Database (Citus + PostgreSQL)

Citus extends PostgreSQL to distribute data across multiple servers.

                    ┌─────────────────────┐
                    │    Coordinator      │
                    │  (receives queries) │
                    └──────────┬──────────┘
                               │
            ┌──────────────────┼──────────────────┐
            │                  │                  │
      ┌─────▼─────┐      ┌─────▼─────┐      ┌─────▼─────┐
      │  Worker 1 │      │  Worker 2 │      │  Worker 3 │
      │ orders    │      │ orders    │      │ orders    │
      │ 1-1000    │      │ 1001-2000 │      │ 2001-3000 │
      └───────────┘      └───────────┘      └───────────┘

How Citus distributes data:

  1. Tables are “sharded” by a key (e.g., restaurant_id)
  2. Each worker holds a portion of the data
  3. Queries are routed to the relevant workers
  4. Results are combined and returned

Example: When a customer orders from Restaurant #42:

  • Coordinator receives the query
  • Routes it to the worker holding Restaurant #42’s data
  • Worker processes and returns the result

Request Flow

Here’s what happens when a customer places an order:

┌────────────┐     ┌─────────────┐     ┌─────────────┐     ┌───────────────┐
│  Customer  │────▶│   nginx     │────▶│   Backend   │────▶│    Citus      │
│  (browser) │     │  (ingress)  │     │  (Region A) │     │  (database)   │
└────────────┘     └─────────────┘     └─────────────┘     └───────────────┘
      │                   │                   │                    │
      │   HTTPS :443      │    Tailscale      │     Tailscale      │
      │   (encrypted)     │    (encrypted)    │     (encrypted)    │
  1. Customer’s browser connects to nginx over HTTPS
  2. nginx forwards request to a backend over Tailscale
  3. Backend queries Citus coordinator over Tailscale
  4. Coordinator fetches data from workers
  5. Response flows back through the same path

Cloud Infrastructure

We deploy on AWS EC2 instances running NixOS.

Instance Sizes

RoleInstanceWhy
Headscalet3.microLow resource needs, just coordination
Ingresst3.smallHandles TLS termination
Backendt3.mediumApplication processing
Citus Coordinatort3.mediumQuery routing
Citus Workerst3.mediumData storage and queries

Security Groups

Because of Tailscale, our firewall rules are minimal:

Ingress node:

Inbound:  80 (HTTP), 443 (HTTPS), 41641/UDP (Tailscale)
Outbound: All (for Tailscale)

All other nodes:

Inbound:  41641/UDP (Tailscale only)
Outbound: All (for Tailscale)

No database ports, no backend ports exposed to the internet.


Deployment

All infrastructure is defined as NixOS modules in our repository.

ModulePurpose
backend-service.nixBackend application service
postgres-service.nixCitus distributed PostgreSQL

This means:

  • Infrastructure is version controlled
  • Deployments are reproducible
  • Configuration changes are atomic

Handling Failures

If a backend server fails:

  • nginx detects it via health checks
  • Traffic is routed to healthy backends
  • No customer impact

If a database worker fails:

  • Queries to that shard will fail temporarily
  • Worker can rejoin and resync
  • Other shards continue working

If an entire region fails:

  • Traffic shifts to the healthy region
  • May need to promote replica workers

Security Summary

Attack VectorMitigation
Network sniffingAll traffic encrypted (WireGuard)
Unauthorized server accessTailscale requires authentication
Database exposed to internetDatabase only accessible via mesh
DDoS on backendOnly nginx is public; rate limiting enabled

Future Work

  • Headscale NixOS module
  • Tailscale client module
  • nginx ingress module
  • Secrets management with sops-nix
  • Prometheus monitoring (stretch goal)
  • Automated database backups
  • deploy-rs for one-command deployments

Deployment Process

How we deploy DineHub across multiple regions with confidence

Philosophy

Our deployment process follows the principle of immutable infrastructure: once deployed, servers are never modified in place. Instead, we build new systems from scratch and atomically switch traffic to them. This eliminates “configuration drift” and makes deployments predictable and reversible.

The NixOS Approach

Traditional deployment processes often involve:

  • SSHing into servers to run commands
  • Patching files in place
  • Hoping the application restarts correctly
  • Manual rollback procedures when things go wrong

NixOS eliminates these risks through declarative configuration:

  1. Describe the desired state in Nix expressions
  2. Build the system locally or in CI
  3. Activate atomically — either the new system works completely, or the old system remains
  4. Rollback automatically if health checks fail

Deployment Pipeline

Stage 1: Build

Every deployment starts with building the new system configuration:

Developer Machine          CI/CD (Garnix)              Binary Cache
     │                           │                           │
     │── nix flake check ───────▶│                           │
     │                           │── build packages ────────▶│
     │                           │                           │── cache builds
     │                           │◀── success/failure ───────│
     │◀── build results ─────────│                           │

The build process:

  • Compiles the backend to a GraalVM native image
  • Bundles the frontend with Bun
  • Runs all tests (unit, integration, property-based)
  • Validates OpenAPI spec with Schemathesis property-based testing
  • Validates NixOS configurations
  • Caches successful builds for reuse

OpenAPI Validation

As part of the build pipeline, we validate that our implementation matches the OpenAPI specification:

  • Specification-first: The OpenAPI spec in specs/openapi.yaml defines the API contract
  • Auto-generation: Spring Boot controllers generate OpenAPI documentation from code
  • Schemathesis testing: Property-based testing verifies implementation matches spec
  • Linting: Redocly validates the spec for correctness and consistency

This ensures API consumers can rely on the documented behavior.

Stage 2: Test

Before deploying to production, we validate in isolated environments:

  • VM Tests: Full system integration tests in NixOS VMs
  • Staging Environment: Identical to production but with synthetic data
  • Health Checks: Automated probes verify endpoints respond correctly

Stage 3: Deploy

Deployments use deploy-rs, which provides atomic activation:

┌─────────────────────────────────────────────────────────────┐
│                    Deployment Flow                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Build system closure locally                            │
│     └─ All packages + dependencies computed                 │
│                                                             │
│  2. Upload to target node                                   │
│     └─ Nix copy-closure sends only missing packages         │
│                                                             │
│  3. Activate new configuration                              │
│     └─ System switches to new generation                    │
│                                                             │
│  4. Run activation hook                                     │
│     └─ Services restart with new configuration              │
│                                                             │
│  5. Verify health checks                                    │
│     └─ Confirm services respond correctly                   │
│                                                             │
│  6. On failure: automatic rollback                          │
│     └─ Previous generation restored                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Rolling Deployments

When deploying to multiple backend servers, we use a rolling deployment strategy:

  1. Take one server out of the load balancer
  2. Deploy new version to that server
  3. Verify health checks pass
  4. Return server to load balancer
  5. Repeat for remaining servers

This ensures:

  • Zero downtime: At least some servers always available
  • Gradual rollout: Issues caught before affecting all traffic
  • Easy rollback: Can revert individual servers if problems arise

Configuration Management

Secrets Handling

Sensitive configuration (database passwords, JWT keys) is managed separately from code:

  • Encrypted at rest: Secrets stored encrypted in the repository using agenix
  • Decrypted at deploy: Only the target machine can decrypt its secrets
  • Never in Nix store: Unencrypted secrets never touch the world-readable Nix store
  • Access controlled: Each secret specifies which users/services can read it

Environment-Specific Configuration

Different environments (dev, staging, production) have different needs:

  • Development: Local database, debug logging, hot reloading
  • Staging: Production-like but isolated, synthetic data
  • Production: Multiple regions, real data, optimized settings

These differences are captured in Nix expressions rather than environment variables scattered across systems.

Disaster Recovery

Backup Strategy

The distributed database provides natural redundancy:

  • Citus workers: Store shards across multiple nodes
  • Cross-region replicas: Critical data replicated to other regions
  • Point-in-time recovery: PostgreSQL WAL archiving enables restoration to any moment

Recovery Procedures

If a region fails completely:

  1. Traffic rerouting: DNS or ingress configuration points to healthy regions
  2. Database promotion: Replica in healthy region promoted to primary
  3. Re-provisioning: Failed region rebuilt from Nix configuration
  4. Data reconciliation: When failed region recovers, data synchronized

Monitoring Deployments

Deployment Metrics

We track deployment health through:

  • Success rate: Percentage of deployments that activate without rollback
  • Time to deploy: Duration from build start to activation complete
  • Error rates: API errors, 5xx responses, failed health checks
  • Resource usage: Memory, CPU, disk during and after deployment

Observability Integration

Deployments integrate with the monitoring stack:

  • Prometheus: Metrics scraped before/after deployment
  • Loki: Log aggregation to detect errors
  • Grafana: Dashboards showing deployment impact
  • Alerts: Automatic notifications for failed deployments

Continuous Deployment

Automated Pipeline

Changes flow automatically from commit to production:

Git Commit → CI Build → Tests Pass → Staging Deploy → Prod Deploy
                │           │              │             │
                ▼           ▼              ▼             ▼
            Build      Integration    Smoke Tests   Rolling
            Packages   Tests          Validation    Rollout

Safety Mechanisms

Automation includes safety checks:

  • Required checks: Build must pass before deployment
  • Manual gates: Production deployments may require approval
  • Canary analysis: New version serves small percentage of traffic first
  • Automatic rollback: Failed health checks trigger immediate rollback

Development vs Production

Key Differences

AspectDevelopmentProduction
Process managementprocess-composesystemd
DatabaseLocal PostgreSQLCitus distributed cluster
NetworkinglocalhostTailscale mesh
SecretsPlain text filesagenix encrypted
UpdatesHot reloadingAtomic deployment
MonitoringConsole logsPrometheus/Grafana

Despite these differences, the same Nix expressions describe both environments. The differences are parameterized rather than being separate code paths.

Troubleshooting Deployments

Common Issues

  • Build failures: Missing dependencies, compilation errors
  • Health check failures: Services start but don’t respond correctly
  • Configuration errors: Secrets or environment variables missing
  • Network issues: Tailscale connectivity problems between nodes

Debug Commands

When deployments fail:

  • Check service status: systemctl status backend
  • View logs: journalctl -u backend -f
  • Test health endpoints: curl localhost:8080/actuator/health
  • Verify Tailscale: tailscale status
  • Rollback if needed: nixos-rebuild switch --rollback

Future Improvements

  • Blue/Green deployments: Instant cutover with ability to rollback
  • Feature flags: Deploy code disabled, enable gradually
  • Chaos engineering: Intentionally break things to test resilience
  • Automated capacity scaling: Add/remove nodes based on load

Networking Architecture

How DineHub nodes communicate securely across regions

Philosophy

Traditional network security relies on perimeter-based firewalls: block everything from the outside, trust everything on the inside. This model breaks down in cloud environments where:

  • Services span multiple regions and cloud providers
  • Containers and VMs come and go dynamically
  • Internal traffic must still be protected

DineHub adopts zero-trust networking: encrypt everything, authenticate every connection, verify every request—regardless of whether it’s “internal” or “external.”

The Tailscale Mesh

What is Tailscale?

Tailscale is a mesh VPN built on WireGuard, a modern, high-performance VPN protocol. Unlike traditional VPNs that tunnel all traffic through a central gateway, Tailscale creates direct, encrypted connections between every pair of nodes.

Why Self-Hosted?

We use Headscale, an open-source implementation of the Tailscale control server:

  • No vendor dependency: We control the coordination server
  • Private infrastructure: No data flows through Tailscale’s SaaS
  • Custom policies: Define our own access rules and ACLs
  • Cost: No per-user licensing fees

Mesh Topology

                    ┌─────────────────────┐
                    │     Internet        │
                    └──────────┬──────────┘
                               │ HTTPS
                               ▼
                    ┌─────────────────────┐
                    │   Ingress Node      │
                    │   (nginx :443)      │
                    └──────────┬──────────┘
                               │
              ╔════════════════╪═════════════════╗
              ║   Tailscale Mesh Network (100.x) ║
              ║   All traffic encrypted via      ║
              ║   WireGuard                      ║
              ╚════════════════╪═════════════════╝
                               │
        ┌──────────────────────┼──────────────────────┐
        │                      │                      │
        ▼                      ▼                      ▼
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│  Backend-US   │◀────▶│  Backend-EU   │◀────▶│   Headscale   │
│               │      │               │      │   Control     │
│ • Port 8080   │      │ • Port 8080   │      │ • Port 443    │
│ • No public IP│      │ • No public IP│      │• No public IP │
└───────┬───────┘      └───────┬───────┘      └───────────────┘
        │                      │
        └──────────────────────┼──────────────────────┐
                               │                      │
                               ▼                      ▼
                    ┌─────────────────┐    ┌─────────────────┐
                    │DB Coordinator   │    │  DB Worker      │
                    │• Port 5432      │    │  • Port 5432    │
                    │• No public IP   │    │  • No public IP │
                    └─────────────────┘    └─────────────────┘

Network Segmentation

Security Zones

We organize infrastructure into security zones based on exposure:

Public Zone (Ingress only)

  • Exposed to internet on ports 80/443
  • nginx reverse proxy terminates TLS
  • All traffic forwarded to private zone via Tailscale

Private Zone (Application layer)

  • Backend servers in multiple regions
  • Only accessible via Tailscale (100.x.x.x addresses)
  • No public IPs, no inbound firewall rules

Data Zone (Database layer)

  • Citus coordinator and workers
  • Same Tailscale-only access as private zone
  • Additional PostgreSQL authentication

Control Plane (Headscale)

  • Manages Tailscale authentication
  • No user-facing services
  • Minimal attack surface

Communication Patterns

Request Flow

When a customer places an order:

  1. Browser → Ingress: HTTPS over public internet
  2. Ingress → Backend: HTTP over Tailscale (encrypted by WireGuard)
  3. Backend → Database: PostgreSQL protocol over Tailscale
  4. Coordinator → Workers: Internal Citus protocol over Tailscale

Every hop is authenticated and encrypted—even traffic between nodes in the same data center.

Inter-Region Communication

When a US-based backend queries a database in EU:

  1. Backend sends query to Citus coordinator (via Tailscale)
  2. Coordinator routes to appropriate worker (may be in EU)
  3. Worker processes query, returns results
  4. Coordinator aggregates and returns to backend

Tailscale automatically establishes the most direct path, potentially bypassing the public internet entirely if nodes are in the same cloud provider’s backbone.

Service Discovery

DNS Resolution

Tailscale provides MagicDNS, automatically assigning DNS names to nodes:

  • backend-us.internal → 100.64.0.1
  • db-coordinator.internal → 100.64.0.2
  • db-worker-1.internal → 100.64.0.3

Services reference each other by stable DNS names rather than IP addresses, simplifying configuration changes.

Health-Based Routing

nginx upstream configuration dynamically adjusts based on backend health:

  • Health checks verify backends respond correctly
  • Failed backends automatically removed from rotation
  • New backends automatically added when healthy
  • Geographic affinity: prefer local region when possible

Access Control

Tailscale ACLs

Access control lists define who can talk to whom:

Groups:
- ingress-nodes: ingress-01, ingress-02
- backend-nodes: backend-us, backend-eu
- database-nodes: db-coord, db-worker-1, db-worker-2

Rules:
- ingress-nodes → backend-nodes: allowed
- backend-nodes → database-nodes: allowed
- database-nodes → backend-nodes: denied
- public-internet → anything: denied (except ingress :443)

This “default deny” approach means new nodes can’t communicate until explicitly permitted.

Authentication

Tailscale uses cryptographic identity:

  • Node authentication: Each node has a unique private key
  • User authentication: Nodes associated with user identity
  • Multi-factor auth: Headscale can require MFA for node enrollment
  • Certificate rotation: Keys automatically rotated

Performance Considerations

Latency

Tailscale adds minimal overhead:

  • WireGuard encryption: ~1-2ms latency increase
  • Direct connections: No central hub to traverse
  • Protocol optimization: UDP-based, handles NAT traversal

For cross-region traffic, geographic latency dominates—Tailscale doesn’t add meaningful overhead.

Bandwidth

WireGuard is efficient:

  • Small overhead: ~60 bytes per packet (vs. 150+ for IPSec)
  • Modern crypto: ChaCha20-Poly1305 optimized for mobile/embedded
  • No head-of-line blocking: UDP transport

Typical throughput exceeds 1 Gbps between cloud instances.

Reliability

The mesh topology provides natural redundancy:

  • No single point of failure: If Headscale is down, existing connections continue
  • Automatic reconnection: Nodes reconnect if paths change
  • Path optimization: Routes around failed intermediate hops

Firewall Configuration

Minimal Rules

Because Tailscale handles authentication and encryption, firewall rules are simple:

Ingress Node:

  • Inbound: 80/tcp, 443/tcp, 41641/udp (Tailscale)
  • Outbound: All (for Tailscale mesh)

All Other Nodes:

  • Inbound: 41641/udp (Tailscale only)
  • Outbound: All (for Tailscale mesh)

No rules for application ports (8080, 5432)—Tailscale provides the connectivity.

Why This Works

Traditional firewall rules would require:

  • Opening port 5432 between specific IP ranges
  • Managing security groups per region
  • Updating rules when topology changes

With Tailscale:

  • Single UDP port for all connectivity
  • Identity-based rather than IP-based rules
  • Automatic updates as nodes join/leave

Troubleshooting

Common Issues

  • Nodes not connecting: Check if enrolled in Tailscale network
  • DNS not resolving: Verify MagicDNS enabled
  • High latency: Check if direct connection established (relayed traffic is slower)
  • Certificate errors: Node may need re-authentication

Diagnostic Commands

# Check Tailscale status
tailscale status

# Test connectivity to another node
tailscale ping backend-us

# View network map
tailscale netcheck

# Debug connection issues
tailscale bug-report

Future Enhancements

  • IPv6 support: Native IPv6 addressing within mesh
  • Subnet routers: Extend Tailscale to legacy infrastructure
  • Access request workflows: Temporary access grants
  • Audit logging: Comprehensive connection logs
  • Network policies: Kubernetes-style micro-segmentation

Database Architecture

Distributed data storage with Citus and PostgreSQL

Philosophy

Traditional monolithic databases eventually hit scalability limits—either they run out of storage or can’t handle concurrent query volume. Scaling vertically (bigger servers) has practical limits and creates single points of failure.

DineHub adopts horizontal database scaling: distribute data across multiple servers, with each server handling a subset of the data. This provides both capacity and performance scaling.

Why Citus?

The Problem with Single Databases

As data grows, a single PostgreSQL server faces challenges:

  • Storage limits: Hardware can only hold so much data
  • Query performance: Large tables become slow to scan
  • Concurrent load: Limited CPU/memory for parallel queries
  • Availability: Single server failure means downtime

Citus Solution

Citus extends PostgreSQL to distribute tables across multiple servers:

  • Horizontal scaling: Add servers as data grows
  • Query parallelization: Complex queries execute across workers
  • High availability: Replicas provide fault tolerance
  • PostgreSQL compatible: Standard SQL, tools, and drivers work

Architecture

The Coordinator

The coordinator is the entry point for all database queries:

  • Receives queries: Applications connect here like normal PostgreSQL
  • Plans execution: Determines which workers hold relevant data
  • Routes requests: Sends sub-queries to appropriate workers
  • Aggregates results: Combines worker responses into final result

From the application perspective, the coordinator looks like a standard PostgreSQL server.

The Workers

Workers store actual data and execute queries:

  • Hold shards: Each worker contains portions of distributed tables
  • Process queries: Execute SQL against local data
  • Return results: Send partial results back to coordinator

Workers are standard PostgreSQL servers with Citus extension installed.

Data Distribution

                    ┌─────────────────────┐
                    │     Coordinator     │
                    │  (Query planner &   │
                    │   result aggregator)│
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │    Worker 1     │ │    Worker 2     │ │    Worker 3     │
    │                 │ │                 │ │                 │
    │  Restaurants    │ │  Restaurants    │ │  Restaurants    │
    │  ID: 1-1000     │ │  ID: 1001-2000  │ │  ID: 2001-3000  │
    │                 │ │                 │ │                 │
    │  Orders         │ │  Orders         │ │  Orders         │
    │  (same shard)   │ │  (same shard)   │ │  (same shard)   │
    └─────────────────┘ └─────────────────┘ └─────────────────┘

Sharding Strategy

Distribution Column

Tables are distributed by a distribution column:

  • Restaurants: Distributed by restaurant_id
  • Orders: Also distributed by restaurant_id (co-located with restaurant)
  • Users: Distributed by user_id

This “co-location” means a restaurant’s orders reside on the same worker as the restaurant itself, making join queries efficient.

Shard Assignment

Citus uses consistent hashing to assign shards to workers:

  • Hash of distribution column determines shard
  • Each shard assigned to one primary worker
  • Replicas may exist on other workers for availability

When to Distribute

Not all tables should be distributed:

Distribute (large tables):

  • Restaurants (millions of rows expected)
  • Orders (billions of rows expected)
  • Order items (billions of rows expected)

Reference tables (replicated to all workers):

  • Cuisine types (small, lookup data)
  • Configuration (rarely changes)

Reference tables are replicated to every worker, making joins fast but updates expensive.

Query Execution

Simple Queries

Single-row lookups by distribution column are fast:

SELECT * FROM orders WHERE restaurant_id = 42;

Coordinator hashes 42, determines which worker holds the shard, and routes directly to that worker.

Complex Queries

Aggregations and joins may involve multiple workers:

SELECT region, COUNT(*) FROM restaurants GROUP BY region;

Execution:

  1. Coordinator sends query to all workers
  2. Each worker counts local restaurants
  3. Workers return partial counts
  4. Coordinator sums results and returns final count

This parallel execution provides near-linear speedup with added workers.

Cross-Shard Joins

Joins between distributed tables require care:

Efficient (co-located join):

SELECT * FROM restaurants r
JOIN orders o ON r.id = o.restaurant_id
WHERE r.id = 42;

Both tables share distribution column, so data is on same worker.

Less efficient (repartition join):

SELECT * FROM orders o
JOIN users u ON o.customer_id = u.id;

Different distribution columns require data movement between workers.

High Availability

Replication Strategy

Each shard has replicas on different workers:

  • Primary: Handles reads and writes
  • Standby: Receives streaming replication, takes over if primary fails
  • Cross-region: Replicas in other regions for disaster recovery

Failover Process

If a worker fails:

  1. Detection: Health checks notice unresponsive worker
  2. Promotion: Standby replica promoted to primary
  3. Reconfiguration: Coordinator routes queries to new primary
  4. Recovery: Failed worker repaired, rejoins as replica

This process is automatic—applications don’t need to change connection strings.

Split-Brain Prevention

Citus uses consensus mechanisms to prevent split-brain scenarios:

  • Only one primary per shard at a time
  • Writes blocked until consensus achieved
  • Clients may see brief unavailability during failover

Performance Optimization

Query Planning

The coordinator analyzes queries to optimize distribution:

  • Pushdown: Move filters and aggregations to workers
  • Pruning: Skip workers that can’t have relevant data
  • Parallelization: Split work across multiple workers

Index Strategy

Index recommendations change with distribution:

  • Distribution column: Always indexed (used for routing)
  • Join columns: Index if frequently joined
  • Filter columns: Index if selective filters common
  • Coordinator: May need indexes for final aggregation

Monitoring

Key metrics for distributed databases:

  • Shard imbalance: Are workers evenly loaded?
  • Query latency: Coordinator vs worker time breakdown
  • Replication lag: Standby replicas behind primary?
  • Connection pooling: Managing thousands of connections

Operational Considerations

Adding Workers

Scale out by adding workers:

  1. Provision new worker nodes
  2. Run citus_add_node() to add to cluster
  3. Existing data doesn’t automatically redistribute
  4. New data uses new workers
  5. Optional: Rebalance shards for even distribution

Schema Changes

Schema modifications propagate to all workers:

-- This runs on coordinator and all workers
ALTER TABLE restaurants ADD COLUMN rating FLOAT;

Citus handles distribution automatically—DDL just works.

Backup and Recovery

Backup strategies for distributed data:

  • Logical backups: pg_dump on coordinator captures distributed schema
  • Per-worker backups: Physical backups of each worker’s data
  • Point-in-time recovery: WAL archiving for granular recovery
  • Cross-region replicas: Live replicas for disaster recovery

Trade-offs

Benefits

  • Scalability: Add capacity by adding servers
  • Performance: Parallel query execution
  • Availability: Replicas provide fault tolerance
  • PostgreSQL compatible: Familiar SQL and tooling

Complexity

  • Query planning: Must consider distribution in query design
  • Operational overhead: More servers to monitor and maintain
  • Transaction limitations: Cross-shard transactions have overhead
  • Migration: Existing applications may need modification

When Not to Use

Citus may be overkill for:

  • Small datasets (< 100GB)
  • Simple workloads (mostly single-row lookups)
  • Strong consistency requirements across shards (use single PostgreSQL)

DineHub’s expected scale (thousands of restaurants, millions of orders) justifies the complexity.

Future Enhancements

  • Citus MX: Multi-coordinator for higher availability
  • Columnar storage: Analytics queries on compressed columnar data
  • Automatic rebalancing: Dynamic shard redistribution
  • Read replicas: Offload read traffic to standbys
  • Global indexes: Cross-shard indexes for unique constraints

Security Architecture

Defense in depth for a multi-region cloud application

Philosophy

Security is not a feature you add at the end—it’s a property that emerges from careful design at every layer. DineHub follows defense in depth: multiple independent security mechanisms, each sufficient on its own, so that if one fails, others still protect the system.

We assume attackers will eventually breach some defenses. The goal is to make that breach useless—to limit what they can access and detect the intrusion quickly.

Security Layers

┌──────────────────────────────────────────────────────────────┐
│ Layer 7: Application Security                                │
│ • Authentication (JWT)                                       │
│ • Authorization (RBAC)                                       │
│ • Input validation                                           │
│ • Output encoding                                            │
├──────────────────────────────────────────────────────────────┤
│ Layer 6: API Security                                        │
│ • HTTPS/TLS                                                  │
│ • Rate limiting                                              │
│ • CORS policies                                              │
│ • API versioning                                             │
├──────────────────────────────────────────────────────────────┤
│ Layer 5: Network Security                                    │
│ • Zero-trust mesh (Tailscale)                                │
│ • Firewall rules                                             │
│ • Private IPs only                                           │
│ • No lateral movement                                        │
├──────────────────────────────────────────────────────────────┤
│ Layer 4: Host Security                                       │
│ • Immutable infrastructure                                   │
│ • Minimal attack surface                                     │
│ • Automatic updates                                          │
│ • Read-only filesystems                                      │
├──────────────────────────────────────────────────────────────┤
│ Layer 3: Secrets Management                                  │
│ • Encryption at rest                                         │
│ • No secrets in code                                         │
│ • Rotation policies                                          │
│ • Audit logging                                              │
├──────────────────────────────────────────────────────────────┤
│ Layer 2: Access Control                                      │
│ • Least privilege                                            │
│ • Multi-factor authentication                                │
│ • Role-based permissions                                     │
│ • Session management                                         │
├──────────────────────────────────────────────────────────────┤
│ Layer 1: Physical Security                                   │
│ • Cloud provider guarantees                                  │
│ • Multi-region distribution                                  │
│ • Encrypted storage                                          │
└ ─────────────────────────────────────────────────────────────┘

Application Security

Authentication

We use JWT (JSON Web Tokens) for authentication:

  • Stateless: No server-side session storage
  • Self-contained: Token carries user identity
  • Expirable: Short-lived tokens (24 hours)
  • Revocable: Token blacklist for logout

JWTs are signed with a server-side secret. Attackers can’t forge tokens without the secret, and expired tokens are automatically rejected.

Authorization

Role-Based Access Control (RBAC) defines what users can do:

  • USER: Browse restaurants, place orders, manage own orders
  • RESTAURANT_OWNER: All USER permissions + manage owned restaurants + view restaurant orders
  • ADMIN: Full system access

Permissions are enforced at the API endpoint level. Even if a user knows an endpoint exists, they can’t access resources they don’t own.

Input Validation

All user input is treated as untrusted:

  • Type validation: DTOs with strict typing
  • Range checks: Numeric values within expected ranges
  • Length limits: Prevent buffer overflow attempts
  • Format validation: Emails, UUIDs match expected patterns
  • Sanitization: Remove or escape dangerous characters

Validation happens at the API boundary, before data reaches business logic.

Network Security

Zero-Trust Architecture

We don’t trust the network—even our internal network:

  • Encryption everywhere: All traffic encrypted via WireGuard (Tailscale)
  • Mutual authentication: Both sides verify each other’s identity
  • No implicit trust: Every connection requires explicit authorization
  • Micro-segmentation: Services can only talk to required dependencies

Network Segmentation

Infrastructure divided into security zones:

Public zone (ingress only):

  • Exposed to internet
  • Minimal attack surface (nginx only)
  • All traffic forwarded to private zone

Private zone (application servers):

  • No public IPs
  • Access only via Tailscale
  • Can only initiate connections to data zone

Data zone (databases):

  • No external access except from application servers
  • Additional PostgreSQL authentication
  • Encrypted storage volumes

This segmentation means compromising one zone doesn’t automatically grant access to others.

Firewall Strategy

Traditional firewalls rely on IP whitelisting. We use identity-based access control:

  • Single port: Tailscale uses one UDP port for all connectivity
  • No application ports exposed: Database and application ports not visible to network
  • Cryptographic identity: Nodes authenticated by certificates, not IP addresses

Secrets Management

Secret Lifecycle

Secrets follow a strict lifecycle:

  1. Generation: Cryptographically secure random generation
  2. Distribution: Encrypted transmission to target systems
  3. Storage: Encrypted at rest, never in version control
  4. Usage: Runtime injection, not baked into images
  5. Rotation: Regular rotation with overlap period
  6. Revocation: Immediate invalidation on compromise

Storage

Secrets are stored encrypted using agenix:

  • Encryption: Age encryption with recipient public keys
  • Repository: Encrypted files stored in Git
  • Decryption: Only target machines can decrypt (private keys on machines)
  • Access: Each secret specifies authorized users/services

This means:

  • Developers can see encrypted blobs but not plaintext
  • CI/CD can deploy encrypted secrets but not read them
  • Production machines decrypt their own secrets at runtime
  • Compromised Git repo doesn’t expose secrets

Secret Types

Different secrets have different handling:

  • Database credentials: Rotated monthly, stored per-environment
  • JWT signing keys: Rotated quarterly, symmetric for performance
  • API keys: Rotated on employee departure, tracked usage
  • TLS certificates: Auto-renewed via Let’s Encrypt

Host Security

Immutable Infrastructure

Servers are immutable—never modified after deployment:

  • No SSH access: Configuration via Nix, not manual commands
  • Read-only root: Root filesystem mounted read-only
  • Ephemeral storage: Local state treated as disposable
  • Reproducible builds: Same Nix expression always produces same system

If a server is compromised, we don’t try to clean it—we replace it.

Attack Surface Reduction

Minimal software installed on each server:

  • Single purpose: Each server runs one service
  • No shells: No bash, no sshd (except for debugging)
  • No compilers: No gcc, no development tools
  • Minimal services: Only required systemd units

Automatic Updates

Security patches apply automatically:

  • NixOS updates: System packages updated via Nix
  • Rolling updates: New generation activated atomically
  • Rollback: Automatic fallback if updates fail
  • Rebootless: Most updates don’t require restart

Secrets in Code

What Never Goes in Code

These must never be committed to version control:

  • Database passwords
  • API keys (Stripe, AWS, etc.)
  • JWT signing secrets
  • TLS private keys
  • Encryption keys
  • Credentials for external services

What Can Go in Code

These are safe to commit:

  • Public API endpoints
  • Non-sensitive configuration (timeouts, limits)
  • Default values that get overridden
  • Encryption of secrets (public keys)

Detection

Pre-commit hooks and CI checks scan for:

  • High-entropy strings (potential secrets)
  • Known secret patterns (AWS keys, JWTs)
  • Hardcoded passwords
  • Private keys

Incident Response

Detection

Security events generate logs:

  • Authentication failures: Failed login attempts
  • Authorization failures: Access denied errors
  • Anomalous patterns: Unusual traffic, query patterns
  • System calls: Auditd logs for privileged operations

Logs aggregate in Loki for analysis and alerting.

Response Playbook

If compromise suspected:

  1. Isolate: Remove compromised nodes from load balancer
  2. Preserve: Capture logs and memory dumps before termination
  3. Analyze: Determine scope of compromise
  4. Rotate: Revoke and regenerate all potentially exposed secrets
  5. Reimage: Replace compromised servers with fresh instances
  6. Monitor: Enhanced monitoring for recurrence

Recovery

Recovery is fast because infrastructure is code:

  • Reprovision: New servers from Nix configuration in minutes
  • Data restore: From encrypted backups
  • Secret rotation: Automated via deploy pipeline
  • Verification: Health checks confirm clean state

Compliance Considerations

While not formally certified, DineHub design supports:

Data Protection

  • Encryption at rest: Database volumes encrypted
  • Encryption in transit: TLS 1.3 for all external traffic
  • Access logging: Audit trails for data access
  • Right to deletion: Data can be purged per request

Security Standards

Architecture aligns with:

  • OWASP Top 10: Addressed through input validation, authentication, etc.
  • CIS Benchmarks: NixOS configuration follows hardening guidelines
  • NIST Cybersecurity Framework: Identify, protect, detect, respond, recover

Threat Model

Assumed Threats

We design against these threats:

  • External attackers: Attempting to breach perimeter
  • Insider threats: Malicious or compromised employees
  • Supply chain: Compromised dependencies or build tools
  • Cloud provider: Curious cloud administrators
  • Physical theft: Stolen laptops with production access

Not Addressed

Out of scope for this project:

  • Nation-state actors: Advanced persistent threats with unlimited resources
  • Social engineering: Phishing, pretexting (user training issue)
  • Denial of wallet: Resource exhaustion attacks (cloud billing limits)

Security Checklist

For new features, verify:

  • Input validation on all user-supplied data
  • Authentication required for sensitive operations
  • Authorization checks enforce ownership/roles
  • No secrets in code (use agenix)
  • Database queries parameterized (no SQL injection)
  • Output encoded (XSS prevention)
  • Rate limiting prevents abuse
  • Logging for security-relevant events
  • Tests include security scenarios

Future Enhancements

  • Web Application Firewall: Rule-based request filtering
  • Bug bounty program: External security researchers
  • Penetration testing: Annual third-party assessment
  • Security headers: CSP, HSTS, X-Frame-Options
  • Certificate pinning: Prevent MITM attacks
  • Behavioral analytics: ML-based anomaly detection

Frontend Architecture

Design philosophy and architectural patterns for the user interface layer

Philosophy

The frontend follows a modern React SPA architecture designed for developer productivity, type safety, and runtime performance. We prioritize declarative UI patterns, compile-time optimizations, and minimal runtime overhead.

Technology Choices

Why Bun?

We chose Bun over Node.js for three primary reasons:

  1. Unified toolchain: Bun replaces the npm/webpack/babel toolchain with a single, fast executable. This reduces configuration complexity and ensures all tools (bundler, test runner, package manager) work together seamlessly.

  2. Performance: Bun’s bundler is significantly faster than webpack or Vite for our use case, reducing development feedback loops.

  3. Built-in TypeScript: No additional compilation step required—TypeScript is first-class.

Why React 19?

React 19 brings several architectural improvements:

  • Concurrent rendering by default: Better perceived performance through prioritized updates
  • Automatic batching: Fewer re-renders without manual optimization
  • Server components: Foundation for future server-side rendering if needed
  • Actions: Simplified form handling and mutations

Why Tailwind CSS v4?

Tailwind v4 represents a significant architectural shift:

  • PostCSS-free: No build-time CSS processing pipeline, reducing build complexity
  • CSS-first configuration: Theme configuration lives in CSS rather than JavaScript
  • Zero-runtime: All styles are generated at build time
  • Predictable bundle size: Only used utilities are included

Application Structure

The frontend organizes code by responsibility rather than by file type:

API Layer

The API layer follows a repository pattern abstraction. Rather than making raw HTTP calls throughout components, we provide domain-specific API objects that encapsulate:

  • Endpoint paths and HTTP methods
  • Request/response type definitions
  • Error handling conventions
  • Authentication header injection

This pattern means components work with semantic methods like restaurantsApi.create(data) rather than raw fetch() calls, making the codebase more maintainable and easier to test.

State Management

We use a hybrid state approach:

  • Server state (data from API): Managed by TanStack Query, which handles caching, background refetching, and optimistic updates automatically
  • Client state (UI-only state): Managed by React’s built-in useState and Context API
  • Authentication state: Global context provider that persists to localStorage

This separation prevents the common anti-pattern of over-fetching or storing server data in global state where it can become stale.

Component Architecture

Components are organized into three tiers:

  1. Page components: Route-level components that compose domain-specific UI
  2. Feature components: Reusable components specific to a domain (e.g., RestaurantCard)
  3. UI primitives: Generic, unstyled components from shadcn/ui (Button, Card, Input)

This three-tier architecture ensures separation of concerns: pages handle routing and data fetching, feature components handle domain logic, and primitives handle accessibility and styling.

Routing Architecture

The routing layer implements route guards for authentication:

  • Public routes: Accessible to all users (landing page, login, signup)
  • Protected routes: Require valid JWT token (dashboard, order placement)
  • Guest-only routes: Redirect authenticated users away (login page when already logged in)

Route guards are implemented as wrapper components that check authentication state and redirect accordingly. This keeps authentication logic centralized and reusable.

Authentication Flow

The authentication system uses JWT tokens stored in localStorage with the following flow:

  1. User submits credentials via login form
  2. Backend validates and returns JWT + user metadata
  3. Token is stored in localStorage and Context state
  4. Subsequent API calls include token in Authorization header
  5. Protected routes check for token presence before rendering

The token has a 24-hour expiration. On app load, the Context provider checks for an existing token and restores the authentication state, providing a seamless user experience.

Build System Design

The build system is designed around Bun’s native bundler with a custom wrapper script that:

  • Discovers entry points automatically (HTML files in src/)
  • Applies Tailwind CSS transformation
  • Generates linked sourcemaps for debugging
  • Copies static assets to the output directory
  • Reports bundle sizes for optimization visibility

The key architectural decision here is convention over configuration: the build script automatically finds entry points rather than requiring a configuration file, making the build process easier to understand and modify.

Styling Philosophy

Our styling approach follows utility-first CSS with semantic theming:

Utility-First

Instead of writing custom CSS classes, we compose utility classes directly in the JSX. This approach:

  • Eliminates the need to name CSS classes
  • Makes styling changes explicit in version control
  • Prevents unused CSS from accumulating
  • Enables rapid prototyping

Dark Mode

Dark mode is implemented via CSS custom properties and Tailwind’s dark: variants. The theme toggle adds/removes a dark class on the document root, which triggers CSS custom property updates throughout the component tree.

Component Styling

UI primitives (from shadcn/ui) are built on Radix UI for accessibility and Tailwind for styling. They accept a className prop for composition, allowing parent components to override or extend styles without modifying the primitive.

Data Fetching Patterns

Data fetching follows a stale-while-revalidate pattern:

  1. Component mounts and requests data
  2. TanStack Query checks the cache first
  3. If cached data exists (even if stale), it’s shown immediately
  4. Background request fetches fresh data
  5. UI updates with fresh data when available

This pattern provides instant UI feedback while ensuring data freshness, eliminating loading spinners for cached data.

Animation Strategy

Animations are implemented with Framer Motion for:

  • Page transitions: Smooth fade/slide between routes
  • Micro-interactions: Button hover states, loading indicators
  • Layout animations: List reordering, expanding panels

We avoid CSS animations for complex sequences and JavaScript animations for simple hover states—Framer Motion provides the right abstraction for component-level animations while deferring to CSS for simple transitions.

Error Handling

Error handling follows a progressive enhancement model:

  1. API layer: Catches HTTP errors and throws typed Error objects
  2. TanStack Query: Catches errors and provides error state to components
  3. Components: Display error UI or retry controls
  4. Global boundary: Unhandled errors caught by error boundary showing fallback UI

This layered approach ensures errors are handled at the appropriate level of abstraction.

Development Experience

The frontend architecture prioritizes developer experience through:

  • Hot reloading: Bun’s dev server provides instant updates
  • Type safety: Full TypeScript coverage with strict mode
  • IDE integration: Tailwind IntelliSense provides autocomplete for utility classes
  • Consistent formatting: treefmt enforces formatting across the codebase

Backend Architecture

Design philosophy and architectural patterns for the service layer

Philosophy

The backend follows domain-driven design principles with a focus on type safety, explicit contracts, and defensive programming. We prioritize compile-time safety over runtime flexibility, using Java’s type system to prevent errors before they reach production.

Technology Choices

Why Spring Boot?

Spring Boot provides an opinionated framework that balances productivity with flexibility:

  1. Ecosystem maturity: Comprehensive libraries for security, data access, and testing
  2. Production-ready: Built-in metrics, health checks, and configuration management
  3. Community standards: Wide adoption means extensive documentation and tooling

Why GraalVM Native Image?

We compile the backend to a native executable rather than running on the JVM:

  • Startup time: Sub-second startup vs. JVM warmup time, critical for auto-scaling scenarios
  • Memory efficiency: Smaller memory footprint enables running on smaller instances
  • Self-contained: Single executable file with no runtime dependencies
  • Container-friendly: Smaller Docker images and faster cold starts

The trade-off is longer build times and some reflection limitations, which we mitigate with explicit configuration.

Why PostgreSQL with Citus?

Our database choice reflects the multi-region requirements:

  • PostgreSQL: Proven reliability, ACID compliance, and extensive feature set
  • Citus extension: Enables horizontal scaling by distributing data across multiple nodes
  • Compatibility: Standard PostgreSQL protocol works with all existing tools

Domain Architecture

The backend organizes code around business domains rather than technical layers:

Domain Structure

Each domain (User, Restaurant, Order) contains:

  • Entity: JPA-mapped data model
  • Repository: Data access abstraction
  • Controller: HTTP request handling
  • DTOs: Data transfer objects for API contracts

This structure keeps related code together, making it easier to understand domain boundaries and modify functionality.

Entity Relationships

The domain model centers around three main entities:

  1. User: Authentication and identity
  2. Restaurant: Business listings with ownership
  3. Order: Transactions linking users to restaurants

The relationships are:

  • User owns Restaurants (one-to-many)
  • User places Orders (one-to-many)
  • Restaurant receives Orders (one-to-many)
  • Order contains OrderItems (embedded collection)

Value Objects vs Entities

We distinguish between entities (have identity and lifecycle) and value objects (defined by attributes):

  • Entities: User, Restaurant, Order (have UUID identity)
  • Value Objects: OrderItem (no identity, belongs to Order)

This distinction guides persistence decisions—entities get their own tables, value objects are embedded.

Security Architecture

Authentication Model

The system uses JWT (JSON Web Tokens) for stateless authentication:

  • Tokens are signed with a server-side secret
  • Tokens contain user identity and expiration
  • Clients send tokens in Authorization header
  • Server validates signature without database lookup

This stateless approach scales horizontally without session affinity requirements.

Authorization Patterns

Authorization follows role-based access control (RBAC):

  • ROLE_USER: Standard customer permissions
  • ROLE_RESTAURANT_OWNER: Can manage owned restaurants and view their orders
  • ROLE_ADMIN: Full system access

Roles are checked at the controller method level using Spring Security’s method security annotations.

Security Boundaries

The security model defines clear boundaries:

  • Public endpoints: No authentication required (login, registration, restaurant listing)
  • Authenticated endpoints: Valid JWT required (order placement)
  • Ownership endpoints: JWT + resource ownership check (order modification)
  • Admin endpoints: JWT + ROLE_ADMIN required (system management)

API Design Principles

RESTful Conventions

The API follows REST conventions with some pragmatic exceptions:

  • Resources map to domain entities (/restaurants, /orders)
  • HTTP verbs indicate action (GET, POST, PUT, DELETE)
  • Status codes convey outcome (200, 201, 400, 401, 403, 404)
  • Plural nouns for collections (/restaurants not /restaurant)

Request/Response Contracts

API contracts are defined through DTOs (Data Transfer Objects):

  • Input DTOs: Define valid request shape and validation rules
  • Output DTOs: Control what data is exposed
  • Validation annotations: Jakarta Bean Validation for input sanitization

This DTO pattern decouples the internal domain model from the public API, allowing independent evolution.

Error Handling

Error responses follow a consistent structure:

  • HTTP status code indicates error category
  • Response body contains human-readable message
  • Validation errors include field-level details

The GlobalExceptionHandler translates exceptions to appropriate HTTP responses, ensuring clients receive consistent error formats.

Data Access Patterns

Repository Abstraction

Data access follows the Repository pattern through Spring Data JPA:

  • Interfaces extend JpaRepository for CRUD operations
  • Method names derive queries automatically (findByIsActiveTrue)
  • Custom queries use @Query annotation for complex SQL
  • Pagination returns Spring’s Page abstraction

This abstraction means controllers work with domain objects rather than SQL, making the code more testable and database-agnostic.

Transaction Boundaries

Transactions are managed at the service layer:

  • Spring’s @Transactional annotation marks business operations
  • Read operations use read-only transactions for optimization
  • Write operations ensure atomicity across multiple database calls

Validation Strategy

Input validation occurs at multiple layers:

  1. DTO annotations: Jakarta Bean Validation (@NotNull, @Email, etc.)
  2. Controller: @Valid annotation triggers validation
  3. Service layer: Business rule validation
  4. Database constraints: Final integrity enforcement

This defense-in-depth approach catches errors as early as possible.

Order Lifecycle Design

State Machine

Orders follow a defined state machine with six states:

  • PENDING: Initial state when order is created
  • CONFIRMED: Restaurant has acknowledged the order
  • PREPARING: Food is being prepared
  • READY: Order is ready for pickup/delivery
  • DELIVERED: Order has been delivered to customer
  • CANCELLED: Order was cancelled

State Transition Rules

Not all transitions are valid:

  • PENDING can transition to CONFIRMED, PREPARING, or CANCELLED
  • CONFIRMED can transition to PREPARING
  • PREPARING can transition to READY
  • READY can transition to DELIVERED
  • CANCELLED is terminal

These rules are enforced in the service layer, preventing invalid state changes.

Permission by State

Different states have different permissions:

  • PENDING orders: Customer can cancel or modify
  • Non-PENDING orders: Only restaurant owner or admin can modify
  • Status changes: Only restaurant side can advance status

This reflects real-world business rules where customers have limited control after order confirmation.

Testing Strategy

Test Pyramid

Testing follows the pyramid model:

  • Unit tests: Fast, isolated, test individual functions
  • Integration tests: Test database interactions and API contracts
  • End-to-end tests: Full request/response cycles (property-based with jqwik)

Test Slices

Spring Boot’s test slices allow targeted testing:

  • @WebMvcTest: Test controllers in isolation
  • @DataJpaTest: Test repositories with in-memory database
  • @SpringBootTest: Full integration tests

Test Data

Tests use dedicated test data rather than production data:

  • Factory methods create valid entities
  • Builders allow flexible test data construction
  • Each test starts with clean database state

Documentation Requirements

Javadoc Standards

All public APIs require Javadoc:

  • Class-level description explains purpose
  • Method-level documentation describes behavior
  • Parameter and return value documented
  • Exceptions and preconditions noted

The build enforces this via the Xdoclint:missing flag, failing builds with missing documentation.

OpenAPI Generation

The API documentation is generated from:

  • SpringDoc annotations on controllers
  • DTO schemas from class definitions
  • Security scheme definitions

This ensures API docs stay synchronized with implementation.

Contract Testing with Schemathesis

Beyond traditional unit and integration tests, we use Schemathesis for property-based API testing. While unit tests verify specific inputs produce expected outputs, contract testing verifies the API adheres to its OpenAPI specification under all circumstances.

What is Schemathesis?

Schemathesis reads the OpenAPI specification and automatically generates thousands of test cases:

  • Valid inputs: Ensures documented behavior matches implementation
  • Edge cases: Boundary values, maximum lengths, special characters
  • Invalid inputs: Malformed JSON, wrong types, missing fields
  • Security cases: SQL injection attempts, XSS payloads

This catches bugs that manual test writing might miss—developers tend to test “happy paths” while Schemathesis explores the entire input space.

Testing Philosophy

Schemathesis operates on a simple principle: if the API claims to accept certain inputs in its OpenAPI spec, it must handle them gracefully. This creates a contract between API provider and consumers:

  • For providers: Any change that breaks Schemathesis tests is a breaking change
  • For consumers: Can rely on documented behavior being accurate
  • For both: Reduces integration surprises

Integration in CI

Schemathesis runs automatically during nix flake check:

  1. Build the backend and generate OpenAPI spec
  2. Start the backend in a test VM
  3. Run Schemathesis against the running API
  4. Fail the build if any tests fail

This ensures the OpenAPI specification remains accurate and the implementation handles edge cases correctly.

Configuration

Schemathesis is configured to:

  • Generate ASCII-only test data (avoiding HTTP header encoding issues)
  • Exclude certain endpoints that require external services (Google OAuth)
  • Skip stateful operations that would invalidate subsequent tests (logout)
  • Use automatic parallelism based on CPU cores

The configuration lives in schemathesis.toml at the project root.

Deployment Architecture

Native Binary

The application compiles to a native binary that:

  • Contains the Spring Boot application + embedded Tomcat
  • Includes all dependencies statically linked
  • Runs without JVM installation
  • Starts in milliseconds

Service Configuration

The binary runs as a systemd service:

  • Automatic restart on failure
  • Environment variables for configuration
  • Health check endpoint for load balancers
  • Graceful shutdown handling

Database Migrations

Schema changes are managed through:

  • Flyway migrations in version control
  • Automatic execution on startup
  • Rollback scripts for recovery
  • Compatibility with distributed Citus schema

Nix Build System

Philosophy and architectural patterns for reproducible builds and declarative infrastructure

Philosophy

Nix is not just a package manager—it’s a fundamentally different approach to software construction. We treat the entire system as a pure function: given the same inputs (source code + dependencies), we always produce the same outputs (binaries + configurations).

Core Concepts

What is Reproducibility?

Traditional build systems produce different outputs based on:

  • System libraries installed on the build machine
  • Environment variables and PATH
  • Network state during dependency resolution
  • Implicit dependencies not declared in the build file

Nix eliminates these variables by:

  • Isolating builds in clean environments with only declared dependencies
  • Locking all inputs including transitive dependencies and their hashes
  • Content-addressable storage where outputs are named by their content hash
  • No global state—each build starts from a pristine environment

The Flake Paradigm

A flake is a self-contained, versioned package description:

  • Declarative: Build instructions written in Nix expression language
  • Reproducible: flake.lock pins every dependency to exact versions
  • Composable: Other flakes can depend on your flake
  • Hermetic: No access to the outside world during builds

This means a build that succeeds on one developer’s machine will succeed identically on CI and production.

Architecture Layers

The Nix architecture separates concerns into four layers:

Layer 1: Package Definitions

Purpose: Describe how to build software from source

Packages define:

  • Source location (Git repository, local path, etc.)
  • Build dependencies (compilers, libraries, tools)
  • Build script (configure, make, install equivalents)
  • Runtime dependencies (libraries needed at runtime)

Key insight: Packages are values in a functional language. They don’t execute—they describe what would be built.

Layer 2: Development Environment

Purpose: Provide a shell with all tools needed for development

The devShell provides:

  • Exact versions of compilers and build tools
  • Project-specific utilities (formatters, linters)
  • Environment variables and shell hooks
  • Isolation from host system packages

When you run nix develop, you enter a subshell where java, bun, and other tools are exactly as specified—regardless of what’s installed on your laptop.

Layer 3: Process Composition

Purpose: Orchestrate multi-service local development

Process-compose replaces Docker Compose for local development:

  • Declares which processes to run (backend, frontend, database)
  • Manages dependencies between services
  • Provides unified logging and monitoring
  • Restart policies for failed processes

Unlike Docker, processes run natively on the host—no virtualization overhead, faster startup, and easier debugging.

Layer 4: System Configuration

Purpose: Define entire NixOS machines

NixOS modules describe:

  • Operating system configuration (users, networking, services)
  • Service definitions with systemd units
  • Security hardening and firewall rules
  • Secrets management integration

These configurations are deployed to create reproducible infrastructure—prod server #1 is identical to prod server #2 because both are built from the same expression.

Dependency Management

Lock Files

Nix flakes generate flake.lock files that pin:

  • Direct flake inputs (nixpkgs version)
  • Transitive dependencies (libraries your dependencies use)
  • Git revisions and content hashes

This means even if nixpkgs updates a library, your build continues using the pinned version until you explicitly update the lock file.

Supply Chain Security

Nix provides multiple layers of supply chain protection:

  1. Source verification: Dependencies are fetched by content hash, not just URL
  2. Reproducible builds: Same source always produces same output
  3. Binary caches: Signed pre-built binaries reduce compilation time
  4. Sandboxing: Builds cannot access the network or modify files outside their directory

If a dependency’s content doesn’t match the expected hash, the build fails rather than accepting a potentially compromised package.

Build Isolation

The Sandbox

Nix builds run in isolated environments that:

  • Have no network access
  • See only explicitly declared dependencies
  • Start with an empty filesystem (except the source)
  • Cannot write outside their output directory

This isolation catches missing dependencies that would work on your laptop (where you have tools installed) but fail in CI.

Pure Functions

Builds are pure functions—they depend only on their inputs:

buildPackage(source, dependencies, buildScript) => output

The same inputs always produce the same output, enabling:

  • Caching: If inputs haven’t changed, reuse previous output
  • Sharing: Multiple users can share the same built package
  • Verification: Rebuild and verify outputs match expectations

Development Workflow

Entering the Environment

When you run nix develop, Nix:

  1. Evaluates the devShell expression
  2. Builds any missing tools
  3. Sets up environment variables
  4. Spawns a new shell with modified PATH
  5. Runs shell hooks (e.g., setting FLAKE_ROOT)

The resulting shell has exactly the tools needed—no more, no less.

Incremental Builds

During development, Nix provides:

  • Incremental compilation: Only changed files rebuild
  • Development shells: Different shells for different tasks
  • Direnv integration: Automatically enter devShell when entering project directory

Testing Changes

The nix flake check command runs the full CI pipeline locally:

  1. Builds all packages (backend, frontend, docs)
  2. Runs unit and integration tests
  3. Checks formatting compliance
  4. Validates NixOS configurations
  5. Runs VM-based integration tests

This means “works on my machine” is actually meaningful—the exact same checks run locally and in CI.

Deployment Architecture

NixOS Systems

NixOS is a Linux distribution where everything is configured through Nix expressions:

  • System packages: Installed via Nix, not apt/yum
  • System services: Defined as systemd units in Nix
  • Configuration files: Generated by Nix templates
  • Users and groups: Declared in Nix, not useradd

A NixOS machine is built by evaluating a Nix expression that returns a complete system configuration.

Deploy-rs

Deploy-rs is the deployment tool that:

  1. Builds the system configuration locally
  2. Copies closure (package + dependencies) to remote machine
  3. Activates the new system configuration
  4. If activation fails, automatically rolls back
  5. Confirms success or triggers rollback

This means failed deployments are atomic—the system either fully activates or reverts to the previous state.

Secrets Management

Secrets are managed separately from configuration:

  • Encrypted at rest: Secrets stored encrypted in Git
  • Decrypted at activation: Age/ragenix decrypts on target machine
  • Available as files: Services read secrets from filesystem
  • Never in Nix store: Unencrypted secrets never touch the world-readable store

This separation means configuration can be public while secrets remain encrypted.

Networking and Infrastructure

Tailscale Mesh

The infrastructure uses Tailscale for private networking:

  • Mesh topology: Every node connects to every other node directly
  • WireGuard encryption: All traffic encrypted with modern crypto
  • Headscale control: Self-hosted coordination server
  • MagicDNS: Private DNS resolution for internal services

This architecture means services communicate over encrypted tunnels without public IPs or complex firewall rules.

Service Discovery

Services find each other through:

  • DNS names: headscale provides internal DNS
  • Static IPs: Tailscale assigns stable IPs in the 100.x.x.x range
  • NixOS module coordination: Services configured to know about each other

No load balancers or service meshes required—just direct encrypted connections.

CI/CD Integration

The Check Pipeline

nix flake check is the universal CI command:

  1. Build verification: All packages compile successfully
  2. Test execution: Unit, integration, and property-based tests
  3. Formatting validation: All code follows project standards
  4. Linting: Static analysis catches potential issues
  5. VM tests: Full system integration tests in VMs

Caching Strategy

Nix provides multiple caching layers:

  • Local store: Already-built packages on your machine
  • Binary cache: Shared cache (Garnix, Cachix) for common packages
  • Build cache: CI artifacts reused between builds

This means builds are incremental—you only rebuild what changed, not the world.

Troubleshooting and Debugging

Build Failures

When builds fail, Nix provides:

  • Complete build logs with all commands executed
  • Environment variable dumps
  • Option to keep failed build directory for inspection
  • --show-trace for detailed evaluation traces

Development Mode

For debugging build issues:

  • nix develop enters the build environment
  • genericBuild runs the build phases interactively
  • Failed phases can be re-run with modifications

Why Reproducibility Matters

Reproducibility isn’t just a nice property—it enables:

  • Bisecting: Git bisect works because old commits still build
  • Security auditing: Rebuild and verify package contents
  • Disaster recovery: Infrastructure rebuilt from Git in minutes
  • Team consistency: Everyone uses exact same tools
  • CI confidence: Local build success predicts CI success

When to Use Nix

Nix excels when you need:

  • Reproducible builds across environments
  • Declarative configuration that can be versioned
  • Hermetic builds that don’t depend on system state
  • Atomic upgrades with rollback capability
  • Cross-language projects with unified tooling

Nix adds complexity when:

  • Simple projects with few dependencies
  • Teams unfamiliar with functional programming
  • Need for rapid iteration over reproducibility
  • Integration with non-Nix build systems

For this project, the complexity is justified by the multi-language nature (Java + TypeScript + Nix) and the production deployment requirements.

openapi Docs

Placeholder for openapi docs - this gets filled out properly by the nix build

Frontend Docs

Placeholder for frontend docs - this gets filled out properly by the nix build

Backend Docs

Placeholder for backend docs - this gets filled out properly by the nix build

Testing Strategy

The project uses a layered testing approach:

  • Unit Tests — JUnit 5 for backend, Bun test runner for frontend
  • Integration Tests — Spring Boot Test with Testcontainers
  • Property-Based Tests — jqwik for generative testing, Schemathesis for API contract testing
  • VM Tests — Full system integration in NixOS virtual machines