DineHub Documentation

Welcome to DineHub — a resilient, multi-region cloud restaurant ordering system designed for scale.

What is DineHub?

DineHub is a distributed restaurant ordering platform that connects customers with restaurants across multiple geographic regions. It’s designed from the ground up for high availability, security, and horizontal scalability.

Why This Architecture?

Modern cloud applications face three fundamental challenges:

Challenge	Traditional Approach	Our Approach
Availability	Single points of failure	Multi-region with automatic failover
Security	Perimeter-based firewalls	Zero-trust mesh with encryption everywhere
Scalability	Vertical scaling (bigger servers)	Horizontal scaling (more servers)

DineHub demonstrates how to build a production-ready system that addresses these challenges through deliberate architectural decisions.

System Overview

At its core, DineHub consists of three layers:

┌─────────────────────────────────────────────────────────────┐
│                      USER INTERFACE                          │
│                   React + Bun + Tailwind                     │
│         Fast, type-safe, with real-time updates              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    SERVICE LAYER                             │
│             Spring Boot + GraalVM Native Image               │
│      Stateless, horizontally scalable, sub-second startup  │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    DATA LAYER                                │
│         Citus (Distributed PostgreSQL)                     │
│    Data sharded across regions, automatic query routing     │
└─────────────────────────────────────────────────────────────┘

Key Features

For Customers

Browse restaurants across multiple regions
Place orders with real-time status tracking
Secure authentication via JWT or Google OAuth
Responsive design that works on mobile and desktop

For Restaurant Owners

Manage restaurant listings and menus
View and process incoming orders
Track order lifecycle from pending to delivered
Role-based access control for staff

For Operators

Deploy to multiple regions with single commands
Monitor system health via built-in observability
Scale horizontally by adding nodes
Zero-downtime deployments with automatic rollback

Architecture Highlights

Multi-Region Deployment

Unlike traditional applications deployed to a single data center, DineHub runs across multiple GCP regions, for example:

US East (Virginia) — Primary region for North America
EU West (Ireland) — Primary region for Europe
Additional regions can be added as needed

Each region contains a complete stack: ingress, backend, and database workers. If one region fails, traffic automatically routes to healthy regions.

Zero-Trust Networking

We don’t trust the network—even our own. All internal communication happens over encrypted tunnels:

Tailscale mesh: WireGuard-encrypted connections between all nodes
Headscale: Self-hosted coordination (no dependency on Tailscale SaaS)
No public IPs: Only the ingress node is exposed to the internet
Mutual authentication: Every connection is authenticated at both ends

Distributed Database

Traditional databases become bottlenecks under load. We use Citus to distribute PostgreSQL horizontally:

Coordinator node: Routes queries to appropriate workers
Worker nodes: Store data shards distributed by restaurant_id
Automatic sharding: Data automatically distributed as restaurants grow
Query parallelization: Complex queries execute across multiple workers

Immutable Infrastructure

We treat infrastructure as code—literally. Our Nix configuration:

Version controlled: All changes tracked in Git
Reproducible: Same configuration always produces same system
Atomic: Deployments succeed or roll back completely
Testable: Infrastructure tested in VMs before production

Technology Choices

Frontend: Bun + React + Tailwind

Bun: Fast all-in-one JavaScript runtime (10x faster than Node for bundling)
React 19: Concurrent rendering and automatic batching
Tailwind v4: PostCSS-free, CSS-first styling with zero runtime
TanStack Query: Automatic caching and background refetching

Why not Node? Bun provides a unified toolchain without webpack configuration hell.

Backend: Spring Boot + GraalVM

Spring Boot 4: Mature ecosystem with production-ready defaults
GraalVM Native Image: Compiles to native binary for fast startup and low memory
PostgreSQL + Citus: Proven relational database with horizontal scaling
JWT Authentication: Stateless tokens for horizontal scalability

Why native compilation? Cold starts matter when auto-scaling. A native binary starts in milliseconds, not seconds.

Infrastructure: Nix + NixOS

Nix Flakes: Reproducible builds with locked dependencies
NixOS: Declarative Linux distribution configured entirely via Nix
deploy-rs: Atomic deployments with automatic rollback
Tailscale: Self-hosted mesh networking without VPN complexity

Why Nix? Traditional configuration management drifts over time. Nix guarantees that what we build today can be rebuilt identically in five years.

API Design: OpenAPI + Schemathesis

OpenAPI Specification: Single source of truth for API contracts in specs/openapi.yaml
Schemathesis: Property-based testing that validates implementation matches specification
Redocly: Documentation generation and spec linting
Contract Testing: API consumers can rely on documented behavior being accurate

This specification-first approach means the API documentation is never out of date—it’s automatically validated against the implementation on every build.

Getting Started

Prerequisites

You’ll need Nix installed (the Determinate Systems installer is recommended):

curl -fsSL https://install.determinate.systems/nix | sh -s -- install --determinate

Quick Start

Enter the development environment (installs all tools automatically):
```
nix develop
```
Start the local development stack (backend + frontend + database):
```
nix run .#compose
```
View the documentation (what you’re reading now):
```
nix run .#docs.serve
```
Run the full test suite:
```
nix flake check -L
```

Project Structure

├── frontend/          # Bun + React SPA
├── backend/           # Spring Boot service
├── nix/               # Nix configuration
├── docs/              # This documentation
├── flake.nix          # Nix entry point
└── README.md          # Quick reference

Documentation Guide

This documentation is organized into sections:

System Architecture

Infrastructure — How we deploy and operate across regions
Component Architecture — Design patterns for frontend, backend, and build system

Component Guides

Frontend — UI layer design and React patterns
Backend — Service layer architecture and domain model
Nix Build System — Reproducible builds and declarative infrastructure

API Reference

OpenAPI Documentation — Interactive API reference

Design Principles

Throughout this system, we follow these principles:

Type Safety First: TypeScript and Java with strict compilation catch errors at build time
Security by Default: Encryption everywhere, least-privilege access, no secrets in code
Horizontal Scalability: Design for adding nodes, not bigger nodes
Reproducibility: Builds and deployments must be repeatable and version-controlled
Observability: Every component exposes metrics and health checks
Developer Experience: Complex infrastructure, simple development workflow

Contributing

This is a university software engineering project. To contribute:

Enter the dev shell: nix develop
Create a branch for your changes
Run tests before committing: nix flake check -L
Format code: nix fmt
Submit merge request with clear description

Resources

Source Code: GitLab Repository
Documentation: Live Docs
Issue Tracker: GitLab Issues
CI/CD: Garnix (via nix flake check)

DineHub was built by Trinity College Dublin Software Engineering Group 26 as a software engineering project demonstrating modern cloud architecture patterns.

Infrastructure

Designing for resilience, security, and scale.

This section documents how we deploy and operate the DineHub restaurant ordering system across multiple cloud regions.

At a Glance

Aspect	Our Solution
Compute	AWS EC2 with NixOS
Networking	Tailscale mesh (self-hosted via Headscale)
Database	Citus (distributed PostgreSQL)
Ingress	nginx reverse proxy
Deployment	NixOS modules + deploy-rs

Documentation

System Architecture — Complete system design with diagrams and component overview
Deployment Process — How we deploy, rollback, and manage infrastructure changes
Networking — Zero-trust mesh networking with Tailscale
Database — Distributed PostgreSQL with Citus for horizontal scaling
Security — Defense in depth across all layers

System Architecture

DineHub - A resilient multi-region cloud restaurant ordering system

This document explains how our system is designed to be highly available, secure, and scalable across multiple cloud regions.

Overview

Our architecture follows three core principles:

Principle	What it means
High Availability	The system stays online even if a server or entire region fails
Security by Default	All internal traffic is encrypted; only web ports are public
Horizontal Scalability	We can add more servers to handle increased load

System Diagram

                         ┌──────────────────┐
                         │    Internet      │
                         │   (customers)    │
                         └────────┬─────────┘
                                  │
                            ports 80/443
                                  │
                    ╔═════════════▼═════════════╗
                    ║      INGRESS NODE         ║
                    ║  ┌─────────────────────┐  ║
                    ║  │   nginx (reverse    │  ║
                    ║  │   proxy + TLS)      │  ║
                    ║  └─────────────────────┘  ║
                    ╚═════════════╤═════════════╝
                                  │
    ══════════════════════════════╪══════════════════════════════
              Tailscale Mesh Network (encrypted, private)
    ══════════════════════════════╪══════════════════════════════
                                  │
           ┌──────────────────────┼──────────────────────┐
           │                      │                      │
    ╔══════▼══════╗        ╔══════▼══════╗        ╔══════▼══════╗
    ║   BACKEND   ║        ║   BACKEND   ║        ║  HEADSCALE  ║
    ║  Region A   ║        ║  Region B   ║        ║  (control)  ║
    ╚══════╤══════╝        ╚══════╤══════╝        ╚═════════════╝
           │                      │
           └──────────┬───────────┘
                      │
               ╔══════▼══════╗
               ║    CITUS    ║
               ║ COORDINATOR ║
               ╚══════╤══════╝
                      │
        ┌─────────────┼─────────────┐
        │             │             │
    ╔═══▼═══╗     ╔═══▼═══╗     ╔═══▼═══╗
    ║WORKER ║     ║WORKER ║     ║WORKER ║
    ║   1   ║     ║   2   ║     ║   3   ║
    ╚═══════╝     ╚═══════╝     ╚═══════╝

Components

1. Ingress Layer (nginx)

The ingress node is the only part of our system exposed to the internet.

Responsibilities:

Terminates TLS/HTTPS connections
Load balances requests across backend servers
Rate limits to prevent abuse

Internet → nginx (:443) → Tailscale → Backend servers

2. Secure Networking (Tailscale + Headscale)

We use Headscale, a self-hosted version of Tailscale, to create a private mesh network.

Why this approach?

Traditional Approach	Our Approach
Complex firewall rules	Simple: block everything except Tailscale
VPN tunnels between regions	Automatic mesh between all nodes
Public IPs on every server	Only ingress has public exposure
Manual certificate management	WireGuard encryption built-in

How it works:

Every server runs the Tailscale client
Headscale (our control server) authenticates nodes
Nodes communicate via private 100.x.x.x addresses
All traffic is encrypted with WireGuard

3. Backend Servers

Stateless application servers that handle the business logic.

Key properties:

Can be deployed in multiple regions for lower latency
Horizontally scalable (add more when needed)
Connect to database over the secure mesh

4. Distributed Database (Citus + PostgreSQL)

Citus extends PostgreSQL to distribute data across multiple servers.

                    ┌─────────────────────┐
                    │    Coordinator      │
                    │  (receives queries) │
                    └──────────┬──────────┘
                               │
            ┌──────────────────┼──────────────────┐
            │                  │                  │
      ┌─────▼─────┐      ┌─────▼─────┐      ┌─────▼─────┐
      │  Worker 1 │      │  Worker 2 │      │  Worker 3 │
      │ orders    │      │ orders    │      │ orders    │
      │ 1-1000    │      │ 1001-2000 │      │ 2001-3000 │
      └───────────┘      └───────────┘      └───────────┘

How Citus distributes data:

Tables are “sharded” by a key (e.g., restaurant_id)
Each worker holds a portion of the data
Queries are routed to the relevant workers
Results are combined and returned

Example: When a customer orders from Restaurant #42:

Coordinator receives the query
Routes it to the worker holding Restaurant #42’s data
Worker processes and returns the result

Request Flow

Here’s what happens when a customer places an order:

┌────────────┐     ┌─────────────┐     ┌─────────────┐     ┌───────────────┐
│  Customer  │────▶│   nginx     │────▶│   Backend   │────▶│    Citus      │
│  (browser) │     │  (ingress)  │     │  (Region A) │     │  (database)   │
└────────────┘     └─────────────┘     └─────────────┘     └───────────────┘
      │                   │                   │                    │
      │   HTTPS :443      │    Tailscale      │     Tailscale      │
      │   (encrypted)     │    (encrypted)    │     (encrypted)    │

Customer’s browser connects to nginx over HTTPS
nginx forwards request to a backend over Tailscale
Backend queries Citus coordinator over Tailscale
Coordinator fetches data from workers
Response flows back through the same path

Cloud Infrastructure

We deploy on AWS EC2 instances running NixOS.

Instance Sizes

Role	Instance	Why
Headscale	t3.micro	Low resource needs, just coordination
Ingress	t3.small	Handles TLS termination
Backend	t3.medium	Application processing
Citus Coordinator	t3.medium	Query routing
Citus Workers	t3.medium	Data storage and queries

Security Groups

Because of Tailscale, our firewall rules are minimal:

Ingress node:

Inbound:  80 (HTTP), 443 (HTTPS), 41641/UDP (Tailscale)
Outbound: All (for Tailscale)

All other nodes:

Inbound:  41641/UDP (Tailscale only)
Outbound: All (for Tailscale)

No database ports, no backend ports exposed to the internet.

Deployment

All infrastructure is defined as NixOS modules in our repository.

Module	Purpose
`backend-service.nix`	Backend application service
`postgres-service.nix`	Citus distributed PostgreSQL

This means:

Infrastructure is version controlled
Deployments are reproducible
Configuration changes are atomic

Handling Failures

If a backend server fails:

nginx detects it via health checks
Traffic is routed to healthy backends
No customer impact

If a database worker fails:

Queries to that shard will fail temporarily
Worker can rejoin and resync
Other shards continue working

If an entire region fails:

Traffic shifts to the healthy region
May need to promote replica workers

Security Summary

Attack Vector	Mitigation
Network sniffing	All traffic encrypted (WireGuard)
Unauthorized server access	Tailscale requires authentication
Database exposed to internet	Database only accessible via mesh
DDoS on backend	Only nginx is public; rate limiting enabled

Future Work

Headscale NixOS module
Tailscale client module
nginx ingress module
Secrets management with sops-nix
Prometheus monitoring (stretch goal)
Automated database backups
deploy-rs for one-command deployments

Deployment Process

How we deploy DineHub across multiple regions with confidence

Philosophy

Our deployment process follows the principle of immutable infrastructure: once deployed, servers are never modified in place. Instead, we build new systems from scratch and atomically switch traffic to them. This eliminates “configuration drift” and makes deployments predictable and reversible.

The NixOS Approach

Traditional deployment processes often involve:

SSHing into servers to run commands
Patching files in place
Hoping the application restarts correctly
Manual rollback procedures when things go wrong

NixOS eliminates these risks through declarative configuration:

Describe the desired state in Nix expressions
Build the system locally or in CI
Activate atomically — either the new system works completely, or the old system remains
Rollback automatically if health checks fail

Deployment Pipeline

Stage 1: Build

Every deployment starts with building the new system configuration:

Developer Machine          CI/CD (Garnix)              Binary Cache
     │                           │                           │
     │── nix flake check ───────▶│                           │
     │                           │── build packages ────────▶│
     │                           │                           │── cache builds
     │                           │◀── success/failure ───────│
     │◀── build results ─────────│                           │

The build process:

Compiles the backend to a GraalVM native image
Bundles the frontend with Bun
Runs all tests (unit, integration, property-based)
Validates OpenAPI spec with Schemathesis property-based testing
Validates NixOS configurations
Caches successful builds for reuse

OpenAPI Validation

As part of the build pipeline, we validate that our implementation matches the OpenAPI specification:

Specification-first: The OpenAPI spec in specs/openapi.yaml defines the API contract
Auto-generation: Spring Boot controllers generate OpenAPI documentation from code
Schemathesis testing: Property-based testing verifies implementation matches spec
Linting: Redocly validates the spec for correctness and consistency

This ensures API consumers can rely on the documented behavior.

Stage 2: Test

Before deploying to production, we validate in isolated environments:

VM Tests: Full system integration tests in NixOS VMs
Staging Environment: Identical to production but with synthetic data
Health Checks: Automated probes verify endpoints respond correctly

Stage 3: Deploy

Deployments use deploy-rs, which provides atomic activation:

┌─────────────────────────────────────────────────────────────┐
│                    Deployment Flow                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Build system closure locally                            │
│     └─ All packages + dependencies computed                 │
│                                                             │
│  2. Upload to target node                                   │
│     └─ Nix copy-closure sends only missing packages         │
│                                                             │
│  3. Activate new configuration                              │
│     └─ System switches to new generation                    │
│                                                             │
│  4. Run activation hook                                     │
│     └─ Services restart with new configuration              │
│                                                             │
│  5. Verify health checks                                    │
│     └─ Confirm services respond correctly                   │
│                                                             │
│  6. On failure: automatic rollback                          │
│     └─ Previous generation restored                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Rolling Deployments

When deploying to multiple backend servers, we use a rolling deployment strategy:

Take one server out of the load balancer
Deploy new version to that server
Verify health checks pass
Return server to load balancer
Repeat for remaining servers

This ensures:

Zero downtime: At least some servers always available
Gradual rollout: Issues caught before affecting all traffic
Easy rollback: Can revert individual servers if problems arise

Configuration Management

Secrets Handling

Sensitive configuration (database passwords, JWT keys) is managed separately from code:

Encrypted at rest: Secrets stored encrypted in the repository using agenix
Decrypted at deploy: Only the target machine can decrypt its secrets
Never in Nix store: Unencrypted secrets never touch the world-readable Nix store
Access controlled: Each secret specifies which users/services can read it

Environment-Specific Configuration

Different environments (dev, staging, production) have different needs:

Development: Local database, debug logging, hot reloading
Staging: Production-like but isolated, synthetic data
Production: Multiple regions, real data, optimized settings

These differences are captured in Nix expressions rather than environment variables scattered across systems.

Disaster Recovery

Backup Strategy

The distributed database provides natural redundancy:

Citus workers: Store shards across multiple nodes
Cross-region replicas: Critical data replicated to other regions
Point-in-time recovery: PostgreSQL WAL archiving enables restoration to any moment

Recovery Procedures

If a region fails completely:

Traffic rerouting: DNS or ingress configuration points to healthy regions
Database promotion: Replica in healthy region promoted to primary
Re-provisioning: Failed region rebuilt from Nix configuration
Data reconciliation: When failed region recovers, data synchronized

Monitoring Deployments

Deployment Metrics

We track deployment health through:

Success rate: Percentage of deployments that activate without rollback
Time to deploy: Duration from build start to activation complete
Error rates: API errors, 5xx responses, failed health checks
Resource usage: Memory, CPU, disk during and after deployment

Observability Integration

Deployments integrate with the monitoring stack:

Prometheus: Metrics scraped before/after deployment
Loki: Log aggregation to detect errors
Grafana: Dashboards showing deployment impact
Alerts: Automatic notifications for failed deployments

Continuous Deployment

Automated Pipeline

Changes flow automatically from commit to production:

Git Commit → CI Build → Tests Pass → Staging Deploy → Prod Deploy
                │           │              │             │
                ▼           ▼              ▼             ▼
            Build      Integration    Smoke Tests   Rolling
            Packages   Tests          Validation    Rollout

Safety Mechanisms

Automation includes safety checks:

Required checks: Build must pass before deployment
Manual gates: Production deployments may require approval
Canary analysis: New version serves small percentage of traffic first
Automatic rollback: Failed health checks trigger immediate rollback

Development vs Production

Key Differences

Aspect	Development	Production
Process management	process-compose	systemd
Database	Local PostgreSQL	Citus distributed cluster
Networking	localhost	Tailscale mesh
Secrets	Plain text files	agenix encrypted
Updates	Hot reloading	Atomic deployment
Monitoring	Console logs	Prometheus/Grafana

Despite these differences, the same Nix expressions describe both environments. The differences are parameterized rather than being separate code paths.

Troubleshooting Deployments

Common Issues

Build failures: Missing dependencies, compilation errors
Health check failures: Services start but don’t respond correctly
Configuration errors: Secrets or environment variables missing
Network issues: Tailscale connectivity problems between nodes

Debug Commands

When deployments fail:

Check service status: systemctl status backend
View logs: journalctl -u backend -f
Test health endpoints: curl localhost:8080/actuator/health
Verify Tailscale: tailscale status
Rollback if needed: nixos-rebuild switch --rollback

Future Improvements

Blue/Green deployments: Instant cutover with ability to rollback
Feature flags: Deploy code disabled, enable gradually
Chaos engineering: Intentionally break things to test resilience
Automated capacity scaling: Add/remove nodes based on load

Networking Architecture

How DineHub nodes communicate securely across regions

Philosophy

Traditional network security relies on perimeter-based firewalls: block everything from the outside, trust everything on the inside. This model breaks down in cloud environments where:

Services span multiple regions and cloud providers
Containers and VMs come and go dynamically
Internal traffic must still be protected

DineHub adopts zero-trust networking: encrypt everything, authenticate every connection, verify every request—regardless of whether it’s “internal” or “external.”

The Tailscale Mesh

What is Tailscale?

Tailscale is a mesh VPN built on WireGuard, a modern, high-performance VPN protocol. Unlike traditional VPNs that tunnel all traffic through a central gateway, Tailscale creates direct, encrypted connections between every pair of nodes.

Why Self-Hosted?

We use Headscale, an open-source implementation of the Tailscale control server:

No vendor dependency: We control the coordination server
Private infrastructure: No data flows through Tailscale’s SaaS
Custom policies: Define our own access rules and ACLs
Cost: No per-user licensing fees

Mesh Topology

                    ┌─────────────────────┐
                    │     Internet        │
                    └──────────┬──────────┘
                               │ HTTPS
                               ▼
                    ┌─────────────────────┐
                    │   Ingress Node      │
                    │   (nginx :443)      │
                    └──────────┬──────────┘
                               │
              ╔════════════════╪═════════════════╗
              ║   Tailscale Mesh Network (100.x) ║
              ║   All traffic encrypted via      ║
              ║   WireGuard                      ║
              ╚════════════════╪═════════════════╝
                               │
        ┌──────────────────────┼──────────────────────┐
        │                      │                      │
        ▼                      ▼                      ▼
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│  Backend-US   │◀────▶│  Backend-EU   │◀────▶│   Headscale   │
│               │      │               │      │   Control     │
│ • Port 8080   │      │ • Port 8080   │      │ • Port 443    │
│ • No public IP│      │ • No public IP│      │• No public IP │
└───────┬───────┘      └───────┬───────┘      └───────────────┘
        │                      │
        └──────────────────────┼──────────────────────┐
                               │                      │
                               ▼                      ▼
                    ┌─────────────────┐    ┌─────────────────┐
                    │DB Coordinator   │    │  DB Worker      │
                    │• Port 5432      │    │  • Port 5432    │
                    │• No public IP   │    │  • No public IP │
                    └─────────────────┘    └─────────────────┘

Network Segmentation

Security Zones

We organize infrastructure into security zones based on exposure:

Public Zone (Ingress only)

Exposed to internet on ports 80/443
nginx reverse proxy terminates TLS
All traffic forwarded to private zone via Tailscale

Private Zone (Application layer)

Backend servers in multiple regions
Only accessible via Tailscale (100.x.x.x addresses)
No public IPs, no inbound firewall rules

Data Zone (Database layer)

Citus coordinator and workers
Same Tailscale-only access as private zone
Additional PostgreSQL authentication

Control Plane (Headscale)

Manages Tailscale authentication
No user-facing services
Minimal attack surface

Communication Patterns

Request Flow

When a customer places an order:

Browser → Ingress: HTTPS over public internet
Ingress → Backend: HTTP over Tailscale (encrypted by WireGuard)
Backend → Database: PostgreSQL protocol over Tailscale
Coordinator → Workers: Internal Citus protocol over Tailscale

Every hop is authenticated and encrypted—even traffic between nodes in the same data center.

Inter-Region Communication

When a US-based backend queries a database in EU:

Backend sends query to Citus coordinator (via Tailscale)
Coordinator routes to appropriate worker (may be in EU)
Worker processes query, returns results
Coordinator aggregates and returns to backend

Tailscale automatically establishes the most direct path, potentially bypassing the public internet entirely if nodes are in the same cloud provider’s backbone.

Service Discovery

DNS Resolution

Tailscale provides MagicDNS, automatically assigning DNS names to nodes:

backend-us.internal → 100.64.0.1
db-coordinator.internal → 100.64.0.2
db-worker-1.internal → 100.64.0.3

Services reference each other by stable DNS names rather than IP addresses, simplifying configuration changes.

Health-Based Routing

nginx upstream configuration dynamically adjusts based on backend health:

Health checks verify backends respond correctly
Failed backends automatically removed from rotation
New backends automatically added when healthy
Geographic affinity: prefer local region when possible

Access Control

Tailscale ACLs

Access control lists define who can talk to whom:

Groups:
- ingress-nodes: ingress-01, ingress-02
- backend-nodes: backend-us, backend-eu
- database-nodes: db-coord, db-worker-1, db-worker-2

Rules:
- ingress-nodes → backend-nodes: allowed
- backend-nodes → database-nodes: allowed
- database-nodes → backend-nodes: denied
- public-internet → anything: denied (except ingress :443)

This “default deny” approach means new nodes can’t communicate until explicitly permitted.

Authentication

Tailscale uses cryptographic identity:

Node authentication: Each node has a unique private key
User authentication: Nodes associated with user identity
Multi-factor auth: Headscale can require MFA for node enrollment
Certificate rotation: Keys automatically rotated

Performance Considerations

Latency

Tailscale adds minimal overhead:

WireGuard encryption: ~1-2ms latency increase
Direct connections: No central hub to traverse
Protocol optimization: UDP-based, handles NAT traversal

For cross-region traffic, geographic latency dominates—Tailscale doesn’t add meaningful overhead.

Bandwidth

WireGuard is efficient:

Small overhead: ~60 bytes per packet (vs. 150+ for IPSec)
Modern crypto: ChaCha20-Poly1305 optimized for mobile/embedded
No head-of-line blocking: UDP transport

Typical throughput exceeds 1 Gbps between cloud instances.

Reliability

The mesh topology provides natural redundancy:

No single point of failure: If Headscale is down, existing connections continue
Automatic reconnection: Nodes reconnect if paths change
Path optimization: Routes around failed intermediate hops

Firewall Configuration

Minimal Rules

Because Tailscale handles authentication and encryption, firewall rules are simple:

Ingress Node:

Inbound: 80/tcp, 443/tcp, 41641/udp (Tailscale)
Outbound: All (for Tailscale mesh)

All Other Nodes:

Inbound: 41641/udp (Tailscale only)
Outbound: All (for Tailscale mesh)

No rules for application ports (8080, 5432)—Tailscale provides the connectivity.

Why This Works

Traditional firewall rules would require:

Opening port 5432 between specific IP ranges
Managing security groups per region
Updating rules when topology changes

With Tailscale:

Single UDP port for all connectivity
Identity-based rather than IP-based rules
Automatic updates as nodes join/leave

Troubleshooting

Common Issues

Nodes not connecting: Check if enrolled in Tailscale network
DNS not resolving: Verify MagicDNS enabled
High latency: Check if direct connection established (relayed traffic is slower)
Certificate errors: Node may need re-authentication

Diagnostic Commands

# Check Tailscale status
tailscale status

# Test connectivity to another node
tailscale ping backend-us

# View network map
tailscale netcheck

# Debug connection issues
tailscale bug-report

Future Enhancements

IPv6 support: Native IPv6 addressing within mesh
Subnet routers: Extend Tailscale to legacy infrastructure
Access request workflows: Temporary access grants
Audit logging: Comprehensive connection logs
Network policies: Kubernetes-style micro-segmentation

Database Architecture

Distributed data storage with Citus and PostgreSQL

Philosophy

Traditional monolithic databases eventually hit scalability limits—either they run out of storage or can’t handle concurrent query volume. Scaling vertically (bigger servers) has practical limits and creates single points of failure.

DineHub adopts horizontal database scaling: distribute data across multiple servers, with each server handling a subset of the data. This provides both capacity and performance scaling.

Why Citus?

The Problem with Single Databases

As data grows, a single PostgreSQL server faces challenges:

Storage limits: Hardware can only hold so much data
Query performance: Large tables become slow to scan
Concurrent load: Limited CPU/memory for parallel queries
Availability: Single server failure means downtime

Citus Solution

Citus extends PostgreSQL to distribute tables across multiple servers:

Horizontal scaling: Add servers as data grows
Query parallelization: Complex queries execute across workers
High availability: Replicas provide fault tolerance
PostgreSQL compatible: Standard SQL, tools, and drivers work

Architecture

The Coordinator

The coordinator is the entry point for all database queries:

Receives queries: Applications connect here like normal PostgreSQL
Plans execution: Determines which workers hold relevant data
Routes requests: Sends sub-queries to appropriate workers
Aggregates results: Combines worker responses into final result

From the application perspective, the coordinator looks like a standard PostgreSQL server.

The Workers

Workers store actual data and execute queries:

Hold shards: Each worker contains portions of distributed tables
Process queries: Execute SQL against local data
Return results: Send partial results back to coordinator

Workers are standard PostgreSQL servers with Citus extension installed.

Data Distribution

                    ┌─────────────────────┐
                    │     Coordinator     │
                    │  (Query planner &   │
                    │   result aggregator)│
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │    Worker 1     │ │    Worker 2     │ │    Worker 3     │
    │                 │ │                 │ │                 │
    │  Restaurants    │ │  Restaurants    │ │  Restaurants    │
    │  ID: 1-1000     │ │  ID: 1001-2000  │ │  ID: 2001-3000  │
    │                 │ │                 │ │                 │
    │  Orders         │ │  Orders         │ │  Orders         │
    │  (same shard)   │ │  (same shard)   │ │  (same shard)   │
    └─────────────────┘ └─────────────────┘ └─────────────────┘

Sharding Strategy

Distribution Column

Tables are distributed by a distribution column:

Restaurants: Distributed by restaurant_id
Orders: Also distributed by restaurant_id (co-located with restaurant)
Users: Distributed by user_id

This “co-location” means a restaurant’s orders reside on the same worker as the restaurant itself, making join queries efficient.

Shard Assignment

Citus uses consistent hashing to assign shards to workers:

Hash of distribution column determines shard
Each shard assigned to one primary worker
Replicas may exist on other workers for availability

When to Distribute

Not all tables should be distributed:

Distribute (large tables):

Restaurants (millions of rows expected)
Orders (billions of rows expected)
Order items (billions of rows expected)

Reference tables (replicated to all workers):

Cuisine types (small, lookup data)
Configuration (rarely changes)

Reference tables are replicated to every worker, making joins fast but updates expensive.

Query Execution

Simple Queries

Single-row lookups by distribution column are fast:

SELECT * FROM orders WHERE restaurant_id = 42;

Coordinator hashes 42, determines which worker holds the shard, and routes directly to that worker.

Complex Queries

Aggregations and joins may involve multiple workers:

SELECT region, COUNT(*) FROM restaurants GROUP BY region;

Execution:

Coordinator sends query to all workers
Each worker counts local restaurants
Workers return partial counts
Coordinator sums results and returns final count

This parallel execution provides near-linear speedup with added workers.

Cross-Shard Joins

Joins between distributed tables require care:

Efficient (co-located join):

SELECT * FROM restaurants r
JOIN orders o ON r.id = o.restaurant_id
WHERE r.id = 42;

Both tables share distribution column, so data is on same worker.

Less efficient (repartition join):

SELECT * FROM orders o
JOIN users u ON o.customer_id = u.id;

Different distribution columns require data movement between workers.

High Availability

Replication Strategy

Each shard has replicas on different workers:

Primary: Handles reads and writes
Standby: Receives streaming replication, takes over if primary fails
Cross-region: Replicas in other regions for disaster recovery

Failover Process

If a worker fails:

Detection: Health checks notice unresponsive worker
Promotion: Standby replica promoted to primary
Reconfiguration: Coordinator routes queries to new primary
Recovery: Failed worker repaired, rejoins as replica

This process is automatic—applications don’t need to change connection strings.

Split-Brain Prevention

Citus uses consensus mechanisms to prevent split-brain scenarios:

Only one primary per shard at a time
Writes blocked until consensus achieved
Clients may see brief unavailability during failover

Performance Optimization

Query Planning

The coordinator analyzes queries to optimize distribution:

Pushdown: Move filters and aggregations to workers
Pruning: Skip workers that can’t have relevant data
Parallelization: Split work across multiple workers

Index Strategy

Index recommendations change with distribution:

Distribution column: Always indexed (used for routing)
Join columns: Index if frequently joined
Filter columns: Index if selective filters common
Coordinator: May need indexes for final aggregation

Monitoring

Key metrics for distributed databases:

Shard imbalance: Are workers evenly loaded?
Query latency: Coordinator vs worker time breakdown
Replication lag: Standby replicas behind primary?
Connection pooling: Managing thousands of connections

Operational Considerations

Adding Workers

Scale out by adding workers:

Provision new worker nodes
Run citus_add_node() to add to cluster
Existing data doesn’t automatically redistribute
New data uses new workers
Optional: Rebalance shards for even distribution

Schema Changes

Schema modifications propagate to all workers:

-- This runs on coordinator and all workers
ALTER TABLE restaurants ADD COLUMN rating FLOAT;

Citus handles distribution automatically—DDL just works.

Backup and Recovery

Backup strategies for distributed data:

Logical backups: pg_dump on coordinator captures distributed schema
Per-worker backups: Physical backups of each worker’s data
Point-in-time recovery: WAL archiving for granular recovery
Cross-region replicas: Live replicas for disaster recovery

Trade-offs

Benefits

Scalability: Add capacity by adding servers
Performance: Parallel query execution
Availability: Replicas provide fault tolerance
PostgreSQL compatible: Familiar SQL and tooling

Complexity

Query planning: Must consider distribution in query design
Operational overhead: More servers to monitor and maintain
Transaction limitations: Cross-shard transactions have overhead
Migration: Existing applications may need modification

When Not to Use

Citus may be overkill for:

Small datasets (< 100GB)
Simple workloads (mostly single-row lookups)
Strong consistency requirements across shards (use single PostgreSQL)

DineHub’s expected scale (thousands of restaurants, millions of orders) justifies the complexity.

Future Enhancements

Citus MX: Multi-coordinator for higher availability
Columnar storage: Analytics queries on compressed columnar data
Automatic rebalancing: Dynamic shard redistribution
Read replicas: Offload read traffic to standbys
Global indexes: Cross-shard indexes for unique constraints

Security Architecture

Defense in depth for a multi-region cloud application

Philosophy

Security is not a feature you add at the end—it’s a property that emerges from careful design at every layer. DineHub follows defense in depth: multiple independent security mechanisms, each sufficient on its own, so that if one fails, others still protect the system.

We assume attackers will eventually breach some defenses. The goal is to make that breach useless—to limit what they can access and detect the intrusion quickly.

Security Layers

┌──────────────────────────────────────────────────────────────┐
│ Layer 7: Application Security                                │
│ • Authentication (JWT)                                       │
│ • Authorization (RBAC)                                       │
│ • Input validation                                           │
│ • Output encoding                                            │
├──────────────────────────────────────────────────────────────┤
│ Layer 6: API Security                                        │
│ • HTTPS/TLS                                                  │
│ • Rate limiting                                              │
│ • CORS policies                                              │
│ • API versioning                                             │
├──────────────────────────────────────────────────────────────┤
│ Layer 5: Network Security                                    │
│ • Zero-trust mesh (Tailscale)                                │
│ • Firewall rules                                             │
│ • Private IPs only                                           │
│ • No lateral movement                                        │
├──────────────────────────────────────────────────────────────┤
│ Layer 4: Host Security                                       │
│ • Immutable infrastructure                                   │
│ • Minimal attack surface                                     │
│ • Automatic updates                                          │
│ • Read-only filesystems                                      │
├──────────────────────────────────────────────────────────────┤
│ Layer 3: Secrets Management                                  │
│ • Encryption at rest                                         │
│ • No secrets in code                                         │
│ • Rotation policies                                          │
│ • Audit logging                                              │
├──────────────────────────────────────────────────────────────┤
│ Layer 2: Access Control                                      │
│ • Least privilege                                            │
│ • Multi-factor authentication                                │
│ • Role-based permissions                                     │
│ • Session management                                         │
├──────────────────────────────────────────────────────────────┤
│ Layer 1: Physical Security                                   │
│ • Cloud provider guarantees                                  │
│ • Multi-region distribution                                  │
│ • Encrypted storage                                          │
└ ─────────────────────────────────────────────────────────────┘

Application Security

Authentication

We use JWT (JSON Web Tokens) for authentication:

Stateless: No server-side session storage
Self-contained: Token carries user identity
Expirable: Short-lived tokens (24 hours)
Revocable: Token blacklist for logout

JWTs are signed with a server-side secret. Attackers can’t forge tokens without the secret, and expired tokens are automatically rejected.

Authorization

Role-Based Access Control (RBAC) defines what users can do:

USER: Browse restaurants, place orders, manage own orders
RESTAURANT_OWNER: All USER permissions + manage owned restaurants + view restaurant orders
ADMIN: Full system access

Permissions are enforced at the API endpoint level. Even if a user knows an endpoint exists, they can’t access resources they don’t own.

Input Validation

All user input is treated as untrusted:

Type validation: DTOs with strict typing
Range checks: Numeric values within expected ranges
Length limits: Prevent buffer overflow attempts
Format validation: Emails, UUIDs match expected patterns
Sanitization: Remove or escape dangerous characters

Validation happens at the API boundary, before data reaches business logic.

Network Security

Zero-Trust Architecture

We don’t trust the network—even our internal network:

Encryption everywhere: All traffic encrypted via WireGuard (Tailscale)
Mutual authentication: Both sides verify each other’s identity
No implicit trust: Every connection requires explicit authorization
Micro-segmentation: Services can only talk to required dependencies

Network Segmentation

Infrastructure divided into security zones:

Public zone (ingress only):

Exposed to internet
Minimal attack surface (nginx only)
All traffic forwarded to private zone

Private zone (application servers):

No public IPs
Access only via Tailscale
Can only initiate connections to data zone

Data zone (databases):

No external access except from application servers
Additional PostgreSQL authentication
Encrypted storage volumes

This segmentation means compromising one zone doesn’t automatically grant access to others.

Firewall Strategy

Traditional firewalls rely on IP whitelisting. We use identity-based access control:

Single port: Tailscale uses one UDP port for all connectivity
No application ports exposed: Database and application ports not visible to network
Cryptographic identity: Nodes authenticated by certificates, not IP addresses

Secrets Management

Secret Lifecycle

Secrets follow a strict lifecycle:

Generation: Cryptographically secure random generation
Distribution: Encrypted transmission to target systems
Storage: Encrypted at rest, never in version control
Usage: Runtime injection, not baked into images
Rotation: Regular rotation with overlap period
Revocation: Immediate invalidation on compromise

Storage

Secrets are stored encrypted using agenix:

Encryption: Age encryption with recipient public keys
Repository: Encrypted files stored in Git
Decryption: Only target machines can decrypt (private keys on machines)
Access: Each secret specifies authorized users/services

This means:

Developers can see encrypted blobs but not plaintext
CI/CD can deploy encrypted secrets but not read them
Production machines decrypt their own secrets at runtime
Compromised Git repo doesn’t expose secrets

Secret Types

Different secrets have different handling:

Database credentials: Rotated monthly, stored per-environment
JWT signing keys: Rotated quarterly, symmetric for performance
API keys: Rotated on employee departure, tracked usage
TLS certificates: Auto-renewed via Let’s Encrypt

Host Security

Immutable Infrastructure

Servers are immutable—never modified after deployment:

No SSH access: Configuration via Nix, not manual commands
Read-only root: Root filesystem mounted read-only
Ephemeral storage: Local state treated as disposable
Reproducible builds: Same Nix expression always produces same system

If a server is compromised, we don’t try to clean it—we replace it.

Attack Surface Reduction

Minimal software installed on each server:

Single purpose: Each server runs one service
No shells: No bash, no sshd (except for debugging)
No compilers: No gcc, no development tools
Minimal services: Only required systemd units

Automatic Updates

Security patches apply automatically:

NixOS updates: System packages updated via Nix
Rolling updates: New generation activated atomically
Rollback: Automatic fallback if updates fail
Rebootless: Most updates don’t require restart

Secrets in Code

What Never Goes in Code

These must never be committed to version control:

Database passwords
API keys (Stripe, AWS, etc.)
JWT signing secrets
TLS private keys
Encryption keys
Credentials for external services

What Can Go in Code

These are safe to commit:

Public API endpoints
Non-sensitive configuration (timeouts, limits)
Default values that get overridden
Encryption of secrets (public keys)

Detection

Pre-commit hooks and CI checks scan for:

High-entropy strings (potential secrets)
Known secret patterns (AWS keys, JWTs)
Hardcoded passwords
Private keys

Incident Response

Detection

Security events generate logs:

Authentication failures: Failed login attempts
Authorization failures: Access denied errors
Anomalous patterns: Unusual traffic, query patterns
System calls: Auditd logs for privileged operations

Logs aggregate in Loki for analysis and alerting.

Response Playbook

If compromise suspected:

Isolate: Remove compromised nodes from load balancer
Preserve: Capture logs and memory dumps before termination
Analyze: Determine scope of compromise
Rotate: Revoke and regenerate all potentially exposed secrets
Reimage: Replace compromised servers with fresh instances
Monitor: Enhanced monitoring for recurrence

Recovery

Recovery is fast because infrastructure is code:

Reprovision: New servers from Nix configuration in minutes
Data restore: From encrypted backups
Secret rotation: Automated via deploy pipeline
Verification: Health checks confirm clean state

Compliance Considerations

While not formally certified, DineHub design supports:

Data Protection

Encryption at rest: Database volumes encrypted
Encryption in transit: TLS 1.3 for all external traffic
Access logging: Audit trails for data access
Right to deletion: Data can be purged per request

Security Standards

Architecture aligns with:

OWASP Top 10: Addressed through input validation, authentication, etc.
CIS Benchmarks: NixOS configuration follows hardening guidelines
NIST Cybersecurity Framework: Identify, protect, detect, respond, recover

Threat Model

Assumed Threats

We design against these threats:

External attackers: Attempting to breach perimeter
Insider threats: Malicious or compromised employees
Supply chain: Compromised dependencies or build tools
Cloud provider: Curious cloud administrators
Physical theft: Stolen laptops with production access

Not Addressed

Out of scope for this project:

Nation-state actors: Advanced persistent threats with unlimited resources
Social engineering: Phishing, pretexting (user training issue)
Denial of wallet: Resource exhaustion attacks (cloud billing limits)

Security Checklist

For new features, verify:

Input validation on all user-supplied data
Authentication required for sensitive operations
Authorization checks enforce ownership/roles
No secrets in code (use agenix)
Database queries parameterized (no SQL injection)
Output encoded (XSS prevention)
Rate limiting prevents abuse
Logging for security-relevant events
Tests include security scenarios

Future Enhancements

Web Application Firewall: Rule-based request filtering
Bug bounty program: External security researchers
Penetration testing: Annual third-party assessment
Security headers: CSP, HSTS, X-Frame-Options
Certificate pinning: Prevent MITM attacks
Behavioral analytics: ML-based anomaly detection

Frontend Architecture

Design philosophy and architectural patterns for the user interface layer

Philosophy

The frontend follows a modern React SPA architecture designed for developer productivity, type safety, and runtime performance. We prioritize declarative UI patterns, compile-time optimizations, and minimal runtime overhead.

Technology Choices

Why Bun?

We chose Bun over Node.js for three primary reasons:

Unified toolchain: Bun replaces the npm/webpack/babel toolchain with a single, fast executable. This reduces configuration complexity and ensures all tools (bundler, test runner, package manager) work together seamlessly.
Performance: Bun’s bundler is significantly faster than webpack or Vite for our use case, reducing development feedback loops.
Built-in TypeScript: No additional compilation step required—TypeScript is first-class.

Why React 19?

React 19 brings several architectural improvements:

Concurrent rendering by default: Better perceived performance through prioritized updates
Automatic batching: Fewer re-renders without manual optimization
Server components: Foundation for future server-side rendering if needed
Actions: Simplified form handling and mutations

Why Tailwind CSS v4?

Tailwind v4 represents a significant architectural shift:

PostCSS-free: No build-time CSS processing pipeline, reducing build complexity
CSS-first configuration: Theme configuration lives in CSS rather than JavaScript
Zero-runtime: All styles are generated at build time
Predictable bundle size: Only used utilities are included

Application Structure

The frontend organizes code by responsibility rather than by file type:

API Layer

The API layer follows a repository pattern abstraction. Rather than making raw HTTP calls throughout components, we provide domain-specific API objects that encapsulate:

Endpoint paths and HTTP methods
Request/response type definitions
Error handling conventions
Authentication header injection

This pattern means components work with semantic methods like restaurantsApi.create(data) rather than raw fetch() calls, making the codebase more maintainable and easier to test.

State Management

We use a hybrid state approach:

Server state (data from API): Managed by TanStack Query, which handles caching, background refetching, and optimistic updates automatically
Client state (UI-only state): Managed by React’s built-in useState and Context API
Authentication state: Global context provider that persists to localStorage

This separation prevents the common anti-pattern of over-fetching or storing server data in global state where it can become stale.

Component Architecture

Components are organized into three tiers:

Page components: Route-level components that compose domain-specific UI
Feature components: Reusable components specific to a domain (e.g., RestaurantCard)
UI primitives: Generic, unstyled components from shadcn/ui (Button, Card, Input)

This three-tier architecture ensures separation of concerns: pages handle routing and data fetching, feature components handle domain logic, and primitives handle accessibility and styling.

Routing Architecture

The routing layer implements route guards for authentication:

Public routes: Accessible to all users (landing page, login, signup)
Protected routes: Require valid JWT token (dashboard, order placement)
Guest-only routes: Redirect authenticated users away (login page when already logged in)

Route guards are implemented as wrapper components that check authentication state and redirect accordingly. This keeps authentication logic centralized and reusable.

Authentication Flow

The authentication system uses JWT tokens stored in localStorage with the following flow:

User submits credentials via login form
Backend validates and returns JWT + user metadata
Token is stored in localStorage and Context state
Subsequent API calls include token in Authorization header
Protected routes check for token presence before rendering

The token has a 24-hour expiration. On app load, the Context provider checks for an existing token and restores the authentication state, providing a seamless user experience.

Build System Design

The build system is designed around Bun’s native bundler with a custom wrapper script that:

Discovers entry points automatically (HTML files in src/)
Applies Tailwind CSS transformation
Generates linked sourcemaps for debugging
Copies static assets to the output directory
Reports bundle sizes for optimization visibility

The key architectural decision here is convention over configuration: the build script automatically finds entry points rather than requiring a configuration file, making the build process easier to understand and modify.

Styling Philosophy

Our styling approach follows utility-first CSS with semantic theming:

Utility-First

Instead of writing custom CSS classes, we compose utility classes directly in the JSX. This approach:

Eliminates the need to name CSS classes
Makes styling changes explicit in version control
Prevents unused CSS from accumulating
Enables rapid prototyping

Dark Mode

Dark mode is implemented via CSS custom properties and Tailwind’s dark: variants. The theme toggle adds/removes a dark class on the document root, which triggers CSS custom property updates throughout the component tree.

Component Styling

UI primitives (from shadcn/ui) are built on Radix UI for accessibility and Tailwind for styling. They accept a className prop for composition, allowing parent components to override or extend styles without modifying the primitive.

Data Fetching Patterns

Data fetching follows a stale-while-revalidate pattern:

Component mounts and requests data
TanStack Query checks the cache first
If cached data exists (even if stale), it’s shown immediately
Background request fetches fresh data
UI updates with fresh data when available

This pattern provides instant UI feedback while ensuring data freshness, eliminating loading spinners for cached data.

Animation Strategy

Animations are implemented with Framer Motion for:

Page transitions: Smooth fade/slide between routes
Micro-interactions: Button hover states, loading indicators
Layout animations: List reordering, expanding panels

We avoid CSS animations for complex sequences and JavaScript animations for simple hover states—Framer Motion provides the right abstraction for component-level animations while deferring to CSS for simple transitions.

Error Handling

Error handling follows a progressive enhancement model:

API layer: Catches HTTP errors and throws typed Error objects
TanStack Query: Catches errors and provides error state to components
Components: Display error UI or retry controls
Global boundary: Unhandled errors caught by error boundary showing fallback UI

This layered approach ensures errors are handled at the appropriate level of abstraction.

Development Experience

The frontend architecture prioritizes developer experience through:

Hot reloading: Bun’s dev server provides instant updates
Type safety: Full TypeScript coverage with strict mode
IDE integration: Tailwind IntelliSense provides autocomplete for utility classes
Consistent formatting: treefmt enforces formatting across the codebase

Backend Architecture

Design philosophy and architectural patterns for the service layer

Philosophy

The backend follows domain-driven design principles with a focus on type safety, explicit contracts, and defensive programming. We prioritize compile-time safety over runtime flexibility, using Java’s type system to prevent errors before they reach production.

Technology Choices

Why Spring Boot?

Spring Boot provides an opinionated framework that balances productivity with flexibility:

Ecosystem maturity: Comprehensive libraries for security, data access, and testing
Production-ready: Built-in metrics, health checks, and configuration management
Community standards: Wide adoption means extensive documentation and tooling

Why GraalVM Native Image?

We compile the backend to a native executable rather than running on the JVM:

Startup time: Sub-second startup vs. JVM warmup time, critical for auto-scaling scenarios
Memory efficiency: Smaller memory footprint enables running on smaller instances
Self-contained: Single executable file with no runtime dependencies
Container-friendly: Smaller Docker images and faster cold starts

The trade-off is longer build times and some reflection limitations, which we mitigate with explicit configuration.

Why PostgreSQL with Citus?

Our database choice reflects the multi-region requirements:

PostgreSQL: Proven reliability, ACID compliance, and extensive feature set
Citus extension: Enables horizontal scaling by distributing data across multiple nodes
Compatibility: Standard PostgreSQL protocol works with all existing tools

Domain Architecture

The backend organizes code around business domains rather than technical layers:

Domain Structure

Each domain (User, Restaurant, Order) contains:

Entity: JPA-mapped data model
Repository: Data access abstraction
Controller: HTTP request handling
DTOs: Data transfer objects for API contracts

This structure keeps related code together, making it easier to understand domain boundaries and modify functionality.

Entity Relationships

The domain model centers around three main entities:

User: Authentication and identity
Restaurant: Business listings with ownership
Order: Transactions linking users to restaurants

The relationships are:

User owns Restaurants (one-to-many)
User places Orders (one-to-many)
Restaurant receives Orders (one-to-many)
Order contains OrderItems (embedded collection)

Value Objects vs Entities

We distinguish between entities (have identity and lifecycle) and value objects (defined by attributes):

Entities: User, Restaurant, Order (have UUID identity)
Value Objects: OrderItem (no identity, belongs to Order)

This distinction guides persistence decisions—entities get their own tables, value objects are embedded.

Security Architecture

Authentication Model

The system uses JWT (JSON Web Tokens) for stateless authentication:

Tokens are signed with a server-side secret
Tokens contain user identity and expiration
Clients send tokens in Authorization header
Server validates signature without database lookup

This stateless approach scales horizontally without session affinity requirements.

Authorization Patterns

Authorization follows role-based access control (RBAC):

ROLE_USER: Standard customer permissions
ROLE_RESTAURANT_OWNER: Can manage owned restaurants and view their orders
ROLE_ADMIN: Full system access

Roles are checked at the controller method level using Spring Security’s method security annotations.

Security Boundaries

The security model defines clear boundaries:

Public endpoints: No authentication required (login, registration, restaurant listing)
Authenticated endpoints: Valid JWT required (order placement)
Ownership endpoints: JWT + resource ownership check (order modification)
Admin endpoints: JWT + ROLE_ADMIN required (system management)

API Design Principles

RESTful Conventions

The API follows REST conventions with some pragmatic exceptions:

Resources map to domain entities (/restaurants, /orders)
HTTP verbs indicate action (GET, POST, PUT, DELETE)
Status codes convey outcome (200, 201, 400, 401, 403, 404)
Plural nouns for collections (/restaurants not /restaurant)

Request/Response Contracts

API contracts are defined through DTOs (Data Transfer Objects):

Input DTOs: Define valid request shape and validation rules
Output DTOs: Control what data is exposed
Validation annotations: Jakarta Bean Validation for input sanitization

This DTO pattern decouples the internal domain model from the public API, allowing independent evolution.

Error Handling

Error responses follow a consistent structure:

HTTP status code indicates error category
Response body contains human-readable message
Validation errors include field-level details

The GlobalExceptionHandler translates exceptions to appropriate HTTP responses, ensuring clients receive consistent error formats.

Data Access Patterns

Repository Abstraction

Data access follows the Repository pattern through Spring Data JPA:

Interfaces extend JpaRepository for CRUD operations
Method names derive queries automatically (findByIsActiveTrue)
Custom queries use @Query annotation for complex SQL
Pagination returns Spring’s Page abstraction

This abstraction means controllers work with domain objects rather than SQL, making the code more testable and database-agnostic.

Transaction Boundaries

Transactions are managed at the service layer:

Spring’s @Transactional annotation marks business operations
Read operations use read-only transactions for optimization
Write operations ensure atomicity across multiple database calls

Validation Strategy

Input validation occurs at multiple layers:

DTO annotations: Jakarta Bean Validation (@NotNull, @Email, etc.)
Controller: @Valid annotation triggers validation
Service layer: Business rule validation
Database constraints: Final integrity enforcement

This defense-in-depth approach catches errors as early as possible.

Order Lifecycle Design

State Machine

Orders follow a defined state machine with six states:

PENDING: Initial state when order is created
CONFIRMED: Restaurant has acknowledged the order
PREPARING: Food is being prepared
READY: Order is ready for pickup/delivery
DELIVERED: Order has been delivered to customer
CANCELLED: Order was cancelled

State Transition Rules

Not all transitions are valid:

PENDING can transition to CONFIRMED, PREPARING, or CANCELLED
CONFIRMED can transition to PREPARING
PREPARING can transition to READY
READY can transition to DELIVERED
CANCELLED is terminal

These rules are enforced in the service layer, preventing invalid state changes.

Permission by State

Different states have different permissions:

PENDING orders: Customer can cancel or modify
Non-PENDING orders: Only restaurant owner or admin can modify
Status changes: Only restaurant side can advance status

This reflects real-world business rules where customers have limited control after order confirmation.

Testing Strategy

Test Pyramid

Testing follows the pyramid model:

Unit tests: Fast, isolated, test individual functions
Integration tests: Test database interactions and API contracts
End-to-end tests: Full request/response cycles (property-based with jqwik)

Test Slices

Spring Boot’s test slices allow targeted testing:

@WebMvcTest: Test controllers in isolation
@DataJpaTest: Test repositories with in-memory database
@SpringBootTest: Full integration tests

Test Data

Tests use dedicated test data rather than production data:

Factory methods create valid entities
Builders allow flexible test data construction
Each test starts with clean database state

Documentation Requirements

Javadoc Standards

All public APIs require Javadoc:

Class-level description explains purpose
Method-level documentation describes behavior
Parameter and return value documented
Exceptions and preconditions noted

The build enforces this via the Xdoclint:missing flag, failing builds with missing documentation.

OpenAPI Generation

The API documentation is generated from:

SpringDoc annotations on controllers
DTO schemas from class definitions
Security scheme definitions

This ensures API docs stay synchronized with implementation.

Contract Testing with Schemathesis

Beyond traditional unit and integration tests, we use Schemathesis for property-based API testing. While unit tests verify specific inputs produce expected outputs, contract testing verifies the API adheres to its OpenAPI specification under all circumstances.

What is Schemathesis?

Schemathesis reads the OpenAPI specification and automatically generates thousands of test cases:

Valid inputs: Ensures documented behavior matches implementation
Edge cases: Boundary values, maximum lengths, special characters
Invalid inputs: Malformed JSON, wrong types, missing fields
Security cases: SQL injection attempts, XSS payloads

This catches bugs that manual test writing might miss—developers tend to test “happy paths” while Schemathesis explores the entire input space.

Testing Philosophy

Schemathesis operates on a simple principle: if the API claims to accept certain inputs in its OpenAPI spec, it must handle them gracefully. This creates a contract between API provider and consumers:

For providers: Any change that breaks Schemathesis tests is a breaking change
For consumers: Can rely on documented behavior being accurate
For both: Reduces integration surprises

Integration in CI

Schemathesis runs automatically during nix flake check:

Build the backend and generate OpenAPI spec
Start the backend in a test VM
Run Schemathesis against the running API
Fail the build if any tests fail

This ensures the OpenAPI specification remains accurate and the implementation handles edge cases correctly.

Configuration

Schemathesis is configured to:

Generate ASCII-only test data (avoiding HTTP header encoding issues)
Exclude certain endpoints that require external services (Google OAuth)
Skip stateful operations that would invalidate subsequent tests (logout)
Use automatic parallelism based on CPU cores

The configuration lives in schemathesis.toml at the project root.

Deployment Architecture

Native Binary

The application compiles to a native binary that:

Contains the Spring Boot application + embedded Tomcat
Includes all dependencies statically linked
Runs without JVM installation
Starts in milliseconds

Service Configuration

The binary runs as a systemd service:

Automatic restart on failure
Environment variables for configuration
Health check endpoint for load balancers
Graceful shutdown handling

Database Migrations

Schema changes are managed through:

Flyway migrations in version control
Automatic execution on startup
Rollback scripts for recovery
Compatibility with distributed Citus schema

Nix Build System

Philosophy and architectural patterns for reproducible builds and declarative infrastructure

Philosophy

Nix is not just a package manager—it’s a fundamentally different approach to software construction. We treat the entire system as a pure function: given the same inputs (source code + dependencies), we always produce the same outputs (binaries + configurations).

Core Concepts

What is Reproducibility?

Traditional build systems produce different outputs based on:

System libraries installed on the build machine
Environment variables and PATH
Network state during dependency resolution
Implicit dependencies not declared in the build file

Nix eliminates these variables by:

Isolating builds in clean environments with only declared dependencies
Locking all inputs including transitive dependencies and their hashes
Content-addressable storage where outputs are named by their content hash
No global state—each build starts from a pristine environment

The Flake Paradigm

A flake is a self-contained, versioned package description:

Declarative: Build instructions written in Nix expression language
Reproducible: flake.lock pins every dependency to exact versions
Composable: Other flakes can depend on your flake
Hermetic: No access to the outside world during builds

This means a build that succeeds on one developer’s machine will succeed identically on CI and production.

Architecture Layers

The Nix architecture separates concerns into four layers:

Layer 1: Package Definitions

Purpose: Describe how to build software from source

Packages define:

Source location (Git repository, local path, etc.)
Build dependencies (compilers, libraries, tools)
Build script (configure, make, install equivalents)
Runtime dependencies (libraries needed at runtime)

Key insight: Packages are values in a functional language. They don’t execute—they describe what would be built.

Layer 2: Development Environment

Purpose: Provide a shell with all tools needed for development

The devShell provides:

Exact versions of compilers and build tools
Project-specific utilities (formatters, linters)
Environment variables and shell hooks
Isolation from host system packages

When you run nix develop, you enter a subshell where java, bun, and other tools are exactly as specified—regardless of what’s installed on your laptop.

Layer 3: Process Composition

Purpose: Orchestrate multi-service local development

Process-compose replaces Docker Compose for local development:

Declares which processes to run (backend, frontend, database)
Manages dependencies between services
Provides unified logging and monitoring
Restart policies for failed processes

Unlike Docker, processes run natively on the host—no virtualization overhead, faster startup, and easier debugging.

Layer 4: System Configuration

Purpose: Define entire NixOS machines

NixOS modules describe:

Operating system configuration (users, networking, services)
Service definitions with systemd units
Security hardening and firewall rules
Secrets management integration

These configurations are deployed to create reproducible infrastructure—prod server #1 is identical to prod server #2 because both are built from the same expression.

Dependency Management

Lock Files

Nix flakes generate flake.lock files that pin:

Direct flake inputs (nixpkgs version)
Transitive dependencies (libraries your dependencies use)
Git revisions and content hashes

This means even if nixpkgs updates a library, your build continues using the pinned version until you explicitly update the lock file.

Supply Chain Security

Nix provides multiple layers of supply chain protection:

Source verification: Dependencies are fetched by content hash, not just URL
Reproducible builds: Same source always produces same output
Binary caches: Signed pre-built binaries reduce compilation time
Sandboxing: Builds cannot access the network or modify files outside their directory

If a dependency’s content doesn’t match the expected hash, the build fails rather than accepting a potentially compromised package.

Build Isolation

The Sandbox

Nix builds run in isolated environments that:

Have no network access
See only explicitly declared dependencies
Start with an empty filesystem (except the source)
Cannot write outside their output directory

This isolation catches missing dependencies that would work on your laptop (where you have tools installed) but fail in CI.

Pure Functions

Builds are pure functions—they depend only on their inputs:

buildPackage(source, dependencies, buildScript) => output

The same inputs always produce the same output, enabling:

Caching: If inputs haven’t changed, reuse previous output
Sharing: Multiple users can share the same built package
Verification: Rebuild and verify outputs match expectations

Development Workflow

Entering the Environment

When you run nix develop, Nix:

Evaluates the devShell expression
Builds any missing tools
Sets up environment variables
Spawns a new shell with modified PATH
Runs shell hooks (e.g., setting FLAKE_ROOT)

The resulting shell has exactly the tools needed—no more, no less.

Incremental Builds

During development, Nix provides:

Incremental compilation: Only changed files rebuild
Development shells: Different shells for different tasks
Direnv integration: Automatically enter devShell when entering project directory

Testing Changes

The nix flake check command runs the full CI pipeline locally:

Builds all packages (backend, frontend, docs)
Runs unit and integration tests
Checks formatting compliance
Validates NixOS configurations
Runs VM-based integration tests

This means “works on my machine” is actually meaningful—the exact same checks run locally and in CI.

Deployment Architecture

NixOS Systems

NixOS is a Linux distribution where everything is configured through Nix expressions:

System packages: Installed via Nix, not apt/yum
System services: Defined as systemd units in Nix
Configuration files: Generated by Nix templates
Users and groups: Declared in Nix, not useradd

A NixOS machine is built by evaluating a Nix expression that returns a complete system configuration.

Deploy-rs

Deploy-rs is the deployment tool that:

Builds the system configuration locally
Copies closure (package + dependencies) to remote machine
Activates the new system configuration
If activation fails, automatically rolls back
Confirms success or triggers rollback

This means failed deployments are atomic—the system either fully activates or reverts to the previous state.

Secrets Management

Secrets are managed separately from configuration:

Encrypted at rest: Secrets stored encrypted in Git
Decrypted at activation: Age/ragenix decrypts on target machine
Available as files: Services read secrets from filesystem
Never in Nix store: Unencrypted secrets never touch the world-readable store

This separation means configuration can be public while secrets remain encrypted.

Networking and Infrastructure

Tailscale Mesh

The infrastructure uses Tailscale for private networking:

Mesh topology: Every node connects to every other node directly
WireGuard encryption: All traffic encrypted with modern crypto
Headscale control: Self-hosted coordination server
MagicDNS: Private DNS resolution for internal services

This architecture means services communicate over encrypted tunnels without public IPs or complex firewall rules.

Service Discovery

Services find each other through:

DNS names: headscale provides internal DNS
Static IPs: Tailscale assigns stable IPs in the 100.x.x.x range
NixOS module coordination: Services configured to know about each other

No load balancers or service meshes required—just direct encrypted connections.

CI/CD Integration

The Check Pipeline

nix flake check is the universal CI command:

Build verification: All packages compile successfully
Test execution: Unit, integration, and property-based tests
Formatting validation: All code follows project standards
Linting: Static analysis catches potential issues
VM tests: Full system integration tests in VMs

Caching Strategy

Nix provides multiple caching layers:

Local store: Already-built packages on your machine
Binary cache: Shared cache (Garnix, Cachix) for common packages
Build cache: CI artifacts reused between builds

This means builds are incremental—you only rebuild what changed, not the world.

Troubleshooting and Debugging

Build Failures

When builds fail, Nix provides:

Complete build logs with all commands executed
Environment variable dumps
Option to keep failed build directory for inspection
--show-trace for detailed evaluation traces

Development Mode

For debugging build issues:

nix develop enters the build environment
genericBuild runs the build phases interactively
Failed phases can be re-run with modifications

Why Reproducibility Matters

Reproducibility isn’t just a nice property—it enables:

Bisecting: Git bisect works because old commits still build
Security auditing: Rebuild and verify package contents
Disaster recovery: Infrastructure rebuilt from Git in minutes
Team consistency: Everyone uses exact same tools
CI confidence: Local build success predicts CI success

When to Use Nix

Nix excels when you need:

Reproducible builds across environments
Declarative configuration that can be versioned
Hermetic builds that don’t depend on system state
Atomic upgrades with rollback capability
Cross-language projects with unified tooling

Nix adds complexity when:

Simple projects with few dependencies
Teams unfamiliar with functional programming
Need for rapid iteration over reproducibility
Integration with non-Nix build systems

For this project, the complexity is justified by the multi-language nature (Java + TypeScript + Nix) and the production deployment requirements.

`openapi` Docs

Placeholder for openapi docs - this gets filled out properly by the nix build

Frontend Docs

Placeholder for frontend docs - this gets filled out properly by the nix build

Backend Docs

Placeholder for backend docs - this gets filled out properly by the nix build

Testing Strategy

The project uses a layered testing approach:

Unit Tests — JUnit 5 for backend, Bun test runner for frontend
Integration Tests — Spring Boot Test with Testcontainers
Property-Based Tests — jqwik for generative testing, Schemathesis for API contract testing
VM Tests — Full system integration in NixOS virtual machines

Keyboard shortcuts

Multi Region Cloud System