DineHub Documentation
Welcome to DineHub — a resilient, multi-region cloud restaurant ordering system designed for scale.
What is DineHub?
DineHub is a distributed restaurant ordering platform that connects customers with restaurants across multiple geographic regions. It’s designed from the ground up for high availability, security, and horizontal scalability.
Why This Architecture?
Modern cloud applications face three fundamental challenges:
| Challenge | Traditional Approach | Our Approach |
|---|---|---|
| Availability | Single points of failure | Multi-region with automatic failover |
| Security | Perimeter-based firewalls | Zero-trust mesh with encryption everywhere |
| Scalability | Vertical scaling (bigger servers) | Horizontal scaling (more servers) |
DineHub demonstrates how to build a production-ready system that addresses these challenges through deliberate architectural decisions.
System Overview
At its core, DineHub consists of three layers:
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ React + Bun + Tailwind │
│ Fast, type-safe, with real-time updates │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SERVICE LAYER │
│ Spring Boot + GraalVM Native Image │
│ Stateless, horizontally scalable, sub-second startup │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ Citus (Distributed PostgreSQL) │
│ Data sharded across regions, automatic query routing │
└─────────────────────────────────────────────────────────────┘
Key Features
For Customers
- Browse restaurants across multiple regions
- Place orders with real-time status tracking
- Secure authentication via JWT or Google OAuth
- Responsive design that works on mobile and desktop
For Restaurant Owners
- Manage restaurant listings and menus
- View and process incoming orders
- Track order lifecycle from pending to delivered
- Role-based access control for staff
For Operators
- Deploy to multiple regions with single commands
- Monitor system health via built-in observability
- Scale horizontally by adding nodes
- Zero-downtime deployments with automatic rollback
Architecture Highlights
Multi-Region Deployment
Unlike traditional applications deployed to a single data center, DineHub runs across multiple GCP regions, for example:
- US East (Virginia) — Primary region for North America
- EU West (Ireland) — Primary region for Europe
- Additional regions can be added as needed
Each region contains a complete stack: ingress, backend, and database workers. If one region fails, traffic automatically routes to healthy regions.
Zero-Trust Networking
We don’t trust the network—even our own. All internal communication happens over encrypted tunnels:
- Tailscale mesh: WireGuard-encrypted connections between all nodes
- Headscale: Self-hosted coordination (no dependency on Tailscale SaaS)
- No public IPs: Only the ingress node is exposed to the internet
- Mutual authentication: Every connection is authenticated at both ends
Distributed Database
Traditional databases become bottlenecks under load. We use Citus to distribute PostgreSQL horizontally:
- Coordinator node: Routes queries to appropriate workers
- Worker nodes: Store data shards distributed by
restaurant_id - Automatic sharding: Data automatically distributed as restaurants grow
- Query parallelization: Complex queries execute across multiple workers
Immutable Infrastructure
We treat infrastructure as code—literally. Our Nix configuration:
- Version controlled: All changes tracked in Git
- Reproducible: Same configuration always produces same system
- Atomic: Deployments succeed or roll back completely
- Testable: Infrastructure tested in VMs before production
Technology Choices
Frontend: Bun + React + Tailwind
- Bun: Fast all-in-one JavaScript runtime (10x faster than Node for bundling)
- React 19: Concurrent rendering and automatic batching
- Tailwind v4: PostCSS-free, CSS-first styling with zero runtime
- TanStack Query: Automatic caching and background refetching
Why not Node? Bun provides a unified toolchain without webpack configuration hell.
Backend: Spring Boot + GraalVM
- Spring Boot 4: Mature ecosystem with production-ready defaults
- GraalVM Native Image: Compiles to native binary for fast startup and low memory
- PostgreSQL + Citus: Proven relational database with horizontal scaling
- JWT Authentication: Stateless tokens for horizontal scalability
Why native compilation? Cold starts matter when auto-scaling. A native binary starts in milliseconds, not seconds.
Infrastructure: Nix + NixOS
- Nix Flakes: Reproducible builds with locked dependencies
- NixOS: Declarative Linux distribution configured entirely via Nix
- deploy-rs: Atomic deployments with automatic rollback
- Tailscale: Self-hosted mesh networking without VPN complexity
Why Nix? Traditional configuration management drifts over time. Nix guarantees that what we build today can be rebuilt identically in five years.
API Design: OpenAPI + Schemathesis
- OpenAPI Specification: Single source of truth for API contracts in
specs/openapi.yaml - Schemathesis: Property-based testing that validates implementation matches specification
- Redocly: Documentation generation and spec linting
- Contract Testing: API consumers can rely on documented behavior being accurate
This specification-first approach means the API documentation is never out of date—it’s automatically validated against the implementation on every build.
Getting Started
Prerequisites
You’ll need Nix installed (the Determinate Systems installer is recommended):
curl -fsSL https://install.determinate.systems/nix | sh -s -- install --determinate
Quick Start
-
Enter the development environment (installs all tools automatically):
nix develop -
Start the local development stack (backend + frontend + database):
nix run .#compose -
View the documentation (what you’re reading now):
nix run .#docs.serve -
Run the full test suite:
nix flake check -L
Project Structure
├── frontend/ # Bun + React SPA
├── backend/ # Spring Boot service
├── nix/ # Nix configuration
├── docs/ # This documentation
├── flake.nix # Nix entry point
└── README.md # Quick reference
Documentation Guide
This documentation is organized into sections:
System Architecture
- Infrastructure — How we deploy and operate across regions
- Component Architecture — Design patterns for frontend, backend, and build system
Component Guides
- Frontend — UI layer design and React patterns
- Backend — Service layer architecture and domain model
- Nix Build System — Reproducible builds and declarative infrastructure
API Reference
- OpenAPI Documentation — Interactive API reference
Design Principles
Throughout this system, we follow these principles:
- Type Safety First: TypeScript and Java with strict compilation catch errors at build time
- Security by Default: Encryption everywhere, least-privilege access, no secrets in code
- Horizontal Scalability: Design for adding nodes, not bigger nodes
- Reproducibility: Builds and deployments must be repeatable and version-controlled
- Observability: Every component exposes metrics and health checks
- Developer Experience: Complex infrastructure, simple development workflow
Contributing
This is a university software engineering project. To contribute:
- Enter the dev shell:
nix develop - Create a branch for your changes
- Run tests before committing:
nix flake check -L - Format code:
nix fmt - Submit merge request with clear description
Resources
- Source Code: GitLab Repository
- Documentation: Live Docs
- Issue Tracker: GitLab Issues
- CI/CD: Garnix (via
nix flake check)
DineHub was built by Trinity College Dublin Software Engineering Group 26 as a software engineering project demonstrating modern cloud architecture patterns.
Infrastructure
Designing for resilience, security, and scale.
This section documents how we deploy and operate the DineHub restaurant ordering system across multiple cloud regions.
At a Glance
| Aspect | Our Solution |
|---|---|
| Compute | AWS EC2 with NixOS |
| Networking | Tailscale mesh (self-hosted via Headscale) |
| Database | Citus (distributed PostgreSQL) |
| Ingress | nginx reverse proxy |
| Deployment | NixOS modules + deploy-rs |
Documentation
- System Architecture — Complete system design with diagrams and component overview
- Deployment Process — How we deploy, rollback, and manage infrastructure changes
- Networking — Zero-trust mesh networking with Tailscale
- Database — Distributed PostgreSQL with Citus for horizontal scaling
- Security — Defense in depth across all layers
System Architecture
DineHub - A resilient multi-region cloud restaurant ordering system
This document explains how our system is designed to be highly available, secure, and scalable across multiple cloud regions.
Overview
Our architecture follows three core principles:
| Principle | What it means |
|---|---|
| High Availability | The system stays online even if a server or entire region fails |
| Security by Default | All internal traffic is encrypted; only web ports are public |
| Horizontal Scalability | We can add more servers to handle increased load |
System Diagram
┌──────────────────┐
│ Internet │
│ (customers) │
└────────┬─────────┘
│
ports 80/443
│
╔═════════════▼═════════════╗
║ INGRESS NODE ║
║ ┌─────────────────────┐ ║
║ │ nginx (reverse │ ║
║ │ proxy + TLS) │ ║
║ └─────────────────────┘ ║
╚═════════════╤═════════════╝
│
══════════════════════════════╪══════════════════════════════
Tailscale Mesh Network (encrypted, private)
══════════════════════════════╪══════════════════════════════
│
┌──────────────────────┼──────────────────────┐
│ │ │
╔══════▼══════╗ ╔══════▼══════╗ ╔══════▼══════╗
║ BACKEND ║ ║ BACKEND ║ ║ HEADSCALE ║
║ Region A ║ ║ Region B ║ ║ (control) ║
╚══════╤══════╝ ╚══════╤══════╝ ╚═════════════╝
│ │
└──────────┬───────────┘
│
╔══════▼══════╗
║ CITUS ║
║ COORDINATOR ║
╚══════╤══════╝
│
┌─────────────┼─────────────┐
│ │ │
╔═══▼═══╗ ╔═══▼═══╗ ╔═══▼═══╗
║WORKER ║ ║WORKER ║ ║WORKER ║
║ 1 ║ ║ 2 ║ ║ 3 ║
╚═══════╝ ╚═══════╝ ╚═══════╝
Components
1. Ingress Layer (nginx)
The ingress node is the only part of our system exposed to the internet.
Responsibilities:
- Terminates TLS/HTTPS connections
- Load balances requests across backend servers
- Rate limits to prevent abuse
Internet → nginx (:443) → Tailscale → Backend servers
2. Secure Networking (Tailscale + Headscale)
We use Headscale, a self-hosted version of Tailscale, to create a private mesh network.
Why this approach?
| Traditional Approach | Our Approach |
|---|---|
| Complex firewall rules | Simple: block everything except Tailscale |
| VPN tunnels between regions | Automatic mesh between all nodes |
| Public IPs on every server | Only ingress has public exposure |
| Manual certificate management | WireGuard encryption built-in |
How it works:
- Every server runs the Tailscale client
- Headscale (our control server) authenticates nodes
- Nodes communicate via private
100.x.x.xaddresses - All traffic is encrypted with WireGuard
3. Backend Servers
Stateless application servers that handle the business logic.
Key properties:
- Can be deployed in multiple regions for lower latency
- Horizontally scalable (add more when needed)
- Connect to database over the secure mesh
4. Distributed Database (Citus + PostgreSQL)
Citus extends PostgreSQL to distribute data across multiple servers.
┌─────────────────────┐
│ Coordinator │
│ (receives queries) │
└──────────┬──────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
│ orders │ │ orders │ │ orders │
│ 1-1000 │ │ 1001-2000 │ │ 2001-3000 │
└───────────┘ └───────────┘ └───────────┘
How Citus distributes data:
- Tables are “sharded” by a key (e.g.,
restaurant_id) - Each worker holds a portion of the data
- Queries are routed to the relevant workers
- Results are combined and returned
Example: When a customer orders from Restaurant #42:
- Coordinator receives the query
- Routes it to the worker holding Restaurant #42’s data
- Worker processes and returns the result
Request Flow
Here’s what happens when a customer places an order:
┌────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐
│ Customer │────▶│ nginx │────▶│ Backend │────▶│ Citus │
│ (browser) │ │ (ingress) │ │ (Region A) │ │ (database) │
└────────────┘ └─────────────┘ └─────────────┘ └───────────────┘
│ │ │ │
│ HTTPS :443 │ Tailscale │ Tailscale │
│ (encrypted) │ (encrypted) │ (encrypted) │
- Customer’s browser connects to nginx over HTTPS
- nginx forwards request to a backend over Tailscale
- Backend queries Citus coordinator over Tailscale
- Coordinator fetches data from workers
- Response flows back through the same path
Cloud Infrastructure
We deploy on AWS EC2 instances running NixOS.
Instance Sizes
| Role | Instance | Why |
|---|---|---|
| Headscale | t3.micro | Low resource needs, just coordination |
| Ingress | t3.small | Handles TLS termination |
| Backend | t3.medium | Application processing |
| Citus Coordinator | t3.medium | Query routing |
| Citus Workers | t3.medium | Data storage and queries |
Security Groups
Because of Tailscale, our firewall rules are minimal:
Ingress node:
Inbound: 80 (HTTP), 443 (HTTPS), 41641/UDP (Tailscale)
Outbound: All (for Tailscale)
All other nodes:
Inbound: 41641/UDP (Tailscale only)
Outbound: All (for Tailscale)
No database ports, no backend ports exposed to the internet.
Deployment
All infrastructure is defined as NixOS modules in our repository.
| Module | Purpose |
|---|---|
backend-service.nix | Backend application service |
postgres-service.nix | Citus distributed PostgreSQL |
This means:
- Infrastructure is version controlled
- Deployments are reproducible
- Configuration changes are atomic
Handling Failures
If a backend server fails:
- nginx detects it via health checks
- Traffic is routed to healthy backends
- No customer impact
If a database worker fails:
- Queries to that shard will fail temporarily
- Worker can rejoin and resync
- Other shards continue working
If an entire region fails:
- Traffic shifts to the healthy region
- May need to promote replica workers
Security Summary
| Attack Vector | Mitigation |
|---|---|
| Network sniffing | All traffic encrypted (WireGuard) |
| Unauthorized server access | Tailscale requires authentication |
| Database exposed to internet | Database only accessible via mesh |
| DDoS on backend | Only nginx is public; rate limiting enabled |
Future Work
- Headscale NixOS module
- Tailscale client module
- nginx ingress module
- Secrets management with sops-nix
- Prometheus monitoring (stretch goal)
- Automated database backups
- deploy-rs for one-command deployments
Deployment Process
How we deploy DineHub across multiple regions with confidence
Philosophy
Our deployment process follows the principle of immutable infrastructure: once deployed, servers are never modified in place. Instead, we build new systems from scratch and atomically switch traffic to them. This eliminates “configuration drift” and makes deployments predictable and reversible.
The NixOS Approach
Traditional deployment processes often involve:
- SSHing into servers to run commands
- Patching files in place
- Hoping the application restarts correctly
- Manual rollback procedures when things go wrong
NixOS eliminates these risks through declarative configuration:
- Describe the desired state in Nix expressions
- Build the system locally or in CI
- Activate atomically — either the new system works completely, or the old system remains
- Rollback automatically if health checks fail
Deployment Pipeline
Stage 1: Build
Every deployment starts with building the new system configuration:
Developer Machine CI/CD (Garnix) Binary Cache
│ │ │
│── nix flake check ───────▶│ │
│ │── build packages ────────▶│
│ │ │── cache builds
│ │◀── success/failure ───────│
│◀── build results ─────────│ │
The build process:
- Compiles the backend to a GraalVM native image
- Bundles the frontend with Bun
- Runs all tests (unit, integration, property-based)
- Validates OpenAPI spec with Schemathesis property-based testing
- Validates NixOS configurations
- Caches successful builds for reuse
OpenAPI Validation
As part of the build pipeline, we validate that our implementation matches the OpenAPI specification:
- Specification-first: The OpenAPI spec in
specs/openapi.yamldefines the API contract - Auto-generation: Spring Boot controllers generate OpenAPI documentation from code
- Schemathesis testing: Property-based testing verifies implementation matches spec
- Linting: Redocly validates the spec for correctness and consistency
This ensures API consumers can rely on the documented behavior.
Stage 2: Test
Before deploying to production, we validate in isolated environments:
- VM Tests: Full system integration tests in NixOS VMs
- Staging Environment: Identical to production but with synthetic data
- Health Checks: Automated probes verify endpoints respond correctly
Stage 3: Deploy
Deployments use deploy-rs, which provides atomic activation:
┌─────────────────────────────────────────────────────────────┐
│ Deployment Flow │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. Build system closure locally │
│ └─ All packages + dependencies computed │
│ │
│ 2. Upload to target node │
│ └─ Nix copy-closure sends only missing packages │
│ │
│ 3. Activate new configuration │
│ └─ System switches to new generation │
│ │
│ 4. Run activation hook │
│ └─ Services restart with new configuration │
│ │
│ 5. Verify health checks │
│ └─ Confirm services respond correctly │
│ │
│ 6. On failure: automatic rollback │
│ └─ Previous generation restored │
│ │
└─────────────────────────────────────────────────────────────┘
Rolling Deployments
When deploying to multiple backend servers, we use a rolling deployment strategy:
- Take one server out of the load balancer
- Deploy new version to that server
- Verify health checks pass
- Return server to load balancer
- Repeat for remaining servers
This ensures:
- Zero downtime: At least some servers always available
- Gradual rollout: Issues caught before affecting all traffic
- Easy rollback: Can revert individual servers if problems arise
Configuration Management
Secrets Handling
Sensitive configuration (database passwords, JWT keys) is managed separately from code:
- Encrypted at rest: Secrets stored encrypted in the repository using agenix
- Decrypted at deploy: Only the target machine can decrypt its secrets
- Never in Nix store: Unencrypted secrets never touch the world-readable Nix store
- Access controlled: Each secret specifies which users/services can read it
Environment-Specific Configuration
Different environments (dev, staging, production) have different needs:
- Development: Local database, debug logging, hot reloading
- Staging: Production-like but isolated, synthetic data
- Production: Multiple regions, real data, optimized settings
These differences are captured in Nix expressions rather than environment variables scattered across systems.
Disaster Recovery
Backup Strategy
The distributed database provides natural redundancy:
- Citus workers: Store shards across multiple nodes
- Cross-region replicas: Critical data replicated to other regions
- Point-in-time recovery: PostgreSQL WAL archiving enables restoration to any moment
Recovery Procedures
If a region fails completely:
- Traffic rerouting: DNS or ingress configuration points to healthy regions
- Database promotion: Replica in healthy region promoted to primary
- Re-provisioning: Failed region rebuilt from Nix configuration
- Data reconciliation: When failed region recovers, data synchronized
Monitoring Deployments
Deployment Metrics
We track deployment health through:
- Success rate: Percentage of deployments that activate without rollback
- Time to deploy: Duration from build start to activation complete
- Error rates: API errors, 5xx responses, failed health checks
- Resource usage: Memory, CPU, disk during and after deployment
Observability Integration
Deployments integrate with the monitoring stack:
- Prometheus: Metrics scraped before/after deployment
- Loki: Log aggregation to detect errors
- Grafana: Dashboards showing deployment impact
- Alerts: Automatic notifications for failed deployments
Continuous Deployment
Automated Pipeline
Changes flow automatically from commit to production:
Git Commit → CI Build → Tests Pass → Staging Deploy → Prod Deploy
│ │ │ │
▼ ▼ ▼ ▼
Build Integration Smoke Tests Rolling
Packages Tests Validation Rollout
Safety Mechanisms
Automation includes safety checks:
- Required checks: Build must pass before deployment
- Manual gates: Production deployments may require approval
- Canary analysis: New version serves small percentage of traffic first
- Automatic rollback: Failed health checks trigger immediate rollback
Development vs Production
Key Differences
| Aspect | Development | Production |
|---|---|---|
| Process management | process-compose | systemd |
| Database | Local PostgreSQL | Citus distributed cluster |
| Networking | localhost | Tailscale mesh |
| Secrets | Plain text files | agenix encrypted |
| Updates | Hot reloading | Atomic deployment |
| Monitoring | Console logs | Prometheus/Grafana |
Despite these differences, the same Nix expressions describe both environments. The differences are parameterized rather than being separate code paths.
Troubleshooting Deployments
Common Issues
- Build failures: Missing dependencies, compilation errors
- Health check failures: Services start but don’t respond correctly
- Configuration errors: Secrets or environment variables missing
- Network issues: Tailscale connectivity problems between nodes
Debug Commands
When deployments fail:
- Check service status:
systemctl status backend - View logs:
journalctl -u backend -f - Test health endpoints:
curl localhost:8080/actuator/health - Verify Tailscale:
tailscale status - Rollback if needed:
nixos-rebuild switch --rollback
Future Improvements
- Blue/Green deployments: Instant cutover with ability to rollback
- Feature flags: Deploy code disabled, enable gradually
- Chaos engineering: Intentionally break things to test resilience
- Automated capacity scaling: Add/remove nodes based on load
Networking Architecture
How DineHub nodes communicate securely across regions
Philosophy
Traditional network security relies on perimeter-based firewalls: block everything from the outside, trust everything on the inside. This model breaks down in cloud environments where:
- Services span multiple regions and cloud providers
- Containers and VMs come and go dynamically
- Internal traffic must still be protected
DineHub adopts zero-trust networking: encrypt everything, authenticate every connection, verify every request—regardless of whether it’s “internal” or “external.”
The Tailscale Mesh
What is Tailscale?
Tailscale is a mesh VPN built on WireGuard, a modern, high-performance VPN protocol. Unlike traditional VPNs that tunnel all traffic through a central gateway, Tailscale creates direct, encrypted connections between every pair of nodes.
Why Self-Hosted?
We use Headscale, an open-source implementation of the Tailscale control server:
- No vendor dependency: We control the coordination server
- Private infrastructure: No data flows through Tailscale’s SaaS
- Custom policies: Define our own access rules and ACLs
- Cost: No per-user licensing fees
Mesh Topology
┌─────────────────────┐
│ Internet │
└──────────┬──────────┘
│ HTTPS
▼
┌─────────────────────┐
│ Ingress Node │
│ (nginx :443) │
└──────────┬──────────┘
│
╔════════════════╪═════════════════╗
║ Tailscale Mesh Network (100.x) ║
║ All traffic encrypted via ║
║ WireGuard ║
╚════════════════╪═════════════════╝
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Backend-US │◀────▶│ Backend-EU │◀────▶│ Headscale │
│ │ │ │ │ Control │
│ • Port 8080 │ │ • Port 8080 │ │ • Port 443 │
│ • No public IP│ │ • No public IP│ │• No public IP │
└───────┬───────┘ └───────┬───────┘ └───────────────┘
│ │
└──────────────────────┼──────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│DB Coordinator │ │ DB Worker │
│• Port 5432 │ │ • Port 5432 │
│• No public IP │ │ • No public IP │
└─────────────────┘ └─────────────────┘
Network Segmentation
Security Zones
We organize infrastructure into security zones based on exposure:
Public Zone (Ingress only)
- Exposed to internet on ports 80/443
- nginx reverse proxy terminates TLS
- All traffic forwarded to private zone via Tailscale
Private Zone (Application layer)
- Backend servers in multiple regions
- Only accessible via Tailscale (100.x.x.x addresses)
- No public IPs, no inbound firewall rules
Data Zone (Database layer)
- Citus coordinator and workers
- Same Tailscale-only access as private zone
- Additional PostgreSQL authentication
Control Plane (Headscale)
- Manages Tailscale authentication
- No user-facing services
- Minimal attack surface
Communication Patterns
Request Flow
When a customer places an order:
- Browser → Ingress: HTTPS over public internet
- Ingress → Backend: HTTP over Tailscale (encrypted by WireGuard)
- Backend → Database: PostgreSQL protocol over Tailscale
- Coordinator → Workers: Internal Citus protocol over Tailscale
Every hop is authenticated and encrypted—even traffic between nodes in the same data center.
Inter-Region Communication
When a US-based backend queries a database in EU:
- Backend sends query to Citus coordinator (via Tailscale)
- Coordinator routes to appropriate worker (may be in EU)
- Worker processes query, returns results
- Coordinator aggregates and returns to backend
Tailscale automatically establishes the most direct path, potentially bypassing the public internet entirely if nodes are in the same cloud provider’s backbone.
Service Discovery
DNS Resolution
Tailscale provides MagicDNS, automatically assigning DNS names to nodes:
backend-us.internal→ 100.64.0.1db-coordinator.internal→ 100.64.0.2db-worker-1.internal→ 100.64.0.3
Services reference each other by stable DNS names rather than IP addresses, simplifying configuration changes.
Health-Based Routing
nginx upstream configuration dynamically adjusts based on backend health:
- Health checks verify backends respond correctly
- Failed backends automatically removed from rotation
- New backends automatically added when healthy
- Geographic affinity: prefer local region when possible
Access Control
Tailscale ACLs
Access control lists define who can talk to whom:
Groups:
- ingress-nodes: ingress-01, ingress-02
- backend-nodes: backend-us, backend-eu
- database-nodes: db-coord, db-worker-1, db-worker-2
Rules:
- ingress-nodes → backend-nodes: allowed
- backend-nodes → database-nodes: allowed
- database-nodes → backend-nodes: denied
- public-internet → anything: denied (except ingress :443)
This “default deny” approach means new nodes can’t communicate until explicitly permitted.
Authentication
Tailscale uses cryptographic identity:
- Node authentication: Each node has a unique private key
- User authentication: Nodes associated with user identity
- Multi-factor auth: Headscale can require MFA for node enrollment
- Certificate rotation: Keys automatically rotated
Performance Considerations
Latency
Tailscale adds minimal overhead:
- WireGuard encryption: ~1-2ms latency increase
- Direct connections: No central hub to traverse
- Protocol optimization: UDP-based, handles NAT traversal
For cross-region traffic, geographic latency dominates—Tailscale doesn’t add meaningful overhead.
Bandwidth
WireGuard is efficient:
- Small overhead: ~60 bytes per packet (vs. 150+ for IPSec)
- Modern crypto: ChaCha20-Poly1305 optimized for mobile/embedded
- No head-of-line blocking: UDP transport
Typical throughput exceeds 1 Gbps between cloud instances.
Reliability
The mesh topology provides natural redundancy:
- No single point of failure: If Headscale is down, existing connections continue
- Automatic reconnection: Nodes reconnect if paths change
- Path optimization: Routes around failed intermediate hops
Firewall Configuration
Minimal Rules
Because Tailscale handles authentication and encryption, firewall rules are simple:
Ingress Node:
- Inbound: 80/tcp, 443/tcp, 41641/udp (Tailscale)
- Outbound: All (for Tailscale mesh)
All Other Nodes:
- Inbound: 41641/udp (Tailscale only)
- Outbound: All (for Tailscale mesh)
No rules for application ports (8080, 5432)—Tailscale provides the connectivity.
Why This Works
Traditional firewall rules would require:
- Opening port 5432 between specific IP ranges
- Managing security groups per region
- Updating rules when topology changes
With Tailscale:
- Single UDP port for all connectivity
- Identity-based rather than IP-based rules
- Automatic updates as nodes join/leave
Troubleshooting
Common Issues
- Nodes not connecting: Check if enrolled in Tailscale network
- DNS not resolving: Verify MagicDNS enabled
- High latency: Check if direct connection established (relayed traffic is slower)
- Certificate errors: Node may need re-authentication
Diagnostic Commands
# Check Tailscale status
tailscale status
# Test connectivity to another node
tailscale ping backend-us
# View network map
tailscale netcheck
# Debug connection issues
tailscale bug-report
Future Enhancements
- IPv6 support: Native IPv6 addressing within mesh
- Subnet routers: Extend Tailscale to legacy infrastructure
- Access request workflows: Temporary access grants
- Audit logging: Comprehensive connection logs
- Network policies: Kubernetes-style micro-segmentation
Database Architecture
Distributed data storage with Citus and PostgreSQL
Philosophy
Traditional monolithic databases eventually hit scalability limits—either they run out of storage or can’t handle concurrent query volume. Scaling vertically (bigger servers) has practical limits and creates single points of failure.
DineHub adopts horizontal database scaling: distribute data across multiple servers, with each server handling a subset of the data. This provides both capacity and performance scaling.
Why Citus?
The Problem with Single Databases
As data grows, a single PostgreSQL server faces challenges:
- Storage limits: Hardware can only hold so much data
- Query performance: Large tables become slow to scan
- Concurrent load: Limited CPU/memory for parallel queries
- Availability: Single server failure means downtime
Citus Solution
Citus extends PostgreSQL to distribute tables across multiple servers:
- Horizontal scaling: Add servers as data grows
- Query parallelization: Complex queries execute across workers
- High availability: Replicas provide fault tolerance
- PostgreSQL compatible: Standard SQL, tools, and drivers work
Architecture
The Coordinator
The coordinator is the entry point for all database queries:
- Receives queries: Applications connect here like normal PostgreSQL
- Plans execution: Determines which workers hold relevant data
- Routes requests: Sends sub-queries to appropriate workers
- Aggregates results: Combines worker responses into final result
From the application perspective, the coordinator looks like a standard PostgreSQL server.
The Workers
Workers store actual data and execute queries:
- Hold shards: Each worker contains portions of distributed tables
- Process queries: Execute SQL against local data
- Return results: Send partial results back to coordinator
Workers are standard PostgreSQL servers with Citus extension installed.
Data Distribution
┌─────────────────────┐
│ Coordinator │
│ (Query planner & │
│ result aggregator)│
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
│ │ │ │ │ │
│ Restaurants │ │ Restaurants │ │ Restaurants │
│ ID: 1-1000 │ │ ID: 1001-2000 │ │ ID: 2001-3000 │
│ │ │ │ │ │
│ Orders │ │ Orders │ │ Orders │
│ (same shard) │ │ (same shard) │ │ (same shard) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Sharding Strategy
Distribution Column
Tables are distributed by a distribution column:
- Restaurants: Distributed by
restaurant_id - Orders: Also distributed by
restaurant_id(co-located with restaurant) - Users: Distributed by
user_id
This “co-location” means a restaurant’s orders reside on the same worker as the restaurant itself, making join queries efficient.
Shard Assignment
Citus uses consistent hashing to assign shards to workers:
- Hash of distribution column determines shard
- Each shard assigned to one primary worker
- Replicas may exist on other workers for availability
When to Distribute
Not all tables should be distributed:
Distribute (large tables):
- Restaurants (millions of rows expected)
- Orders (billions of rows expected)
- Order items (billions of rows expected)
Reference tables (replicated to all workers):
- Cuisine types (small, lookup data)
- Configuration (rarely changes)
Reference tables are replicated to every worker, making joins fast but updates expensive.
Query Execution
Simple Queries
Single-row lookups by distribution column are fast:
SELECT * FROM orders WHERE restaurant_id = 42;
Coordinator hashes 42, determines which worker holds the shard, and routes directly to that worker.
Complex Queries
Aggregations and joins may involve multiple workers:
SELECT region, COUNT(*) FROM restaurants GROUP BY region;
Execution:
- Coordinator sends query to all workers
- Each worker counts local restaurants
- Workers return partial counts
- Coordinator sums results and returns final count
This parallel execution provides near-linear speedup with added workers.
Cross-Shard Joins
Joins between distributed tables require care:
Efficient (co-located join):
SELECT * FROM restaurants r
JOIN orders o ON r.id = o.restaurant_id
WHERE r.id = 42;
Both tables share distribution column, so data is on same worker.
Less efficient (repartition join):
SELECT * FROM orders o
JOIN users u ON o.customer_id = u.id;
Different distribution columns require data movement between workers.
High Availability
Replication Strategy
Each shard has replicas on different workers:
- Primary: Handles reads and writes
- Standby: Receives streaming replication, takes over if primary fails
- Cross-region: Replicas in other regions for disaster recovery
Failover Process
If a worker fails:
- Detection: Health checks notice unresponsive worker
- Promotion: Standby replica promoted to primary
- Reconfiguration: Coordinator routes queries to new primary
- Recovery: Failed worker repaired, rejoins as replica
This process is automatic—applications don’t need to change connection strings.
Split-Brain Prevention
Citus uses consensus mechanisms to prevent split-brain scenarios:
- Only one primary per shard at a time
- Writes blocked until consensus achieved
- Clients may see brief unavailability during failover
Performance Optimization
Query Planning
The coordinator analyzes queries to optimize distribution:
- Pushdown: Move filters and aggregations to workers
- Pruning: Skip workers that can’t have relevant data
- Parallelization: Split work across multiple workers
Index Strategy
Index recommendations change with distribution:
- Distribution column: Always indexed (used for routing)
- Join columns: Index if frequently joined
- Filter columns: Index if selective filters common
- Coordinator: May need indexes for final aggregation
Monitoring
Key metrics for distributed databases:
- Shard imbalance: Are workers evenly loaded?
- Query latency: Coordinator vs worker time breakdown
- Replication lag: Standby replicas behind primary?
- Connection pooling: Managing thousands of connections
Operational Considerations
Adding Workers
Scale out by adding workers:
- Provision new worker nodes
- Run
citus_add_node()to add to cluster - Existing data doesn’t automatically redistribute
- New data uses new workers
- Optional: Rebalance shards for even distribution
Schema Changes
Schema modifications propagate to all workers:
-- This runs on coordinator and all workers
ALTER TABLE restaurants ADD COLUMN rating FLOAT;
Citus handles distribution automatically—DDL just works.
Backup and Recovery
Backup strategies for distributed data:
- Logical backups: pg_dump on coordinator captures distributed schema
- Per-worker backups: Physical backups of each worker’s data
- Point-in-time recovery: WAL archiving for granular recovery
- Cross-region replicas: Live replicas for disaster recovery
Trade-offs
Benefits
- Scalability: Add capacity by adding servers
- Performance: Parallel query execution
- Availability: Replicas provide fault tolerance
- PostgreSQL compatible: Familiar SQL and tooling
Complexity
- Query planning: Must consider distribution in query design
- Operational overhead: More servers to monitor and maintain
- Transaction limitations: Cross-shard transactions have overhead
- Migration: Existing applications may need modification
When Not to Use
Citus may be overkill for:
- Small datasets (< 100GB)
- Simple workloads (mostly single-row lookups)
- Strong consistency requirements across shards (use single PostgreSQL)
DineHub’s expected scale (thousands of restaurants, millions of orders) justifies the complexity.
Future Enhancements
- Citus MX: Multi-coordinator for higher availability
- Columnar storage: Analytics queries on compressed columnar data
- Automatic rebalancing: Dynamic shard redistribution
- Read replicas: Offload read traffic to standbys
- Global indexes: Cross-shard indexes for unique constraints
Security Architecture
Defense in depth for a multi-region cloud application
Philosophy
Security is not a feature you add at the end—it’s a property that emerges from careful design at every layer. DineHub follows defense in depth: multiple independent security mechanisms, each sufficient on its own, so that if one fails, others still protect the system.
We assume attackers will eventually breach some defenses. The goal is to make that breach useless—to limit what they can access and detect the intrusion quickly.
Security Layers
┌──────────────────────────────────────────────────────────────┐
│ Layer 7: Application Security │
│ • Authentication (JWT) │
│ • Authorization (RBAC) │
│ • Input validation │
│ • Output encoding │
├──────────────────────────────────────────────────────────────┤
│ Layer 6: API Security │
│ • HTTPS/TLS │
│ • Rate limiting │
│ • CORS policies │
│ • API versioning │
├──────────────────────────────────────────────────────────────┤
│ Layer 5: Network Security │
│ • Zero-trust mesh (Tailscale) │
│ • Firewall rules │
│ • Private IPs only │
│ • No lateral movement │
├──────────────────────────────────────────────────────────────┤
│ Layer 4: Host Security │
│ • Immutable infrastructure │
│ • Minimal attack surface │
│ • Automatic updates │
│ • Read-only filesystems │
├──────────────────────────────────────────────────────────────┤
│ Layer 3: Secrets Management │
│ • Encryption at rest │
│ • No secrets in code │
│ • Rotation policies │
│ • Audit logging │
├──────────────────────────────────────────────────────────────┤
│ Layer 2: Access Control │
│ • Least privilege │
│ • Multi-factor authentication │
│ • Role-based permissions │
│ • Session management │
├──────────────────────────────────────────────────────────────┤
│ Layer 1: Physical Security │
│ • Cloud provider guarantees │
│ • Multi-region distribution │
│ • Encrypted storage │
└ ─────────────────────────────────────────────────────────────┘
Application Security
Authentication
We use JWT (JSON Web Tokens) for authentication:
- Stateless: No server-side session storage
- Self-contained: Token carries user identity
- Expirable: Short-lived tokens (24 hours)
- Revocable: Token blacklist for logout
JWTs are signed with a server-side secret. Attackers can’t forge tokens without the secret, and expired tokens are automatically rejected.
Authorization
Role-Based Access Control (RBAC) defines what users can do:
- USER: Browse restaurants, place orders, manage own orders
- RESTAURANT_OWNER: All USER permissions + manage owned restaurants + view restaurant orders
- ADMIN: Full system access
Permissions are enforced at the API endpoint level. Even if a user knows an endpoint exists, they can’t access resources they don’t own.
Input Validation
All user input is treated as untrusted:
- Type validation: DTOs with strict typing
- Range checks: Numeric values within expected ranges
- Length limits: Prevent buffer overflow attempts
- Format validation: Emails, UUIDs match expected patterns
- Sanitization: Remove or escape dangerous characters
Validation happens at the API boundary, before data reaches business logic.
Network Security
Zero-Trust Architecture
We don’t trust the network—even our internal network:
- Encryption everywhere: All traffic encrypted via WireGuard (Tailscale)
- Mutual authentication: Both sides verify each other’s identity
- No implicit trust: Every connection requires explicit authorization
- Micro-segmentation: Services can only talk to required dependencies
Network Segmentation
Infrastructure divided into security zones:
Public zone (ingress only):
- Exposed to internet
- Minimal attack surface (nginx only)
- All traffic forwarded to private zone
Private zone (application servers):
- No public IPs
- Access only via Tailscale
- Can only initiate connections to data zone
Data zone (databases):
- No external access except from application servers
- Additional PostgreSQL authentication
- Encrypted storage volumes
This segmentation means compromising one zone doesn’t automatically grant access to others.
Firewall Strategy
Traditional firewalls rely on IP whitelisting. We use identity-based access control:
- Single port: Tailscale uses one UDP port for all connectivity
- No application ports exposed: Database and application ports not visible to network
- Cryptographic identity: Nodes authenticated by certificates, not IP addresses
Secrets Management
Secret Lifecycle
Secrets follow a strict lifecycle:
- Generation: Cryptographically secure random generation
- Distribution: Encrypted transmission to target systems
- Storage: Encrypted at rest, never in version control
- Usage: Runtime injection, not baked into images
- Rotation: Regular rotation with overlap period
- Revocation: Immediate invalidation on compromise
Storage
Secrets are stored encrypted using agenix:
- Encryption: Age encryption with recipient public keys
- Repository: Encrypted files stored in Git
- Decryption: Only target machines can decrypt (private keys on machines)
- Access: Each secret specifies authorized users/services
This means:
- Developers can see encrypted blobs but not plaintext
- CI/CD can deploy encrypted secrets but not read them
- Production machines decrypt their own secrets at runtime
- Compromised Git repo doesn’t expose secrets
Secret Types
Different secrets have different handling:
- Database credentials: Rotated monthly, stored per-environment
- JWT signing keys: Rotated quarterly, symmetric for performance
- API keys: Rotated on employee departure, tracked usage
- TLS certificates: Auto-renewed via Let’s Encrypt
Host Security
Immutable Infrastructure
Servers are immutable—never modified after deployment:
- No SSH access: Configuration via Nix, not manual commands
- Read-only root: Root filesystem mounted read-only
- Ephemeral storage: Local state treated as disposable
- Reproducible builds: Same Nix expression always produces same system
If a server is compromised, we don’t try to clean it—we replace it.
Attack Surface Reduction
Minimal software installed on each server:
- Single purpose: Each server runs one service
- No shells: No bash, no sshd (except for debugging)
- No compilers: No gcc, no development tools
- Minimal services: Only required systemd units
Automatic Updates
Security patches apply automatically:
- NixOS updates: System packages updated via Nix
- Rolling updates: New generation activated atomically
- Rollback: Automatic fallback if updates fail
- Rebootless: Most updates don’t require restart
Secrets in Code
What Never Goes in Code
These must never be committed to version control:
- Database passwords
- API keys (Stripe, AWS, etc.)
- JWT signing secrets
- TLS private keys
- Encryption keys
- Credentials for external services
What Can Go in Code
These are safe to commit:
- Public API endpoints
- Non-sensitive configuration (timeouts, limits)
- Default values that get overridden
- Encryption of secrets (public keys)
Detection
Pre-commit hooks and CI checks scan for:
- High-entropy strings (potential secrets)
- Known secret patterns (AWS keys, JWTs)
- Hardcoded passwords
- Private keys
Incident Response
Detection
Security events generate logs:
- Authentication failures: Failed login attempts
- Authorization failures: Access denied errors
- Anomalous patterns: Unusual traffic, query patterns
- System calls: Auditd logs for privileged operations
Logs aggregate in Loki for analysis and alerting.
Response Playbook
If compromise suspected:
- Isolate: Remove compromised nodes from load balancer
- Preserve: Capture logs and memory dumps before termination
- Analyze: Determine scope of compromise
- Rotate: Revoke and regenerate all potentially exposed secrets
- Reimage: Replace compromised servers with fresh instances
- Monitor: Enhanced monitoring for recurrence
Recovery
Recovery is fast because infrastructure is code:
- Reprovision: New servers from Nix configuration in minutes
- Data restore: From encrypted backups
- Secret rotation: Automated via deploy pipeline
- Verification: Health checks confirm clean state
Compliance Considerations
While not formally certified, DineHub design supports:
Data Protection
- Encryption at rest: Database volumes encrypted
- Encryption in transit: TLS 1.3 for all external traffic
- Access logging: Audit trails for data access
- Right to deletion: Data can be purged per request
Security Standards
Architecture aligns with:
- OWASP Top 10: Addressed through input validation, authentication, etc.
- CIS Benchmarks: NixOS configuration follows hardening guidelines
- NIST Cybersecurity Framework: Identify, protect, detect, respond, recover
Threat Model
Assumed Threats
We design against these threats:
- External attackers: Attempting to breach perimeter
- Insider threats: Malicious or compromised employees
- Supply chain: Compromised dependencies or build tools
- Cloud provider: Curious cloud administrators
- Physical theft: Stolen laptops with production access
Not Addressed
Out of scope for this project:
- Nation-state actors: Advanced persistent threats with unlimited resources
- Social engineering: Phishing, pretexting (user training issue)
- Denial of wallet: Resource exhaustion attacks (cloud billing limits)
Security Checklist
For new features, verify:
- Input validation on all user-supplied data
- Authentication required for sensitive operations
- Authorization checks enforce ownership/roles
- No secrets in code (use agenix)
- Database queries parameterized (no SQL injection)
- Output encoded (XSS prevention)
- Rate limiting prevents abuse
- Logging for security-relevant events
- Tests include security scenarios
Future Enhancements
- Web Application Firewall: Rule-based request filtering
- Bug bounty program: External security researchers
- Penetration testing: Annual third-party assessment
- Security headers: CSP, HSTS, X-Frame-Options
- Certificate pinning: Prevent MITM attacks
- Behavioral analytics: ML-based anomaly detection
Frontend Architecture
Design philosophy and architectural patterns for the user interface layer
Philosophy
The frontend follows a modern React SPA architecture designed for developer productivity, type safety, and runtime performance. We prioritize declarative UI patterns, compile-time optimizations, and minimal runtime overhead.
Technology Choices
Why Bun?
We chose Bun over Node.js for three primary reasons:
-
Unified toolchain: Bun replaces the npm/webpack/babel toolchain with a single, fast executable. This reduces configuration complexity and ensures all tools (bundler, test runner, package manager) work together seamlessly.
-
Performance: Bun’s bundler is significantly faster than webpack or Vite for our use case, reducing development feedback loops.
-
Built-in TypeScript: No additional compilation step required—TypeScript is first-class.
Why React 19?
React 19 brings several architectural improvements:
- Concurrent rendering by default: Better perceived performance through prioritized updates
- Automatic batching: Fewer re-renders without manual optimization
- Server components: Foundation for future server-side rendering if needed
- Actions: Simplified form handling and mutations
Why Tailwind CSS v4?
Tailwind v4 represents a significant architectural shift:
- PostCSS-free: No build-time CSS processing pipeline, reducing build complexity
- CSS-first configuration: Theme configuration lives in CSS rather than JavaScript
- Zero-runtime: All styles are generated at build time
- Predictable bundle size: Only used utilities are included
Application Structure
The frontend organizes code by responsibility rather than by file type:
API Layer
The API layer follows a repository pattern abstraction. Rather than making raw HTTP calls throughout components, we provide domain-specific API objects that encapsulate:
- Endpoint paths and HTTP methods
- Request/response type definitions
- Error handling conventions
- Authentication header injection
This pattern means components work with semantic methods like restaurantsApi.create(data) rather than raw fetch() calls, making the codebase more maintainable and easier to test.
State Management
We use a hybrid state approach:
- Server state (data from API): Managed by TanStack Query, which handles caching, background refetching, and optimistic updates automatically
- Client state (UI-only state): Managed by React’s built-in
useStateand Context API - Authentication state: Global context provider that persists to
localStorage
This separation prevents the common anti-pattern of over-fetching or storing server data in global state where it can become stale.
Component Architecture
Components are organized into three tiers:
- Page components: Route-level components that compose domain-specific UI
- Feature components: Reusable components specific to a domain (e.g., RestaurantCard)
- UI primitives: Generic, unstyled components from shadcn/ui (Button, Card, Input)
This three-tier architecture ensures separation of concerns: pages handle routing and data fetching, feature components handle domain logic, and primitives handle accessibility and styling.
Routing Architecture
The routing layer implements route guards for authentication:
- Public routes: Accessible to all users (landing page, login, signup)
- Protected routes: Require valid JWT token (dashboard, order placement)
- Guest-only routes: Redirect authenticated users away (login page when already logged in)
Route guards are implemented as wrapper components that check authentication state and redirect accordingly. This keeps authentication logic centralized and reusable.
Authentication Flow
The authentication system uses JWT tokens stored in localStorage with the following flow:
- User submits credentials via login form
- Backend validates and returns JWT + user metadata
- Token is stored in
localStorageand Context state - Subsequent API calls include token in Authorization header
- Protected routes check for token presence before rendering
The token has a 24-hour expiration. On app load, the Context provider checks for an existing token and restores the authentication state, providing a seamless user experience.
Build System Design
The build system is designed around Bun’s native bundler with a custom wrapper script that:
- Discovers entry points automatically (HTML files in
src/) - Applies Tailwind CSS transformation
- Generates linked sourcemaps for debugging
- Copies static assets to the output directory
- Reports bundle sizes for optimization visibility
The key architectural decision here is convention over configuration: the build script automatically finds entry points rather than requiring a configuration file, making the build process easier to understand and modify.
Styling Philosophy
Our styling approach follows utility-first CSS with semantic theming:
Utility-First
Instead of writing custom CSS classes, we compose utility classes directly in the JSX. This approach:
- Eliminates the need to name CSS classes
- Makes styling changes explicit in version control
- Prevents unused CSS from accumulating
- Enables rapid prototyping
Dark Mode
Dark mode is implemented via CSS custom properties and Tailwind’s dark: variants. The theme toggle adds/removes a dark class on the document root, which triggers CSS custom property updates throughout the component tree.
Component Styling
UI primitives (from shadcn/ui) are built on Radix UI for accessibility and Tailwind for styling. They accept a className prop for composition, allowing parent components to override or extend styles without modifying the primitive.
Data Fetching Patterns
Data fetching follows a stale-while-revalidate pattern:
- Component mounts and requests data
- TanStack Query checks the cache first
- If cached data exists (even if stale), it’s shown immediately
- Background request fetches fresh data
- UI updates with fresh data when available
This pattern provides instant UI feedback while ensuring data freshness, eliminating loading spinners for cached data.
Animation Strategy
Animations are implemented with Framer Motion for:
- Page transitions: Smooth fade/slide between routes
- Micro-interactions: Button hover states, loading indicators
- Layout animations: List reordering, expanding panels
We avoid CSS animations for complex sequences and JavaScript animations for simple hover states—Framer Motion provides the right abstraction for component-level animations while deferring to CSS for simple transitions.
Error Handling
Error handling follows a progressive enhancement model:
- API layer: Catches HTTP errors and throws typed Error objects
- TanStack Query: Catches errors and provides error state to components
- Components: Display error UI or retry controls
- Global boundary: Unhandled errors caught by error boundary showing fallback UI
This layered approach ensures errors are handled at the appropriate level of abstraction.
Development Experience
The frontend architecture prioritizes developer experience through:
- Hot reloading: Bun’s dev server provides instant updates
- Type safety: Full TypeScript coverage with strict mode
- IDE integration: Tailwind IntelliSense provides autocomplete for utility classes
- Consistent formatting: treefmt enforces formatting across the codebase
Backend Architecture
Design philosophy and architectural patterns for the service layer
Philosophy
The backend follows domain-driven design principles with a focus on type safety, explicit contracts, and defensive programming. We prioritize compile-time safety over runtime flexibility, using Java’s type system to prevent errors before they reach production.
Technology Choices
Why Spring Boot?
Spring Boot provides an opinionated framework that balances productivity with flexibility:
- Ecosystem maturity: Comprehensive libraries for security, data access, and testing
- Production-ready: Built-in metrics, health checks, and configuration management
- Community standards: Wide adoption means extensive documentation and tooling
Why GraalVM Native Image?
We compile the backend to a native executable rather than running on the JVM:
- Startup time: Sub-second startup vs. JVM warmup time, critical for auto-scaling scenarios
- Memory efficiency: Smaller memory footprint enables running on smaller instances
- Self-contained: Single executable file with no runtime dependencies
- Container-friendly: Smaller Docker images and faster cold starts
The trade-off is longer build times and some reflection limitations, which we mitigate with explicit configuration.
Why PostgreSQL with Citus?
Our database choice reflects the multi-region requirements:
- PostgreSQL: Proven reliability, ACID compliance, and extensive feature set
- Citus extension: Enables horizontal scaling by distributing data across multiple nodes
- Compatibility: Standard PostgreSQL protocol works with all existing tools
Domain Architecture
The backend organizes code around business domains rather than technical layers:
Domain Structure
Each domain (User, Restaurant, Order) contains:
- Entity: JPA-mapped data model
- Repository: Data access abstraction
- Controller: HTTP request handling
- DTOs: Data transfer objects for API contracts
This structure keeps related code together, making it easier to understand domain boundaries and modify functionality.
Entity Relationships
The domain model centers around three main entities:
- User: Authentication and identity
- Restaurant: Business listings with ownership
- Order: Transactions linking users to restaurants
The relationships are:
- User owns Restaurants (one-to-many)
- User places Orders (one-to-many)
- Restaurant receives Orders (one-to-many)
- Order contains OrderItems (embedded collection)
Value Objects vs Entities
We distinguish between entities (have identity and lifecycle) and value objects (defined by attributes):
- Entities: User, Restaurant, Order (have UUID identity)
- Value Objects: OrderItem (no identity, belongs to Order)
This distinction guides persistence decisions—entities get their own tables, value objects are embedded.
Security Architecture
Authentication Model
The system uses JWT (JSON Web Tokens) for stateless authentication:
- Tokens are signed with a server-side secret
- Tokens contain user identity and expiration
- Clients send tokens in Authorization header
- Server validates signature without database lookup
This stateless approach scales horizontally without session affinity requirements.
Authorization Patterns
Authorization follows role-based access control (RBAC):
- ROLE_USER: Standard customer permissions
- ROLE_RESTAURANT_OWNER: Can manage owned restaurants and view their orders
- ROLE_ADMIN: Full system access
Roles are checked at the controller method level using Spring Security’s method security annotations.
Security Boundaries
The security model defines clear boundaries:
- Public endpoints: No authentication required (login, registration, restaurant listing)
- Authenticated endpoints: Valid JWT required (order placement)
- Ownership endpoints: JWT + resource ownership check (order modification)
- Admin endpoints: JWT + ROLE_ADMIN required (system management)
API Design Principles
RESTful Conventions
The API follows REST conventions with some pragmatic exceptions:
- Resources map to domain entities (/restaurants, /orders)
- HTTP verbs indicate action (GET, POST, PUT, DELETE)
- Status codes convey outcome (200, 201, 400, 401, 403, 404)
- Plural nouns for collections (/restaurants not /restaurant)
Request/Response Contracts
API contracts are defined through DTOs (Data Transfer Objects):
- Input DTOs: Define valid request shape and validation rules
- Output DTOs: Control what data is exposed
- Validation annotations: Jakarta Bean Validation for input sanitization
This DTO pattern decouples the internal domain model from the public API, allowing independent evolution.
Error Handling
Error responses follow a consistent structure:
- HTTP status code indicates error category
- Response body contains human-readable message
- Validation errors include field-level details
The GlobalExceptionHandler translates exceptions to appropriate HTTP responses, ensuring clients receive consistent error formats.
Data Access Patterns
Repository Abstraction
Data access follows the Repository pattern through Spring Data JPA:
- Interfaces extend JpaRepository for CRUD operations
- Method names derive queries automatically (findByIsActiveTrue)
- Custom queries use @Query annotation for complex SQL
- Pagination returns Spring’s Page abstraction
This abstraction means controllers work with domain objects rather than SQL, making the code more testable and database-agnostic.
Transaction Boundaries
Transactions are managed at the service layer:
- Spring’s @Transactional annotation marks business operations
- Read operations use read-only transactions for optimization
- Write operations ensure atomicity across multiple database calls
Validation Strategy
Input validation occurs at multiple layers:
- DTO annotations: Jakarta Bean Validation (@NotNull, @Email, etc.)
- Controller: @Valid annotation triggers validation
- Service layer: Business rule validation
- Database constraints: Final integrity enforcement
This defense-in-depth approach catches errors as early as possible.
Order Lifecycle Design
State Machine
Orders follow a defined state machine with six states:
- PENDING: Initial state when order is created
- CONFIRMED: Restaurant has acknowledged the order
- PREPARING: Food is being prepared
- READY: Order is ready for pickup/delivery
- DELIVERED: Order has been delivered to customer
- CANCELLED: Order was cancelled
State Transition Rules
Not all transitions are valid:
- PENDING can transition to CONFIRMED, PREPARING, or CANCELLED
- CONFIRMED can transition to PREPARING
- PREPARING can transition to READY
- READY can transition to DELIVERED
- CANCELLED is terminal
These rules are enforced in the service layer, preventing invalid state changes.
Permission by State
Different states have different permissions:
- PENDING orders: Customer can cancel or modify
- Non-PENDING orders: Only restaurant owner or admin can modify
- Status changes: Only restaurant side can advance status
This reflects real-world business rules where customers have limited control after order confirmation.
Testing Strategy
Test Pyramid
Testing follows the pyramid model:
- Unit tests: Fast, isolated, test individual functions
- Integration tests: Test database interactions and API contracts
- End-to-end tests: Full request/response cycles (property-based with jqwik)
Test Slices
Spring Boot’s test slices allow targeted testing:
- @WebMvcTest: Test controllers in isolation
- @DataJpaTest: Test repositories with in-memory database
- @SpringBootTest: Full integration tests
Test Data
Tests use dedicated test data rather than production data:
- Factory methods create valid entities
- Builders allow flexible test data construction
- Each test starts with clean database state
Documentation Requirements
Javadoc Standards
All public APIs require Javadoc:
- Class-level description explains purpose
- Method-level documentation describes behavior
- Parameter and return value documented
- Exceptions and preconditions noted
The build enforces this via the Xdoclint:missing flag, failing builds with missing documentation.
OpenAPI Generation
The API documentation is generated from:
- SpringDoc annotations on controllers
- DTO schemas from class definitions
- Security scheme definitions
This ensures API docs stay synchronized with implementation.
Contract Testing with Schemathesis
Beyond traditional unit and integration tests, we use Schemathesis for property-based API testing. While unit tests verify specific inputs produce expected outputs, contract testing verifies the API adheres to its OpenAPI specification under all circumstances.
What is Schemathesis?
Schemathesis reads the OpenAPI specification and automatically generates thousands of test cases:
- Valid inputs: Ensures documented behavior matches implementation
- Edge cases: Boundary values, maximum lengths, special characters
- Invalid inputs: Malformed JSON, wrong types, missing fields
- Security cases: SQL injection attempts, XSS payloads
This catches bugs that manual test writing might miss—developers tend to test “happy paths” while Schemathesis explores the entire input space.
Testing Philosophy
Schemathesis operates on a simple principle: if the API claims to accept certain inputs in its OpenAPI spec, it must handle them gracefully. This creates a contract between API provider and consumers:
- For providers: Any change that breaks Schemathesis tests is a breaking change
- For consumers: Can rely on documented behavior being accurate
- For both: Reduces integration surprises
Integration in CI
Schemathesis runs automatically during nix flake check:
- Build the backend and generate OpenAPI spec
- Start the backend in a test VM
- Run Schemathesis against the running API
- Fail the build if any tests fail
This ensures the OpenAPI specification remains accurate and the implementation handles edge cases correctly.
Configuration
Schemathesis is configured to:
- Generate ASCII-only test data (avoiding HTTP header encoding issues)
- Exclude certain endpoints that require external services (Google OAuth)
- Skip stateful operations that would invalidate subsequent tests (logout)
- Use automatic parallelism based on CPU cores
The configuration lives in schemathesis.toml at the project root.
Deployment Architecture
Native Binary
The application compiles to a native binary that:
- Contains the Spring Boot application + embedded Tomcat
- Includes all dependencies statically linked
- Runs without JVM installation
- Starts in milliseconds
Service Configuration
The binary runs as a systemd service:
- Automatic restart on failure
- Environment variables for configuration
- Health check endpoint for load balancers
- Graceful shutdown handling
Database Migrations
Schema changes are managed through:
- Flyway migrations in version control
- Automatic execution on startup
- Rollback scripts for recovery
- Compatibility with distributed Citus schema
Nix Build System
Philosophy and architectural patterns for reproducible builds and declarative infrastructure
Philosophy
Nix is not just a package manager—it’s a fundamentally different approach to software construction. We treat the entire system as a pure function: given the same inputs (source code + dependencies), we always produce the same outputs (binaries + configurations).
Core Concepts
What is Reproducibility?
Traditional build systems produce different outputs based on:
- System libraries installed on the build machine
- Environment variables and PATH
- Network state during dependency resolution
- Implicit dependencies not declared in the build file
Nix eliminates these variables by:
- Isolating builds in clean environments with only declared dependencies
- Locking all inputs including transitive dependencies and their hashes
- Content-addressable storage where outputs are named by their content hash
- No global state—each build starts from a pristine environment
The Flake Paradigm
A flake is a self-contained, versioned package description:
- Declarative: Build instructions written in Nix expression language
- Reproducible:
flake.lockpins every dependency to exact versions - Composable: Other flakes can depend on your flake
- Hermetic: No access to the outside world during builds
This means a build that succeeds on one developer’s machine will succeed identically on CI and production.
Architecture Layers
The Nix architecture separates concerns into four layers:
Layer 1: Package Definitions
Purpose: Describe how to build software from source
Packages define:
- Source location (Git repository, local path, etc.)
- Build dependencies (compilers, libraries, tools)
- Build script (configure, make, install equivalents)
- Runtime dependencies (libraries needed at runtime)
Key insight: Packages are values in a functional language. They don’t execute—they describe what would be built.
Layer 2: Development Environment
Purpose: Provide a shell with all tools needed for development
The devShell provides:
- Exact versions of compilers and build tools
- Project-specific utilities (formatters, linters)
- Environment variables and shell hooks
- Isolation from host system packages
When you run nix develop, you enter a subshell where java, bun, and other tools are exactly as specified—regardless of what’s installed on your laptop.
Layer 3: Process Composition
Purpose: Orchestrate multi-service local development
Process-compose replaces Docker Compose for local development:
- Declares which processes to run (backend, frontend, database)
- Manages dependencies between services
- Provides unified logging and monitoring
- Restart policies for failed processes
Unlike Docker, processes run natively on the host—no virtualization overhead, faster startup, and easier debugging.
Layer 4: System Configuration
Purpose: Define entire NixOS machines
NixOS modules describe:
- Operating system configuration (users, networking, services)
- Service definitions with systemd units
- Security hardening and firewall rules
- Secrets management integration
These configurations are deployed to create reproducible infrastructure—prod server #1 is identical to prod server #2 because both are built from the same expression.
Dependency Management
Lock Files
Nix flakes generate flake.lock files that pin:
- Direct flake inputs (nixpkgs version)
- Transitive dependencies (libraries your dependencies use)
- Git revisions and content hashes
This means even if nixpkgs updates a library, your build continues using the pinned version until you explicitly update the lock file.
Supply Chain Security
Nix provides multiple layers of supply chain protection:
- Source verification: Dependencies are fetched by content hash, not just URL
- Reproducible builds: Same source always produces same output
- Binary caches: Signed pre-built binaries reduce compilation time
- Sandboxing: Builds cannot access the network or modify files outside their directory
If a dependency’s content doesn’t match the expected hash, the build fails rather than accepting a potentially compromised package.
Build Isolation
The Sandbox
Nix builds run in isolated environments that:
- Have no network access
- See only explicitly declared dependencies
- Start with an empty filesystem (except the source)
- Cannot write outside their output directory
This isolation catches missing dependencies that would work on your laptop (where you have tools installed) but fail in CI.
Pure Functions
Builds are pure functions—they depend only on their inputs:
buildPackage(source, dependencies, buildScript) => output
The same inputs always produce the same output, enabling:
- Caching: If inputs haven’t changed, reuse previous output
- Sharing: Multiple users can share the same built package
- Verification: Rebuild and verify outputs match expectations
Development Workflow
Entering the Environment
When you run nix develop, Nix:
- Evaluates the devShell expression
- Builds any missing tools
- Sets up environment variables
- Spawns a new shell with modified PATH
- Runs shell hooks (e.g., setting FLAKE_ROOT)
The resulting shell has exactly the tools needed—no more, no less.
Incremental Builds
During development, Nix provides:
- Incremental compilation: Only changed files rebuild
- Development shells: Different shells for different tasks
- Direnv integration: Automatically enter devShell when entering project directory
Testing Changes
The nix flake check command runs the full CI pipeline locally:
- Builds all packages (backend, frontend, docs)
- Runs unit and integration tests
- Checks formatting compliance
- Validates NixOS configurations
- Runs VM-based integration tests
This means “works on my machine” is actually meaningful—the exact same checks run locally and in CI.
Deployment Architecture
NixOS Systems
NixOS is a Linux distribution where everything is configured through Nix expressions:
- System packages: Installed via Nix, not apt/yum
- System services: Defined as systemd units in Nix
- Configuration files: Generated by Nix templates
- Users and groups: Declared in Nix, not useradd
A NixOS machine is built by evaluating a Nix expression that returns a complete system configuration.
Deploy-rs
Deploy-rs is the deployment tool that:
- Builds the system configuration locally
- Copies closure (package + dependencies) to remote machine
- Activates the new system configuration
- If activation fails, automatically rolls back
- Confirms success or triggers rollback
This means failed deployments are atomic—the system either fully activates or reverts to the previous state.
Secrets Management
Secrets are managed separately from configuration:
- Encrypted at rest: Secrets stored encrypted in Git
- Decrypted at activation: Age/ragenix decrypts on target machine
- Available as files: Services read secrets from filesystem
- Never in Nix store: Unencrypted secrets never touch the world-readable store
This separation means configuration can be public while secrets remain encrypted.
Networking and Infrastructure
Tailscale Mesh
The infrastructure uses Tailscale for private networking:
- Mesh topology: Every node connects to every other node directly
- WireGuard encryption: All traffic encrypted with modern crypto
- Headscale control: Self-hosted coordination server
- MagicDNS: Private DNS resolution for internal services
This architecture means services communicate over encrypted tunnels without public IPs or complex firewall rules.
Service Discovery
Services find each other through:
- DNS names: headscale provides internal DNS
- Static IPs: Tailscale assigns stable IPs in the 100.x.x.x range
- NixOS module coordination: Services configured to know about each other
No load balancers or service meshes required—just direct encrypted connections.
CI/CD Integration
The Check Pipeline
nix flake check is the universal CI command:
- Build verification: All packages compile successfully
- Test execution: Unit, integration, and property-based tests
- Formatting validation: All code follows project standards
- Linting: Static analysis catches potential issues
- VM tests: Full system integration tests in VMs
Caching Strategy
Nix provides multiple caching layers:
- Local store: Already-built packages on your machine
- Binary cache: Shared cache (Garnix, Cachix) for common packages
- Build cache: CI artifacts reused between builds
This means builds are incremental—you only rebuild what changed, not the world.
Troubleshooting and Debugging
Build Failures
When builds fail, Nix provides:
- Complete build logs with all commands executed
- Environment variable dumps
- Option to keep failed build directory for inspection
--show-tracefor detailed evaluation traces
Development Mode
For debugging build issues:
nix developenters the build environmentgenericBuildruns the build phases interactively- Failed phases can be re-run with modifications
Why Reproducibility Matters
Reproducibility isn’t just a nice property—it enables:
- Bisecting: Git bisect works because old commits still build
- Security auditing: Rebuild and verify package contents
- Disaster recovery: Infrastructure rebuilt from Git in minutes
- Team consistency: Everyone uses exact same tools
- CI confidence: Local build success predicts CI success
When to Use Nix
Nix excels when you need:
- Reproducible builds across environments
- Declarative configuration that can be versioned
- Hermetic builds that don’t depend on system state
- Atomic upgrades with rollback capability
- Cross-language projects with unified tooling
Nix adds complexity when:
- Simple projects with few dependencies
- Teams unfamiliar with functional programming
- Need for rapid iteration over reproducibility
- Integration with non-Nix build systems
For this project, the complexity is justified by the multi-language nature (Java + TypeScript + Nix) and the production deployment requirements.
openapi Docs
Placeholder for openapi docs - this gets filled out properly by the nix build
Frontend Docs
Placeholder for frontend docs - this gets filled out properly by the nix build
Backend Docs
Placeholder for backend docs - this gets filled out properly by the nix build
Testing Strategy
The project uses a layered testing approach:
- Unit Tests — JUnit 5 for backend, Bun test runner for frontend
- Integration Tests — Spring Boot Test with Testcontainers
- Property-Based Tests — jqwik for generative testing, Schemathesis for API contract testing
- VM Tests — Full system integration in NixOS virtual machines