Performance & Scalability

Spec Source: Document 13 — Performance & Scalability | Last Updated: February 2026

Overview

This document defines the performance targets, caching strategy, database optimization approach, media handling pipeline, monitoring architecture, and scalability plan for the DoCurious platform. These are engineering benchmarks — they inform infrastructure decisions, set contractual SLAs, and provide the goalposts that every backend and frontend change is measured against.

DoCurious serves a school-heavy user base with predictable traffic patterns: weekday mornings ramp up with school logins, afternoons peak with post-school personal usage, and September/January onboarding windows create seasonal spikes. The architecture is designed to handle these patterns efficiently at every growth phase from 500-user beta through 500K+ mature platform.

STATUS: PARTIAL

The frontend implements route-level code splitting via React.lazy (50+ lazy-loaded page components), Suspense-based loading states with PageSkeleton, tree shaking through Vite 7's production build, and Tailwind CSS purging. No backend, CDN, caching layer, monitoring infrastructure, or load testing pipeline exists yet — the platform runs entirely against an in-memory mock API. Performance targets from the spec are documented here as the contracts the future backend must meet.

How It Works

Load Projections

DoCurious plans for five growth phases. Each phase determines infrastructure sizing, auto-scaling thresholds, and cost projections.

Phase	Timeline	Total Users	DAU	Schools	Vendors	Avg req/s	Peak req/s
Beta	Months 1--3	500	100	5	10	5	20
Launch	Months 3--6	5,000	1,000	25	50	50	200
Growth	Months 6--12	25,000	5,000	100	200	250	1,000
Scale	Year 2	100,000	20,000	500	500	1,000	5,000
Mature	Year 3+	500,000+	100,000+	2,000+	2,000+	5,000	25,000

Traffic follows a school-driven daily pattern: near-zero overnight, morning ramp at 7--9 AM (school logins, teacher assignments), midday plateau at 9 AM--3 PM (steady school usage), afternoon peak at 3--6 PM (post-school personal usage, TR uploads), moderate evening at 6--10 PM (parent dashboards), then wind-down. Weekdays carry higher traffic than weekends. September and January see onboarding spikes from new school-year starts.

STATUS: PLANNED

Load projections are spec targets. No production traffic data exists yet.

Response Time Targets

Every API endpoint and page transition has latency contracts at three percentiles. These are the numbers the backend must meet and the monitoring system must enforce.

Operation	p50 Target	p95 Target	p99 Target
Page load (initial)	< 1.5s	< 3.0s	< 5.0s
Page load (subsequent / SPA navigation)	< 300ms	< 800ms	< 1.5s
API response (simple read)	< 50ms	< 150ms	< 300ms
API response (complex query)	< 150ms	< 500ms	< 1.0s
Search query	< 100ms	< 200ms	< 500ms
Recommendation feed	< 150ms	< 300ms	< 600ms
Image upload (acknowledgment)	< 200ms	< 500ms	< 1.0s
Image processing (async)	< 5s	< 15s	< 30s
Notification delivery (in-app)	< 500ms	< 1.0s	< 2.0s
Email delivery (to ESP)	< 2s	< 5s	< 10s

STATUS: PLANNED

These targets are engineering contracts. No backend exists to measure against them.

Web Vitals Targets

Frontend performance is measured against Core Web Vitals. These targets align with Google's "Good" thresholds and ensure the app feels fast on school Chromebooks and budget phones alike.

Metric	Target	What It Measures
LCP (Largest Contentful Paint)	< 2.5s	When main content loads
FID (First Input Delay)	< 100ms	Responsiveness to first interaction
CLS (Cumulative Layout Shift)	< 0.1	Visual stability — no layout jank
TTFB (Time to First Byte)	< 600ms	Server response time
FCP (First Contentful Paint)	< 1.8s	When first content appears on screen

Availability Targets

Metric	Target
Uptime	99.9% (8.7 hours downtime/year max)
Planned maintenance window	< 2 hours/month, scheduled outside school hours
Mean time to detect (MTTD)	< 5 minutes
Mean time to recover (MTTR)	< 1 hour (critical), < 4 hours (non-critical)

Throughput Targets

Resource	Target
Concurrent users (per server)	500
WebSocket connections (if used)	10,000 per node
File uploads (concurrent)	100 per server
Background job processing	1,000 jobs/minute
Email sends	10,000/hour
Push notifications	50,000/hour

Caching Strategy

Cache Layers

The caching architecture has four tiers. Each request passes through all layers before reaching the database.

[Browser Cache] → [CDN] → [Application Cache (Redis)] → [Database]

Browser cache. Static assets (JS, CSS, images) get Cache-Control: max-age=1year with content-hash filenames — Vite already generates hashed filenames in production builds. API responses use no-cache for dynamic data and short TTLs for semi-static content (categories, featured challenges). A Service Worker caches offline-capable assets.

CDN. All static assets are served via CDN (CloudFront, Cloudflare, or equivalent). User-uploaded media is served via CDN with signed URLs for access control. Geographic distribution reduces latency for users across different school districts. Cache invalidation uses versioned filenames for assets and explicit purge for media.

Application cache (Redis). Hot data lives in Redis with TTL-based expiration. This layer absorbs the vast majority of read traffic.

Data	TTL	Invalidation Trigger
User session	30 days	Logout, password change
User profile	15 minutes	Profile update
Challenge detail	1 hour	Challenge edit
Challenge quality scores	24 hours	Daily recompute
Recommendation feeds	1 hour	Recompute
Popular/trending feeds	6 hours	Daily recompute
Category list	24 hours	Category change
Search autocomplete	6 hours	Index update
Notification count (unread)	Real-time	New notification / read
Feature flags	5 minutes	Flag change
Rate limit counters	Per window	Automatic expiry

Database query cache. Prepared statement caching, connection pooling via PgBouncer, and materialized views for expensive aggregations. Analytics views refresh every 15 minutes; leaderboards refresh every hour.

STATUS: PLANNED

No caching infrastructure exists. The frontend runs against in-memory mock data. Vite's content-hashed filenames are the only cache-related behavior currently in place.

Cache Warming

On deployment or cache flush, the system pre-warms critical data to avoid cold-cache latency spikes:

Pre-warm category list
Pre-warm featured content
Pre-warm popular/trending feeds
Pre-warm top 1,000 challenge details
Stagger warming requests to avoid thundering herd

Cache Invalidation Patterns

Pattern	When Used	Example
Write-through	Update cache on write	User profile, challenge detail
TTL-based	Let cache expire naturally	Feeds, scores
Event-driven	Invalidate on event	Notification count on new notification
Never cache	Sensitive data	Authentication tokens, PII lookups

Database Optimization

Database Selection

Primary: PostgreSQL. ACID compliance for transactional data, full-text search for initial search implementation, JSON/JSONB support for flexible schemas, row-level security for data isolation, and mature replication and backup tooling. The local development database (docurious_prod) runs PostgreSQL 17.

Cache/Queue: Redis. Session storage, application cache, rate limiting, job queues, and real-time notification counts.

Future (at scale): Elasticsearch or Meilisearch for search when PostgreSQL FTS is insufficient, a data warehouse for analytics separate from the transactional DB, and read replicas for query distribution.

Indexing Strategy

Every table gets primary key, created_at, and foreign key indexes. Critical query indexes are designed around the actual access patterns:

Table	Index	Supports
`challenges`	`(status, category_id, created_at)`	Browse by category
`challenges`	`(vendor_id, status)`	Vendor dashboard
`challenges`	GIN index on search vector	Full-text search
`track_records`	`(user_id, status, created_at DESC)`	My Track Records
`track_records`	`(challenge_id, status)`	Challenge TR gallery
`track_records`	`(status, created_at)`	Verification queue
`notifications`	`(user_id, read, created_at DESC)`	Notification center
`assignments`	`(class_id, due_date, status)`	Student assignments
`users`	`(email) UNIQUE`	Login lookup
`community_members`	`(community_id, user_id) UNIQUE`	Membership check
`events`	`(challenge_id, start_time)`	Event calendar

Partial indexes improve performance for common filtered queries: track_records WHERE status = 'pending_review' (verification queue), challenges WHERE status = 'active' (exclude archived from browse), notifications WHERE read = false (unread count).

Query Optimization Guidelines

No N+1 queries — always use eager loading or joins for related data
Cursor-based pagination for infinite scroll, offset-based for admin tables
Maximum query time — 500ms hard limit, alert on > 200ms
EXPLAIN ANALYZE on all new queries touching large tables
No SELECT * — always specify columns
Limit result sets — maximum 100 per page, 50 default

Connection Management

Setting	Value
Connection pooling	PgBouncer (recommended)
Pool size	20--50 connections per instance
Connection timeout	5 seconds
Query timeout	30 seconds (hard kill)
Idle connection cleanup	10 minutes

Data Partitioning (At Scale)

When tables exceed approximately 50 million rows, partition by month on created_at: audit logs, notifications, analytics events, search analytics. Track records may be partitioned by year if volume warrants it.

Image & Media Optimization

Image Pipeline

[User Upload] → [Validation] → [Virus Scan] → [EXIF Strip]
             → [Store Original] → [Generate Variants] → [CDN]

Five image variants are generated asynchronously for every upload:

Variant	Dimensions	Quality	Use
Thumbnail	150 x 150 (crop)	80%	Grid views, lists
Small	400px wide	85%	Card previews
Medium	800px wide	85%	Detail views
Large	1600px wide	90%	Full-screen view
Original	As uploaded	100%	Download, export

Processing uses WebP format with JPEG fallback. All images below the fold use lazy loading. Responsive images are served via srcset.

Upload Limits

User Type	Max File Size	Max Files per TR	Daily Upload Limit
General user	50 MB	20	500 MB
Student (under 13)	25 MB	10	200 MB
Teacher	50 MB	20	500 MB
Vendor	100 MB	50	2 GB
Admin	100 MB	Unlimited	Unlimited

Video Handling

Videos are not hosted on DoCurious. Users provide links (YouTube, Vimeo, etc.) and the platform embeds via oEmbed or iframe. Thumbnails are extracted from the embed provider. Direct video upload with transcoding is a planned enhancement.

Storage Architecture

Cloud object storage (S3 or equivalent) with bucket structure: /{environment}/{content-type}/{user-id}/{file-id}. Content types: avatars, challenge-covers, track-record-media, exports. Lifecycle policies delete orphaned files after 30 days. Cross-region replication provides disaster recovery.

STATUS: PLANNED

The frontend MediaUpload component handles client-side file selection and preview. No server-side image processing, CDN delivery, or storage backend exists.

Frontend Performance Budget

Bundle Strategy

STATUS: BUILT

The frontend implements route-level code splitting with React.lazy for 50+ page components, Suspense boundaries with PageSkeleton fallbacks, and Vite 7 production builds with tree shaking and Tailwind CSS purging. Auth pages (Login, Register, ForgotPassword, ResetPassword, VerifyEmail) are eagerly loaded for fast initial render; everything else is lazy.

Code splitting by route ensures each page loads only its dependencies. The router at src/routes/index.tsx lazy-loads every page component except the five auth pages that are kept eager for fast initial login.

Resource	Budget
HTML document	< 50 KB
CSS (total)	< 100 KB gzipped
JS (initial route)	< 200 KB gzipped
Images (above fold)	< 500 KB total
Web fonts	< 100 KB total
Total page weight (initial)	< 1 MB
HTTP requests (initial)	< 30

Target total JS across all routes: < 1 MB gzipped.

Asset Optimization

Images: WebP with JPEG fallback, responsive srcset
Fonts: System font stack preferred; custom fonts use font-display: swap
CSS: Critical CSS inlined, remainder async loaded
Icons: SVG sprite or icon font (no individual image requests)
Preload: Critical resources via <link rel="preload">
Prefetch: Likely next navigation via <link rel="prefetch">

Rendering Strategy

Page Type	Strategy	Examples
Landing, About, Privacy, Terms	Static generation	`Landing.tsx`, `PrivacyPolicy.tsx`, `TermsOfService.tsx`
Challenge browse, category pages	Incremental Static Regeneration	`Explore.tsx`, `ExploreCategoryView.tsx`
Dashboard, notifications, admin	Client-side rendering	`Dashboard.tsx`, `Notifications.tsx`
All pages (initial load)	Server-side rendering for SEO	Full SSR pass

STATUS: PARTIAL

The current build is a pure client-side SPA (Vite + React). SSR, static generation, and ISR are planned architectural changes that will require a framework migration (e.g., to Next.js or Remix) or a custom SSR layer.

Background Job Architecture

Job Categories

Category	Queue	Priority	Concurrency
Email delivery	`email`	Medium	10 workers
Push notification	`push`	Medium	5 workers
Image processing	`media`	Low	5 workers
Search index update	`search`	Low	3 workers
Recommendation recompute	`recommendations`	Low	2 workers
Analytics aggregation	`analytics`	Low	2 workers
Data export generation	`exports`	Low	2 workers
Scheduled notifications	`scheduler`	Medium	3 workers
Cleanup (expired tokens, etc.)	`maintenance`	Low	1 worker

Processing Requirements

At-least-once delivery with idempotent job design
Dead letter queue for failed jobs after 3 retries
Exponential backoff on retry: 1s, 10s, 60s
Job timeout: 5 minutes default, 30 minutes for exports and analytics
Alert on: queue depth > 1,000, failure rate > 5%, processing latency > 5 minutes

Scheduled Jobs

Job	Schedule	Purpose
Daily analytics aggregation	2:00 AM UTC	Compute daily metrics
Challenge quality score recompute	3:00 AM UTC	Update quality/trending scores
Recommendation feed generation	Every hour	Refresh user feeds
Popular/trending feed update	4:00 AM UTC	Update global feeds
Digest email generation	6:00 AM per timezone	Weekly/daily digests
Streak check	12:01 AM per timezone	Evaluate streak status
Expired token cleanup	1:00 AM UTC	Remove expired sessions, reset tokens
Orphaned file cleanup	5:00 AM UTC (weekly)	Remove unattached uploads
Backup verification	6:00 AM UTC (weekly)	Verify backup integrity
Leaderboard recompute	Every hour	Update leaderboard rankings
School health score update	4:00 AM UTC (daily)	Recompute school health
Data retention cleanup	3:00 AM UTC (monthly)	Remove expired data per retention policy

STATUS: PLANNED

No background job infrastructure exists. All scheduled computations will need a queue system (SQS, BullMQ, or equivalent) and worker processes.

Monitoring & Alerting

Application Monitoring

Metrics to track per endpoint: request rate, response time (p50, p95, p99), error rate (4xx, 5xx), active users (real-time), database query time per query pattern, cache hit/miss ratio, and background job queue depth and latency.

Recommended tooling: APM via Datadog, New Relic, or open-source Grafana + Prometheus. Error tracking via Sentry. Log aggregation via ELK stack, Datadog Logs, or CloudWatch.

Infrastructure Monitoring

Metric	Alert Threshold
CPU utilization	80%
Memory utilization	85%
Disk utilization	80%
Database connections (active/idle/waiting)	90% pool used
Redis memory usage	80%
SSL certificate expiry	< 14 days

Alerting Rules

Alert	Condition	Severity	Channel
High error rate	5xx rate > 1% for 5 min	Critical	Pager + Slack
Slow responses	p95 > 2s for 10 min	High	Slack
Database connection exhaustion	> 90% pool used	Critical	Pager + Slack
Disk space low	> 80% used	High	Slack
Job queue backing up	Depth > 1,000 for 15 min	High	Slack
Certificate expiring	< 14 days	Medium	Email
Memory pressure	> 90% for 5 min	High	Slack
Zero traffic	No requests for 5 min	Critical	Pager

Operations Dashboards

Three dashboards are specified:

Operations Dashboard: Request rate, error rate, response time (real-time), active users, system health (CPU, memory, disk), recent deployments marked on timeline.

Database Dashboard: Query rate and latency, connection pool status, slow query log, replication lag.

Background Jobs Dashboard: Queue depths by category, processing rate, failure rate, recent failures with details.

STATUS: PLANNED

No monitoring infrastructure exists. The frontend has a dev-only Debug Panel (Ctrl+Shift+D) with a Network Log tab that records mock API calls — this is a development tool, not production monitoring.

Scalability Architecture

Horizontal Scaling

Component	Scaling Approach	Trigger
Web/API servers	Auto-scale based on CPU / request rate	CPU > 70% or requests > 80% capacity
Background workers	Scale by queue depth	Queue depth > 500
Database (reads)	Add read replicas	Read latency > 100ms or CPU > 70%
Database (writes)	Vertical scale first, partition later	Write latency > 50ms
Redis	Cluster mode at scale	Memory > 80%
File storage	Managed service (auto-scales)	N/A
CDN	Managed service (auto-scales)	N/A

Architecture Scaling Tiers

Small (up to 5K users): Single application server (or 2 for redundancy), single PostgreSQL instance, single Redis instance, managed file storage (S3), CDN for static assets.

Medium (5K--50K users): Auto-scaling application servers (2--5 instances), PostgreSQL with read replica, Redis with persistence, dedicated background worker instances, Elasticsearch for search replacing PostgreSQL FTS.

Large (50K--500K users): 5--20 application server instances, PostgreSQL primary + multiple read replicas, Redis cluster, dedicated analytics data warehouse, dedicated search cluster, microservice extraction for high-traffic paths (notifications, recommendations).

Database Read/Write Split

At the Medium tier and above: write operations go to the primary database, read operations go to read replica(s). Application-level routing handles the split. Replication lag is monitored with alerts if it exceeds 1 second. Critical reads (authentication, authorization) always hit the primary.

API Design Standards

Response Format

All API responses follow a consistent envelope:

Success:

json

{
  "data": { ... },
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 150,
    "total_pages": 8
  }
}

Error:

json

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Human-readable description",
    "details": [
      { "field": "email", "message": "Invalid email format" }
    ]
  }
}

Pagination

Cursor-based (for feeds, infinite scroll): ?cursor={opaque_token}&limit=20. Response includes next_cursor (null if no more results). Default limit 20, max 100.

Offset-based (for admin tables, finite lists): ?page=1&per_page=20. Response includes total count and total pages. Default per_page 20, max 100.

Rate Limiting

Every response includes rate limit headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704067200

Load Testing

Test Scenarios

Scenario	Target	Tool
Sustained load	Handle expected daily traffic for 1 hour	k6, Locust, or Artillery
Peak load	Handle 3x average load for 15 min	Same
Spike test	Handle sudden 10x traffic for 5 min	Same
Soak test	Handle average load for 24 hours (memory leaks)	Same
School onboarding surge	500 students login simultaneously	Same
Media upload burst	100 concurrent uploads	Same

Test Frequency

Before major releases: full suite
Weekly: sustained load test (automated)
Monthly: peak and spike tests
Quarterly: soak test

Test Environment

Staging environment matching production topology, synthetic data at 2x expected production volume, production-like database size. Results are compared against the response time and throughput targets defined above.

STATUS: PLANNED

No load testing infrastructure or test scripts exist.

Design Decisions

Why cursor-based pagination for feeds? Offset-based pagination breaks when new content is inserted — users see duplicates or miss items. Cursor-based pagination provides stable, consistent results for infinite scroll feeds. Offset-based is kept for admin tables where total count and page navigation are needed.

Why no direct video hosting? Video transcoding, storage, and delivery are expensive and complex. YouTube and Vimeo handle it well. By embedding rather than hosting, DoCurious avoids significant infrastructure cost and complexity while still allowing video evidence in Track Records. Direct upload is a future enhancement.

Why Redis for sessions instead of JWTs? Server-side sessions in Redis allow immediate invalidation on logout or password change. JWTs cannot be revoked before expiry without a blocklist — which is effectively reimplementing server-side sessions. Redis sessions are simpler and more secure for the security model DoCurious needs.

Why PostgreSQL FTS before Elasticsearch? PostgreSQL's built-in full-text search is sufficient for the Launch and early Growth phases. It avoids the operational overhead of a separate search cluster. The migration to Elasticsearch or Meilisearch is planned for when FTS query performance degrades at scale.

Why eager-load auth pages but lazy-load everything else? Login, registration, and password reset are the first screens users see. Lazy-loading them would add a visible loading spinner to the very first interaction. All other pages benefit from code splitting because users only visit a subset of the 55+ pages in any session.

Technical Implementation

Current Frontend Performance Stack

Layer	Implementation	File
Route code splitting	`React.lazy` + `Suspense` for 50+ page components	`src/routes/index.tsx`
Loading fallbacks	`PageSkeleton` component during lazy load	`src/components/common/PageSkeleton.tsx`
Error boundaries	`ErrorBoundary` wrapping route segments	`src/components/common/ErrorBoundary.tsx`
Build tooling	Vite 7 with tree shaking, minification, content hashing	`vite.config.ts`
CSS optimization	Tailwind CSS 4 with automatic purging of unused classes	`@tailwindcss/vite` plugin
Component memoization	`useMemo` / `useCallback` / `memo` used across 52 files	Various components
Dev performance inspection	Debug Panel with Network Log tab (dev-only)	`src/components/debug/`
Type checking	TypeScript strict mode for compile-time safety	`tsconfig.json`
Testing	Vitest with jsdom for unit/component tests	`vitest` config

What Needs Building

Feature	Priority	Dependencies
Backend API with response time enforcement	Critical	Express/Fastify server, PostgreSQL
Redis caching layer with TTL strategy	Critical	Redis instance, backend API
CDN configuration for static + media assets	Critical	Cloud provider (AWS/Cloudflare)
Image processing pipeline (variant generation)	High	Sharp/Pillow, object storage, background workers
Connection pooling (PgBouncer)	High	PostgreSQL deployment
Database indexing per spec strategy	High	Backend ORM / migration tooling
Background job queue (BullMQ, SQS)	High	Redis or SQS, worker processes
APM + error tracking (Sentry, Datadog)	High	Production deployment
Rate limiting middleware	Medium	Redis, API gateway
Load testing scripts and CI pipeline	Medium	k6 or Locust, staging environment
SSR / static generation for SEO pages	Medium	Framework migration or custom SSR
Service Worker for offline assets	Low	Frontend PWA setup
Elasticsearch migration for search	Low	Scale-phase trigger
Data partitioning for large tables	Low	50M+ row threshold

Explore & Discovery — Search and recommendation performance targets (< 100ms search, < 150ms recommendations) drive database indexing and caching strategy
Challenges — Challenge browse, detail, and gallery pages are the highest-traffic read paths that caching and CDN must optimize
Track Records — TR media uploads drive the image processing pipeline, storage architecture, and upload limit enforcement
Notifications — Notification delivery infrastructure (50K push/hour, 10K email/hour) requires background job queues and real-time cache invalidation
Gamification — Leaderboard recomputation, XP aggregation, and streak checks are scheduled background jobs with materialized view dependencies
School Administration — School onboarding surges (500 simultaneous student logins) are a key load testing scenario; school health scores are daily scheduled recomputes
Vendor — Vendor analytics dashboards rely on materialized views refreshed every 15 minutes
Accounts — Authentication, session management, and rate limiting are performance-critical paths that always hit the primary database

Performance & Scalability ​

Overview ​

How It Works ​

Load Projections ​

Response Time Targets ​

Web Vitals Targets ​

Availability Targets ​

Throughput Targets ​

Caching Strategy ​

Cache Layers ​

Cache Warming ​

Cache Invalidation Patterns ​

Database Optimization ​

Database Selection ​

Indexing Strategy ​

Query Optimization Guidelines ​

Connection Management ​

Data Partitioning (At Scale) ​

Image & Media Optimization ​

Image Pipeline ​

Upload Limits ​

Video Handling ​

Storage Architecture ​

Frontend Performance Budget ​

Bundle Strategy ​

Asset Optimization ​

Rendering Strategy ​

Background Job Architecture ​

Job Categories ​

Processing Requirements ​

Scheduled Jobs ​

Monitoring & Alerting ​

Application Monitoring ​

Infrastructure Monitoring ​

Alerting Rules ​

Operations Dashboards ​

Scalability Architecture ​

Horizontal Scaling ​

Architecture Scaling Tiers ​

Database Read/Write Split ​

API Design Standards ​

Response Format ​

Pagination ​

Rate Limiting ​

Load Testing ​

Test Scenarios ​

Test Frequency ​

Test Environment ​

Design Decisions ​

Technical Implementation ​

Current Frontend Performance Stack ​

What Needs Building ​

Related Features ​

Performance & Scalability

Overview

How It Works

Load Projections

Response Time Targets

Web Vitals Targets

Availability Targets

Throughput Targets

Caching Strategy

Cache Layers

Cache Warming

Cache Invalidation Patterns

Database Optimization

Database Selection

Indexing Strategy

Query Optimization Guidelines

Connection Management

Data Partitioning (At Scale)

Image & Media Optimization

Image Pipeline

Upload Limits

Video Handling

Storage Architecture

Frontend Performance Budget

Bundle Strategy

Asset Optimization

Rendering Strategy

Background Job Architecture

Job Categories

Processing Requirements

Scheduled Jobs

Monitoring & Alerting

Application Monitoring

Infrastructure Monitoring

Alerting Rules

Operations Dashboards

Scalability Architecture

Horizontal Scaling

Architecture Scaling Tiers

Database Read/Write Split

API Design Standards

Response Format

Pagination

Rate Limiting

Load Testing

Test Scenarios

Test Frequency

Test Environment

Design Decisions

Technical Implementation

Current Frontend Performance Stack

What Needs Building

Related Features