Processes 160M+ order records with static Parquet architecture achieving sub-second dashboard loads
💡Business Impact: Provides executives instant access to business intelligence that previously required manual database queries
An executive analytics platform I built from scratch when existing solutions couldn't handle the scale and constraints. After investigating the query timeouts with 160M+ records, I realized we needed a completely different approach than traditional database-API-frontend architecture. My breakthrough was designing a static file system that transforms expensive API infrastructure into compressed Parquet files served directly to browsers. This approach eliminated thousands in monthly hosting costs while achieving instant dashboard performance.
MySQL 5.7 → Python/Polars processing → Compressed Parquet files → Direct frontend loading via hyparquet.
I built timestamp-based change detection that processes only modified records, reducing 20-minute full scans to 5-minute targeted updates.
JSON index files map date ranges to Parquet row groups, enabling instant filtering without API calls.
Two-repository system where my backend writes processed files directly to frontend repository, deployed as static assets on Vercel, eliminating API hosting costs while maintaining sub-second performance.
Ensured type safety across the entire full-stack application
Built complex dashboards with 13 internal views for data visualization
Implemented complex routing and data fetching patterns
Advanced query optimization on legacy MySQL 5.7 with read-only constraints
Built secure REST APIs with authentication and private routing
Designed multi-tenant database architecture for B2B customers
Optimized data processing from 10+ min to under 5 min for 160M+ records
Reduced query times by 50% through advanced indexing and query restructuring
Created intuitive charts and graphs for complex business metrics
Deployed scalable multi-tenant architecture on AWS
Impact: This caching system I built reduced processing time from 20 minutes to 5 minutes by implementing timestamp-based incremental updates with 48-hour lookback windows, eliminating unnecessary full table scans while ensuring data completeness.
Performance improvement gained by migrating from pandas to Polars (Rust-based)
Sole developer
Aug 2023 - ongoing
architecture, backend, data optimization, performance tuning
Legacy Database Performance Crisis: I faced constant query timeouts and memory issues processing 160M+ records on read-only MySQL 5.7. Couldn't create indexes, temporary tables, or use modern SQL features (CTEs) released after 2015. This forced me to completely rewrite every query and develop creative optimization strategies.
Frontend Bundle Size Disaster: I discovered that my initial JSON-based architecture was creating 25MB+ bundle sizes, causing terrible loading performance. Users were abandoning dashboards before they loaded. I needed a fundamental shift in data format and processing approach while maintaining sub-second responsiveness.
Executive Data Access Bottleneck: Leadership had no real-time access to critical business metrics across 2M offers and $143M GMV operations. They were stuck with manual queries that took hours and outdated dashboards requiring constant interpretation, creating decision-making bottlenecks.
Query Optimization & Processing Revolution: I rewrote every query from scratch to work within MySQL 5.7 constraints, using subqueries and EXISTS clauses instead of modern features. Then I migrated from pandas to Polars, achieving 75% performance gains. My incremental update system tracks only changed records, reducing processing from 20 minutes to 5 minutes.
Static Architecture Innovation: I eliminated API infrastructure entirely by serving compressed Parquet files directly to the frontend. My implementation uses ZSTD compression and hyparquet client-side processing with JSON indexing for instant filtering. This reduced bundle size by 94% (25MB→1.6MB) while maintaining sub-second dashboard performance.
Executive Dashboard Transformation: I replaced manual database queries with automated data pipelines feeding 13 different dashboards. Executives now have instant access to current business metrics instead of waiting for manual query results and interpretation. My system updates every 5 minutes automatically.
Constraint-Driven Innovation Leads to Superior Solutions: The read-only database limitations I faced forced me to create a static architecture that not only solved immediate problems but eliminated hosting costs and improved performance beyond traditional approaches. Sometimes constraints drive more innovative solutions than unlimited resources.
Systematic Optimization Methodology Compounds Results: I learned that breaking down performance problems into measurable components (query optimization, data format, processing pipeline) enables targeted improvements that compound. My 75% query gains + 94% bundle reduction + incremental updates created transformational performance improvement.
Scale-Appropriate Engineering Over Premature Optimization: I discovered that choosing maintainable, cost-effective solutions (static files vs complex API infrastructure) appropriate for the scale and requirements often outperforms sophisticated but unnecessary architectures. Engineering judgment matters more than following standard patterns.
My incremental updates process only changed records, reducing load on production database by 75%
Rollbar monitoring alerts on processing failures with automated recovery strategies
Static Parquet files eliminate API downtime risks while maintaining data freshness (5min updates)
Type validation and data integrity checks during my pandas→Polars processing pipeline
Index-based chunking enables instant filtering across years of daily aggregated data