Testing of 6 LLMs across 6 processing approaches to find optimal cost-performance balance
💡Business Impact: Enables cost-effective SEO-optimized title generation for large product catalogs while meeting marketplace's strict 75-79 character requirements
An e-commerce title optimization tool that generates SEO-optimized product titles within strict marketplace character limits (75-79 characters). The system addresses the challenge of scaling manual SEO work across large product catalogs while maintaining both search optimization and platform compliance requirements.
Six processing approaches evaluated - iterative, async progressive, tool-calling, batch-processing, batch-api, and early-async patterns.
OpenAI Batch API with prompt caching delivering 90% cost reduction, handling 57K products with programmatic fallbacks for the 38% that fail character constraints.
Hybrid design combining AI feature extraction with deterministic template generation for guaranteed 75-79 character compliance.
Built batch processing pipeline for 4.5k listings
Stored optimized product data and processing metrics
Optimized GPT-4o-mini for cost-effective title generation
Achieved $0.00003 per item through prompt engineering
Fine-tuned prompts for e-commerce SEO optimization
Linear scaling architecture processing 4.5k items in 30 minutes
Impact: This code reveals why 90%+ character compliance became pyrrhic victories - achieving length requirements through expensive iteration made approaches practically unusable despite technical success.
Sole developer
July 2025 - ongoing
LLM evaluation, model comparison, cost optimization, experimental approaches
Prompt Engineering Constraint Barrier: Multiple strategies (progressive generation, tool-calling validation, iterative refinement) all failed to achieve reliable character limit compliance, revealing fundamental LLM limitation with precise mathematical instructions rather than prompt design issue.
Model Performance vs. Cost Disconnect: Systematic evaluation showed premium models (GPT-5) cost 10x more while delivering identical constraint compliance failures as budget options, with quality improvements that don't address the core precision problem.
Architecture Scale Dilemma: Individual processing allowed iterative refinement but became economically prohibitive. Batch processing enabled large-scale optimization but removed real-time validation essential for constraint debugging and improvement.
AI+Template Hybrid Architecture: Shifted from pure generation to intelligent parsing approach where AI extracts product features and reorganizes content, while deterministic templates handle precise character control. This separates AI strengths (content intelligence) from mathematical precision requirements.
Strategic Model Optimization: Selected GPT-4o-mini as cost-performance sweet spot, achieving adequate content quality at $0.000075/item through batch processing. Implemented prompt caching for 90% cost reduction on repeated system instructions, making large-scale processing economically viable.
Production-Ready Batch Pipeline: Built comprehensive OpenAI Batch API system with automated fallbacks, constraint validation, and error handling. Created monitoring framework tracking compliance rates and cost metrics across different processing approaches.
Cost-Quality Analysis Essential Before Production Deployment: Systematic model evaluation revealed that highest accuracy models (GPT-5) cost 10x more with minimal accuracy gains over GPT-4o-mini (95% vs 90%). Economic viability matters more than marginal accuracy improvements for production e-commerce applications. Always establish cost targets before optimizing for quality metrics.
Template-Based Optimization Outperforms Pure AI for Constrained Problems: Character limit compliance requires deterministic control that AI models struggle with efficiently. Hybrid approaches combining AI feature extraction with template generation can achieve better cost-performance ratios than pure AI solutions. Sometimes constraints drive innovation toward more efficient architectures.
Batch Processing Architecture Enables Economic AI Deployment: Individual API calls for 4,500+ items would cost $600+ and take hours. Batch API reduced costs to $135 and processing time to 30 minutes, achieving linear scaling. Architecture design choices (sequential vs batch vs parallel) have exponential impact on deployment economics for AI applications.
Manual validation for experimental/development phase
Token usage tracking for cost scaling analysis
Manual error management during model optimization
Quality assurance through manual review process
Through systematic prompt engineering across 6 LLM models, discovered that explicit character constraints in prompts achieve only 62% compliance despite validation instructions. Tested multiple prompt strategies: iterative refinement with feedback loops, progressive generation with early validation, and tool-calling approaches for constraint checking. Key insight: AI models excel at content optimization but struggle with precise length control, leading to hybrid prompt design combining feature extraction prompts with template completion.