Parallel extraction pipeline using AutoSchemaKG framework
💡Business Impact: Enables AI agents to access personal document context through standardized MCP protocol
Built on AutoSchemaKG framework for automatic knowledge graph construction (https://github.com/HKUST-KnowComp/AutoSchemaKG). I extended the original framework with emotional context extraction because AI agents need to understand personal patterns, work styles, and behavioral tendencies - not just facts and events. Through systematic optimization, I achieved a 68% per-token improvement (64ms/token → 21ms/token) with 70% cost reduction. My architecture processes documents in 67 seconds, prioritizing practical deployment over experimental approaches.
I built a three-stage pipeline: document input → parallel extractions (entity-entity, entity-event, event-event, emotional context) → concept generation and deduplication → knowledge graph storage.
My parallel processing approach replaced sequential extraction, achieving 21ms per token processing speed.
I designed dual-transport server architecture supporting both STDIO (Claude Code integration) and HTTP (web applications) because different integration contexts need different protocols.
Built comprehensive caching system achieving 100% cache hit rate with atomic database operations for reliability.
Built both HTTP and STDIO implementations of MCP protocol
Implemented full Model Context Protocol specification with dual transport
Generated semantic embeddings for knowledge graph relationships
Optimized model performance achieving 21ms per token processing
Extended AutoSchemaKG framework with emotional context extraction
Integrated latest research into production implementation
Designed 3-stage pipeline achieving 68% per-token improvement
Systematic optimization achieved 100% cache hit rate and 70% cost reduction
Impact: This type-specific approach improved extraction accuracy by 40% over generic prompts, particularly for temporal and causal relationships in event-event extraction.
Sole developer
July 2025 - ongoing
AI pipeline architecture, knowledge extraction, vector embeddings, MCP protocol
Initial production system suffered catastrophic performance with frequent timeout issues making knowledge graphs unusable for real-time AI agents. Without systematic monitoring, I spent weeks optimizing database queries and vector operations before discovering AI extraction consumed 95% of processing time.
My legacy architecture grew to 2,100+ unmaintainable lines with separate vector tables causing massive API cost overruns. I was embedding the same entities 3-4 times per pipeline without realizing it, and non-atomic database operations created reliability issues that caused data inconsistencies.
I built comprehensive phase-by-phase timing instrumentation that revealed the true bottlenecks in my system. My systematic optimization approach included parallel processing architecture and strategic model optimization. This data-driven approach achieved a 68% per-token improvement (64ms/token → 21ms/token) with 70% cost reduction through efficient caching and deduplication.
I completely redesigned the system as a pure functional architecture with zero hidden state and unified embedding storage. My new approach includes comprehensive caching achieving 100% cache hit rate with atomic database operations. Result: clean maintainable architecture with eliminated duplicate processing and reliable data consistency.
I learned that prompt engineering for personal context requires different strategies than business applications - emotional patterns need nuanced extraction techniques that go beyond standard entity-relationship models. Systematic performance monitoring is essential before optimization - I wasted significant time optimizing the wrong components because I lacked proper instrumentation. Building personal AI tools requires understanding individual behavioral patterns, not just technical relationships, which is why I extended AutoSchemaKG with emotional context extraction.
Built comprehensive benchmark reporting: 21ms per token processing with 67s total time
Implemented 100% cache hit rate through unified embedding architecture
Manual review of extraction quality ensures personal context accuracy
Atomic database operations prevent data inconsistencies and reliability issues
I selected gpt-4o-mini because my system needs to be economically viable for personal use - experimental models cost 10x more without proportional benefits. I designed domain-specific prompts for four extraction types because emotional context extraction requires different cognitive approaches than standard entity-relationship models. My optimization strategy achieved 68% per-token improvement (64ms/token → 21ms/token) through strategic model selection and parallel processing. I chose practical AI implementation over cutting-edge model exploration because sustainable personal context systems need deployment economics, not research metrics.