Resume Roast Arena
Industry
Gen AI and Cloud
Client
End Users , Public tool
Service
Analytics
Date
January 2026
Work In Poogress Project
Building a distributed, event-driven resume processing system with idempotent APIs and worker-based pipelines supporting multi-format ingestion (PDF, DOCX, images).
• Implemented a fault-tolerant extraction pipeline using Apache Tika with Tesseract OCR fallback, confidence-based routing, and strict retry vs dead-letter handling; artifacts persisted to object storage.
• Developing a deterministic normalization layer to structure resumes into sections, entities, signals, and metrics as a foundation for anonymization, scoring, and LLM-assisted feedback; system designed for scale with plans for public launch and Reddit-based distribution [Project is under developement)
Tech Stack:
Python (Monolith POC) Fast api - Base backend (app factory fashion)
Azure Service Bus - Multiple busses for different stages of process, acting message queues
Python Persistent Workers ( horizontally scaled) - 3 workers for different stages of the pipline
Postgress SQL and Azure Edge SQl with SQL Alchemy Asyncio- User data, metadata, Links, Global data
Azure Blob - 3 layer structure ,Object store for artifacts at each stage
Python Periodic Workers - Periodically running workers to process global metadata update queues, recalculate and update global data, leaderboards, states etc
Docker - intial containerization (will be shifted to AKS once in production)
Apache Tika and Tesseract OCR - for parsing artifacts
— Yet to be implemented but planned
Normalization and Anonymization layers
Better implementation of Dead letter queue
Azure VMSS for backend
And later NextJs based frontend (majorly done by V0/lovable)
Github Repo (public dev , not latest): Github



