Resume Roast Arena

Industry

Gen AI and Cloud

Client

End Users , Public tool

Service

Analytics

Date

January 2026

Work In Poogress Project

Building a distributed, event-driven resume processing system with idempotent APIs and worker-based pipelines supporting multi-format ingestion (PDF, DOCX, images).

• Implemented a fault-tolerant extraction pipeline using Apache Tika with Tesseract OCR fallback, confidence-based routing, and strict retry vs dead-letter handling; artifacts persisted to object storage.

• Developing a deterministic normalization layer to structure resumes into sections, entities, signals, and metrics as a foundation for anonymization, scoring, and LLM-assisted feedback; system designed for scale with plans for public launch and Reddit-based distribution [Project is under developement)

Tech Stack:

Python (Monolith POC) Fast api - Base backend (app factory fashion)

Azure Service Bus - Multiple busses for different stages of process, acting message queues

Python Persistent Workers ( horizontally scaled) - 3 workers for different stages of the pipline

Postgress SQL and Azure Edge SQl with SQL Alchemy Asyncio- User data, metadata, Links, Global data

Azure Blob - 3 layer structure ,Object store for artifacts at each stage

Python Periodic Workers - Periodically running workers to process global metadata update queues, recalculate and update global data, leaderboards, states etc

Docker - intial containerization (will be shifted to AKS once in production)

Apache Tika and Tesseract OCR - for parsing artifacts

— Yet to be implemented but planned

Normalization and Anonymization layers

Better implementation of Dead letter queue

Azure VMSS for backend

And later NextJs based frontend (majorly done by V0/lovable)

Github Repo (public dev , not latest): Github

RELATED PROJECTS

VIEW ALL PROJECTS