Resume Roast Arena

Industry

Gen AI and Cloud

Client

End Users , Public tool

Service

Analytics

Date

January 2026

Work In Poogress Project

Building a distributed, event-driven resume processing system with idempotent APIs and worker-based pipelines supporting multi-format ingestion (PDF, DOCX, images).

• Implemented a fault-tolerant extraction pipeline using Apache Tika with Tesseract OCR fallback, confidence-based routing, and strict retry vs dead-letter handling; artifacts persisted to object storage.

• Developing a deterministic normalization layer to structure resumes into sections, entities, signals, and metrics as a foundation for anonymization, scoring, and LLM-assisted feedback; system designed for scale with plans for public launch and Reddit-based distribution [Project is under developement)


Tech Stack:

Python (Monolith POC) Fast api - Base backend (app factory fashion)

Azure Service Bus - Multiple busses for different stages of process, acting message queues

Python Persistent Workers ( horizontally scaled) - 3 workers for different stages of the pipline

Postgress SQL and Azure Edge SQl with SQL Alchemy Asyncio- User data, metadata, Links, Global data

Azure Blob - 3 layer structure ,Object store for artifacts at each stage

Python Periodic Workers - Periodically running workers to process global metadata update queues, recalculate and update global data, leaderboards, states etc

Docker - intial containerization (will be shifted to AKS once in production)

Apache Tika and Tesseract OCR - for parsing artifacts

— Yet to be implemented but planned

Normalization and Anonymization layers

Better implementation of Dead letter queue

Azure VMSS for backend

And later NextJs based frontend (majorly done by V0/lovable)


Github Repo (public dev , not latest): Github


shape
shape
shape
image 2
image 2
image 2

RELATED PROJECTS

Create a free website with Framer, the website builder loved by startups, designers and agencies.