A binary classifier predicting which healthcare claims need a specific revenue-cycle action, refactored from a notebook POC into a scheduled, versioned production pipeline.
EU Blue Card eligible · open to relocation · available Q3 2026
Production machine learning
for healthcare
revenue cycles.
I'm Saurabh Joshi, a data scientist with five years shipping ML pipelines that run in production — not notebooks. Currently relocating to Germany for a senior ML role.
- 5 yrs
- production ML
- 0.90
- RCM model AUC
- Spark
- on Kubernetes
- EU
- Blue Card track
Selected work
A console app for CARC/RARC weighted claim scoring, rebuilt as a containerized FastAPI service and enterprise proof of concept.
A configurable PySpark service for pulling and deduplicating allocation data across workspaces, with the config-driven query layer that keeps it flexible.
A from-scratch MLOps capstone: a retrieval-augmented agent backed by an MCP server and a LoRA-fine-tuned extraction model, with an evaluation harness.
About
I build the unglamorous parts of machine learning well: incremental extraction, partitioned storage, retry logic, model versioning, and serving — the layer between a good model and one a team can actually depend on. My home turf is healthcare revenue cycle management, where a misfired prediction has a real cost attached.
I'm honest about where I'm growing. I'm investing deliberately in transformer internals, LLM fine-tuning, and MLOps depth — and the work shows up in my recent projects rather than just on a skills list.
I'm relocating to Germany and looking for a senior data science / ML engineering role in Berlin or Munich.
Technical stack
Production ML
- XGBoost
- scikit-learn
- Optuna
- NLP / LLMs
- Model versioning
- Threshold optimization
Data Engineering
- PySpark
- Spark on Kubernetes
- SQL Server
- Hive-partitioned Parquet
- Incremental pipelines
Serving & MLOps
- FastAPI
- Flask
- Docker
- Docker Compose
- Structured logging
- CI/CD
Cloud
- AWS
- Azure
- Azure DevOps
Playground
// status: in development
Interactive, in-browser ML demos are landing here soon — a threshold explorer over a real classifier and a client-side inference demo, both running entirely in your browser. No server required.
What I'm focused on
updated June 2026
- Deepening transformer internals and LLM fine-tuning (LoRA, eval harnesses).
- Building an MLOps capstone: RAG + agent + MCP server with a fine-tuned extractor.
- German at A1, working through Netzwerk neu toward B1.
- Targeting senior ML roles in Berlin and Munich.
Recent writing
Turning a five-script POC into a pipeline that survives Monday
A proof of concept runs once, on your machine, while you watch. Production runs unattended, forever. Here's the refactor that bridged the two for an RCM classifier.
Your classifier's default threshold is a bug
An AUC of 0.90 told me the model was good. A 0.5 cutoff told the business it was useless. Here's how I closed that gap on a production RCM classifier.
Hiring for a senior ML role in Germany?
I'd like to hear about it. The fastest way to reach me is email.
hello@saurabhjoshi.dev