EU Blue Card eligible · open to relocation · available Q3 2026

Production machine learning
for healthcare
revenue cycles.

I'm Saurabh Joshi, a data scientist with five years shipping ML pipelines that run in production — not notebooks. Currently relocating to Germany for a senior ML role.

Download CV ↓ Get in touch →

5 yrs: production ML
0.90: RCM model AUC
Spark: on Kubernetes
EU: Blue Card track

01 / work

Selected work

Production RCM claim classifier

A binary classifier predicting which healthcare claims need a specific revenue-cycle action, refactored from a notebook POC into a scheduled, versioned production pipeline.

0.67 F1 (positive)

0.90 ROC AUC

5 → 1 pkg from POC

XGBoost PySpark SQL Server Parquet Optuna Python

Claim prioritization web service

A console app for CARC/RARC weighted claim scoring, rebuilt as a containerized FastAPI service and enterprise proof of concept.

CLI → API interface

Compose deploy

FastAPI Docker Docker Compose Python pyodbc

Spark DataFrame extraction API

A configurable PySpark service for pulling and deduplicating allocation data across workspaces, with the config-driven query layer that keeps it flexible.

Spark/K8s runtime

driven config

PySpark Spark on Kubernetes Python SQL

RAG + agent + fine-tuned extractor

A from-scratch MLOps capstone: a retrieval-augmented agent backed by an MCP server and a LoRA-fine-tuned extraction model, with an evaluation harness.

in build status

harness eval

PyTorch LoRA RAG MCP Python

02 / about

About

I build the unglamorous parts of machine learning well: incremental extraction, partitioned storage, retry logic, model versioning, and serving — the layer between a good model and one a team can actually depend on. My home turf is healthcare revenue cycle management, where a misfired prediction has a real cost attached.

I'm honest about where I'm growing. I'm investing deliberately in transformer internals, LLM fine-tuning, and MLOps depth — and the work shows up in my recent projects rather than just on a skills list.

I'm relocating to Germany and looking for a senior data science / ML engineering role in Berlin or Munich.

03 / skills

Technical stack

Production ML

XGBoost
scikit-learn
Optuna
NLP / LLMs
Model versioning
Threshold optimization

Data Engineering

PySpark
Spark on Kubernetes
SQL Server
Hive-partitioned Parquet
Incremental pipelines

Serving & MLOps

FastAPI
Flask
Docker
Docker Compose
Structured logging
CI/CD

Cloud

AWS
Azure
Azure DevOps

04 / playground

Playground

// status: in development

Interactive, in-browser ML demos are landing here soon — a threshold explorer over a real classifier and a client-side inference demo, both running entirely in your browser. No server required.

05 / now

What I'm focused on

updated June 2026

Deepening transformer internals and LLM fine-tuning (LoRA, eval harnesses).
Building an MLOps capstone: RAG + agent + MCP server with a fine-tuned extractor.
German at A1, working through Netzwerk neu toward B1.
Targeting senior ML roles in Berlin and Munich.

06 / writing

Recent writing

Turning a five-script POC into a pipeline that survives Monday

A proof of concept runs once, on your machine, while you watch. Production runs unattended, forever. Here's the refactor that bridged the two for an RCM classifier.

9 Jun 2026

Your classifier's default threshold is a bug

An AUC of 0.90 told me the model was good. A 0.5 cutoff told the business it was useless. Here's how I closed that gap on a production RCM classifier.

28 May 2026

all writing →

Hiring for a senior ML role in Germany?

I'd like to hear about it. The fastest way to reach me is email.

hello@saurabhjoshi.dev