Skip to content

EU Blue Card eligible · open to relocation · available Q3 2026

Production machine learning
for healthcare revenue cycles.

I'm Saurabh Joshi, a data scientist with five years shipping ML pipelines that run in production — not notebooks. Currently relocating to Germany for a senior ML role.

5 yrs
production ML
0.90
RCM model AUC
Spark
on Kubernetes
EU
Blue Card track
01 / work

Selected work

A binary classifier predicting which healthcare claims need a specific revenue-cycle action, refactored from a notebook POC into a scheduled, versioned production pipeline.

0.67 F1 (positive)
0.90 ROC AUC
5 → 1 pkg from POC
XGBoost PySpark SQL Server Parquet Optuna Python

A console app for CARC/RARC weighted claim scoring, rebuilt as a containerized FastAPI service and enterprise proof of concept.

CLI → API interface
Compose deploy
FastAPI Docker Docker Compose Python pyodbc

A configurable PySpark service for pulling and deduplicating allocation data across workspaces, with the config-driven query layer that keeps it flexible.

Spark/K8s runtime
driven config
PySpark Spark on Kubernetes Python SQL

A from-scratch MLOps capstone: a retrieval-augmented agent backed by an MCP server and a LoRA-fine-tuned extraction model, with an evaluation harness.

in build status
harness eval
PyTorch LoRA RAG MCP Python
02 / about

About

I build the unglamorous parts of machine learning well: incremental extraction, partitioned storage, retry logic, model versioning, and serving — the layer between a good model and one a team can actually depend on. My home turf is healthcare revenue cycle management, where a misfired prediction has a real cost attached.

I'm honest about where I'm growing. I'm investing deliberately in transformer internals, LLM fine-tuning, and MLOps depth — and the work shows up in my recent projects rather than just on a skills list.

I'm relocating to Germany and looking for a senior data science / ML engineering role in Berlin or Munich.

03 / skills

Technical stack

Production ML

  • XGBoost
  • scikit-learn
  • Optuna
  • NLP / LLMs
  • Model versioning
  • Threshold optimization

Data Engineering

  • PySpark
  • Spark on Kubernetes
  • SQL Server
  • Hive-partitioned Parquet
  • Incremental pipelines

Serving & MLOps

  • FastAPI
  • Flask
  • Docker
  • Docker Compose
  • Structured logging
  • CI/CD

Cloud

  • AWS
  • Azure
  • Azure DevOps
04 / playground

Playground

// status: in development

Interactive, in-browser ML demos are landing here soon — a threshold explorer over a real classifier and a client-side inference demo, both running entirely in your browser. No server required.

05 / now

What I'm focused on

updated June 2026

06 / writing

Recent writing

all writing →

Hiring for a senior ML role in Germany?

I'd like to hear about it. The fastest way to reach me is email.

hello@saurabhjoshi.dev