Data Platform for Clinical Genomic Research

At Tempus 2019-2022 as a paired engineering lead on a ‘one pizza‘ team, I designed, built, and realized a custom clinical and molecular research platform enabling scientific collaboration on one of the largest anonymized multimodal patient datasets at the time (and possibly still).

Our platform supported applied science (via proven hypothesis) for medical and pharmaceutical molecular research as exemplified by Tempus Lens & Tempus Plus.

(TCGA being a public example of a simliar dataset)

Tempus, with its mission to improve health outcomes through AI, provided me a strong exposure to bioinformatics and remains the best managed and disciplined software product firm I’ve worked at.

Since my time there, they have gone public and launched an FDA-approved Atrial Fibrillation detection among so much else.

One memorable experience was a “gap week” project where a teammate and I conducted a grid search to optimize Data Science workflow, latency, and cost. We tested the hypothesis of using Dask and Parquet instead of EMR Cluster/Spark, rewriting SSBM SQL queries in Dask, Pandas, and Spark to benchmark against Redshift performance. The resulting report informed the development of Tempus Lens.

Tools : Spark, Pandas, Dask, Matrix Factorization, SQL, Pandas, Spark, SSBM Queries, Python, Docker, R, Kubernetes, Bash, Parquet, Redshift, Multimodal data 

AWS: Fargate, EKS, S3