Data Platform for Clinical Genomic Research

  • Designed and built a custom clinical and molecular research platform for scientific collaboration in 6 months
  • The platform supported collaborative bioinformatic and applied science for medical and pharmaceutical research with one of the largest anonymized clinical and molecular datasets.  Workflow Example
  • Tempus Lens & Tempus Plus

It was a blast working at Tempus, the mission (AI to improve health outcomes) is strong, the exposure to bioinformatics was amazing, and they remain the best managed and disciplined software product firm I’ve worked at; so much growth, yet well organized, and talented co-workers. I was employee 400 or so and now they’re above 2000.

The most just sheer fun I had at Tempus was when we got the ‘gap week’ between quarterly deliveries, and could do whatever we wanted.

A teammate and I decided to make a grid search across architectures to discover the best fit tooling for Data Science workflow, latency and cost.

Hypothesis tested was something like:

can we use Dask and Parquet (flat files) instead of an EMR Cluster / Spark for analysis ? (it would be cheaper)

I re-wrote the SSBM SQL queries in Dask, Pandas, and Spark to compare benchmark against Redshift performance. We compiled a report with the results; this was very informative for us building out Tempus Lens after the break.

Tools : Spark, Pandas, Dask, Matrix Factorization, SQL, Pandas, Spark, SSBM Queries, Python, Docker, R, Kubernetes, Bash, Parquet, Redshift, Multimodal data 

AWS: Fargate, EKS, S3