Work Experience

Senior Software Engineer

June 2019 - Present
Netflix

Work on Datamesh Team

  • Oncall and user support for Datamesh, a streaming pipeline abstraction layer built on top of Flink and Kafka.
  • Co-architected and implemented a multi-tenant Kafka architecture for efficiently provisioning and scaling tens of thousands of kafka topics across hundreds of Kafka clusters. Improved operability, maintainability, and scalability with the new design.
  • Implemented cost optimizations for the Flink and Kafka infrastructure in Datamesh resulting in a large annual cost savings.
  • Did critical infrastructure preparation for Netflix’s largest live events (NFL and Mike Tyson).

Work on RDE (Realtime Data Engines)

  • RDE is the uber team comprised of the Kafka and Flink subteams. Have been oncall for both Flink and Kafka infrastructure. And provided user support for both platforms.
  • Co-designed strategy for a large scale migration of several thousand Flink jobs from a legacy control plane to the new RDE Flink control plane.
  • Worked on a team to successfully execute the migration without user job disruption and without data loss.
  • Implemented automated topic mapping for producers, which allows producers to automatically discover the kafka cluster their data needs to be published to.
  • Implemented automated topic move, which allows topics to be seamless moved to a new cluster without disrupting producers or consumers.

Work on Kafka Team

  • Operated the Kafka control plane, which manages all of Netflix’s Kafka clusters.
  • Implemented cluster metadata improvements for the Kafka control plane.
  • Implemented authorization bug fixes for Netflix’s Kafka fork.
  • Planned migration from non-secure kafka clusters to secure kafka clusters.

Work on Flink Team

  • Operated the Flink control plane, which manages thousands of production Flink jobs.
  • Implemented numerous bugfixes and improvements for Netflix’s Flink fork, Netflix’s internal Flink library, and the Flink control plane.
  • Optimized Netflix’s internal Avro serialization and deserialization libraries to get more than a 100x performance improvement.
  • Integrated Netflix’s avro serialization and deserialization libraries with Flink.
  • Worked with our largest user to migrate their large stateful Flink jobs with 800+ cpus each from JSON to AVRO. After migrating to Avro, job state size and number of cpus was cut in half, resulting in a large cost savings.
  • Implemented robust autoscaling for tens of thousands of Flink jobs in production. This has been running in production for several years. Reduced oncall burden and yielded a large yearly cost savings.
  • LinkedIn Engineering autoscaling talk
  • Flink Forward atuoscaling talk

Senior Staff Software Engineer

Apr 2019 - June 2019
MapR Technologies, San Jose, CA

Staff Software Engineer

Aug 2017 - Apr 2019
MapR Technologies, San Jose, CA

Work on MapR Cloud

  • Designed and created a service to deploy, manage, and access MapR clusters in the cloud. Evaluated AWS, Azure, and GCP across multiple dimensions to pick the cloud with the highest ROI to launch the service on. Met with members of the Azure cloud team and AKS (Azure Kubernetes Service).
  • Deployed on Azure with AKS (Azure Kubernetes Service).
  • Designed security model to allow remote access to the MapR cluster through the browser for users.
  • Designed security model for operators to manage the clusters.

Work on Drill

  • Started the resource management project for concurrent queries on Drill.
  • Worked on operator spilling when processing large data sets.
  • Fixed race conditions and memory leaks in the execution engine.
  • General refactoring work to improve project testability, build stability, and maintainability.

Apache Drill Committer

May 2018 - Present
The Apache Software Foundation

Committer for Apache Drill.

Apache Apex Committer

Jun 2015 - Present
The Apache Software Foundation

Committer for Apache Apex.

Software Engineer

Dec 2016 - Aug 2017
GE Digital, San Ramon, CA

Built a scalable and fault-tolerant lambda function service similar to AWS Lambda from scratch.

  • Built the engine on top of Apache Mesos and wrote it in Go.
  • Utilized docker for secure execution of lambda functions.
  • The engine supported realtime stream processing.
  • Created some Go helper libraries for dependency injection and writing Spring style Rest APIs.

Principal Software Engineer

Apr 2016 - Dec 2016
Logichub, Mountain View, CA

Worked on a automated cyber security threat detection platform.

  • Co-designed and developed a data processing platform on top of Apache Spark.
  • Designed and implemented domain specific data caching and data provenance algorithms.
  • Used the Scala Play REST framework.
  • Wrote various spark jobs.

Software Engineer

Aug 2014 - Apr 2016
DataTorrent, San Jose, CA

Worked on Apache Apex, a real-time stream processing platform built on top of Hadoop.

  • Implemented critical POCs for potential customers
  • Fixed bugs and added features to Apache Apex and Apache Apex Malhar
  • Built a data aggregation and visualization engine on top of Apache Apex, which was used in production.
  • Co-designed and authored App Data Tracker, which is an Apex application that allows users to collect, aggregate, and visualize metrics published by other apps on their cluster.

Software Engineer

Jul 2012 - Aug 2014
Oracle, Santa Clara, CA
  • Designed and implemented a benchmark analysis system which handled hundreds of millions of data points.
  • Wrote an agent that collected benchmark results generated by other teams.
  • Built a backend server on top of Glassfish which stored and queried results from Mysql.
  • Built a front end in Java Swing.
  • Designed a search algorithm which allowed the user to navigate the sparse high-dimensional data set.
  • Used the tool to catch critical performance regressions before they hit customers.
  • Maintained various pieces of infrastructure: internal website, wiki, solaris package server.