Henry Li

About Me

Hello, and welcome. I'm Henry Li.

My career is dedicated to building the data foundations, scalable infrastructure, and safety systems that allow enterprises to genuinely trust their data and AI.

Solving today's enterprise AI bottlenecks takes a rare blend of rigorous statistical theory and large-scale engineering. Over the years, my work has spanned modeling unpredictable biological systems, architecting enterprise data quality monitors, and building autonomous AI agents. Today, I focus on translating the mathematical and technical underpinnings of these systems into practical, reliable, and secure business solutions — though I'd be the first to say that "trust" in a data or AI system is a moving target, and much of what makes that work interesting is that the goalposts keep shifting as the systems themselves evolve.

Industry Experience & Scaling AI

My approach to complex data problems was shaped by working inside large-scale tech operations.

Uber — Data Science

I joined Uber as a Data Scientist during a period of rapid expansion for its machine learning systems, where the fidelity of data feeding those models had become a critical bottleneck. I led the effort to build Uber's first internal data quality monitoring system for ML pipelines — an early blueprint for what would become the modern data monitoring industry — and was the lead author on the Uber Engineering post that introduced the work publicly.

What struck me then, and still feels unresolved, is how much of "data quality" resists formal definition: a dataset can pass every statistical check and still be subtly wrong in ways only a domain expert would catch. That gap between measurable quality and meaningful quality is, I think, one of the more interesting open questions in the field.

Bigeye — Founding Data Scientist

Recognizing that every modern enterprise needed this level of observability, I joined Bigeye as its Founding Data Scientist. From 2020 through 2023, I translated hyperscale internal monitoring tools into a flexible, enterprise-grade platform. By architecting the machine learning and automation engines at the core of the product, we abstracted away the complexity of data observability so clients could automatically detect and remediate anomalies. We deployed these capabilities to enterprise leaders like Zoom, Confluent, and Instacart, as well as to highly regulated organizations within the Intelligence Community — an experience that deeply informed my understanding of how enterprise buyers evaluate and procure critical infrastructure.

One lesson I carried away: the hardest part of observability is rarely the detection; it's deciding which anomalies warrant human attention in a world where data is always drifting a little. That prioritization problem only grows when AI systems start reacting to the data in real time.

Nova AI — Co-Founder

After Bigeye, I co-founded Nova AI, an agentic software testing company. Working with clients such as JB Hi-Fi, we transformed manual QA workflows into environments assisted by AI agents, substantially reducing test-writing time and freeing engineering teams to focus on the user paths that actually mattered. Building agents that act on real codebases made the reliability question concrete in a new way — it's one thing to generate a plausible test, another to trust an agent to know when it shouldn't act. I suspect the industry is still early in working out how to give these systems a well-calibrated sense of their own uncertainty.

Academic Foundations & Mentorship

My approach to drawing precise, replicable insights from highly interdependent systems is rooted in my academic training. I hold a PhD in Structural Biology (Biophysics) and a Master of Public Policy from Stanford, along with a BA in Statistics and Biochemistry from UC Berkeley.

Throughout my academic journey, I had the privilege of learning from and collaborating with some of the most accomplished scientists and economists of their generation. I mention their names here not to claim credit for their life-long work, but to express my gratitude — observing their intellect up close and learning from their uncompromising standard of rigor continues to shape how I approach problems today.

UC Berkeley — Biophysics of DNA Replication

At UC Berkeley, I studied the biophysical mechanisms of DNA replication under Professor John Kuriyan and Dr. Brian Kelch. Running X-ray diffraction experiments at Lawrence Berkeley National Laboratory taught me how to navigate and extract insights from intricate biological systems — and, perhaps more importantly, how much of scientific progress depends on being comfortable with partial answers while the fuller picture takes shape.

Stanford — Statistics & Computational Biology

At Stanford, I was drawn to the intersection of statistics and computational biology, and was advised by an extraordinary group of scientists: Professor Wing Hung Wong (recipient of the COPSS Presidents' Award in Statistics), Professor Michael Levitt (Nobel Laureate in Chemistry), Professor Chiara Sabatti, and Professor Garry Nolan. Under their guidance, I developed projects modeling cellular states through high-dimensional gene networks, with peer-reviewed publications in PLOS Computational Biology and the Annals of Applied Statistics.

High-dimensional biology is a humbling area to work in: the models are always reductions of something far more tangled, and the question of what we're justified in concluding from them is one I still think about in my current work on AI systems.

Stanford — Public Policy & Biosecurity

A parallel interest in how policy is used to manage complex societal systems led me to Stanford's Master of Public Policy program, where I worked with Professor Joe Nation and Professor Bill Sharpe (Nobel Laureate in Economics). Together we applied rigorous statistical simulations to model the multi-decade behavior of public pension systems, evaluating policy shifts to optimize long-term solvency.

This dual-track education also introduced me to biosecurity. I collaborated with Dr. Milana Trounce, Director of Biosecurity at Stanford, to co-author a publication on the academic discourse of the field. Analyzing how institutions evaluate and mitigate catastrophic biological risks fundamentally shaped my view of systemic threat models and risk governance.

Selected Work

🔍 Data anomaly detection

Lessons Learned from Uber: Designing an Intelligent Data Quality Monitor — Bigeye Publication, 2021.
Anomaly Detection Part 1: The Key to Effective Data Observability — Bigeye Publication, 2021.
Anomaly Detection Part 2: The Bigeye Approach — Bigeye Publication, 2021.
Data Anomaly Detection Requires Hindsight and Foresight — Bigeye Publication, 2022.

📈 Intelligent infrastructure

Data anomaly detection at scale

Monitoring Data Quality at Scale with Statistical Modeling — Uber Engineering Blog, 2020.

Other Uber internal contributions

Intelligent code deployment and rollback
Cost planning and data asset usage

🧬 Biosciences and statistical research

Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN): Accurate and Scalable High-Dimensional Multi-Sample Single-Cell Data Analysis — PLOS Computational Biology, 2017.
Learning a Nonlinear Dynamical System Model of Gene Regulation: A Perturbed Steady-State Approach — Annals of Applied Statistics, 2013.
Modeling Stochastic Noise in Gene Regulatory Systems — Quantitative Biology, 2014.
Genome-Wide Mapping of DNA Hydroxymethylation in Osteoarthritic Chondrocytes — Arthritis and Rheumatology, 2015.
Stable 5-Hydroxymethylcytosine (5hmC) Acquisition Marks Gene Activation During Chondrogenic Differentiation — Journal of Bone and Mineral Research, 2015.
Analysis of Scholarly Discourse in Biosecurity from 1986–2020 — Biorxiv, 2020.

🤓 Opinion pieces on data science

You Need to Be Constantly Exploring the Data in Your AI Pipeline — VentureBeat, 2021.
Solving One of ML and AI's Biggest Challenges: Exploratory Data Analysis — Datanami, 2021.
Calling All Data Scientists: Data Observability Needs You — Data Science Central, 2022.
Making Sense of Machine Learning and Artificial Intelligence Models by Monitoring the Training Data — Bigeye Publication, 2023.

Contact Me

Find me here: