Engineering Online Experimentation and ML Evaluations

Architecture, Statistics and Machine Learning for Production-Scale Systems

Häftad, Engelska, 2026

769 kr

Kommande

Online experimentation is now essential for modern software and machine learning teams. This book provides an engineer-first, end-to-end guide to building and operating production-ready experimentation platforms.The book begins with Part I establishing the core foundations of credible experimentation, including hypothesis testing, power analysis, sample sizing, metric design, and common pitfalls such as peeking, multiple testing, and novelty or learning effects. Part II focuses on platform engineering—traffic and identity management, mutual exclusion, event and logging design, ETL/ELT pipelines, building a stats engine with SciPy and statsmodels, SRM detection, integrating deployments with feature flags and canaries, and setting up guardrail and health monitoring. Part III presents advanced designs that improve speed and sensitivity: sequential testing with alpha spending, bootstrap intervals for ratios and quantiles, A/B/n testing with ANOVA, interleaving for ranking systems, switchback and geo experiments, and multi-armed bandits. Part IV connects experimentation to ML workflows, covering offline, shadow, canary, and A/B evaluation pipelines; Bayesian optimization for adaptive experimentation; counterfactual and IPS methods for learning from logs; and safe retraining supported by strong governance.What you will learn:Design trustworthy experiments with proper metrics, guardrails, α/power/MDE settings, and safeguards against peeking and multiple-testing errorsBuild a production-ready experimentation stack with assignment, identity/diversion, logging, ETL/ELT, a stats engine, and SRM checksRun advanced designs at scale, including sequential tests, bootstrap CIs, interleaving, switchback/geo experiments, and multi-armed banditsEvaluate ML systems from offline to online, leverage experiment logs for learning, and enable safe retraining with governanceWho this book is for:The primary audience for this book includes Data Engineers, ML Engineers, and Platform or Software Architects. It is also well suited for Product and Data Scientists who want a deeper understanding of experimentation systems and the engineering principles behind them.

Produktinformation

Utgivningsdatum2026-07-01
Mått178 x 254 x undefined mm
FormatHäftad
SpråkEngelska
Antal sidor288
FörlagAPress
ISBN9798868827204

Tillhör följande kategorier

Systemvetenskap och AI inom Data och IT

Ming Lei is a data and ML engineering leader with 20 years of experience building end-to-end ML systems for Internet Ads and Search, and experimentation platforms for E-commerce. He has designed large-scale systems that operationalize rigorous statistical methods — such as sequential testing, bootstrapping, multi-armed bandits, and Bayesian optimization — and support ML evaluation from offline analysis to online deployment. His leadership spans roles at eBay, Meta (Facebook), Google, and Appen. He holds multiple US patents and advanced degrees in computer science (UC Riverside) and economics (Clark University), along with a B.S. in physics (Wuhan University). He is based in the Northwest of US.

Part I: The Statistical and Foundational Core.- Chapter 1: The Experimentation Mindset.- Chapter 2: The Statistical Engine of Experimentation.- Chapter 3: Designing Trustworthy Experiments.- Chapter 4: Metric Design and Variance Reduction.- Part II: Platform Engineering: Building a Production Experimentation System.- Chapter 5: Architecture of an Experimentation Platform.- Chapter 6: User Identity, Diversion, and Segmentation.- Chapter 7: Instrumentation and Event Design.- Chapter 8: The ETL/ELT Pipeline and Statistical Engine.- Chapter 9: Data Quality and Health Checks.- Chapter 10: Deployment and Release Strategies.- Part III: Beyond Basic A/B Testing: Advanced Experimental Designs.- Chapter 11: Accelerating Experiments and Analyzing Complex Metrics.- Chapter 12: Advanced Designs: Multi-Variant and Factorial Experiments.- Chapter 13: Evaluating Ranking Systems: Online Interleaving Experiments.- Chapter 14: Switchback and Geo-Experiments: Testing on Time and Space.- Chapter 15: Multi-Armed Bandits: Balancing Exploration and Exploitation.- Chapter 16: Contextual Bandits: Personalized Exploration and Exploitation.- Part IV: Online Experimentation for Machine Learning Systems.- Chapter 17: Testing Machine Learning Systems.- Chapter 18: Adaptive Experimentation for Model Optimization.- Chapter 19: Machine Learning from Experiment: Counterfactual Learning.- Chapter 20: Deploying Experiment-Trained Models: Safe Retraining Pipelines and Governance.