Advances in Financial Machine Learning

Inbunden, Engelska, 2018

619 kr

Beställningsvara. Skickas inom 7-10 vardagar

Fri frakt för medlemmar vid köp för minst 249 kr.

Learn to understand and implement the latest machine learning innovations to improve your investment performanceMachine learning (ML) is changing virtually every aspect of our lives. Today, ML algorithms accomplish tasks that – until recently – only expert humans could perform. And finance is ripe for disruptive innovations that will transform how the following generations understand money and invest.In the book, readers will learn how to: Structure big data in a way that is amenable to ML algorithmsConduct research with ML algorithms on big dataUse supercomputing methods and back test their discoveries while avoiding false positivesAdvances in Financial Machine Learning addresses real life problems faced by practitioners every day, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their individual setting.Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance.

Produktinformation

Utgivningsdatum2018-05-04
Mått158 x 231 x 31 mm
Vikt816 g
FormatInbunden
SpråkEngelska
Antal sidor400
FörlagJohn Wiley & Sons Inc
ISBN9781119482086

Tillhör följande kategorier

Finansiering inom Ekonomi och ledarskap
Systemvetenskap och AI inom Data och it

DR. MARCOS LÓPEZ DE PRADO is a principal at AQR Capital Management, and its head of machine learning. Marcos is also a research fellow at Lawrence Berkeley National Laboratory (U.S. Department of Energy, Office of Science). SSRN ranks him as one of the most-read authors in economics, and he has published dozens of scientific articles on machine learning and supercomputing in the leading academic journals. Marcos earned a PhD in financial economics (2003), a second PhD in mathematical finance (2011) from Universidad Complutense de Madrid, and is a recipient of Spain's National Award for Academic Excellence (1999). He completed his post-doctoral research at Harvard University and Cornell University, where he teaches a graduate course in financial machine learning at the School of Engineering. Marcos has an Erdös #2 and an Einstein #4 according to the American Mathematical Society.

About the Author xxiPREAMBLE 11 Financial Machine Learning as a Distinct Subject 31.1 Motivation, 31.2 The Main Reason Financial Machine Learning Projects Usually Fail, 41.2.1 The Sisyphus Paradigm, 41.2.2 The Meta-Strategy Paradigm, 51.3 Book Structure, 61.3.1 Structure by Production Chain, 61.3.2 Structure by Strategy Component, 91.3.3 Structure by Common Pitfall, 121.4 Target Audience, 121.5 Requisites, 131.6 FAQs, 141.7 Acknowledgments, 18Exercises, 19References, 20Bibliography, 20Part 1 Data Analysis 212 Financial Data Structures 232.1 Motivation, 232.2 Essential Types of Financial Data, 232.2.1 Fundamental Data, 232.2.2 Market Data, 242.2.3 Analytics, 252.2.4 Alternative Data, 252.3 Bars, 252.3.1 Standard Bars, 262.3.2 Information-Driven Bars, 292.4 Dealing with Multi-Product Series, 322.4.1 The ETF Trick, 332.4.2 PCA Weights, 352.4.3 Single Future Roll, 362.5 Sampling Features, 382.5.1 Sampling for Reduction, 382.5.2 Event-Based Sampling, 38Exercises, 40References, 413 Labeling 433.1 Motivation, 433.2 The Fixed-Time Horizon Method, 433.3 Computing Dynamic Thresholds, 443.4 The Triple-Barrier Method, 453.5 Learning Side and Size, 483.6 Meta-Labeling, 503.7 How to Use Meta-Labeling, 513.8 The Quantamental Way, 533.9 Dropping Unnecessary Labels, 54Exercises, 55Bibliography, 564 Sample Weights 594.1 Motivation, 594.2 Overlapping Outcomes, 594.3 Number of Concurrent Labels, 604.4 Average Uniqueness of a Label, 614.5 Bagging Classifiers and Uniqueness, 624.5.1 Sequential Bootstrap, 634.5.2 Implementation of Sequential Bootstrap, 644.5.3 A Numerical Example, 654.5.4 Monte Carlo Experiments, 664.6 Return Attribution, 684.7 Time Decay, 704.8 Class Weights, 71Exercises, 72References, 73Bibliography, 735 Fractionally Differentiated Features 755.1 Motivation, 755.2 The Stationarity vs. Memory Dilemma, 755.3 Literature Review, 765.4 The Method, 775.4.1 Long Memory, 775.4.2 Iterative Estimation, 785.4.3 Convergence, 805.5 Implementation, 805.5.1 Expanding Window, 805.5.2 Fixed-Width Window Fracdiff, 825.6 Stationarity with Maximum Memory Preservation, 845.7 Conclusion, 88Exercises, 88References, 89Bibliography, 89Part 2 Modelling 916 Ensemble Methods 936.1 Motivation, 936.2 The Three Sources of Errors, 936.3 Bootstrap Aggregation, 946.3.1 Variance Reduction, 946.3.2 Improved Accuracy, 966.3.3 Observation Redundancy, 976.4 Random Forest, 986.5 Boosting, 996.6 Bagging vs. Boosting in Finance, 1006.7 Bagging for Scalability, 101Exercises, 101References, 102Bibliography, 1027 Cross-Validation in Finance 1037.1 Motivation, 1037.2 The Goal of Cross-Validation, 1037.3 Why K-Fold CV Fails in Finance, 1047.4 A Solution: Purged K-Fold CV, 1057.4.1 Purging the Training Set, 1057.4.2 Embargo, 1077.4.3 The Purged K-Fold Class, 1087.5 Bugs in Sklearn’s Cross-Validation, 109Exercises, 110Bibliography, 1118 Feature Importance 1138.1 Motivation, 1138.2 The Importance of Feature Importance, 1138.3 Feature Importance with Substitution Effects, 1148.3.1 Mean Decrease Impurity, 1148.3.2 Mean Decrease Accuracy, 1168.4 Feature Importance without Substitution Effects, 1178.4.1 Single Feature Importance, 1178.4.2 Orthogonal Features, 1188.5 Parallelized vs. Stacked Feature Importance, 1218.6 Experiments with Synthetic Data, 122Exercises, 127References, 1279 Hyper-Parameter Tuning with Cross-Validation 1299.1 Motivation, 1299.2 Grid Search Cross-Validation, 1299.3 Randomized Search Cross-Validation, 1319.3.1 Log-Uniform Distribution, 1329.4 Scoring and Hyper-parameter Tuning, 134Exercises, 135References, 136Bibliography, 137Part 3 Backtesting 13910 Bet Sizing 14110.1 Motivation, 14110.2 Strategy-Independent Bet Sizing Approaches, 14110.3 Bet Sizing from Predicted Probabilities, 14210.4 Averaging Active Bets, 14410.5 Size Discretization, 14410.6 Dynamic Bet Sizes and Limit Prices, 145Exercises, 148References, 149Bibliography, 14911 The Dangers of Backtesting 15111.1 Motivation, 15111.2 Mission Impossible: The Flawless Backtest, 15111.3 Even If Your Backtest Is Flawless, It Is Probably Wrong, 15211.4 Backtesting Is Not a Research Tool, 15311.5 A Few General Recommendations, 15311.6 Strategy Selection, 155Exercises, 158References, 158Bibliography, 15912 Backtesting through Cross-Validation 16112.1 Motivation, 16112.2 The Walk-Forward Method, 16112.2.1 Pitfalls of the Walk-Forward Method, 16212.3 The Cross-Validation Method, 16212.4 The Combinatorial Purged Cross-Validation Method, 16312.4.1 Combinatorial Splits, 16412.4.2 The Combinatorial Purged Cross-Validation Backtesting Algorithm, 16512.4.3 A Few Examples, 16512.5 How Combinatorial Purged Cross-Validation Addresses Backtest Overfitting, 166Exercises, 167References, 16813 Backtesting on Synthetic Data 16913.1 Motivation, 16913.2 Trading Rules, 16913.3 The Problem, 17013.4 Our Framework, 17213.5 Numerical Determination of Optimal Trading Rules, 17313.5.1 The Algorithm, 17313.5.2 Implementation, 17413.6 Experimental Results, 17613.6.1 Cases with Zero Long-Run Equilibrium, 17713.6.2 Cases with Positive Long-Run Equilibrium, 18013.6.3 Cases with Negative Long-Run Equilibrium, 18213.7 Conclusion, 192Exercises, 192References, 19314 Backtest Statistics 19514.1 Motivation, 19514.2 Types of Backtest Statistics, 19514.3 General Characteristics, 19614.4 Performance, 19814.4.1 Time-Weighted Rate of Return, 19814.5 Runs, 19914.5.1 Returns Concentration, 19914.5.2 Drawdown and Time under Water, 20114.5.3 Runs Statistics for Performance Evaluation, 20114.6 Implementation Shortfall, 20214.7 Efficiency, 20314.7.1 The Sharpe Ratio, 20314.7.2 The Probabilistic Sharpe Ratio, 20314.7.3 The Deflated Sharpe Ratio, 20414.7.4 Efficiency Statistics, 20514.8 Classification Scores, 20614.9 Attribution, 207Exercises, 208References, 209Bibliography, 20915 Understanding Strategy Risk 21115.1 Motivation, 21115.2 Symmetric Payouts, 21115.3 Asymmetric Payouts, 21315.4 The Probability of Strategy Failure, 21615.4.1 Algorithm, 21715.4.2 Implementation, 217Exercises, 219References, 22016 Machine Learning Asset Allocation 22116.1 Motivation, 22116.2 The Problem with Convex Portfolio Optimization, 22116.3 Markowitz’s Curse, 22216.4 From Geometric to Hierarchical Relationships, 22316.4.1 Tree Clustering, 22416.4.2 Quasi-Diagonalization, 22916.4.3 Recursive Bisection, 22916.5 A Numerical Example, 23116.6 Out-of-Sample Monte Carlo Simulations, 23416.7 Further Research, 23616.8 Conclusion, 238Appendices, 23916.A.1 Correlation-based Metric, 23916.A.2 Inverse Variance Allocation, 23916.A.3 Reproducing the Numerical Example, 24016.A.4 Reproducing the Monte Carlo Experiment, 242Exercises, 244References, 245Part 4 Useful Financial Features 24717 Structural Breaks 24917.1 Motivation, 24917.2 Types of Structural Break Tests, 24917.3 CUSUM Tests, 25017.3.1 Brown-Durbin-Evans CUSUM Test on Recursive Residuals, 25017.3.2 Chu-Stinchcombe-White CUSUM Test on Levels, 25117.4 Explosiveness Tests, 25117.4.1 Chow-Type Dickey-Fuller Test, 25117.4.2 Supremum Augmented Dickey-Fuller, 25217.4.3 Sub- and Super-Martingale Tests, 259Exercises, 261References, 26118 Entropy Features 26318.1 Motivation, 26318.2 Shannon’s Entropy, 26318.3 The Plug-in (or Maximum Likelihood) Estimator, 26418.4 Lempel-Ziv Estimators, 26518.5 Encoding Schemes, 26918.5.1 Binary Encoding, 27018.5.2 Quantile Encoding, 27018.5.3 Sigma Encoding, 27018.6 Entropy of a Gaussian Process, 27118.7 Entropy and the Generalized Mean, 27118.8 A Few Financial Applications of Entropy, 27518.8.1 Market Efficiency, 27518.8.2 Maximum Entropy Generation, 27518.8.3 Portfolio Concentration, 27518.8.4 Market Microstructure, 276Exercises, 277References, 278Bibliography, 27919 Microstructural Features 28119.1 Motivation, 28119.2 Review of the Literature, 28119.3 First Generation: Price Sequences, 28219.3.1 The Tick Rule, 28219.3.2 The Roll Model, 28219.3.3 High-Low Volatility Estimator, 28319.3.4 Corwin and Schultz, 28419.4 Second Generation: Strategic Trade Models, 28619.4.1 Kyle’s Lambda, 28619.4.2 Amihud’s Lambda, 28819.4.3 Hasbrouck’s Lambda, 28919.5 Third Generation: Sequential Trade Models, 29019.5.1 Probability of Information-based Trading, 29019.5.2 Volume-Synchronized Probability of Informed Trading, 29219.6 Additional Features from Microstructural Datasets, 29319.6.1 Distibution of Order Sizes, 29319.6.2 Cancellation Rates, Limit Orders, Market Orders, 29319.6.3 Time-Weighted Average Price Execution Algorithms, 29419.6.4 Options Markets, 29519.6.5 Serial Correlation of Signed Order Flow, 29519.7 What Is Microstructural Information?, 295Exercises, 296References, 298Part 5 High-performance Computing Recipes 30120 Multiprocessing and Vectorization 30320.1 Motivation, 30320.2 Vectorization Example, 30320.3 Single-Thread vs. Multithreading vs. Multiprocessing, 30420.4 Atoms and Molecules, 30620.4.1 Linear Partitions, 30620.4.2 Two-Nested Loops Partitions, 30720.5 Multiprocessing Engines, 30920.5.1 Preparing the Jobs, 30920.5.2 Asynchronous Calls, 31120.5.3 Unwrapping the Callback, 31220.5.4 Pickle/Unpickle Objects, 31320.5.5 Output Reduction, 31320.6 Multiprocessing Example, 315Exercises, 316Reference, 317Bibliography, 31721 Brute Force and Quantum Computers 31921.1 Motivation, 31921.2 Combinatorial Optimization, 31921.3 The Objective Function, 32021.4 The Problem, 32121.5 An Integer Optimization Approach, 32121.5.1 Pigeonhole Partitions, 32121.5.2 Feasible Static Solutions, 32321.5.3 Evaluating Trajectories, 32321.6 A Numerical Example, 32521.6.1 Random Matrices, 32521.6.2 Static Solution, 32621.6.3 Dynamic Solution, 327Exercises, 327References, 32822 High-Performance Computational Intelligence and Forecasting Technologies 329Kesheng Wu and Horst D. Simon22.1 Motivation, 32922.2 Regulatory Response to the Flash Crash of 2010, 32922.3 Background, 33022.4 HPC Hardware, 33122.5 HPC Software, 33522.5.1 Message Passing Interface, 33522.5.2 Hierarchical Data Format 5, 33622.5.3 In Situ Processing, 33622.5.4 Convergence, 33722.6 Use Cases, 33722.6.1 Supernova Hunting, 33722.6.2 Blobs in Fusion Plasma, 33822.6.3 Intraday Peak Electricity Usage, 34022.6.4 The Flash Crash of 2010, 34122.6.5 Volume-synchronized Probability of Informed Trading Calibration, 34622.6.6 Revealing High Frequency Events with Non-uniform Fast Fourier Transform, 34722.7 Summary and Call for Participation, 34922.8 Acknowledgments, 350References, 350Index 353