Business intelligence is a broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. The term implies having a comprehensive knowledge of all factors that affect a business, such as customers, competitors, business partners, economic environment, and internal operations, therefore enabling optimal decisions to be made. Business Intelligence provides readers with an introduction and practical guide to the mathematical models and analysis methodologies vital to business intelligence.This book: Combines detailed coverage with a practical guide to the mathematical models and analysis methodologies of business intelligence.Covers all the hot topics such as data warehousing, data mining and its applications, machine learning, classification, supply optimization models, decision support systems, and analytical methods for performance evaluation.Is made accessible to readers through the careful definition and introduction of each concept, followed by the extensive use of examples and numerous real-life case studies.Explains how to utilise mathematical models and analysis models to make effective and good quality business decisions.This book is aimed at postgraduate students following data analysis and data mining courses.Researchers looking for a systematic and broad coverage of topics in operations research and mathematical models for decision-making will find this an invaluable guide.
Carlo Vercellis - School of Management, Politecnico di Milano, Italy As well as teaching courses in Operations Research and Business Intelligence, Professor Vercellis is director of the research group MOLD (Mathematical Modeling, Optimization, Learning from Data). He has written four book in Italian, contributed to numerous other books, and has had many papers published in a variety of international journals.
Preface xiiiI Components of the decision-making process 11 Business intelligence 31.1 Effective and timely decisions 31.2 Data, information and knowledge 61.3 The role of mathematical models 81.4 Business intelligence architectures 91.4.1 Cycle of a business intelligence analysis 111.4.2 Enabling factors in business intelligence projects 131.4.3 Development of a business intelligence system 141.5 Ethics and business intelligence 171.6 Notes and readings 182 Decision support systems 212.1 Definition of system 212.2 Representation of the decision-making process 232.2.1 Rationality and problem solving 242.2.2 The decision-making process 252.2.3 Types of decisions 292.2.4 Approaches to the decision-making process 332.3 Evolution of information systems 352.4 Definition of decision support system 362.5 Development of a decision support system 402.6 Notes and readings 433 Data warehousing 453.1 Definition of data warehouse 453.1.1 Data marts 493.1.2 Data quality 503.2 Data warehouse architecture 513.2.1 ETL tools 533.2.2 Metadata 543.3 Cubes and multidimensional analysis 553.3.1 Hierarchies of concepts and OLAP operations 603.3.2 Materialization of cubes of data 613.4 Notes and readings 62II Mathematical Models and Methods 634 Mathematical models for decision making 654.1 Structure of mathematical models 654.2 Development of a model 674.3 Classes of models 704.4 Notes and readings 755 Data mining 775.1 Definition of data mining 775.1.1 Models and methods for data mining 795.1.2 Data mining, classical statistics and OLAP 805.1.3 Applications of data mining 815.2 Representation of input data 825.3 Data mining process 845.4 Analysis methodologies 905.5 Notes and readings 946 Data preparation 956.1 Data validation 956.1.1 Incomplete data 966.1.2 Data affected by noise 976.2 Data transformation 996.2.1 Standardization 996.2.2 Feature extraction 1006.3 Data reduction 1006.3.1 Sampling 1016.3.2 Feature selection 1026.3.3 Principal component analysis 1046.3.4 Data discretization 1097 Data exploration 1137.1 Univariate analysis 1137.1.1 Graphical analysis of categorical attributes 1147.1.2 Graphical analysis of numerical attributes 1167.1.3 Measures of central tendency for numerical attributes 1187.1.4 Measures of dispersion for numerical attributes 1217.1.5 Measures of relative location for numerical attributes 1267.1.6 Identification of outliers for numerical attributes 1277.1.7 Measures of heterogeneity for categorical attributes 1297.1.8 Analysis of the empirical density 1307.1.9 Summary statistics 1357.2 Bivariate analysis 1367.2.1 Graphical analysis 1367.2.2 Measures of correlation for numerical attributes 1427.2.3 Contingency tables for categorical attributes 1457.3 Multivariate analysis 1477.3.1 Graphical analysis 1477.3.2 Measures of correlation for numerical attributes 1497.4 Notes and readings 1528 Regression 1538.1 Structure of regression models 1538.2 Simple linear regression 1568.2.1 Calculating the regression line 1588.3 Multiple linear regression 1618.3.1 Calculating the regression coefficients 1628.3.2 Assumptions on the residuals 1638.3.3 Treatment of categorical predictive attributes 1668.3.4 Ridge regression 1678.3.5 Generalized linear regression 1688.4 Validation of regression models 1688.4.1 Normality and independence of the residuals 1698.4.2 Significance of the coefficients 1728.4.3 Analysis of variance 1748.4.4 Coefficient of determination 1758.4.5 Coefficient of linear correlation 1768.4.6 Multicollinearity of the independent variables 1778.4.7 Confidence and prediction limits 1788.5 Selection of predictive variables 1798.5.1 Example of development of a regression model 1808.6 Notes and readings 1859 Time series 1879.1 Definition of time series 1879.1.1 Index numbers 1909.2 Evaluating time series models 1929.2.1 Distortion measures 1929.2.2 Dispersion measures 1939.2.3 Tracking signal 1949.3 Analysis of the components of time series 1959.3.1 Moving average 1969.3.2 Decomposition of a time series 1989.4 Exponential smoothing models 2039.4.1 Simple exponential smoothing 2039.4.2 Exponential smoothing with trend adjustment 2049.4.3 Exponential smoothing with trend and seasonality 2069.4.4 Simple adaptive exponential smoothing 2079.4.5 Exponential smoothing with damped trend 2089.4.6 Initial values for exponential smoothing models 2099.4.7 Removal of trend and seasonality 2099.5 Autoregressive models 2109.5.1 Moving average models 2129.5.2 Autoregressive moving average models 2129.5.3 Autoregressive integrated moving average models 2129.5.4 Identification of autoregressive models 2139.6 Combination of predictive models 2169.7 The forecasting process 2179.7.1 Characteristics of the forecasting process 2179.7.2 Selection of a forecasting method 2199.8 Notes and readings 21910 Classification 22110.1 Classification problems 22110.1.1 Taxonomy of classification models 22410.2 Evaluation of classification models 22610.2.1 Holdout method 22810.2.2 Repeated random sampling 22810.2.3 Cross-validation 22910.2.4 Confusion matrices 23010.2.5 ROC curve charts 23310.2.6 Cumulative gain and lift charts 23410.3 Classification trees 23610.3.1 Splitting rules 24010.3.2 Univariate splitting criteria 24310.3.3 Example of development of a classification tree 24610.3.4 Stopping criteria and pruning rules 25010.4 Bayesian methods 25110.4.1 Naive Bayesian classifiers 25210.4.2 Example of naive Bayes classifier 25310.4.3 Bayesian networks 25610.5 Logistic regression 25710.6 Neural networks 25910.6.1 The Rosenblatt perceptron 25910.6.2 Multi-level feed-forward networks 26010.7 Support vector machines 26210.7.1 Structural risk minimization 26210.7.2 Maximal margin hyperplane for linear separation 26610.7.3 Nonlinear separation 27010.8 Notes and readings 27511 Association rules 27711.1 Motivation and structure of association rules 27711.2 Single-dimension association rules 28111.3 Apriori algorithm 28411.3.1 Generation of frequent itemsets 28411.3.2 Generation of strong rules 28511.4 General association rules 28811.5 Notes and readings 29012 Clustering 29312.1 Clustering methods 29312.1.1 Taxonomy of clustering methods 29412.1.2 Affinity measures 29612.2 Partition methods 30212.2.1 K-means algorithm 30212.2.2 K-medoids algorithm 30512.3 Hierarchical methods 30712.3.1 Agglomerative hierarchical methods 30812.3.2 Divisive hierarchical methods 31012.4 Evaluation of clustering models 31212.5 Notes and readings 315III Business Intelligence Applications 31713 Marketing models 31913.1 Relational marketing 32013.1.1 Motivations and objectives 32013.1.2 An environment for relational marketing analysis 32713.1.3 Lifetime value 32913.1.4 The effect of latency in predictive models 33213.1.5 Acquisition 33313.1.6 Retention 33413.1.7 Cross-selling and up-selling 33513.1.8 Market basket analysis 33513.1.9 Web mining 33613.2 Salesforce management 33813.2.1 Decision processes in salesforce management 33913.2.2 Models for salesforce management 34213.2.3 Response functions 34313.2.4 Sales territory design 34613.2.5 Calls and product presentations planning 34713.3 Business case studies 35213.3.1 Retention in telecommunications 35213.3.2 Acquisition in the automotive industry 35413.3.3 Cross-selling in the retail industry 35813.4 Notes and readings 36014 Logistic and production models 36114.1 Supply chain optimization 36214.2 Optimization models for logistics planning 36414.2.1 Tactical planning 36414.2.2 Extra capacity 36514.2.3 Multiple resources 36614.2.4 Backlogging 36614.2.5 Minimum lots and fixed costs 36914.2.6 Bill of materials 37014.2.7 Multiple plants 37114.3 Revenue management systems 37214.3.1 Decision processes in revenue management 37314.4 Business case studies 37614.4.1 Logistics planning in the food industry 37614.4.2 Logistics planning in the packaging industry 38314.5 Notes and readings 38415 Data envelopment analysis 38515.1 Efficiency measures 38615.2 Efficient frontier 38615.3 The CCR model 39015.3.1 Definition of target objectives 39215.3.2 Peer groups 39315.4 Identification of good operating practices 39415.4.1 Cross-efficiency analysis 39415.4.2 Virtual inputs and virtual outputs 39515.4.3 Weight restrictions 39615.5 Other models 39615.6 Notes and readings 397Appendix A Software tools 399Appendix B Dataset repositories 401References 403Index 413