Practical Guide to Data Mining for Business and Industry

Inbunden, Engelska, 2014

AvAndrea Ahlemeyer-Stubbe,Shirley Coleman

1 109 kr

Beställningsvara. Skickas inom 5-8 vardagar. Fri frakt för medlemmar vid köp för minst 249 kr.

Data mining is well on its way to becoming a recognized discipline in the overlapping areas of IT, statistics, machine learning, and AI. Practical Data Mining for Business presents a user-friendly approach to data mining methods, covering the typical uses to which it is applied. The methodology is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications. The book is formatted to allow statisticians, computer scientists, and economists to cross-reference from a particular application or method to sectors of interest.

Produktinformation

Utgivningsdatum2014-05-13
Mått155 x 231 x 23 mm
Vikt522 g
FormatInbunden
SpråkEngelska
Antal sidor328
FörlagJohn Wiley & Sons Inc
ISBN9781119977131

Tillhör följande kategorier

Databaser inom Data och IT
Affärsförhandlingar inom Ekonomi och Ledarskap
Företagsekonomi inom Ekonomi och Ledarskap

Glossary of terms xiiPart I Data Mining Concept 11 Introduction 31.1 Aims of the Book 31.2 Data Mining Context 51.2.1 Domain Knowledge 61.2.2 Words to Remember 71.2.3 Associated Concepts 71.3 Global Appeal 81.4 Example Datasets Used in This Book 81.5 Recipe Structure 111.6 Further Reading and Resources 132 Data Mining Definition 142.1 Types of Data Mining Questions 152.1.1 Population and Sample 152.1.2 Data Preparation 162.1.3 Supervised and Unsupervised Methods 162.1.4 Knowledge-Discovery Techniques 182.2 Data Mining Process 192.3 Business Task: Clarification of the Business Question behind the Problem 202.4 Data: Provision and Processing of the Required Data 212.4.1 Fixing the Analysis Period 222.4.2 Basic Unit of Interest 232.4.3 Target Variables 242.4.4 Input Variables/Explanatory Variables 242.5 Modelling: Analysis of the Data 252.6 Evaluation and Validation during the Analysis Stage 252.7 Application of Data Mining Results and Learning from the Experience 28Part II Data Mining Practicalities 313 All about data 333.1 Some Basics 343.1.1 Data, Information, Knowledge and Wisdom 353.1.2 Sources and Quality of Data 363.1.3 Measurement Level and Types of Data 373.1.4 Measures of Magnitude and Dispersion 393.1.5 Data Distributions 413.2 Data Partition: Random Samples for Training, Testing and Validation 413.3 Types of Business Information Systems 443.3.1 Operational Systems Supporting Business Processes 443.3.2 Analysis-Based Information Systems 453.3.3 Importance of Information 453.4 Data Warehouses 473.4.1 Topic Orientation 473.4.2 Logical Integration and Homogenisation 483.4.3 Reference Period 483.4.4 Low Volatility 483.4.5 Using the Data Warehouse 493.5 Three Components of a Data Warehouse: DBMS, DB and DBCS 503.5.1 Database Management System (DBMS) 513.5.2 Database (DB) 513.5.3 Database Communication Systems (DBCS) 513.6 Data Marts 523.6.1 Regularly Filled Data Marts 533.6.2 Comparison between Data Marts and Data Warehouses 533.7 A Typical Example from the Online Marketing Area 543.8 Unique Data Marts 543.8.1 Permanent Data Marts 543.8.2 Data Marts Resulting from Complex Analysis 563.9 Data Mart: Do’s and Don’ts 583.9.1 Do’s and Don’ts for Processes 583.9.2 Do’s and Don’ts for Handling 583.9.3 Do’s and Don’ts for Coding/Programming 594 Data Preparation 604.1 Necessity of Data Preparation 614.2 From Small and Long to Short and Wide 614.3 Transformation of Variables 654.4 Missing Data and Imputation Strategies 664.5 Outliers 694.6 Dealing with the Vagaries of Data 704.6.1 Distributions 704.6.2 Tests for Normality 704.6.3 Data with Totally Different Scales 704.7 Adjusting the Data Distributions 714.7.1 Standardisation and Normalisation 714.7.2 Ranking 714.7.3 Box–Cox Transformation 714.8 Binning 724.8.1 Bucket Method 734.8.2 Analytical Binning for Nominal Variables 734.8.3 Quantiles 734.8.4 Binning in Practice 744.9 Timing Considerations 774.10 Operational Issues 775 Analytics 785.1 Introduction 795.2 Basis of Statistical Tests 805.2.1 Hypothesis Tests and P Values 805.2.2 Tolerance Intervals 825.2.3 Standard Errors and Confidence Intervals 835.3 Sampling 835.3.1 Methods 835.3.2 Sample Sizes 845.3.3 Sample Quality and Stability 845.4 Basic Statistics for Pre-analytics 855.4.1 Frequencies 855.4.2 Comparative Tests 885.4.3 Cross Tabulation and Contingency Tables 895.4.4 Correlations 905.4.5 Association Measures for Nominal Variables 915.4.6 Examples of Output from Comparative and Cross Tabulation Tests 925.5 Feature Selection/Reduction of Variables 965.5.1 Feature Reduction Using Domain Knowledge 965.5.2 Feature Selection Using Chi-Square 975.5.3 Principal Components Analysis and Factor Analysis 975.5.4 Canonical Correlation, PLS and SEM 985.5.5 Decision Trees 985.5.6 Random Forests 985.6 Time Series Analysis 996 Methods 1026.1 Methods Overview 1046.2 Supervised Learning 1056.2.1 Introduction and Process Steps 1056.2.2 Business Task 1056.2.3 Provision and Processing of the Required Data 1066.2.4 Analysis of the Data 1076.2.5 Evaluation and Validation of the Results (during the Analysis) 1086.2.6 Application of the Results 1086.3 Multiple Linear Regression for use when Target is Continuous 1096.3.1 Rationale of Multiple Linear Regression Modelling 1096.3.2 Regression Coefficients 1106.3.3 Assessment of the Quality of the Model 1116.3.4 Example of Linear Regression in Practice 1136.4 Regression when the Target is not Continuous 1196.4.1 Logistic Regression 1196.4.2 Example of Logistic Regression in Practice 1216.4.3 Discriminant Analysis 1266.4.4 Log-Linear Models and Poisson Regression 1286.5 Decision Trees 1296.5.1 Overview 1296.5.2 Selection Procedures of the Relevant Input Variables 1346.5.3 Splitting Criteria 1346.5.4 Number of Splits (Branches of the Tree) 1356.5.5 Symmetry/Asymmetry 1356.5.6 Pruning 1356.6 Neural Networks 1376.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks 1416.8 Unsupervised Learning 1426.8.1 Introduction and Process Steps 1426.8.2 Business Task 1436.8.3 Provision and Processing of the Required Data 1436.8.4 Analysis of the Data 1456.8.5 Evaluation and Validation of the Results (during the Analysis) 1476.8.6 Application of the Results 1486.9 Cluster Analysis 1486.9.1 Introduction 1486.9.2 Hierarchical Cluster Analysis 1496.9.3 K-Means Method of Cluster Analysis 1506.9.4 Example of Cluster Analysis in Practice 1516.10 Kohonen Networks and Self-Organising Maps 1516.10.1 Description 1516.10.2 Example of SOMs in Practice 1526.11 Group Purchase Methods: Association and Sequence Analysis 1556.11.1 Introduction 1556.11.2 Analysis of the Data 1576.11.3 Group Purchase Methods 1586.11.4 Examples of Group Purchase Methods in Practice 1587 Validation and Application 1617.1 Introduction to Methods for Validation 1617.2 Lift and Gain Charts 1627.3 Model Stability 1647.4 Sensitivity Analysis 1677.5 Threshold Analytics and Confusion Matrix 1697.6 ROC Curves 1707.7 Cross-Validation and Robustness 1717.8 Model Complexity 172Part III Data Mining in Action 1738 Marketing: Prediction 1758.1 Recipe 1: Response Optimisation: to Find and Address the Right Number of Customers 1768.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer 1868.3 Recipe 3: To Find the Right Number of Customers to Ignore 1878.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer 1908.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy 1918.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy 1928.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase 1938.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Communication Areas 1948.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Insurance Areas 1969 Intra-Customer Analysis 1989.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer 1999.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer 2009.3 Recipe 12: To Find and Describe Homogeneous Groups of Products 2069.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage 2109.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups 2169.6 Recipe 15: Product Set Combination 2179.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer 21910 Learning from a Small Testing Sample and Prediction 22510.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income) 22510.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases 23610.3 Recipe 19: To Understand Operational Features and General Business Forecasting 24111 Miscellaneous 24411.1 Recipe 20: To Find Customers Who Will Potentially Churn 24411.2 Recipe 21: Indirect Churn Based on a Discontinued Contract 24911.3 Recipe 22: Social Media Target Group Descriptions 25011.4 Recipe 23: Web Monitoring 25411.5 Recipe 24: To Predict Who is Likely to Click on a Special Banner 25812 Software and Tools: A Quick Guide 26112.1 List of Requirements When Choosing a Data Mining Tool 26112.2 Introduction to the Idea of Fully Automated Modelling (FAM) 26512.2.1 Predictive Behavioural Targeting 26512.2.2 Fully Automatic Predictive Targeting and Modelling Real-Time Online Behaviour 26612.3 FAM Function 26612.4 FAM Architecture 26712.5 FAM Data Flows and Databases 26812.6 FAM Modelling Aspects 26912.7 FAM Challenges and Critical Success Factors 27012.8 FAM Summary 27013 Overviews 27113.1 To Make Use of Official Statistics 27213.2 How to Use Simple Maths to Make an Impression 27213.2.1 Approximations 27213.2.2 Absolute and Relative Values 27313.2.3 % Change 27313.2.4 Values in Context 27313.2.5 Confidence Intervals 27413.2.6 Rounding 27413.2.7 Tables 27413.2.8 Figures 27413.3 Differences between Statistical Analysis and Data Mining 27513.3.1 Assumptions 27513.3.2 Values Missing Because ‘Nothing Happened’ 27513.3.3 Sample Sizes 27613.3.4 Goodness-of-Fit Tests 27613.3.5 Model Complexity 27713.4 How to Use Data Mining in Different Industries 27713.5 Future Views 283Bibliography 285Index 296

“A Practical Guide to Data Mining for Business and Industrygives practical tools on how information can be extracted from masses of data. The book is very well written, in a conversational tone that makes it enjoyable to read. The authors are excellent communicators. If you are interested in learning about data mining, learning to do a particular task in data mining, looking for a textbook to use in a data mining or analytics course, or have a problem or data analytic task you are working on, this book would be an excellent place to start.” (Mathematical Association of America, 23 August 2014)