Intelligent Data Analysis
From Data Gathering to Data Comprehension
Inbunden, Engelska, 2020
Av Deepak Gupta, Siddhartha Bhattacharyya, Ashish Khanna, Kalpna Sagar, India) Gupta, Deepak (Dr. APJ Abdul Kalam Technical University, Lucknow, India) Bhattacharyya, Siddhartha (CHRIST (Deemed to be University), Bengaluru, India) Khanna, Ashish (National Institute of Technology, Kurukshetra, India) Sagar, Kalpna (Guru Gobind Singh Indraprastha University, Delhi
1 879 kr
Produktinformation
- Utgivningsdatum2020-06-25
- Mått178 x 246 x 28 mm
- Vikt907 g
- FormatInbunden
- SpråkEngelska
- SerieWiley Series in Intelligent Signal and Data Processing
- Antal sidor432
- FörlagJohn Wiley & Sons Inc
- ISBN9781119544456
Tillhör följande kategorier
Deepak Gupta completed his PhD (CSE) at Dr. APJ Abdul Kalam Technical University, Lucknow, India; his M.E. (CTA) at Delhi College of Engineering, New Delhi, India; and his B.Tech at Guru Gobind Singh Indraprastha University, Delhi, India. He completed his postdoctoral research on the Internet of Things (IoT) at the National Institute of Telecommunications, Ghaziabad, India. He is a guest editor for SCI and SCOPUS and has co-authored 38 books and published 95 research papers. Siddhartha Bhattacharyya, PhD, is a Professor of Computer Science at CHRIST (Deemed to be University), Bengaluru, India. Ashish Khanna received his PhD from the National Institute of Technology, Kurukshetra, India. He completed his M.Tech and B.Tech at Guru Gobind Singh Indraprastha University, Delhi, India in 2004. He has published 100 research papers and co-authored 22 textbooks on engineering. His research includes distributed computing, distributed systems, cloud computing, and opportunistic networks. He completed his postdoctoral research on the Internet of Things (IoT) at the National Institute of Telecommunications, Ghaziabad, India. Kalpna Sagar received her B.Tech from Indira Gandhi Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India and her M.Tech from University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, Delhi, India. She is currently pursuing her PhD. Her research includes software engineering, human-computer interaction, and data-mining. She has published numerous research papers and is currently an Assistant Professor and Assistant Dean of Academics at KIET Group of Institutions, Dr. APJ Abdul Kalam Technical University, Lucknow, India.
- List of Contributors xixSeries Preface xxiiiPreface xxv1 Intelligent Data Analysis: Black Box Versus White Box Modeling 1Sarthak Gupta, Siddhant Bagga, and Deepak Kumar Sharma1.1 Introduction 11.1.1 Intelligent Data Analysis 11.1.2 Applications of IDA and Machine Learning 21.1.3 White Box Models Versus Black Box Models 21.1.4 Model Interpretability 31.2 Interpretation of White Box Models 31.2.1 Linear Regression 31.2.2 Decision Tree 51.3 Interpretation of Black Box Models 71.3.1 Partial Dependence Plot 71.3.2 Individual Conditional Expectation 91.3.3 Accumulated Local Effects 91.3.4 Global Surrogate Models 121.3.5 Local Interpretable Model-Agnostic Explanations 121.3.6 Feature Importance 121.4 Issues and Further Challenges 131.5 Summary 13References 142 Data: Its Nature and Modern Data Analytical Tools 17Ravinder Ahuja, Shikhar Asthana, Ayush Ahuja, and Manu Agarwal2.1 Introduction 172.2 Data Types and Various File Formats 182.2.1 Structured Data 182.2.2 Semi-Structured Data 202.2.3 Unstructured Data 202.2.4 Need for File Formats 212.2.5 Various Types of File Formats 222.2.5.1 Comma Separated Values (CSV) 222.2.5.2 ZIP 222.2.5.3 Plain Text (txt) 232.2.5.4 JSON 232.2.5.5 XML 232.2.5.6 Image Files 242.2.5.7 HTML 242.3 Overview of Big Data 252.3.1 Sources of Big Data 272.3.1.1 Media 272.3.1.2 The Web 272.3.1.3 Cloud 272.3.1.4 Internet of Things 272.3.1.5 Databases 272.3.1.6 Archives 282.3.2 Big Data Analytics 282.3.2.1 Descriptive Analytics 282.3.2.2 Predictive Analytics 282.3.2.3 Prescriptive Analytics 292.4 Data Analytics Phases 292.5 Data Analytical Tools 302.5.1 Microsoft Excel 302.5.2 Apache Spark 332.5.3 Open Refine 342.5.4 R Programming 352.5.4.1 Advantages of R 362.5.4.2 Disadvantages of R 362.5.5 Tableau 362.5.5.1 How TableauWorks 362.5.5.2 Tableau Feature 372.5.5.3 Advantages 372.5.5.4 Disadvantages 372.5.6 Hadoop 372.5.6.1 Basic Components of Hadoop 382.5.6.2 Benefits 382.6 Database Management System for Big Data Analytics 382.6.1 Hadoop Distributed File System 382.6.2 NoSql 382.6.2.1 Categories of NoSql 392.7 Challenges in Big Data Analytics 392.7.1 Storage of Data 402.7.2 Synchronization of Data 402.7.3 Security of Data 402.7.4 Fewer Professionals 402.8 Conclusion 40References 413 Statistical Methods for Intelligent Data Analysis: Introduction and Various Concepts 43Shubham Kumaram, Samarth Chugh, and Deepak Kumar Sharma3.1 Introduction 433.2 Probability 433.2.1 Definitions 433.2.1.1 Random Experiments 433.2.1.2 Probability 443.2.1.3 Probability Axioms 443.2.1.4 Conditional Probability 443.2.1.5 Independence 443.2.1.6 Random Variable 443.2.1.7 Probability Distribution 453.2.1.8 Expectation 453.2.1.9 Variance and Standard Deviation 453.2.2 Bayes’ Rule 453.3 Descriptive Statistics 463.3.1 Picture Representation 463.3.1.1 Frequency Distribution 463.3.1.2 Simple Frequency Distribution 463.3.1.3 Grouped Frequency Distribution 463.3.1.4 Stem and Leaf Display 463.3.1.5 Histogram and Bar Chart 473.3.2 Measures of Central Tendency 473.3.2.1 Mean 473.3.2.2 Median 473.3.2.3 Mode 473.3.3 Measures of Variability 483.3.3.1 Range 483.3.3.2 Box Plot 483.3.3.3 Variance and Standard Deviation 483.3.4 Skewness and Kurtosis 483.4 Inferential Statistics 493.4.1 Frequentist Inference 493.4.1.1 Point Estimation 503.4.1.2 Interval Estimation 503.4.2 Hypothesis Testing 513.4.3 Statistical Significance 513.5 Statistical Methods 523.5.1 Regression 523.5.1.1 Linear Model 523.5.1.2 Nonlinear Models 523.5.1.3 Generalized Linear Models 533.5.1.4 Analysis of Variance 533.5.1.5 Multivariate Analysis of Variance 553.5.1.6 Log-Linear Models 553.5.1.7 Logistic Regression 563.5.1.8 Random Effects Model 563.5.1.9 Overdispersion 573.5.1.10 Hierarchical Models 573.5.2 Analysis of Survival Data 573.5.3 Principal Component Analysis 583.6 Errors 593.6.1 Error in Regression 603.6.2 Error in Classification 613.7 Conclusion 61References 614 Intelligent Data Analysis with Data Mining: Theory and Applications 63Shivam Bachhety, Ramneek Singhal, and Rachna Jain Objective 634.1 Introduction to Data Mining 634.1.1 Importance of Intelligent Data Analytics in Business 644.1.2 Importance of Intelligent Data Analytics in Health Care 654.2 Data and Knowledge 654.3 Discovering Knowledge in Data Mining 664.3.1 Process Mining 674.3.2 Process of Knowledge Discovery 674.4 Data Analysis and Data Mining 694.5 Data Mining: Issues 694.6 Data Mining: Systems and Query Language 714.6.1 Data Mining Systems 714.6.2 Data Mining Query Language 724.7 Data Mining Methods 734.7.1 Classification 744.7.2 Cluster Analysis 754.7.3 Association 754.7.4 Decision Tree Induction 764.8 Data Exploration 774.9 Data Visualization 804.10 Probability Concepts for Intelligent Data Analysis (IDA) 83Reference 835 Intelligent Data Analysis: Deep Learning and Visualization 85Than D. Le and Huy V. Pham5.1 Introduction 855.2 Deep Learning and Visualization 865.2.1 Linear and Logistic Regression and Visualization 865.2.2 CNN Architecture 895.2.2.1 Vanishing Gradient Problem 905.2.2.2 Convolutional Neural Networks (CNNs) 915.2.3 Reinforcement Learning 915.2.4 Inception and ResNet Networks 935.2.5 Softmax 945.3 Data Processing and Visualization 975.3.1 Regularization for Deep Learning and Visualization 985.3.1.1 Regularization for Linear Regression 985.4 Experiments and Results 1025.4.1 Mask RCNN Based on Object Detection and Segmentation 1025.4.2 Deep Matrix Factorization 1085.4.2.1 Network Visualization 1085.4.3 Deep Learning and Reinforcement Learning 1115.5 Conclusion 112References 1136 A Systematic Review on the Evolution of Dental Caries Detection Methods and Its Significance in Data Analysis Perspective 115Soma Datta, Nabendu Chaki, and Biswajit Modak6.1 Introduction 1156.1.1 Analysis of Dental Caries 1156.2 Different Caries Lesion Detection Methods and Data Characterization 1196.2.1 Point Detection Method 1206.2.2 Visible Light Property Method 1216.2.3 Radiographs 1216.2.4 Light-Emitting Devices 1236.2.5 Optical Coherent Tomography (OCT) 1256.2.6 Software Tools 1256.3 Technical Challenges with the Existing Methods 1266.3.1 Challenges in Data Analysis Perspective 1276.4 Result Analysis 1296.5 Conclusion 129Acknowledgment 131References 1317 Intelligent Data Analysis Using Hadoop Cluster – Inspired MapReduce Framework and Association Rule Mining on Educational Domain 137Pratiyush Guleria and Manu Sood7.1 Introduction 1377.1.1 Research Areas of IDA 1387.1.2 The Need for IDA in Education 1397.2 Learning Analytics in Education 1397.2.1 Role of Web-Enabled and Mobile Computing in Education 1417.2.2 Benefits of Learning Analytics 1427.2.3 Future Research Directions of IDA 1427.3 Motivation 1427.4 Literature Review 1437.4.1 Association Rule Mining and Big Data 1437.5 Intelligent Data Analytical Tools 1457.6 Intelligent Data Analytics Using MapReduce Framework in an Educational Domain 1497.6.1 Data Description 1497.6.2 Objective 1507.6.3 Proposed Methodology 1507.6.3.1 Stage 1 Map Reduce Algorithm 1507.6.3.2 Stage 2 Apriori Algorithm 1507.7 Results 1517.8 Conclusion and Future Scope 153References 1538 Influence of Green Space on Global Air Quality Monitoring: Data Analysis Using K-Means Clustering Algorithm 157Gihan S. Pathirana and Malka N. Halgamuge8.1 Introduction 1578.2 Material and Methods 1598.2.1 Data Collection 1598.2.2 Data Inclusion Criteria 1598.2.3 Data Preprocessing 1598.2.4 Data Analysis 1618.3 Results 1618.4 Quantitative Analysis 1638.4.1 K-Means Clustering 1638.4.2 Level of Difference of Green Area 1678.5 Discussion 1678.6 Conclusion 169References 1709 IDA with Space Technology and Geographic Information System 173Bright Keswani, Tarini Ch. Mishra, Ambarish G. Mohapatra, Poonam Keswani, Priyatosh Sahu, and Anish Kumar Sarangi9.1 Introduction 1739.1.1 Real-Time in Space 1769.1.2 Generating Programming Triggers 1789.1.3 Analytical Architecture 1789.1.4 Remote Sensing Big Data Acquisition Unit (RSDU) 1809.1.5 Data Processing Unit 1809.1.6 Data Analysis and Decision Unit 1819.1.7 Analysis 1819.1.8 Incorporating Machine Learning and Artificial Intelligence 1819.1.8.1 Methodologies Applicable 1829.1.8.2 Support Vector Machines (SVM) and Cross-Validation 1829.1.8.3 Massively Parallel Computing and I/O 1839.1.8.4 Data Architecture and Governance 1839.1.9 Real-Time Spacecraft Detection 1859.1.9.1 Active Phased Array 1869.1.9.2 Relay Communication 1869.1.9.3 Low-Latency Random Access 1869.1.9.4 Channel Modeling and Prediction 1869.2 Geospatial Techniques 1879.2.1 The Big-GIS 1879.2.2 Technologies Applied 1879.2.2.1 Internet of Things and Sensor Web 1889.2.2.2 Cloud Computing 1889.2.2.3 Stream Processing 1889.2.2.4 Big Data Analytics 1889.2.2.5 Coordinated Observation 1889.2.2.6 Big Geospatial Data Management 1899.2.2.7 Parallel Geocomputation Framework 1899.2.3 Data Collection Using GIS 1899.2.3.1 NoSQL Databases 1909.2.3.2 Parallel Processing 1909.2.3.3 Knowledge Discovery and Intelligent Service 1909.2.3.4 Data Analysis 1919.3 Comparative Analysis 1929.4 Conclusion 192References 19410 Application of Intelligent Data Analysis in Intelligent Transportation System Using IoT 199Rakesh Roshan and Om Prakash Rishi10.1 Introduction to Intelligent Transportation System (ITS) 19910.1.1 Working of Intelligent Transportation System 20110.1.2 Services of Intelligent Transportation System 20110.1.3 Advantages of Intelligent Transportation System 20310.2 Issues and Challenges of Intelligent Transportation System (ITS) 20410.2.1 Communication Technology Used Currently in ITS 20510.2.2 Challenges in the Implementation of ITS 20610.2.3 Opportunity for Popularity of Automated/Autonomous/Self-Driving Car or Vehicle 20710.3 Intelligent Data Analysis Makes an IoT-Based Transportation System Intelligent 20810.3.1 Introduction to Intelligent Data Analysis 20810.3.2 How IDA Makes IoT-Based Transportation Systems Intelligent 21010.3.2.1 Traffic Management Through IoT and Intelligent Data Analysis 21010.3.2.2 Tracking of Multiple Vehicles 21110.4 Intelligent Data Analysis for Security in Intelligent Transportation System 21210.5 Tools to Support IDA in an Intelligent Transportation System 215References 21711 Applying Big Data Analytics on Motor Vehicle Collision Predictions in New York City 219Dhanushka Abeyratne and Malka N. Halgamuge11.1 Introduction 21911.1.1 Overview of Big Data Analytics on Motor Vehicle Collision Predictions 21911.2 Materials and Methods 22011.2.1 Collection of Raw Data 22011.2.2 Data Inclusion Criteria 22011.2.3 Data Preprocessing 22011.2.4 Data Analysis 22111.3 Classification Algorithms and K-Fold Validation Using Data Set Obtained from NYPD (2012–2017) 22311.3.1 Classification Algorithms 22311.3.1.1 k-Fold Cross-Validation 22311.3.2 Statistical Analysis 22511.4 Results 22511.4.1 Measured Processing Time and Accuracy of Each Classifier 22511.4.2 Measured p-Value in each Vehicle Group Using K-Means Clustering/One-Way ANOVA 22711.4.3 Identified High Collision Concentration Locations of Each Vehicle Group 22911.4.4 Measured Different Criteria for Further Analysis of NYPD Data Set (2012–2017) 22911.5 Discussion 23311.6 Conclusion 237References 23812 A Smart and Promising Neurological Disorder Diagnostic System: An Amalgamation of Big Data, IoT, and Emerging Computing Techniques 241Prableen Kaur and Manik Sharma12.1 Introduction 24112.1.1 Difference Between Neurological and Psychological Disorders 24112.2 Statistics of Neurological Disorders 24312.3 Emerging Computing Techniques 24412.3.1 Internet of Things 24412.3.2 Big Data 24512.3.3 Soft Computing Techniques 24512.4 Related Works and Publication Trends of Articles 24912.5 The Need for Neurological Disorders Diagnostic System 25112.5.1 Design of Smart and Intelligent Neurological Disorders Diagnostic System 25112.6 Conclusion 259References 26013 Comments-Based Analysis of a Bug Report Collection System and Its Applications 265Arvinder Kaur and Shubhra Goyal13.1 Introduction 26513.2 Background 26713.2.1 Issue Tracking System 26713.2.2 Bug Report Statistics 26713.3 Related Work 26813.3.1 Data Extraction Process 26813.3.2 Applications of Bug Report Comments 27013.3.2.1 Bug Summarization 27013.3.2.2 Emotion Mining 27113.4 Data Collection Process 27213.4.1 Steps of Data Extraction 27313.4.2 Block Diagram for Data Extraction 27413.4.3 Reports Generated 27413.4.3.1 Bug Attribute Report 27413.4.3.2 Long Description Report 27513.4.3.3 Bug Comments Reports 27513.4.3.4 Error Report 27513.5 Analysis of Bug Reports 27513.5.1 Research Question 1: Is the Performance of Software Affected by Open Bugs that are Critical in Nature? 27513.5.2 Research Question 2: How Can Test Leads Improve the Performance of Software Systems? 27713.5.3 Research Question 3: Which Are the Most Error-Prone Areas that Can Cause System Failure? 27713.5.4 Research Question 4: Which Are the Most Frequent Words and Keywords to Predict Most Critical Bugs? 27913.5.5 Research Questions 5: What is the Importance of Frequent Words Mined from Bug Reports? 28113.6 Threats to Validity 28413.7 Conclusion 284References 28614 Sarcasm Detection Algorithms Based on Sentiment Strength 289Pragya Katyayan and Nisheeth Joshi14.1 Introduction 28914.2 Literature Survey 29114.3 Experiment 29414.3.1 Data Collection 29414.3.2 Finding SentiStrengths 29414.3.3 Proposed Algorithm 29514.3.4 Explanation of the Algorithms 29714.3.5 Classification 30014.3.5.1 Explanation 30014.3.6 Evaluation 30214.4 Results and Evaluation 30314.5 Conclusion 305References 30515 SNAP: Social Network Analysis Using Predictive Modeling 307Samridhi Seth and Rahul Johari15.1 Introduction 30715.1.1 Types of Predictive Analytics Models 30715.1.2 Predictive Analytics Techniques 30815.1.2.1 Regression Techniques 30815.1.2.2 Machine Learning Techniques 30815.2 Literature Survey 30915.3 Comparative Study 31315.4 Simulation and Analysis 31315.4.1 Few Analyses Made on the Data Set Are Given Below 31415.4.1.1 Duration of Each Contact Was Found 31415.4.1.2 Total Number of Contacts of Source Node with Destination Node Was Found for all Nodes 31415.4.1.3 Total Duration of Contact of Source Node with Each Node Was Found 31515.4.1.4 Mobility Pattern Describes Direction of Contact and Relation Between Number of Contacts and Duration of Contact 31515.4.1.5 Unidirectional Contact, that is, Only 1 Node is Contacting Second Node but Vice Versa is Not There 31715.4.1.6 Graphical Representation for the Duration of Contacts with Each Node is Given below 31715.4.1.7 Rank and Percentile for Number of Contacts with Each Node 32015.4.1.8 Data Set is Described for Three Days Where Time is Calculated in Seconds. Data Set can be Divided Into Three Days. Some of the Analyses Conducted on the Data set Day Wise Are Given Below 32615.5 Conclusion and Future Work 329References 32916 Intelligent Data Analysis for Medical Applications 333Moolchand Sharma, Vikas Chaudhary, Prerna Sharma, and R. S. Bhatia16.1 Introduction 33316.1.1 IDA (Intelligent Data Analysis) 33516.1.1.1 Elicitation of Background Knowledge 33716.1.2 Medical Applications 33716.2 IDA Needs in Medical Applications 33816.2.1 Public Health 33916.2.2 Electronic Health Record 33916.2.3 Patient Profile Analytics 33916.2.3.1 Patient’s Profile 33916.3 IDA Methods Classifications 33916.3.1 Data Abstraction 33916.3.2 Data Mining Method 34016.3.3 Temporal Data Mining 34116.4 Intelligent Decision Support System in Medical Applications 34116.4.1 Need for Intelligent Decision System (IDS) 34216.4.2 Understanding Intelligent Decision Support: Some Definitions 34216.4.3 Advantages/Disadvantages of IDS 34416.5 Conclusion 345References 34517 Bruxism Detection Using Single-Channel C4-A1 on Human Sleep S2 Stage Recording 347Md Belal Bin Heyat, Dakun Lai, Faijan Akhtar, Mohd Ammar Bin Hayat, Shafan Azad, Shadab Azad, and Shajan Azad17.1 Introduction 34717.1.1 Side Effect of Poor Snooze 34817.2 History of Sleep Disorder 34917.2.1 Classification of Sleep Disorder 34917.2.2 Sleep Stages of the Human 35117.3 Electroencephalogram Signal 35117.3.1 Electroencephalogram Generation 35117.3.1.1 Classification of Electroencephalogram Signal 35217.4 EEG Data Measurement Technique 35217.4.1 10–20 Electrode Positioning System 35217.4.1.1 Procedure of Electrode placement 35317.5 Literature Review 35417.6 Subjects and Methodology 35417.6.1 Data Collection 35417.6.2 Low Pass Filter 35517.6.3 Hanning Window 35517.6.4 Welch Method 35617.7 Data Analysis of the Bruxism and Normal Data Using EEG Signal 35617.8 Result 35817.9 Conclusions 361Acknowledgments 363References 36418 Handwriting Analysis for Early Detection of Alzheimer’s Disease 369Rajib Saha, Anirban Mukherjee, Aniruddha Sadhukhan, Anisha Roy, and Manashi De18.1 Introduction and Background 36918.2 Proposed Work and Methodology 37618.3 Results and Discussions 37918.3.1 Character Segmentation 38018.4 Conclusion 384References 385Index 387