Data Wrangling
Concepts, Applications and Tools
Inbunden, Engelska, 2023
Av M. Niranjanamurthy, Kavita Sheoran, Geetika Dhand, Prabhjot Kaur, India) Niranjanamurthy, M. (M S Ramaiah Institute of Technology, India) Sheoran, Kavita (MSIT, India) Dhand, Geetika (Maharaja Surajmal Institute of Technology
2 529 kr
Produktinformation
- Utgivningsdatum2023-06-28
- Vikt758 g
- FormatInbunden
- SpråkEngelska
- Antal sidor368
- FörlagJohn Wiley & Sons Inc
- ISBN9781119879688
Tillhör följande kategorier
M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M S Ramaiah Institute of Technology, Bangalore, Karnataka. He earned his PhD in computer science at JJTU, Rajasthan, India. He has over 11 years of teaching experience and two years of industry experience as a software engineer. He has published several books, and he is working on numerous books for Scrivener Publishing. He has published over 60 papers for scholarly journals and conferences, and he is working as a reviewer in 22 scientific journals. He also has numerous awards to his credit.Kavita Sheoran, PhD, is an associate professor in the Computer Science Department, MSIT, Delhi, and she earned her PhD in computer science from Gautam Buddha University, Greater Noida. With over 17 years of teaching experience, she has published various papers in reputed journals and has published two books.Geetika Dhand, PhD, is an associate professor in the Department of Computer Science and Engineering at Maharaja Surajmal Institute of Technology. After earning her PhD in computer science from Manav Rachna International Institute of Research and Studies, Faridabad, she has taught for over 17 years. She has published one book and a number of papers in technical journals.Prabhjot Kaur has over 19 years of teaching experience and has earned two PhDs for her work in two different research areas. She has authored two books and more than 40 research papers in reputed journals and conferences. She also has one patent to her credit.
- 1 Basic Principles of Data Wrangling 1Akshay Singh, Surender Singh and Jyotsna Rathee1.1 Introduction 21.2 Data Workflow Structure 41.3 Raw Data Stage 41.3.1 Data Input 51.3.2 Output Actions at Raw Data Stage 61.3.3 Structure 61.3.4 Granularity 71.3.5 Accuracy 71.3.6 Temporality 81.3.7 Scope 81.4 Refined Stage 91.4.1 Data Design and Preparation 91.4.2 Structure Issues 91.4.3 Granularity Issues 101.4.4 Accuracy Issues 101.4.5 Scope Issues 111.4.6 Output Actions at Refined Stage 111.5 Produced Stage 121.5.1 Data Optimization 131.5.2 Output Actions at Produced Stage 131.6 Steps of Data Wrangling 141.7 Do’s for Data Wrangling 161.8 Tools for Data Wrangling 16References 172 Skills and Responsibilities of Data Wrangler 19Prabhjot Kaur, Anupama Kaushik and Aditya Kapoor2.1 Introduction 202.2 Role as an Administrator (Data and Database) 212.3 Skills Required 222.3.1 Technical Skills 222.3.1.1 Python 222.3.1.2 R Programming Language 252.3.1.3 Sql 262.3.1.4 MATLAB 272.3.1.5 Scala 272.3.1.6 Excel 282.3.1.7 Tableau 282.3.1.8 Power BI 292.3.2 Soft Skills 312.3.2.1 Presentation Skills 312.3.2.2 Storytelling 322.3.2.3 Business Insights 322.3.2.4 Writing/Publishing Skills 322.3.2.5 Listening 332.3.2.6 Stop and Think 332.3.2.7 Soft Issues 332.4 Responsibilities as Database Administrator 342.4.1 Software Installation and Maintenance 342.4.2 Data Extraction, Transformation, and Loading 342.4.3 Data Handling 352.4.4 Data Security 352.4.5 Data Authentication 352.4.6 Data Backup and Recovery 352.4.7 Security and Performance Monitoring 362.4.8 Effective Use of Human Resource 362.4.9 Capacity Planning 362.4.10 Troubleshooting 362.4.11 Database Tuning 362.5 Concerns for a DBA 372.6 Data Mishandling and Its Consequences 392.6.1 Phases of Data Breaching 402.6.2 Data Breach Laws 412.6.3 Best Practices For Enterprises 412.7 The Long-Term Consequences: Loss of Trust and Diminished Reputation 422.8 Solution to the Problem 422.9 Case Studies 422.9.1 UBER Case Study 422.9.1.1 Role of Analytics and Business Intelligence in Optimization 442.9.1.2 Mapping Applications for City Ops Teams 462.9.1.3 Marketplace Forecasting 472.9.1.4 Learnings from Data 482.9.2 PepsiCo Case Study 482.9.2.1 Searching for a Single Source of Truth 492.9.2.2 Finding the Right Solution for Better Data 492.9.2.3 Enabling Powerful Results with Self-Service Analytics 502.10 Conclusion 50References 503 Data Wrangling Dynamics 53Simarjit Kaur, Anju Bala and Anupam Garg3.1 Introduction 533.2 Related Work 543.3 Challenges: Data Wrangling 553.4 Data Wrangling Architecture 563.4.1 Data Sources 573.4.2 Auxiliary Data 573.4.3 Data Extraction 583.4.4 Data Wrangling 583.4.4.1 Data Accessing 583.4.4.2 Data Structuring 583.4.4.3 Data Cleaning 583.4.4.4 Data Enriching 593.4.4.5 Data Validation 593.4.4.6 Data Publication 593.5 Data Wrangling Tools 593.5.1 Excel 593.5.2 Altair Monarch 603.5.3 Anzo 603.5.4 Tabula 613.5.5 Trifacta 613.5.6 Datameer 633.5.7 Paxata 633.5.8 Talend 653.6 Data Wrangling Application Areas 653.7 Future Directions and Conclusion 67References 684 Essentials of Data Wrangling 71Menal Dahiya, Nikita Malik and Sakshi Rana4.1 Introduction 714.2 Holistic Workflow Framework for Data Projects 724.2.1 Raw Stage 734.2.2 Refined Stage 744.2.3 Production Stage 744.3 The Actions in Holistic Workflow Framework 744.3.1 Raw Data Stage Actions 744.3.1.1 Data Ingestion 754.3.1.2 Creating Metadata 754.3.2 Refined Data Stage Actions 764.3.3 Production Data Stage Actions 774.4 Transformation Tasks Involved in Data Wrangling 784.4.1 Structuring 784.4.2 Enriching 784.4.3 Cleansing 794.5 Description of Two Types of Core Profiling 794.5.1 Individual Values Profiling 804.5.1.1 Syntactic 804.5.1.2 Semantic 804.5.2 Set-Based Profiling 804.6 Case Study 804.6.1 Importing Required Libraries 814.6.2 Changing the Order of the Columns in the Dataset 824.6.3 To Display the DataFrame (Top 10 Rows) and Verify that the Columns are in Order 824.6.4 To Display the DataFrame (Bottom 10 rows) and Verify that the Columns Are in Order 834.6.5 Generate the Statistical Summary of the DataFrame for All the Columns 834.7 Quantitative Analysis 844.7.1 Maximum Number of Fires on Any Given Day 844.7.2 Total Number of Fires for the Entire Duration for Every State 854.7.3 Summary Statistics 864.8 Graphical Representation 864.8.1 Line Graph 864.8.2 Pie Chart 864.8.3 Bar Graph 874.9 Conclusion 89References 905 Data Leakage and Data Wrangling in Machine Learning for Medical Treatment 91P.T. Jamuna Devi and B.R. Kavitha5.1 Introduction 915.2 Data Wrangling and Data Leakage 935.3 Data Wrangling Stages 945.3.1 Discovery 945.3.2 Structuring 955.3.3 Cleaning 955.3.4 Improving 955.3.5 Validating 955.3.6 Publishing 955.4 Significance of Data Wrangling 965.5 Data Wrangling Examples 965.6 Data Wrangling Tools for Python 965.7 Data Wrangling Tools and Methods 995.8 Use of Data Preprocessing 1005.9 Use of Data Wrangling 1015.10 Data Wrangling in Machine Learning 1045.11 Enhancement of Express Analytics Using Data Wrangling Process 1065.12 Conclusion 106References 1066 Importance of Data Wrangling in Industry 4.0 109Rachna Jain, Geetika Dhand, Kavita Sheoran and Nisha Aggarwal6.1 Introduction 1106.1.1 Data Wrangling Entails 1106.2 Steps in Data Wrangling 1116.2.1 Obstacles Surrounding Data Wrangling 1136.3 Data Wrangling Goals 1146.4 Tools and Techniques of Data Wrangling 1156.4.1 Basic Data Munging Tools 1156.4.2 Data Wrangling in Python 1156.4.3 Data Wrangling in R 1166.5 Ways for Effective Data Wrangling 1166.5.1 Ways to Enhance Data Wrangling Pace 1176.6 Future Directions 119References 1207 Managing Data Structure in R 123Mittal Desai and Chetan Dudhagara7.1 Introduction to Data Structure 1237.2 Homogeneous Data Structures 1257.2.1 Vector 1257.2.2 Factor 1317.2.3 Matrix 1327.2.4 Array 1367.3 Heterogeneous Data Structures 1387.3.1 List 1397.3.2 Dataframe 144References 1468 Dimension Reduction Techniques in Distributional Semantics: An Application Specific Review 147Pooja Kherwa, Jyoti Khurana, Rahul Budhraj, Sakshi Gill, Shreyansh Sharma and Sonia Rathee8.1 Introduction 1488.2 Application Based Literature Review 1508.3 Dimensionality Reduction Techniques 1588.3.1 Principal Component Analysis 1588.3.2 Linear Discriminant Analysis 1618.3.2.1 Two-Class LDA 1628.3.2.2 Three-Class LDA 1628.3.3 Kernel Principal Component Analysis 1658.3.4 Locally Linear Embedding 1698.3.5 Independent Component Analysis 1718.3.6 Isometric Mapping (Isomap) 1728.3.7 Self-Organising Maps 1738.3.8 Singular Value Decomposition 1748.3.9 Factor Analysis 1758.3.10 Auto-Encoders 1768.4 Experimental Analysis 1788.4.1 Datasets Used 1788.4.2 Techniques Used 1788.4.3 Classifiers Used 1798.4.4 Observations 1798.4.5 Results Analysis Red-Wine Quality Dataset 1798.5 Conclusion 182References 1829 Big Data Analytics in Real Time for Enterprise Applications to Produce Useful Intelligence 187Prashant Vats and Siddhartha Sankar Biswas9.1 Introduction 1889.2 The Internet of Things and Big Data Correlation 1909.3 Design, Structure, and Techniques for Big Data Technology 1919.4 Aspiration for Meaningful Analyses and Big Data Visualization Tools 1939.4.1 From Information to Guidance 1949.4.2 The Transition from Information Management to Valuation Offerings 1959.5 Big Data Applications in the Commercial Surroundings 1969.5.1 IoT and Data Science Applications in the Production Industry 1979.5.1.1 Devices that are Inter Linked 1999.5.1.2 Data Transformation 1999.5.2 Predictive Analysis for Corporate Enterprise Applications in the Industrial Sector 2049.6 Big Data Insights’ Constraints 2079.6.1 Technological Developments 2079.6.2 Representation of Data 2079.6.3 Data That Is Fragmented and Imprecise 2089.6.4 Extensibility 2089.6.5 Implementation in Real Time Scenarios 2089.7 Conclusion 209References 21010 Generative Adversarial Networks: A Comprehensive Review 213Jyoti Arora, Meena Tushir, Pooja Kherwa and Sonia RatheeList of Abbreviations 21310.1 Introductıon 21410.2 Background 21510.2.1 Supervised vs Unsupervised Learning 21510.2.2 Generative Modeling vs Discriminative Modeling 21610.3 Anatomy of a GAN 21710.4 Types of GANs 21810.4.1 Conditional GAN (CGAN) 21810.4.2 Deep Convolutional GAN (DCGAN) 22010.4.3 Wasserstein GAN (WGAN) 22110.4.4 Stack GAN 22210.4.5 Least Square GAN (LSGANs) 22210.4.6 Information Maximizing GAN (INFOGAN) 22310.5 Shortcomings of GANs 22410.6 Areas of Application 22610.6.1 Image 22610.6.2 Video 22610.6.3 Artwork 22710.6.4 Music 22710.6.5 Medicine 22710.6.6 Security 22710.7 Conclusion 228References 22811 Analysis of Machine Learning Frameworks Used in Image Processing: A Review 235Gurpreet Kaur and Kamaljit Singh Saini11.1 Introduction 23511.2 Types of ML Algorithms 23611.2.1 Supervised Learning 23611.2.2 Unsupervised Learning 23711.2.3 Reinforcement Learning 23811.3 Applications of Machine Learning Techniques 23811.3.1 Personal Assistants 23811.3.2 Predictions 23811.3.3 Social Media 24011.3.4 Fraud Detection 24011.3.5 Google Translator 24211.3.6 Product Recommendations 24211.3.7 Videos Surveillance 24311.4 Solution to a Problem Using ml 24311.4.1 Classification Algorithms 24311.4.2 Anomaly Detection Algorithm 24411.4.3 Regression Algorithm 24411.4.4 Clustering Algorithms 24511.4.5 Reinforcement Algorithms 24511.5 ml in Image Processing 24611.5.1 Frameworks and Libraries Used for ML Image Processing 24611.6 Conclusion 248References 24812 Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and Challenges 251Ram Singh, Rohit Bansal and Niranjanamurthy M.12.1 Introduction 25212.1.1 Artificial Intelligence in Accounting and Finance Sector 25212.2 Uses of AI in Accounting & Finance Sector 25412.2.1 Pay and Receive Processing 25412.2.2 Supplier on Boarding and Procurement 25512.2.3 Audits 25512.2.4 Monthly, Quarterly Cash Flows, and Expense Management 25512.2.5 AI Chatbots 25512.3 Applications of AI in Accounting and Finance Sector 25612.3.1 AI in Personal Finance 25712.3.2 AI in Consumer Finance 25712.3.3 AI in Corporate Finance 25712.4 Benefits and Advantages of AI in Accounting and Finance 25812.4.1 Changing the Human Mindset 25912.4.2 Machines Imitate the Human Brain 26012.4.3 Fighting Misrepresentation 26012.4.4 AI Machines Make Accounting Tasks Easier 26012.4.5 Invisible Accounting 26112.4.6 Build Trust through Better Financial Protection and Control 26112.4.7 Active Insights Help Drive Better Decisions 26112.4.8 Fraud Protection, Auditing, and Compliance 26212.4.9 Machines as Financial Guardians 26312.4.10 Intelligent Investments 26412.4.11 Consider the “Runaway Effect” 26412.4.12 Artificial Control and Effective Fiduciaries 26412.4.13 Accounting Automation Avenues and Investment Management 26512.5 Challenges of AI Application in Accounting and Finance 26512.5.1 Data Quality and Management 26712.5.2 Cyber and Data Privacy 26712.5.3 Legal Risks, Liability, and Culture Transformation 26712.5.4 Practical Challenges 26812.5.5 Limits of Machine Learning and AI 26912.5.6 Roles and Skills 26912.5.7 Institutional Issues 27012.6 Suggestions and Recommendation 27112.7 Conclusion and Future Scope of the Study 272References 27213 Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving Car 275B. Eshwar, Harshaditya Sheoran, Shivansh Pathak and Meena Rao13.1 Introduction 27513.1.1 Environment Overview 27713.1.1.1 Simulation Overview 27713.1.1.2 Agent Overview 27813.1.1.3 Brain Overview 27913.1.2 Algorithm Used 27913.1.2.1 Markovs Decision Process (MDP) 27913.1.2.2 Adding a Living Penalty 28013.1.2.3 Implementing a Neural Network 28013.2 Simulations and Results 28113.2.1 Self-Driving Car Simulation 28113.2.2 Real-Time Lane Detection and Obstacle Avoidance 28313.2.3 About the Model 28313.2.4 Preprocessing the Image/Frame 28513.3 Conclusion 286References 28714 Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India Limited 289Ruchika Pharswan, Ashish Negi and Tridib Basak14.1 Introduction 29014.2 Literature Review 29214.2.1 Prior Pandemic Automobile Industry/COVID- 19Thump on the Automobile Sector 29414.2.2 Maruti Suzuki India Limited (MSIL) During COVID-19 and Other Players in the Automobile Industry and How MSIL Prevailed 29614.3 Methodology 29714.4 Findings 29814.4.1 Worldwide Economic Impact of the Epidemic 29814.4.2 Effect on Global Automobile Industry 29814.4.3 Effect on Indian Automobile Industry 30114.4.4 Automobile Industry Scenario That Can Be Expected Post COVID-19 Recovery 30614.5 Discussion 30614.5.1 Competitive Dimensions 30614.5.2 MSIL Strategies 30714.5.3 MSIL Operations and Supply Chain Management 30814.5.4 MSIL Suppliers Network 30914.5.5 MSIL Manufacturing 31014.5.5 MSIL Distributors Network 31114.5.6 MSIL Logistics Management 31214.6 Conclusion 312References 312About the Editors 315Index 317
Du kanske också är intresserad av
Advanced Mathematics in Scientific Computing, Communication and Security
Dipti Jadha, Biswadip Basu Mallik, Pritam Wani, Narendrakumar Dasre, M. Niranjanamurthy, India) Mallik, Biswadip Basu (Institute of Engineering & Management, Kolkata, India) Niranjanamurthy, M. (M S Ramaiah Institute of Technology
3 279 kr
Artificial Intelligence and Machine Learning in Data Science and Analytics
M. Niranjanamurthy, Valentina Emilia Balas, Manoj Kumar, India) Niranjanamurthy, M. (M S Ramaiah Institute of Technology, Romania) Balas, Valentina Emilia (University of Arad, India) Kumar, Manoj (Central University of Haryana
3 249 kr
Mathematics and Computer Science for Real-World Applications, Volume 4
Biswadip Basu Mallik, M. Niranjanamurthy, Sharmistha Ghosh, Krishanu Deyasi, Santanu Das, India) Mallik, Biswadip Basu (Institute of Engineering & Management, Kolkata, India) Niranjanamurthy, M. (M S Ramaiah Institute of Technology, India) Ghosh, Sharmistha (Institute of Engineering and Management, India) Deyasi, Krishanu (Institute of Engineering and Management, India) Das, Santanu (Institute of Engineering and Management
3 559 kr
Advances in Data Science and Analytics
M. Niranjanamurthy, Hemant Kumar Gianey, Amir H. Gandomi, India) Niranjanamurthy, M. (M. S. Ramaiah Institute of Technology, India) Gianey, Hemant Kumar (Vellore Institute of Technology, Australia) Gandomi, Amir H. (University of Technology Sydney, Amir H Gandomi
2 899 kr