Data Science Programming All-in-One For Dummies
Häftad, Engelska, 2020
Av John Paul Mueller, Luca Massaron, John Paul (Indiana University of Pennyslvania) Mueller
399 kr
Beställningsvara. Skickas inom 7-10 vardagar
Fri frakt för medlemmar vid köp för minst 249 kr.Your logical, linear guide to the fundamentals of data science programmingData science is exploding—in a good way—with a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models.Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time. Get grounded: the ideal start for new data professionalsWhat lies ahead: learn about specific areas that data is transforming Be meaningful: find out how to tell your data storySee clearly: pick up the art of visualizationWhether you’re a beginning student or already mid-career, get your copy now and add even more meaning to your life—and everyone else’s!
Produktinformation
- Utgivningsdatum2020-02-10
- Mått188 x 234 x 51 mm
- Vikt998 g
- FormatHäftad
- SpråkEngelska
- SerieFor Dummies
- Antal sidor768
- FörlagJohn Wiley & Sons Inc
- ISBN9781119626114
Tillhör följande kategorier
John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.
- Introduction 1About This Book 1Foolish Assumptions 3Icons Used in This Book 4Beyond the Book 4Where to Go from Here 5Book 1: Defining Data Science 7Chapter 1: Considering the History and Uses of Data Science 9Considering the Elements of Data Science 10Considering the emergence of data science 10Outlining the core competencies of a data scientist 11Linking data science, big data, and AI 12Understanding the role of programming 12Defining the Role of Data in the World 13Enticing people to buy products 13Keeping people safer 14Creating new technologies 15Performing analysis for research 16Providing art and entertainment 17Making life more interesting in other ways 18Creating the Data Science Pipeline 18Preparing the data 18Performing exploratory data analysis 18Learning from data 19Visualizing 19Obtaining insights and data products 19Comparing Different Languages Used for Data Science 20Obtaining an overview of data science languages 20Defining the pros and cons of using Python 22Defining the pros and cons of using R 23Learning to Perform Data Science Tasks Fast 25Loading data 26Training a model 26Viewing a result 26Chapter 2: Placing Data Science within the Realm of AI 29Seeing the Data to Data Science Relationship 30Considering the data architecture 30Acquiring data from various sources 31Performing data analysis 32Archiving the data 33Defining the Levels of AI 33Beginning with AI 34Advancing to machine learning 39Getting detailed with deep learning 43Creating a Pipeline from Data to AI 47Considering the desired output 47Defining a data architecture 47Combining various data sources 47Checking for errors and fixing them 48Performing the analysis 48Validating the result 49Enhancing application performance 49Chapter 3: Creating a Data Science Lab of Your Own 51Considering the Analysis Platform Options 52Using a desktop system 53Working with an online IDE 53Considering the need for a GPU 54Choosing a Development Language 56Obtaining and Using Python 58Working with Python in this book 58Obtaining and installing Anaconda for Python 59Defining a Python code repository 64Working with Python using Google Colaboratory 69Defining the limits of using Azure Notebooks with Python and R 71Obtaining and Using R 72Obtaining and installing Anaconda for R 72Starting the R environment 73Defining an R code repository 75Presenting Frameworks 76Defining the differences 76Explaining the popularity of frameworks 77Choosing a particular library 79Accessing the Downloadable Code 80Chapter 4: Considering Additional Packages and Libraries You Might Want 81Considering the Uses for Third-Party Code 82Obtaining Useful Python Packages 83Accessing scientific tools using SciPy 84Performing fundamental scientific computing using NumPy 85Performing data analysis using pandas 85Implementing machine learning using Scikit-learn 86Going for deep learning with Keras and TensorFlow 86Plotting the data using matplotlib 87Creating graphs with NetworkX 88Parsing HTML documents using Beautiful Soup 88Locating Useful R Libraries 89Using your Python code in R with reticulate 89Conducting advanced training using caret 90Performing machine learning tasks using mlr 90Visualizing data using ggplot2 91Enhancing ggplot2 using esquisse 91Creating graphs with igraph 91Parsing HTML documents using rvest 92Wrangling dates using lubridate 92Making big data simpler using dplyr and purrr 93Chapter 5: Leveraging a Deep Learning Framework 95Understanding Deep Learning Framework Usage 96Working with Low-End Frameworks 97Chainer 97PyTorch 98MXNet 98Microsoft Cognitive Toolkit/CNTK 99Understanding TensorFlow 100Grasping why TensorFlow is so good 101Making TensorFlow easier by using TFLearn 102Using Keras as the best simplifier 102Getting your copy of TensorFlow and Keras 103Fixing the C++ build tools error in Windows 106Accessing your new environment in Notebook 108Book 2: Interacting with Data Storage 109Chapter 1: Manipulating Raw Data 111Defining the Data Sources 112Obtaining data locally 112Using online data sources 117Employing dynamic data sources 121Considering other kinds of data sources 123Considering the Data Forms 124Working with pure text 124Accessing formatted text 125Deciphering binary data 126Understanding the Need for Data Reliability 128Chapter 2: Using Functional Programming Techniques 131Defining Functional Programming 132Differences with other programming paradigms 132Understanding its goals 133Understanding Pure and Impure Languages 134Using the pure approach 134Using the impure approach 134Comparing the Functional Paradigm 135Imperative 135Procedural 136Object-oriented 136Declarative 136Using Python for Functional Programming Needs 137Understanding How Functional Data Works 138Working with immutable data 139Considering the role of state 139Eliminating side effects 140Passing by reference versus by value 140Working with Lists and Strings 142Creating lists 144Evaluating lists 144Performing common list manipulations 146Understanding the Dict and Set alternatives 147Considering the use of strings 148Employing Pattern Matching 150Looking for patterns in data 150Understanding regular expressions 152Using pattern matching in analysis 155Working with pattern matching 156Working with Recursion 159Performing tasks more than once 159Understanding recursion 161Using recursion on lists 162Considering advanced recursive tasks 163Passing functions instead of variables 164Performing Functional Data Manipulation 165Slicing and dicing 166Mapping your data 167Filtering data 168Organizing data 169Chapter 3: Working with Scalars, Vectors, and Matrices 171Considering the Data Forms 172Defining Data Type through Scalars 173Creating Organized Data with Vectors 174Defining a vector 175Creating vectors of a specific type 175Performing math on vectors 176Performing logical and comparison tasks on vectors 176Multiplying vectors 177Creating and Using Matrices 178Creating a matrix 178Creating matrices of a specific type 179Using the matrix class 181Performing matrix multiplication 181Executing advanced matrix operations 183Extending Analysis to Tensors 185Using Vectorization Effectively 186Selecting and Shaping Data 187Slicing rows 188Slicing columns 188Dicing 189Concatenating 189Aggregating 194Working with Trees 195Understanding the basics of trees 195Building a tree 196Representing Relations in a Graph 198Going beyond trees 198Arranging graphs 199Chapter 4: Accessing Data in Files 201Understanding Flat File Data Sources 202Working with Positional Data Files 203Accessing Data in CSV Files 205Working with a simple CSV file 205Making use of header information 208Moving On to XML Files 209Working with a simple XML file 209Parsing XML 211Using XPath for data extraction 212Considering Other Flat-File Data Sources 214Working with Nontext Data 215Downloading Online Datasets 218Working with package datasets 218Using public domain datasets 219Chapter 5: Working with a Relational DBMS 223Considering RDBMS Issues 224Defining the use of tables 225Understanding keys and indexes 226Using local versus online databases 227Working in read-only mode 228Accessing the RDBMS Data 228Using the SQL language 229Relying on scripts 231Relying on views 231Relying on functions 232Creating a Dataset 233Combining data from multiple tables 233Ensuring data completeness 234Slicing and dicing the data as needed 234Mixing RDBMS Products 234Chapter 6: Working with a NoSQL DMBS 237Considering the Ramifications of Hierarchical Data 238Understanding hierarchical organization 238Developing strategies for freeform data 239Performing an analysis 240Working around dangling data 241Accessing the Data 243Creating a picture of the data form 243Employing the correct transiting strategy 244Ordering the data 247Interacting with Data from NoSQL Databases 248Working with Dictionaries 249Developing Datasets from Hierarchical Data 250Processing Hierarchical Data into Other Forms 251Book 3: Manipulating Data Using Basic Algorithms 253Chapter 1: Working with Linear Regression 255Considering the History of Linear Regression 256Combining Variables 257Working through simple linear regression 257Advancing to multiple linear regression 260Considering which question to ask 262Reducing independent variable complexity 263Manipulating Categorical Variables 265Creating categorical variables 266Renaming levels 267Combining levels 268Using Linear Regression to Guess Numbers 269Defining the family of linear models 270Using more variables in a larger dataset 271Understanding variable transformations 274Doing variable transformations 275Creating interactions between variables 277Understanding limitations and problems 282Learning One Example at a Time 283Using Gradient Descent 283Implementing Stochastic Gradient Descent 283Considering the effects of regularization 287Chapter 2: Moving Forward with Logistic Regression 289Considering the History of Logistic Regression 290Differentiating between Linear and Logistic Regression 291Considering the model 291Defining the logistic function 292Understanding the problems that logistic regression solves 294Fitting the curve 295Considering a pass/fail example 296Using Logistic Regression to Guess Classes 297Applying logistic regression 297Considering when classes are more 298Defining logistic regression performance 300Switching to Probabilities 301Specifying a binary response 301Transforming numeric estimates into probabilities 302Working through Multiclass Regression 305Understanding multiclass regression 305Developing a multiclass regression implementation 306Chapter 3: Predicting Outcomes Using Bayes 309Understanding Bayes’ Theorem 310Delving into Bayes history 310Considering the basic theorem 312Using Naïve Bayes for Predictions 313Finding out that Naïve Bayes isn’t so naïve 314Predicting text classifications 315Getting an overview of Bayesian inference 318Working with Networked Bayes 324Considering the network types and uses 324Understanding Directed Acyclic Graphs (DAGs) 327Employing networked Bayes in predictions 328Deciding between automated and guided learning 332Considering the Use of Bayesian Linear Regression 332Considering the Use of Bayesian Logistic Regression 333Chapter 4: Learning with K-Nearest Neighbors 335Considering the History of K-Nearest Neighbors 336Learning Lazily with K-Nearest Neighbors 337Understanding the basis of KNN 337Predicting after observing neighbors 338Choosing the k parameter wisely 341Leveraging the Correct k Parameter 342Understanding the k parameter 342Experimenting with a flexible algorithm 343Implementing KNN Regression 345Implementing KNN Classification 347Book 4: Performing Advanced Data Manipulation 351Chapter 1: Leveraging Ensembles of Learners 353Leveraging Decision Trees 354Growing a forest of trees 356Seeing Random Forests in action 358Understanding the importance measures 360Configuring your system for importance measures with Python 361Seeing importance measures in action 361Working with Almost Random Guesses 364Understanding the premise 365Bagging predictors with AdaBoost 366Meeting Again with Gradient Descent 369Understanding the GBM difference 369Seeing GBM in action 371Averaging Different Predictors 372Chapter 2: Building Deep Learning Models 373Discovering the Incredible Perceptron 374Understanding perceptron functionality 375Touching the nonseparability limit 376Hitting Complexity with Neural Networks 378Considering the neuron 379Pushing data with feed-forward 381Defining hidden layers 383Executing operations 384Considering the details of data movement through the neural network 386Using backpropagation to adjust learning 387Understanding More about Neural Networks 390Getting an overview of the neural network process 391Defining the basic architecture 391Documenting the essential modules 393Solving a simple problem 396Looking Under the Hood of Neural Networks 399Choosing the right activation function 399Relying on a smart optimizer 401Setting a working learning rate 402Explaining Deep Learning Differences with Other Forms of AI 402Adding more layers 403Changing the activations 405Adding regularization by dropout 406Using online learning 407Transferring learning 407Learning end to end 408Chapter 3: Recognizing Images with CNNs 409Beginning with Simple Image Recognition 410Considering the ramifications of sight 410Working with a set of images 411Extracting visual features 417Recognizing faces using Eigenfaces 419Classifying images 423Understanding CNN Image Basics 427Moving to CNNs with Character Recognition 429Accessing the dataset 430Reshaping the dataset 431Encoding the categories 432Defining the model 432Using the model 433Explaining How Convolutions Work 435Understanding convolutions 435Simplifying the use of pooling 439Describing the LeNet architecture 440Detecting Edges and Shapes from Images 446Visualizing convolutions 447Unveiling successful architectures 449Discussing transfer learning 450Chapter 4: Processing Text and Other Sequences 453Introducing Natural Language Processing 454Defining the human perspective as it relates to data science 454Considering the computer perspective as it relates to data science 455Understanding How Machines Read 456Creating a corpus 457Performing feature extraction 457Understanding the BoW 458Processing and enhancing text 459Maintaining order using n-grams 461Stemming and removing stop words 462Scraping textual datasets from the web 465Handling problems with raw text 470Storing processed text data in sparse matrices 473Understanding Semantics Using Word Embeddings 478Using Scoring and Classification 482Performing classification tasks 482Analyzing reviews from e-commerce 485Book 5: Performing Data-Related Tasks 491Chapter 1: Making Recommendations 493Realizing the Recommendation Revolution 494Downloading Rating Data 495Navigating through anonymous web data 496Encountering the limits of rating data 499Leveraging SVD 506Considering the origins of SVD 506Understanding the SVD connection 508Chapter 2: Performing Complex Classifications 509Using Image Classification Challenges 510Delving into ImageNet and Coco 511Learning the magic of data augmentation 513Distinguishing Traffic Signs 516Preparing the image data 517Running a classification task 520Chapter 3: Identifying Objects 525Distinguishing Classification Tasks 526Understanding the problem 526Performing localization 527Classifying multiple objects 528Annotating multiple objects in images 529Segmenting images 530Perceiving Objects in Their Surroundings 531Considering vision needs in self-driving cars 531Discovering how RetinaNet works 532Using the Keras-RetinaNet code 534Overcoming Adversarial Attacks on Deep Learning Applications 538Tricking pixels 539Hacking with stickers and other artifacts 541Chapter 4: Analyzing Music and Video 543Learning to Imitate Art and Life 544Transferring an artistic style 545Reducing the problem to statistics 546Understanding that deep learning doesn’t create 548Mimicking an Artist 548Defining a new piece based on a single artist 549Combining styles to create new art 550Visualizing how neural networks dream 551Using a network to compose music 551Other creative avenues 552Moving toward GANs 553Finding the key in the competition 554Considering a growing field 556Chapter 5: Considering Other Task Types 559Processing Language in Texts 560Considering the processing methodologies 560Defining understanding as tokenization 561Putting all the documents into a bag 562Using AI for sentiment analysis 566Processing Time Series 574Defining sequences of events 574Performing a prediction using LSTM 575Chapter 6: Developing Impressive Charts and Plots 579Starting a Graph, Chart, or Plot 580Understanding the differences between graphs, charts, and plots 580Considering the graph, chart, and plot types 582Defining the plot 583Drawing multiple lines 584Drawing multiple plots 584Saving your work 586Setting the Axis, Ticks, and Grids 587Getting the axis 587Formatting the ticks 590Adding grids 590Defining the Line Appearance 591Working with line styles 592Adding markers 593Using Labels, Annotations, and Legends 594Adding labels 595Annotating the chart 596Creating a legend 598Creating Scatterplots 599Depicting groups 599Showing correlations 600Plotting Time Series 603Representing time on axes 604Plotting trends over time 605Plotting Geographical Data 608Getting the toolkit 608Drawing the map 609Plotting the data 613Visualizing Graphs 615Understanding the adjacency matrix 615Using NetworkX basics 615Book 6: Diagnosing and Fixing Errors 619Chapter 1: Locating Errors in Your Data 621Considering the Types of Data Errors 622Obtaining the Required Data 624Considering the data sources 624Obtaining reliable data 625Making human input more reliable 626Using automated data collection 628Validating Your Data 629Figuring out what’s in your data 629Removing duplicates 631Creating a data map and a data plan 632Manicuring the Data 634Dealing with missing data 634Considering data misalignments 639Separating out useful data 640Dealing with Dates in Your Data 640Formatting date and time values 641Using the right time transformation 641Chapter 2: Considering Outrageous Outcomes 643Deciding What Outrageous Means 644Considering the Five Mistruths in Data 645Commission 645Omission 646Perspective 646Bias 647Frame-of-reference 648Considering Detection of Outliers 649Understanding outlier basics 649Finding more things that can go wrong 651Understanding anomalies and novel data 651Examining a Simple Univariate Method 653Using the pandas package 653Leveraging the Gaussian distribution 655Making assumptions and checking out 656Developing a Multivariate Approach 657Using principle component analysis 658Using cluster analysis 659Automating outliers detection with Isolation Forests 661Chapter 3: Dealing with Model Overfitting and Underfitting 663Understanding the Causes 664Considering the problem 664Looking at underfitting 665Looking at overfitting 666Plotting learning curves for insights 668Determining the Sources of Overfitting and Underfitting 670Understanding bias and variance 671Having insufficient data 671Being fooled by data leakage 672Guessing the Right Features 672Selecting variables like a pro 673Using nonlinear transformations 676Regularizing linear models 684Chapter 4: Obtaining the Correct Output Presentation 689Considering the Meaning of Correct 690Determining a Presentation Type 691Considering the audience 691Defining a depth of detail 692Ensuring that the data is consistent with audience needs 693Understanding timeliness 693Choosing the Right Graph 694Telling a story with your graphs 694Showing parts of a whole with pie charts 694Creating comparisons with bar charts 695Showing distributions using histograms 697Depicting groups using boxplots 699Defining a data flow using line graphs 700Seeing data patterns using scatterplots 701Working with External Data 702Embedding plots and other images 703Loading examples from online sites 703Obtaining online graphics and multimedia 704Chapter 5: Developing Consistent Strategies 707Standardizing Data Collection Techniques 707Using Reliable Sources 709Verifying Dynamic Data Sources 711Considering the problem 712Analyzing streams with the right recipe 714Looking for New Data Collection Trends 715Weeding Old Data 716Considering the Need for Randomness 717Considering why randomization is needed 718Understanding how probability works 718Index 721