Course in Statistics with R
Inbunden, Engelska, 2016
Av Prabhanjan N. Tattar, Suresh Ramaiah, B. G. Manjunath, Prabhanjan N Tattar, B G Manjunath
1 319 kr
Produktinformation
- Utgivningsdatum2016-05-06
- Mått183 x 252 x 46 mm
- Vikt1 293 g
- FormatInbunden
- SpråkEngelska
- Antal sidor696
- FörlagJohn Wiley & Sons Inc
- ISBN9781119152729
Tillhör följande kategorier
Prabhanjan Tattar , Business Analysis Senior Advisor at Dell International Services, Bangalore, India. Professor Tattar is a statistician providing analytical solutions to business problems inclusive of statistical models and machine learning as appropriate.Suresh Ramaiah, Assistant Professor of Statistics at Dharwad University, Dharwad, India.B G Manjunath, Business Analysis Advisor at Dell International Services, Bangalore, India
- List of Figures xviiList of Tables xxiPreface xxiiiAcknowledgments xxvPart I THE PRELIMINARIES1 WhyR? 31.1 Why R? 31.2 R Installation 51.3 There is Nothing such as PRACTICALS 51.4 Datasets in R and Internet 61.4.1 List of Web-sites containing DATASETS 71.4.2 Antique Datasets 81.5 http://cran.r-project.org 91.5.1 http://r-project.org 101.5.2 http://www.cran.r-project.org/web/views/ 101.5.3 Is subscribing to R-Mailing List useful? 101.6 R and its Interface with other Software 111.7 help and/or? 111.8 R Books 121.9 A Road Map 132 The R Basics 152.1 Introduction 152.2 Simple Arithmetics and a Little Beyond 162.2.1 Absolute Values, Remainders, etc. 162.2.2 round, floor, etc. 172.2.3 Summary Functions 182.2.4 Trigonometric Functions 182.2.5 Complex Numbers 192.2.6 Special Mathematical Functions 212.3 Some Basic R Functions 222.3.1 Summary Statistics 232.3.2 is, as, is.na, etc. 252.3.3 factors, levels, etc. 262.3.4 Control Programming 272.3.5 Other Useful Functions 292.3.6 Calculus* 312.4 Vectors and Matrices in R 332.4.1 Vectors 332.4.2 Matrices 362.5 Data Entering and Reading from Files 412.5.1 Data Entering 412.5.2 Reading Data from External Files 432.6 Working with Packages 442.7 R Session Management 452.8 Further Reading 462.9 Complements, Problems, and Programs 463 Data Preparation and Other Tricks 493.1 Introduction 493.2 Manipulation with Complex Format Files 503.3 Reading Datasets of Foreign Formats 553.4 Displaying R Objects 563.5 Manipulation Using R Functions 573.6 Working with Time and Date 593.7 Text Manipulations 623.8 Scripts and Text Editors for R 643.8.1 Text Editors for Linuxians 643.9 Further Reading 653.10 Complements, Problems, and Programs 654 Exploratory Data Analysis 674.1 Introduction: The Tukey’s School of Statistics 674.2 Essential Summaries of EDA 684.3 Graphical Techniques in EDA 714.3.1 Boxplot 714.3.2 Histogram 764.3.3 Histogram Extensions and the Rootogram 794.3.4 Pareto Chart 814.3.5 Stem-and-Leaf Plot 844.3.6 Run Chart 884.3.7 Scatter Plot 894.4 Quantitative Techniques in EDA 914.4.1 Trimean 914.4.2 Letter Values 924.5 Exploratory Regression Models 954.5.1 Resistant Line 954.5.2 Median Polish 984.6 Further Reading 994.7 Complements, Problems, and Programs 100Part II PROBABILITY AND INFERENCE5 Probability Theory 1055.1 Introduction 1055.2 Sample Space, Set Algebra, and Elementary Probability 1065.3 Counting Methods 1135.3.1 Sampling: The Diverse Ways 1145.3.2 The Binomial Coefficients and the Pascals Triangle 1185.3.3 Some Problems Based on Combinatorics 1195.4 Probability: A Definition 1225.4.1 The Prerequisites 1225.4.2 The Kolmogorov Definition 1275.5 Conditional Probability and Independence 1305.6 Bayes Formula 1325.7 Random Variables, Expectations, and Moments 1335.7.1 The Definition 1335.7.2 Expectation of Random Variables 1365.8 Distribution Function, Characteristic Function, and Moment Generation Function 1435.9 Inequalities 1455.9.1 The Markov Inequality 1455.9.2 The Jensen’s Inequality 1455.9.3 The Chebyshev Inequality 1465.10 Convergence of Random Variables 1465.10.1 Convergence in Distributions 1475.10.2 Convergence in Probability 1505.10.3 Convergence in rth Mean 1505.10.4 Almost Sure Convergence 1515.11 The Law of Large Numbers 1525.11.1 The Weak Law of Large Numbers 1525.12 The Central Limit Theorem 1535.12.1 The de Moivre-Laplace Central Limit Theorem 1535.12.2 CLT for iid Case 1545.12.3 The Lindeberg-Feller CLT 1575.12.4 The Liapounov CLT 1625.13 Further Reading 1655.13.1 Intuitive, Elementary, and First Course Source 1655.13.2 The Classics and Second Course Source 1665.13.3 The Problem Books 1675.13.4 Other Useful Sources 1675.13.5 R for Probability 1675.14 Complements, Problems, and Programs 1676 Probability and Sampling Distributions 1716.1 Introduction 1716.2 Discrete Univariate Distributions 1726.2.1 The Discrete Uniform Distribution 1726.2.2 The Binomial Distribution 1736.2.3 The Geometric Distribution 1766.2.4 The Negative Binomial Distribution 1786.2.5 Poisson Distribution 1796.2.6 The Hypergeometric Distribution 1826.3 Continuous Univariate Distributions 1846.3.1 The Uniform Distribution 1846.3.2 The Beta Distribution 1866.3.3 The Exponential Distribution 1876.3.4 The Gamma Distribution 1886.3.5 The Normal Distribution 1896.3.6 The Cauchy Distribution 1916.3.7 The t-Distribution 1936.3.8 The Chi-square Distribution 1936.3.9 The F-Distribution 1946.4 Multivariate Probability Distributions 1946.4.1 The Multinomial Distribution 1946.4.2 Dirichlet Distribution 1956.4.3 The Multivariate Normal Distribution 1956.4.4 The Multivariate t Distribution 1966.5 Populations and Samples 1966.6 Sampling from the Normal Distributions 1976.7 Some Finer Aspects of Sampling Distributions 2016.7.1 Sampling Distribution of Median 2016.7.2 Sampling Distribution of Mean of Standard Distributions 2016.8 Multivariate Sampling Distributions 2036.8.1 Noncentral Univariate Chi-square, t, and F Distributions 2036.8.2 Wishart Distribution 2056.8.3 Hotellings T2 Distribution 2066.9 Bayesian Sampling Distributions 2066.10 Further Reading 2076.11 Complements, Problems, and Programs 2087 Parametric Inference 2097.1 Introduction 2097.2 Families of Distribution 2107.2.1 The Exponential Family 2127.2.2 Pitman Family 2137.3 Loss Functions 2147.4 Data Reduction 2167.4.1 Sufficiency 2177.4.2 Minimal Sufficiency 2197.5 Likelihood and Information 2207.5.1 The Likelihood Principle 2207.5.2 The Fisher Information 2267.6 Point Estimation 2317.6.1 Maximum Likelihood Estimation 2317.6.2 Method of Moments Estimator 2397.7 Comparison of Estimators 2417.7.1 Unbiased Estimators 2417.7.2 Improving Unbiased Estimators 2437.8 Confidence Intervals 2457.9 Testing Statistical Hypotheses–The Preliminaries 2467.10 The Neyman-Pearson Lemma 2517.11 Uniformly Most Powerful Tests 2567.12 Uniformly Most Powerful Unbiased Tests 2607.12.1 Tests for the Means: One- and Two-Sample t-Test 2637.13 Likelihood Ratio Tests 2657.13.1 Normal Distribution: One-Sample Problems 2667.13.2 Normal Distribution: Two-Sample Problem for the Mean 2697.14 Behrens-Fisher Problem 2707.15 Multiple Comparison Tests 2717.15.1 Bonferroni’s Method 2727.15.2 Holm’s Method 2737.16 The EM Algorithm* 2747.16.1 Introduction 2747.16.2 The Algorithm 2747.16.3 Introductory Applications 2757.17 Further Reading 2807.17.1 Early Classics 2807.17.2 Texts from the Last 30 Years 2817.18 Complements, Problems, and Programs 2818 Nonparametric Inference 2838.1 Introduction 2838.2 Empirical Distribution Function and Its Applications 2838.2.1 Statistical Functionals 2858.3 The Jackknife and Bootstrap Methods 2888.3.1 The Jackknife 2888.3.2 The Bootstrap 2898.3.3 Bootstrapping Simple Linear Model* 2928.4 Non-parametric Smoothing 2948.4.1 Histogram Smoothing 2948.4.2 Kernel Smoothing 2978.4.3 Nonparametric Regression Models* 3008.5 Non-parametric Tests 3048.5.1 The Wilcoxon Signed-Ranks Test 3058.5.2 The Mann-Whitney test 3088.5.3 The Siegel-Tukey Test 3098.5.4 The Wald-Wolfowitz Run Test 3118.5.5 The Kolmogorov-Smirnov Test 3128.5.6 Kruskal-Wallis Test* 3148.6 Further Reading 3158.7 Complements, Problems, and Programs 3169 Bayesian Inference 3179.1 Introduction 3179.2 Bayesian Probabilities 3179.3 The Bayesian Paradigm for Statistical Inference 3219.3.1 Bayesian Sufficiency and the Principle 3219.3.2 Bayesian Analysis and Likelihood Principle 3229.3.3 Informative and Conjugate Prior 3229.3.4 Non-informative Prior 3239.4 Bayesian Estimation 3239.4.1 Inference for Binomial Distribution 3239.4.2 Inference for the Poisson Distribution 3269.4.3 Inference for Uniform Distribution 3279.4.4 Inference for Exponential Distribution 3289.4.5 Inference for Normal Distributions 3299.5 The Credible Intervals 3329.6 Bayes Factors for Testing Problems 3339.7 Further Reading 3349.8 Complements, Problems, and Programs 335Part III STOCHASTIC PROCESSES AND MONTE CARLO10 Stochastic Processes 33910.1 Introduction 33910.2 Kolmogorov’s Consistency Theorem 34010.3 Markov Chains 34110.3.1 The m-Step TPM 34410.3.2 Classification of States 34510.3.3 Canonical Decomposition of an Absorbing Markov Chain 34710.3.4 Stationary Distribution and Mean First Passage Time of an Ergodic Markov Chain 35010.3.5 Time Reversible Markov Chain 35210.4 Application of Markov Chains in Computational Statistics 35210.4.1 The Metropolis-Hastings Algorithm 35310.4.2 Gibbs Sampler 35410.4.3 Illustrative Examples 35510.5 Further Reading 36110.6 Complements, Problems, and Programs 36111 Monte Carlo Computations 36311.1 Introduction 36311.2 Generating the (Pseudo-) Random Numbers 36411.2.1 Useful Random Generators 36411.2.2 Probability Through Simulation 36611.3 Simulation from Probability Distributions and Some Limit Theorems 37311.3.1 Simulation from Discrete Distributions 37311.3.2 Simulation from Continuous Distributions 38011.3.3 Understanding Limit Theorems through Simulation 38311.3.4 Understanding The Central Limit Theorem 38611.4 Monte Carlo Integration 38811.5 The Accept-Reject Technique 39011.6 Application to Bayesian Inference 39411.7 Further Reading 39711.8 Complements, Problems, and Programs 397Part IV LINEAR MODELS12 Linear Regression Models 40112.1 Introduction 40112.2 Simple Linear Regression Model 40212.2.1 Fitting a Linear Model 40312.2.2 Confidence Intervals 40512.2.3 The Analysis of Variance (ANOVA) 40712.2.4 The Coefficient of Determination 40912.2.5 The “lm” Function from R 41012.2.6 Residuals for Validation of the Model Assumptions 41212.2.7 Prediction for the Simple Regression Model 41612.2.8 Regression through the Origin 41712.3 The Anscombe Warnings and Regression Abuse 41812.4 Multiple Linear Regression Model 42112.4.1 Scatter Plots: A First Look 42212.4.2 Other Useful Graphical Methods 42312.4.3 Fitting a Multiple Linear Regression Model 42712.4.4 Testing Hypotheses and Confidence Intervals 42912.5 Model Diagnostics for the Multiple Regression Model 43312.5.1 Residuals 43312.5.2 Influence and Leverage Diagnostics 43612.6 Multicollinearity 44112.6.1 Variance Inflation Factor 44212.6.2 Eigen System Analysis 44312.7 Data Transformations 44512.7.1 Linearization 44512.7.2 Variance Stabilization 44712.7.3 Power Transformation 44912.8 Model Selection 45112.8.1 Backward Elimination 45312.8.2 Forward and Stepwise Selection 45612.9 Further Reading 45812.9.1 Early Classics 45812.9.2 Industrial Applications 45812.9.3 Regression Details 45812.9.4 Modern Regression Texts 45812.9.5 R for Regression 45812.10 Complements, Problems, and Programs 45813 Experimental Designs 46113.1 Introduction 46113.2 Principles of Experimental Design 46113.3 Completely Randomized Designs 46213.3.1 The CRD Model 46213.3.2 Randomization in CRD 46313.3.3 Inference for the CRD Models 46513.3.4 Validation of Model Assumptions 47013.3.5 Contrasts and Multiple Testing for the CRD Model 47213.4 Block Designs 47713.4.1 Randomization and Analysis of Balanced Block Designs 47713.4.2 Incomplete Block Designs 48113.4.3 Latin Square Design 48413.4.4 Graeco Latin Square Design 48713.5 Factorial Designs 49013.5.1 Two Factorial Experiment 49113.5.2 Three-Factorial Experiment 49613.5.3 Blocking in Factorial Experiments 50213.6 Further Reading 50413.7 Complements, Problems, and Programs 50414 Multivariate Statistical Analysis - I 50714.1 Introduction 50714.2 Graphical Plots for Multivariate Data 50714.3 Definitions, Notations, and Summary Statistics for Multivariate Data 51114.3.1 Definitions and Data Visualization 51114.3.2 Early Outlier Detection 51714.4 Testing for Mean Vectors : One Sample 52014.4.1 Testing for Mean Vector with Known Variance-Covariance Matrix 52014.4.2 Testing for Mean Vectors with Unknown Variance-Covariance Matrix 52114.5 Testing for Mean Vectors : Two-Samples 52314.6 Multivariate Analysis of Variance 52614.6.1 Wilks Test Statistic 52614.6.2 Roy’s Test 52814.6.3 Pillai’s Test Statistic 52914.6.4 The Lawley-Hotelling Test Statistic 52914.7 Testing for Variance-Covariance Matrix: One Sample 53114.7.1 Testing for Sphericity 53214.8 Testing for Variance-Covariance Matrix: k-Samples 53314.9 Testing for Independence of Sub-vectors 53614.10 Further Reading 53814.11 Complements, Problems, and Programs 53815 Multivariate Statistical Analysis - II 54115.1 Introduction 54115.2 Classification and Discriminant Analysis 54115.2.1 Discrimination Analysis 54215.2.2 Classification 54315.3 Canonical Correlations 54415.4 Principal Component Analysis – Theory and Illustration 54715.4.1 The Theory 54715.4.2 Illustration Through a Dataset 54915.5 Applications of Principal Component Analysis 55315.5.1 PCA for Linear Regression 55315.5.2 Biplots 55615.6 Factor Analysis 56015.6.1 The Orthogonal Factor Analysis Model 56115.6.2 Estimation of Loadings and Communalities 56215.7 Further Reading 56815.7.1 The Classics and Applied Perspectives 56815.7.2 Multivariate Analysis and Software 56815.8 Complements, Problems, and Programs 56916 Categorical Data Analysis 57116.1 Introduction 57116.2 Graphical Methods for CDA 57216.2.1 Bar and Stacked Bar Plots 57216.2.2 Spine Plots 57516.2.3 Mosaic Plots 57716.2.4 Pie Charts and Dot Charts 58016.2.5 Four-Fold Plots 58316.3 The Odds Ratio 58616.4 The Simpson’s Paradox 58816.5 The Binomial, Multinomial, and Poisson Models 58916.5.1 The Binomial Model 58916.5.2 The Multinomial Model 59016.5.3 The Poisson Model 59116.6 The Problem of Overdispersion 59316.7 The ;;2- Tests of Independence 59316.8 Further Reading 59516.9 Complements, Problems, and Programs 59517 Generalized Linear Models 59717.1 Introduction 59717.2 Regression Problems in Count/Discrete Data 59717.3 Exponential Family and the GLM 60017.4 The Logistic Regression Model 60117.5 Inference for the Logistic Regression Model 60217.5.1 Estimation of the Regression Coefficients and Related Parameters 60217.5.2 Estimation of the Variance-Covariance Matrix of ;;̂ 60617.5.3 Confidence Intervals and Hypotheses Testing for the Regression Coefficients 60717.5.4 Residuals for the Logistic Regression Model 60817.5.5 Deviance Test and Hosmer-Lemeshow Goodness-of-Fit Test 61117.6 Model Selection in Logistic Regression Models 61317.7 Probit Regression 61817.8 Poisson Regression Model 62117.9 Further Reading 62517.10 Complements, Problems, and Programs 626Appendix A Open Source Software–An Epilogue 627Appendix B The Statistical Tables 631Bibliography 633Author Index 643Subject Index 649R Codes 659
"Integrates the theory and applications of statistics using R the book has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject." (Zentralblatt MATH 2016)