Advances in Data Science
Symbolic, Complex, and Network Data
Inbunden, Engelska, 2020
Av Edwin Diday, Rong Guan, Gilbert Saporta, Huiwen Wang, France) Diday, Edwin (Universite de Paris IX - Dauphine, China) Guan, Rong (Central University of Finance and Economics, France) Saporta, Gilbert (Conservatoire National des Arts et Metiers, China) Wang, Huiwen (Beihang University
2 259 kr
Produktinformation
- Utgivningsdatum2020-02-14
- Mått163 x 239 x 20 mm
- Vikt499 g
- FormatInbunden
- SpråkEngelska
- Antal sidor258
- FörlagISTE Ltd and John Wiley & Sons Inc
- ISBN9781786305763
Tillhör följande kategorier
Edwin Diday is Emeritus Professor at Paris-Dauphine University-PSL. He helped to introduce the symbolic data analysis paradigm and the dynamic clustering method (opening the path to local models), as well as pyramidal clustering for spatial representation of overlapping clusters.Rong Guan is Associate Professor at the School of Statistics and Mathematics, Central University of Finance and Economics, Beijing. Her research covers complex and symbolic data analysis and financial distress diagnosis.Gilbert Saporta is Emeritus Professor at Conservatoire National des Arts et Métiers, France. His current research focuses on functional data analysis and clusterwise and sparse methods. He is Honorary President of the French Statistical Society.Huiwen Wang is Professor at the School of Economics and Management, Beihang University, Beijing. Her research covers dimension reduction, PLS regression, symbolic data analysis, compositional data analysis, functional data analysis and statistical modeling methods for mixed data.
- Preface xiPart 1. Symbolic Data 1Chapter 1. Explanatory Tools for Machine Learning in the Symbolic Data Analysis Framework 3Edwin DIDAY1.1. Introduction 41.2. Introduction to Symbolic Data Analysis 61.2.1. What are complex data? 61.2.2. What are “classes” and “class of complex data”? 71.2.3. Which kind of class variability? 71.2.4. What are “symbolic variables” and “symbolic data tables”? 71.2.5. Symbolic Data Analysis (SDA) 91.3. Symbolic data tables from Dynamic Clustering Method and EM 101.3.1. The “dynamical clustering method” (DCM) 101.3.2. Examples of DCM applications 101.3.3. Clustering methods by mixture decomposition 121.3.4. Symbolic data tables from clustering 131.3.5. A general way to compare results of clustering methods by the “explanatory power” of their associated symbolic data table 151.3.6. Quality criteria of classes and variables based on the cells of the symbolic data table containing intervals or inferred distributions 151.4. Criteria for ranking individuals, classes and their bar chart descriptive symbolic variables 161.4.1. A theoretical framework for SDA 161.4.2. Characterization of a category and a class by a measure of discordance 181.4.3. Link between a characterization by the criteria W and the standard Tf-Idf 191.4.4. Ranking the individuals, the symbolic variables and the classes of a bar chart symbolic data table 211.5. Two directions of research 231.5.1. Parametrization of concordance and discordance criteria 231.5.2. Improving the explanatory power of any machine learning tool by a filtering process 251.6. Conclusion 271.7. References 28Chapter 2. Likelihood in the Symbolic Context 31Richard EMILION and Edwin DIDAY2.1. Introduction 312.2. Probabilistic setting 322.2.1. Description variable and class variable 322.2.2. Conditional distributions 332.2.3. Symbolic variables 332.2.4. Examples 352.2.5. Probability measures on (ℂ, C), likelihood 372.3. Parametric models for p = 1 382.3.1. LDA model 382.3.2. BLS method 412.3.3. Interval-valued variables 422.3.4. Probability vectors and histogram-valued variables 422.4. Nonparametric estimation for p = 1 452.4.1. Multihistograms and multivariate polygons 452.4.2. Dirichlet kernel mixtures 452.4.3. Dirichlet Process Mixture (DPM) 452.5. Density models for p ≥ 2 462.6. Conclusion 462.7. References 47Chapter 3. Dimension Reduction and Visualization of Symbolic Interval-Valued Data Using Sliced Inverse Regression 49Han-Ming WU, Chiun-How KAO and Chun-houh CHEN3.1. Introduction 493.2. PCA for interval-valued data and the sliced inverse regression 513.2.1. PCA for interval-valued data 513.2.2. Classic SIR 523.3. SIR for interval-valued data 533.3.1. Quantification approaches 543.3.2. Distributional approaches 563.4. Projections and visualization in DR subspace 583.4.1. Linear combinations of intervals 583.4.2. The graphical representation of the projected intervals in the 2D DR subspace 593.5. Some computational issues 613.5.1. Standardization of interval-valued data 613.5.2. The slicing schemes for iSIR 623.5.3. The evaluation of DR components 623.6. Simulation studies 633.6.1. Scenario 1: aggregated data 633.6.2. Scenario 2: data based on interval arithmetic 633.6.3. Results 643.7. A real data example: face recognition data 653.8. Conclusion and discussion 733.9. References 74Chapter 4. On the “Complexity” of Social Reality. Some Reflections About the Use of Symbolic Data Analysis in Social Sciences 79Frédéric LEBARON4.1. Introduction 794.2. Social sciences facing “complexity” 804.2.1. The total social fact, a designation of “complexity” in social sciences 804.2.2. Two families of answers 804.2.3. The contemporary deepening of the two approaches, “reductionist” and “encompassing” 814.2.4. Issues of scale and heterogeneity 824.3. Symbolic data analysis in the social sciences: an example 834.3.1. Symbolic data analysis 834.3.2. An exploratory case study on European data 834.3.3. A sociological interpretation 944.4. Conclusion 954.5. References 96Part 2. Complex Data 99Chapter 5. A Spatial Dependence Measure and Prediction of Georeferenced Data Streams Summarized by Histograms 101Rosanna VERDE and Antonio BALZANELLA5.1. Introduction 1015.2. Processing setup 1035.3. Main definitions 1045.4. Online summarization of a data stream through CluStream for Histogram data 1065.5. Spatial dependence monitoring: a variogram for histogram data 1075.6. Ordinary kriging for histogram data 1105.7. Experimental results on real data 1125.8. Conclusion 1165.9. References 116Chapter 6. Incremental Calculation Framework for Complex Data 119Huiwen WANG, Yuan WEI and Siyang WANG6.1. Introduction 1196.2. Basic data 1226.2.1. The basic data space 1226.2.2. Sample covariance matrix 1236.3. Incremental calculation of complex data 1246.3.1. Transformation of complex data 1246.3.2. Online decomposition of covariance matrix 1256.3.3. Adopted algorithms 1286.4. Simulation studies 1316.4.1. Functional linear regression 1316.4.2. Compositional PCA 1336.5. Conclusion 1356.6. Acknowledgment 1356.7. References 135Part 3. Network Data 139Chapter 7. Recommender Systems and Attributed Networks 141Françoise FOGELMAN-SOULIÉ, Lanxiang MEI, Jianyu ZHANG, Yiming LI, Wen GE, Yinglan LI and Qiaofei YE7.1. Introduction 1417.2. Recommender systems 1427.2.1. Data used 1437.2.2. Model-based collaborative filtering 1457.2.3. Neighborhood-based collaborative filtering 1457.2.4. Hybrid models 1487.3. Social networks 1507.3.1. Non-independence 1507.3.2. Definition of a social network 1507.3.3. Properties of social networks 1517.3.4. Bipartite networks 1527.3.5. Multilayer networks 1537.4. Using social networks for recommendation 1547.4.1. Social filtering 1547.4.2. Extension to use attributes 1557.4.3. Remarks 1567.5. Experiments 1567.5.1. Performance evaluation 1567.5.2. Datasets 1577.5.3. Analysis of one-mode projected networks 1587.5.4. Models evaluated 1607.5.5. Results 1607.6. Perspectives 1637.7. References 163Chapter 8. Attributed Networks Partitioning Based on Modularity Optimization 169David COMBE, Christine LARGERON, Baptiste JEUDY, Françoise FOGELMAN-SOULIÉ and Jing WANG8.1. Introduction 1698.2. Related work 1718.3. Inertia based modularity 1728.4. I-Louvain 1748.5. Incremental computation of the modularity gain 1768.6. Evaluation of I-Louvain method 1798.6.1. Performance of I-Louvain on artificial datasets 1798.6.2. Run-time of I-Louvain 1808.7. Conclusion 1818.8. References 182Part 4. Clustering 187Chapter 9. A Novel Clustering Method with Automatic Weighting of Tables and Variables 189Rodrigo C. DE ARAÚJO, Francisco DE ASSIS TENORIO DE CARVALHO and Yves LECHEVALLIER9.1. Introduction 1899.2. Related Work 1909.3. Definitions, notations and objective 1919.3.1. Choice of distances 1929.3.2. Criterion W measures the homogeneity of the partition P on the set of tables 1939.3.3. Optimization of the criterion W 1959.4. Hard clustering with automated weighting of tables and variables 1969.4.1. Clustering algorithms MND–W and MND–WT 1969.5. Applications: UCI data sets 2019.5.1. Application I: Iris plant 2019.5.2. Application II: multi-features dataset 2049.6. Conclusion 2069.7. References 206Chapter 10. Clustering and Generalized ANOVA for Symbolic Data Constructed from Open Data 209Simona KORENJAK-ČERNE, Nataša KEJAR and Vladimir BATAGELJ10.1. Introduction 20910.2. Data description based on discrete (membership) distributions 21010.3. Clustering 21210.3.1. TIMSS – study of teaching approaches 21510.3.2. Clustering countries based on age–sex distributions of their populations 21710.4. Generalized ANOVA 22110.5. Conclusion 22510.6. References 226List of Authors 229Index 233