Big Data Analytics for Large-Scale Multimedia Search
Inbunden, Engelska, 2019
Av Stefanos Vrochidis, Stefanos Vrochidis, Benoit Huet, Edward Y. Chang, Ioannis Kompatsiaris, Edward Y Chang
1 869 kr
Produktinformation
- Utgivningsdatum2019-04-19
- Mått163 x 244 x 25 mm
- Vikt839 g
- FormatInbunden
- SpråkEngelska
- Antal sidor376
- FörlagJohn Wiley & Sons Inc
- ISBN9781119376972
Tillhör följande kategorier
Stefanos Vrochidis is a Senior Researcher with the Information Technologies Institute (CERTH-ITI) in Greece. His research interests include multimedia retrieval, semantic multimedia analysis, multimodal big data analytics, web data mining, multimodal interaction and security applications. Benoit Huet is Assistant Professor in the Data Science Department of EURECOM, France. His current research interests include large scale multimedia content analysis, mining and indexing, multimodal fusion, and affective and socially-aware multimedia. Edward Y. Chang has acted as the President of AI Research and Healthcare at HTC since 2012. Prior to his current post, he was a director of research at Google from 2006 to 2012, and a professor at the University of California, Santa Barbara, from 1999 to 2006. He is an IEEE Fellow for his contribution to scalable machine learning. Ioannis Kompatsiaris is a Senior Researcher with the Information Technologies Institute (CERTH-ITI) in Greece, leading the Multimedia, Knowledge and Social Media Analytics Lab. His research interests include large-scale multimedia and social media analysis, knowledge structures and reasoning, eHealth, security and environmental applications.
- Introduction xvList of Contributors xixAbout the Companion Website xxiiiPart I Feature Extraction from Big Multimedia Data 11 Representation Learning on Large and Small Data 3Chun-Nan Chou, Chuen-Kai Shie, Fu-Chieh Chang, Jocelyn Chang and Edward Y. Chang1.1 Introduction 31.2 Representative Deep CNNs 51.2.1 AlexNet 61.2.1.1 ReLU Nonlinearity 61.2.1.2 Data Augmentation 71.2.1.3 Dropout 81.2.2 Network in Network 81.2.2.1 MLP Convolutional Layer 91.2.2.2 Global Average Pooling 91.2.3 VGG 101.2.3.1 Very Small Convolutional Filters 101.2.3.2 Multi-scale Training 111.2.4 GoogLeNet 111.2.4.1 Inception Modules 111.2.4.2 Dimension Reduction 121.2.5 ResNet 131.2.5.1 Residual Learning 131.2.5.2 Identity Mapping by Shortcuts 141.2.6 Observations and Remarks 151.3 Transfer Representation Learning 151.3.1 Method Specifications 171.3.2 Experimental Results and Discussion 181.3.2.1 Results of Transfer Representation Learning for OM 191.3.2.2 Results of Transfer Representation Learning for Melanoma 201.3.2.3 Qualitative Evaluation: Visualization 211.3.3 Observations and Remarks 231.4 Conclusions 24References 252 Concept-Based and Event-Based Video Search in Large Video Collections 31Foteini Markatopoulou, Damianos Galanopoulos, Christos Tzelepis, Vasileios Mezaris and Ioannis Patras2.1 Introduction 322.2 Video preprocessing and Machine Learning Essentials 332.2.1 Video Representation 332.2.2 Dimensionality Reduction 342.3 Methodology for Concept Detection and Concept-Based Video Search 352.3.1 Related Work 352.3.2 Cascades for Combining Different Video Representations 372.3.2.1 Problem Definition and Search Space 372.3.2.2 Problem Solution 382.3.3 Multi-Task Learning for Concept Detection and Concept-Based Video Search 402.3.4 Exploiting Label Relations 412.3.5 Experimental Study 422.3.5.1 Dataset and Experimental Setup 422.3.5.2 Experimental Results 432.3.5.3 Computational Complexity 472.4 Methods for Event Detection and Event-Based Video Search 482.4.1 Related Work 482.4.2 Learning from Positive Examples 492.4.3 Learning Solely from Textual Descriptors: Zero-Example Learning 502.4.4 Experimental Study 522.4.4.1 Dataset and Experimental Setup 522.4.4.2 Experimental Results: Learning from Positive Examples 532.4.4.3 Experimental Results: Zero-Example Learning 532.5 Conclusions 542.6 Acknowledgments 55References 553 Big Data Multimedia Mining: Feature Extraction Facing Volume, Velocity, and Variety 61Vedhas Pandit, Shahin Amiriparian, Maximilian Schmitt, Amr Mousa and Björn Schuller3.1 Introduction 613.2 Scalability through Parallelization 643.2.1 Process Parallelization 643.2.2 Data Parallelization 643.3 Scalability through Feature Engineering 653.3.1 Feature Reduction through Spatial Transformations 663.3.2 Laplacian Matrix Representation 663.3.3 Parallel latent Dirichlet allocation and bag of words 683.4 Deep Learning-Based Feature Learning 683.4.1 Adaptability that Conquers both Volume and Velocity 703.4.2 Convolutional Neural Networks 723.4.3 Recurrent Neural Networks 733.4.4 Modular Approach to Scalability 743.5 Benchmark Studies 763.5.1 Dataset 763.5.2 Spectrogram Creation 773.5.3 CNN-Based Feature Extraction 773.5.4 Structure of the CNNs 783.5.5 Process Parallelization 793.5.6 Results 803.6 Closing Remarks 813.7 Acknowledgements 82References 82Part II Learning Algorithms for Large-Scale Multimedia 894 Large-Scale Video Understanding with Limited Training Labels 91Jingkuan Song, Xu Zhao, Lianli Gao and Liangliang Cao4.1 Introduction 914.2 Video Retrieval with Hashing 914.2.1 Overview 914.2.2 Unsupervised Multiple Feature Hashing 934.2.2.1 Framework 934.2.2.2 The Objective Function of MFH 934.2.2.3 Solution of MFH 954.2.2.3.1 Complexity Analysis 964.2.3 Submodular Video Hashing 974.2.3.1 Framework 974.2.3.2 Video Pooling 974.2.3.3 Submodular Video Hashing 984.2.4 Experiments 994.2.4.1 Experiment Settings 994.2.4.1.1 Video Datasets 994.2.4.1.2 Visual Features 994.2.4.1.3 Algorithms for Comparison 1004.2.4.2 Results 1004.2.4.2.1 CC_WEB_VIDEO 1004.2.4.2.2 Combined Dataset 1004.2.4.3 Evaluation of SVH 1014.2.4.3.1 Results 1024.3 Graph-Based Model for Video Understanding 1034.3.1 Overview 1034.3.2 Optimized Graph Learning for Video Annotation 1044.3.2.1 Framework 1044.3.2.2 OGL 1044.3.2.2.1 Terms and Notations 1044.3.2.2.2 Optimal Graph-Based SSL 1054.3.2.2.3 Iterative Optimization 1064.3.3 Context Association Model for Action Recognition 1074.3.3.1 Context Memory 1084.3.4 Graph-based Event Video Summarization 1094.3.4.1 Framework 1094.3.4.2 Temporal Alignment 1104.3.5 TGIF: A New Dataset and Benchmark on Animated GIF Description 1114.3.5.1 Data Collection 1114.3.5.2 Data Annotation 1124.3.6 Experiments 1144.3.6.1 Experimental Settings 1144.3.6.1.1 Datasets 1144.3.6.1.2 Features 1144.3.6.1.3 Baseline Methods and Evaluation Metrics 1144.3.6.2 Results 1154.4 Conclusions and Future Work 116References 1165 Multimodal Fusion of Big Multimedia Data 121Ilias Gialampoukidis, Elisavet Chatzilari, Spiros Nikolopoulos, Stefanos Vrochidis and Ioannis Kompatsiaris5.1 Multimodal Fusion in Multimedia Retrieval 1225.1.1 Unsupervised Fusion in Multimedia Retrieval 1235.1.1.1 Linear and Non-linear Similarity Fusion 1235.1.1.2 Cross-modal Fusion of Similarities 1245.1.1.3 Random Walks and Graph-based Fusion 1245.1.1.4 A Unifying Graph-based Model 1265.1.2 Partial Least Squares Regression 1275.1.3 Experimental Comparison 1285.1.3.1 Dataset Description 1285.1.3.2 Settings 1295.1.3.3 Results 1295.1.4 Late Fusion of Multiple Multimedia Rankings 1305.1.4.1 Score Fusion 1315.1.4.2 Rank Fusion 1325.1.4.2.1 Borda Count Fusion 1325.1.4.2.2 Reciprocal Rank Fusion 1325.1.4.2.3 Condorcet Fusion 1325.2 Multimodal Fusion in Multimedia Classification 1325.2.1 Related Literature 1345.2.2 Problem Formulation 1365.2.3 Probabilistic Fusion in Active Learning 1375.2.3.1 If P(S=0|V,T)≠0: 1385.2.3.2 If P(S=0|V,T)≠0: 1385.2.3.3 Incorporating Informativeness in the Selection (P(S|V)) 1395.2.3.4 Measuring Oracle’s Confidence (P(S|T)) 1395.2.3.5 Re-training 1405.2.4 Experimental Comparison 1415.2.4.1 Datasets 1415.2.4.2 Settings 1425.2.4.3 Results 1435.2.4.3.1 Expanding with Positive, Negative or Both 1435.2.4.3.2 Comparing with Sample Selection Approaches 1455.2.4.3.3 Comparing with Fusion Approaches 1475.2.4.3.4 Parameter Sensitivity Investigation 1475.2.4.3.5 Comparing with Existing Methods 1485.3 Conclusions 151References 1526 Large-Scale Social Multimedia Analysis 157Benjamin Bischke, Damian Borth and Andreas Dengel6.1 Social Multimedia in Social Media Streams 1576.1.1 Social Multimedia 1576.1.2 Social Multimedia Streams 1586.1.3 Analysis of the Twitter Firehose 1606.1.3.1 Dataset: Overview 1606.1.3.2 Linked Resource Analysis 1606.1.3.3 Image Content Analysis 1626.1.3.4 Geographic Analysis 1646.1.3.5 Textual Analysis 1666.2 Large-Scale Analysis of Social Multimedia 1676.2.1 Large-Scale Processing of Social Multimedia Analysis 1676.2.1.1 Batch-Processing Frameworks 1676.2.1.2 Stream-Processing Frameworks 1686.2.1.3 Distributed Processing Frameworks 1686.2.2 Analysis of Social Multimedia 1696.2.2.1 Analysis of Visual Content 1696.2.2.2 Analysis of Textual Content 1696.2.2.3 Analysis of Geographical Content 1706.2.2.4 Analysis of User Content 1706.3 Large-Scale Multimedia Opinion Mining System 1706.3.1 System Overview 1716.3.2 Implementation Details 1716.3.2.1 Social Media Data Crawler 1716.3.2.2 Social Multimedia Analysis 1736.3.2.3 Analysis of Visual Content 1746.3.3 Evaluations: Analysis of Visual Content 1756.3.3.1 Filtering of Synthetic Images 1756.3.3.2 Near-Duplicate Detection 1776.4 Conclusion 178References 1797 Privacy and Audiovisual Content: Protecting Users as Big Multimedia Data Grows Bigger 183Martha Larson, Jaeyoung Choi, Manel Slokom, Zekeriya Erkin, Gerald Friedland and Arjen P. de Vries7.1 Introduction 1837.1.1 The Dark Side of Big Multimedia Data 1847.1.2 Defining Multimedia Privacy 1847.2 Protecting User Privacy 1887.2.1 What to Protect 1887.2.2 How to Protect 1897.2.3 Threat Models 1917.3 Multimedia Privacy 1927.3.1 Privacy and Multimedia Big Data 1927.3.2 Privacy Threats of Multimedia Data 1947.3.2.1 Audio Data 1947.3.2.2 Visual Data 1957.3.2.3 Multimodal Threats 1957.4 Privacy-Related Multimedia Analysis Research 1967.4.1 Multimedia Analysis Filters 1967.4.2 Multimedia Content Masking 1987.5 The Larger Research Picture 1997.5.1 Multimedia Security and Trust 1997.5.2 Data Privacy 2007.6 Outlook on Multimedia Privacy Challenges 2027.6.1 Research Challenges 2027.6.1.1 Multimedia Analysis 2027.6.1.2 Data 2027.6.1.3 Users 2037.6.2 Research Reorientation 2047.6.2.1 Professional Paranoia 2047.6.2.2 Privacy as a Priority 2047.6.2.3 Privacy in Parallel 205References 205Part III Scalability in Multimedia Access 2098 Data Storage and Management for Big Multimedia 211Björn Þór Jónsson, Gylfi Þór Gudmundsson, Laurent Amsaleg and Philippe Bonnet8.1 Introduction 2118.1.1 Multimedia Applications and Scale 2128.1.2 Big Data Management 2138.1.3 System Architecture Outline 2138.1.4 Metadata Storage Architecture 2148.1.4.1 Lambda Architecture 2148.1.4.2 Storage Layer 2158.1.4.3 Processing Layer 2168.1.4.4 Serving Layer 2168.1.4.5 Dynamic Data 2168.1.5 Summary and Chapter Outline 2178.2 Media Storage 2178.2.1 Storage Hierarchy 2178.2.1.1 Secondary Storage 2188.2.1.2 The Five-Minute Rule 2188.2.1.3 Emerging Trends for Local Storage 2198.2.2 Distributed Storage 2208.2.2.1 Distributed Hash Tables 2218.2.2.2 The CAP Theorem and the PACELC Formulation 2218.2.2.3 The Hadoop Distributed File System 2218.2.2.4 Ceph 2228.2.3 Discussion 2228.3 Processing Media 2228.3.1 Metadata Extraction 2238.3.2 Batch Processing 2238.3.2.1 Map-Reduce and Hadoop 2248.3.2.2 Spark 2258.3.2.3 Comparison 2268.3.3 Stream Processing 2268.4 Multimedia Delivery 2268.4.1 Distributed In-Memory Buffering 2278.4.1.1 Memcached and Redis 2278.4.1.2 Alluxio 2278.4.1.3 Content Distribution Networks 2288.4.2 Metadata Retrieval and NoSQL Systems 2288.4.2.1 Key-Value Stores 2298.4.2.2 Document Stores 2298.4.2.3 Wide Column Stores 2298.4.2.4 Graph Stores 2298.4.3 Discussion 2298.5 Case Studies: Facebook 2308.5.1 Data Popularity: Hot, Warm or Cold 2308.5.2 Mentions Live 2318.6 Conclusions and Future Work 2318.6.1 Acknowledgments 232References 2329 Perceptual Hashing for Large-Scale Multimedia Search 239LiWeng, I-Hong Jhuo and Wen-Huang Cheng9.1 Introduction 2409.1.1 Related work 2409.1.2 Definitions and Properties of Perceptual Hashing 2419.1.3 Multimedia Search using Perceptual Hashing 2439.1.4 Applications of Perceptual Hashing 2439.1.5 Evaluating Perceptual Hash Algorithms 2449.2 Unsupervised Perceptual Hash Algorithms 2459.2.1 Spectral Hashing 2459.2.2 Iterative Quantization 2469.2.3 K-Means Hashing 2479.2.4 Kernelized Locality Sensitive Hashing 2499.3 Supervised Perceptual Hash Algorithms 2509.3.1 Semi-Supervised Hashing 2509.3.2 Kernel-Based Supervised Hashing 2529.3.3 Restricted Boltzmann Machine-Based Hashing 2539.3.4 Supervised Semantic-Preserving Deep Hashing 2559.4 Constructing Perceptual Hash Algorithms 2579.4.1 Two-Step Hashing 2579.4.2 Hash Bit Selection 2589.5 Conclusion and Discussion 260References 261Part IV Applications of Large-Scale Multimedia Search 26710 Image Tagging with Deep Learning: Fine-Grained Visual Analysis 269Jianlong Fu and Tao Mei10.1 Introduction 26910.2 Basic Deep Learning Models 27010.3 Deep Image Tagging for Fine-Grained Image Recognition 27210.3.1 Attention Proposal Network 27410.3.2 Classification and Ranking 27510.3.3 Multi-Scale Joint Representation 27610.3.4 Implementation Details 27610.3.5 Experiments on CUB-200-2011 27710.3.6 Experiments on Stanford Dogs 28010.4 Deep Image Tagging for Fine-Grained Sentiment Analysis 28110.4.1 Learning Deep Sentiment Representation 28210.4.2 Sentiment Analysis 28310.4.3 Experiments on SentiBank 28310.5 Conclusion 284References 28511 Visually Exploring Millions of Images using Image Maps and Graphs 289Kai Uwe Barthel and Nico Hezel11.1 Introduction and Related Work 29011.2 Algorithms for Image Sorting 29311.2.1 Self-Organizing Maps 29311.2.2 Self-Sorting Maps 29411.2.3 Evolutionary Algorithms 29511.3 Improving SOMs for Image Sorting 29511.3.1 Reducing SOM Sorting Complexity 29511.3.2 Improving SOM Projection Quality 29711.3.3 Combining SOMs and SSMs 29711.4 Quality Evaluation of Image Sorting Algorithms 29811.4.1 Analysis of SOMs 29811.4.2 Normalized Cross-Correlation 29911.4.3 A New Image Sorting Quality Evaluation Scheme 29911.5 2D Sorting Results 30111.5.1 Image Test Sets 30111.5.2 Experiments 30211.6 Demo System for Navigating 2D Image Maps 30411.7 Graph-Based Image Browsing 30611.7.1 Generating Semantic Image Features 30611.7.2 Building the Image Graph 30711.7.3 Visualizing and Navigating the Graph 31011.7.4 Prototype for Image Graph Navigation 31211.8 Conclusion and Future Work 313References 31312 Medical Decision Support Using Increasingly Large Multimodal Data Sets 317Henning Müller and Devrim Ünay12.1 Introduction 31712.2 Methodology for Reviewing the Literature in this chapter 32012.3 Data, Ground Truth, and Scientific Challenges 32112.3.1 Data Annotation and Ground Truthing 32112.3.2 Scientific Challenges and Evaluation as a Service 32112.3.3 Other Medical Data Resources Available 32212.4 Techniques used for Multimodal Medical Decision Support 32312.4.1 Visual and Non-Visual Features Describing the Image Content 32312.4.2 General Machine Learning and Deep Learning 32312.5 Application Types of Image-Based Decision Support 32612.5.1 Localization 32612.5.2 Segmentation 32612.5.3 Classification 32712.5.4 Prediction 32712.5.5 Retrieval 32712.5.6 Automatic Image Annotation 32812.5.7 Other Application Types 32812.6 Discussion on Multimodal Medical Decision Support 32812.7 Outlook or the Next Steps of Multimodal Medical Decision Support 329References 330Conclusions and Future Trends 337Index 339