Chemoinformatics

Basic Concepts and Methods

Häftad, Engelska, 2018

AvThomas Engel,Thomas Engel,Johann Gasteiger

1 149 kr

Skickas torsdag 19/2. Fri frakt för medlemmar vid köp för minst 249 kr.

This essential guide to the knowledge and tools in the field includes everything from the basic concepts to modern methods, while also forming a bridge to bioinformatics.The textbook offers a very clear and didactical structure, starting from the basics and the theory, before going on to provide an overview of the methods. Learning is now even easier thanks to exercises at the end of each section or chapter. Software tools are explained in detail, so that the students not only learn the necessary theoretical background, but also how to use the different software packages available. The wide range of applications is presented in the corresponding book Applied Chemoinformatics - Achievements and Future Opportunities (ISBN 9783527342013). For Master and PhD students in chemistry, biochemistry and computer science, as well as providing an excellent introduction for other newcomers to the field.

Produktinformation

Utgivningsdatum2018-08-01
Mått170 x 244 x 31 mm
Vikt1 111 g
FormatHäftad
SpråkEngelska
Antal sidor608
FörlagWiley-VCH Verlag GmbH
ISBN9783527331093

Tillhör följande kategorier

Biovetenskap inom Naturvetenskap och teknik

Johann Gasteiger is Professor emeritus of Chemistry at the University of Erlangen-Nuremberg, Germany and the co-founder of "Computer-Chemie-Centrum". He has received numerous awards and is a member of several societies and editorial boards. His research interests are in the development of software for drug design, simulation of chemical reactions, organic synthesis design, simulation of spectra, and chemical information processing by neural networks and genetic algorithms. Thomas Engel is is coordinator at the Department of Chemistry and Biochemistry of the Ludwig-Maximilians-Universität in Munich, Germany. He received his academic degrees at the University of Würzburg. Since 2001 he is lecturer at various universities promoting and establishing courses in scientific computing. He is also a member of the Chemistry-Information-Computer section (CIC) of the GDCh and the Molecular Graphics and Modeling Society (German section).

Foreword xxiList of Contributors xxv1 Introduction 1Thomas Engel and Johann Gasteiger1.1 The Rationale for the Books 11.2 The Objectives of Chemoinformatics 21.3 Learning in Chemoinformatics 41.4 Outline of the Book 51.5 The Scope of the Book 71.6 Teaching Chemoinformatics 8References 82 Principles of Molecular Representations 9Thomas Engel2.1 Introduction 92.2 Chemical Nomenclature 112.2.1 Non-systematic Nomenclature (Trivial Names) 112.2.2 Systematic Nomenclature of Chemical Compounds 122.2.3 Drawbacks of Chemical Nomenclature for Data Processing 122.3 Chemical Notations 122.3.1 Empirical Formulas of Inorganic and Organic Compounds 122.3.2 Line Notations 142.4 Mathematical Notations 142.4.1 Introduction into Graph Theory 152.4.2 Matrix Representations 182.4.2.1 Adjacency Matrix 182.4.2.2 Incidence Matrix 192.4.2.3 Distance Matrix 202.4.2.4 Bond Matrix 212.4.2.5 Bond–Electron Matrix 212.4.2.6 Summary on Matrix Representations 232.4.3 Connection Table 232.5 Speciﬁc Types of Chemical Structures 252.5.1 General Concepts of Isomerism 252.5.2 Tautomerism 262.5.3 Markush Structures 272.5.4 Beyond a Connection Table Representation 282.5.4.1 Representation of Molecular Structures by Electron Systems 282.6 Spatial Representation of Structures 312.6.1 Representation of Conﬁgurational Isomers 322.6.2 Chirality 332.6.3 3D Coordinate Systems 362.7 Molecular Surfaces 37Selected Reading 38References 3933 Computer Processing of Chemical Structure Information 43Thomas Engel3.1 Introduction 433.2 Standard File Formats for Chemical Structure Information 443.2.1 SMILES 443.2.1.1 Stereochemistry in SMILES 473.2.1.2 Summary on SMILES 473.2.2 SMARTS 473.2.3 SYBYL Line Notation 483.2.4 The International Chemical Identiﬁer (InChI) and InChIKey 483.2.5 XYZ Format 503.2.6 Z-Matrix 513.2.7 The Molﬁle Format Family 523.2.7.1 Structure of a Molﬁle 533.2.7.2 Stereochemistry in the Molﬁle 573.2.7.3 Structure of an SDﬁle 573.2.8 The PDB File Format 583.2.8.1 Introduction/History 583.2.8.2 General Description 583.2.8.3 Analysis of a Sample PDB File 603.2.9 Metadata Formats 653.2.9.1 STAR-Based File Formats and Dictionaries 653.2.9.2 CIF File Format 663.2.9.3 mmCIF File Format 673.2.9.4 CML 683.2.9.5 CSRML 683.2.10 Libraries for Handling Information in Structure File Formats 693.3 Input and Output of Chemical Structures 703.3.1 Molecule Editors 723.3.2 Molecule Viewers 733.4 Processing Constitutional Information 733.4.1 Structure Isomers and Isomorphism 733.4.2 Tautomerism 743.4.3 Unambiguous and Biunique Representation by Canonicalization 763.4.3.1 The Morgan Algorithm 773.4.4 Ring Perception 793.4.4.1 Introduction 793.4.4.2 Graph Terminology 803.4.4.3 Ring Perception Strategies 813.5 Processing 3D Structure Information 863.5.1 Detection and Speciﬁcation of Chirality 863.5.1.1 Detection of Chirality 873.5.1.2 Speciﬁcation of Chirality 873.5.2 Automatic Generation of 3D Structures 903.5.3 Automatic Generation of Ensemble of Conformations 943.6 Visualization of Molecular Models 1003.6.1 Introduction 1003.6.2 Models of the 3D Structure 1013.6.2.1 Wire Frame and Capped Sticks Model 1013.6.2.2 Ball-and-Stick Model 1013.6.2.3 Space-Filling Model 1023.6.2.4 Crystallographic Models 1023.6.3 Models of Biological Macromolecules 1023.6.4 Virtual Reality 1033.6.5 3D Printing 1033.7 Calculation of Molecular Surfaces 1033.7.1 Van der Waals Surface 1043.7.2 Connolly Surface 1043.7.3 Solvent-Accessible Surface 1053.7.4 Enzyme Cavity Surface (Union Surface) 1063.7.5 Isovalue-Based Electron Density Surface 1063.7.6 Experimentally Determined Surfaces 1063.7.7 Visualization of Molecular Surface Properties 1073.7.8 Property-based Isosurfaces 1073.7.8.1 Electrostatic Potentials 1083.7.8.2 Hydrogen Bonding Potential 1083.7.8.3 Polarizability and Hydrophobicity Potential 1083.7.8.4 Spin Density 1083.7.8.5 Vector Fields 1083.7.8.6 Volumetric Properties 1083.8 Chemoinformatic Toolkits and Workﬂow Environments 109Selected Reading 111References 1114 Representation of Chemical Reactions 121Oliver Sacher and Johann Gasteiger4.1 Introduction 1214.2 Reaction Equation 1224.3 Reaction Types 1234.4 Reaction Center and Reaction Mechanisms 1254.5 Chemical Reactivity 1264.5.1 Physicochemical Eﬀects 1264.5.1.1 Charge Distribution 1264.5.1.2 Inductive Eﬀect 1274.5.1.3 Resonance Eﬀect 1274.5.1.4 Polarizability Eﬀect 1284.5.1.5 Steric Eﬀect 1284.5.1.6 Stereoelectronic Eﬀects 1284.5.2 Simple Methods for Quantifying Chemical Reactivity 1284.5.2.1 Frontier Molecular Orbital Theory 1284.5.2.2 Linear Free Energy Relationships 1304.6 Learning from Reaction Information 1324.7 Building of Reaction Databases 1334.7.1 Contents 1334.7.2 Reaction Data Exchange Formats 1344.7.2.1 RXN/RDF format by MDL/Symyx 1344.7.2.2 Reaction SMILES/SMIRKS by Daylight Chemical Information Systems 1344.7.2.3 Chemical Markup Language 1354.7.2.4 International Chemical Identiﬁer for Reactions (RinChI) 1354.7.3 Input and Output of Reactions 1354.8 Reaction Center Perception 1384.9 Reaction Classiﬁcation 1394.9.1 Model-Driven Approaches 1394.9.1.1 Ugi’s Scheme and Some Follow-Ups 1404.9.1.2 InfoChem’s Reaction Classiﬁcation 1434.9.2 Data-Driven Approaches 1454.9.2.1 HORACE 1454.9.2.2 Reaction Landscapes 1464.10 Stereochemistry of Reactions 1484.11 Reaction Networks 149Selected Reading 151References 1525 The Data 1555.1 Introduction 1555.2 Data Types 1565.2.1 Numerical Data 1575.2.2 Molecular Structures 1595.2.3 Bit Vectors 1605.2.3.1 Hash Codes 1605.2.3.2 Structural Keys 1625.2.3.3 Fingerprints 1635.2.4 Chemical Reactions 1645.2.5 Molecular Spectra 1655.3 Storage and Manipulation of Data 1695.3.1 Experimental Data 1695.3.1.1 Types of Data on Properties 1705.3.1.2 Accuracy of the Data 1705.3.2 Data Storage and Exchange 1715.3.2.1 DAT File 1715.3.2.2 JCAMP-DX 1715.3.2.3 Predictive Model Markup Language (PMML) 1725.3.3 Real-World Data 1735.3.3.1 Data Complexity 1735.3.3.2 Outliers and Redundant Objects 1745.3.4 Data Transformation 1755.3.4.1 Fast Fourier Transformation 1755.3.4.2 Wavelet Transformation 1755.3.5 Preparation of Datasets for Building of Models and Validations of Their Quality 1765.4 Conclusions 177Selected Reading 178References 1796 Databases and Data Sources in Chemistry 185Engelbert Zass and Thomas Engel6.1 Introduction 1856.2 Chemical Literature and Databases 1866.2.1 Classiﬁcation of Chemical Literature 1866.2.2 The Origin of Chemical Databases 1876.2.3 Evolution of Database Systems and User Interfaces 1876.3 Major Chemical Database Systems 1886.3.1 SciFinder 1886.3.2 Reaxys 1896.3.3 SciFinder versus Reaxys 1906.4 Compound Databases 1916.4.1 2D Structures 1916.4.1.1 Searching Organic Compounds 1926.4.1.2 Searching Inorganic and Coordination Compounds 1946.4.2 Sequences of Biopolymers 1956.4.3 3D Structures 1986.4.4 Catalog Databases 2006.5 Databases with Properties of Compounds 2006.5.1 Physical Properties 2016.5.2 Thermodynamic and Thermochemical Data 2026.5.3 Spectra 2046.5.3.1 Spectroscopic Databases 2056.5.3.2 Compound Databases with Spectroscopic Information 2056.5.4 Biological, Environmental, and Safety Information Sources 2066.5.4.1 Biological Information 2076.5.4.2 Pharmaceutical and Medical Information 2086.5.4.3 Toxicity, Environmental, and Safety Information 2096.6 Reaction Databases 2106.6.1 Comprehensive Reaction Databases 2106.6.2 Synthetic Methodology Databases 2126.7 Bibliographic and Citation Databases 2126.7.1 Bibliographic Databases 2136.7.1.1 Special Bibliographic Databases 2136.7.1.2 Patent Bibliographic Databases 2146.7.1.3 Searching Bibliographic Databases 2166.7.1.4 Linking to Full Text 2166.7.2 Citation Databases 2176.7.2.1 General Citation Databases 2186.7.2.2 Patent Citation Databases 2196.8 Full-Text Databases 2196.8.1 Electronic Journals 2196.8.2 Patents 2206.8.3 Lexika and Encyclopedias 2216.9 Architecture of a Structure-Searchable Database 222Selected Reading 224References 2247 Searching Chemical Structures 231Nikolay Kochev, Valentin Monev, and Ivan Bangov7.1 Introduction 2317.2 Full Structure Search 2327.3 Substructure Search 2357.3.1 Basic Concepts 2357.3.2 Backtracking Algorithm 2367.3.3 Optimization of the Backtracking Algorithm 2387.3.4 Screening 2397.3.5 Superstructure Searching 2417.3.6 Automorphism Searching 2417.3.7 Maximum Common Substructure Searching 2427.3.8 Speciﬁc Line Notations for Substructure Searching 2437.3.9 Chemotypes for Database Searching 2447.4 Similarity Search 2457.4.1 Similarity Basics 2457.4.2 Similarity Measures 2477.4.3 Descriptor Selection and Coding 2497.4.4 Similarity Measures Based on Maximum Common Substructure 2507.5 Three-Dimensional Structure Search Methods 2507.5.1 Pharmacophore Searching 2517.5.2 3D Similarity Searching 2527.6 Sequence Searching in Protein and Nucleic Acid Databases 2547.6.1 Sequence Similarity Deﬁnition 2557.6.2 Dynamic Programming Algorithm 2567.6.3 Fast Sequence Searching in Large Databases 2587.7 Summary 259Selected Reading 261References 2628 Computational Chemistry 2678.1 Empirical Approaches to the Calculation of Properties 269Johann Gasteiger8.1.1 Introduction 2698.1.2 Additivity of Atomic Contributions 2698.1.3 Attenuation Models 2718.1.3.1 Calculation of Charge Distribution 2718.1.3.2 Polarizability Eﬀect 275Selected Reading 277References 2778.2 Molecular Mechanics 279Harald Lanig8.2.1 Introduction 2798.2.2 No Force Field Calculation without Atom Types 2808.2.3 The Functional Form of Common Force Fields 2818.2.3.1 Bond Stretching 2828.2.3.2 Angle Bending 2838.2.3.3 Torsional Terms 2848.2.3.4 Out-of-Plane Bending 2858.2.3.5 Electrostatic Interactions 2868.2.3.6 Van der Waals Interactions 2878.2.3.7 Cross Terms 2898.2.3.8 Advanced Interatomic Potentials and Future Development 2908.2.4 Available Force Fields 2918.2.4.1 Force Fields for Small Molecules 2928.2.4.2 Force Fields for Biomolecules 293Selected Readings 296References 2968.3 Molecular Dynamics 301Harald Lanig8.3.1 Introduction 3018.3.2 The Continuous Movement of Molecules 3028.3.3 Methods 3028.3.3.1 Algorithms 3038.3.3.2 Ways for Speeding up the Calculations 3048.3.3.3 Solvent Eﬀects 3058.3.3.4 Periodic Boundary Conditions 3088.3.4 Constant Energy, Temperature, or Pressure? 3088.3.5 Long-Range Forces 3108.3.6 Application of Molecular Dynamics Techniques 3118.3.7 Future Perspectives 315Selected Readings 317References 3178.4 Quantum Mechanics 320Tim Clark8.4.1 Hückel Molecular Orbital Theory 3208.4.2 Semiempirical MO Theory 3248.4.3 Ab Initio Molecular Orbital Theory 3278.4.4 Density Functional Theory 3328.4.5 Properties from Quantum Mechanical Calculations 3348.4.5.1 Net Atomic Charges 3348.4.5.2 Dipole and Higher Multipole Moments 3358.4.5.3 Polarizabilities 3358.4.5.4 Orbital Energies 3368.4.5.5 Surface Descriptors 3368.4.5.6 Local Ionization Potential 3368.4.6 Quantum Mechanical Techniques for Very Largen Molecules 3378.4.6.1 Linear Scaling Methods 3378.4.6.2 Hybrid QM/MM Calculations 3388.4.7 The Future of Quantum Mechanical Methods in Chemoinformatics 338Selected Reading 340References 3419 Modeling and Prediction of Properties (QSPR/QSAR) 345Johann Gasteiger10 Calculation of Structure Descriptors 349Lothar Terﬂoth and Johann Gasteiger10.1 Introduction 34910.1.1 QSPR/QSAR Modeling 34910.1.2 Overview 34910.1.3 Classiﬁcation of Compounds and Similarity Searching 35010.1.4 Deﬁnition of the Terms “Structure Descriptor” and “Molecular Descriptor” 35110.1.5 Classiﬁcation of Structure Descriptors 35110.1.6 Structure Descriptors with a Fixed Length 35110.2 Structure Descriptors for Classiﬁcation and Similarity Searching 35210.2.1 2D Structure Descriptors (Topological Descriptors) 35210.2.1.1 Structural Keys 35210.2.1.2 Fingerprints 35310.2.1.3 Distance and Similarity Measures 35410.2.1.4 Chemotypes: Data Mining for Compounds with Structural Features 35610.2.1.5 Multilevel Neighborhoods of Atoms 35810.2.1.6 Descriptors from Shannon Entropy Calculations 35910.2.1.7 Chemically Advanced Template Search (CATS2D) Descriptors 36010.2.1.8 Descriptors from Chemical Bond Information 36010.2.2 3D Descriptors 36110.2.2.1 Geometric Atom Pair Descriptors 36110.2.2.2 CATS3D and CHARGE3D 36110.2.2.3 Pharmacophores 36210.2.3 Field-Based Molecular Similarity 36210.2.3.1 Electron Density 36210.2.3.2 General Field-Based Similarity Indices 36310.3 Structure Descriptors for Quantitative Modeling 36310.3.1 0-D Molecular Descriptors 36310.3.2 1D Molecular Descriptors 36310.3.3 2D Molecular Descriptors (Topological Descriptors) 36510.3.3.1 Single-Valued Descriptors 36510.3.3.2 Topological Descriptors as Vectors 36610.3.4 3D Descriptors 36910.3.4.1 3D Structure Generation 36910.3.4.2 3D Autocorrelation Vector 37010.3.4.3 3D Molecule Representation of Structures Based on Electron Diﬀraction Code (3D MoRSE Code) 37010.3.4.4 Radial Distribution Function Code 37110.3.4.5 Other 3D Descriptors 37510.3.5 Chirality Descriptors 37510.3.5.1 Chirality Codes 37610.3.5.2 Conformation-Independent Chirality Code (CICC) 37610.3.5.3 Conformation-Dependent Chirality Code (CDCC) 37710.3.5.4 Descriptors of Molecular Shape and Molecular Surfaces 37710.3.5.5 Global Shape Descriptors 37810.3.5.6 Autocorrelation of Molecular Surface Properties 37810.3.5.7 2D Maps of Molecular Surfaces 37910.3.5.8 Charged Partial Surface Area 38210.3.6 Field-Based Methods 38310.3.6.1 Comparative Molecular Field Analysis (CoMFA) 38310.3.6.2 Comparative Molecular Similarity Analysis (CoMSIA) 38410.3.6.3 3D Molecular Interaction Fields 38410.3.7 Descriptors for an Ensemble of Conformations (4D Descriptors) 38410.3.7.1 4D-QSAR 38410.3.8 Quantum Chemical Descriptors 38510.4 Descriptors That Are Not Calculated from the Chemical Structure 38510.5 Summary and Outlook 387Selected Reading 390References 39011 Data Analysis and Data Handling (QSPR/QSAR) 39711.1 Methods for Multivariate Data Analysis 399Kurt Varmuza11.1.1 Introduction into Multivariate Data Analysis 39911.1.1.1 Aims 39911.1.1.2 Notation and Symbols 40011.1.2 Basics of Statistical Data Evaluation 40111.1.2.1 Data Distribution, Central Value, and Spread 40111.1.2.2 Correlation 40411.1.2.3 Discrimination 40511.1.3 Multivariate Data 40611.1.3.1 Overview 40611.1.3.2 Preprocessing 40711.1.3.3 Distances and Similarities 40811.1.3.4 Linear Latent Variables 41011.1.4 Evaluation of Empirical Models 41211.1.4.1 Overview 41211.1.4.2 Optimum Model Complexity 41211.1.4.3 Performance Criteria for Calibration Models 41311.1.4.4 Performance Criteria for Classiﬁcation Models 41411.1.4.5 Cross-Validation 41511.1.4.6 Bootstrap 41611.1.5 Exploration: Analyzing the Independent Variables 41711.1.5.1 Overview 41711.1.5.2 Principal Component Analysis (PCA) 41711.1.5.3 Nonlinear Mapping 41911.1.5.4 Cluster Analysis 41911.1.5.5 Example: Exploratory Data Analysis of Mass Spectra from Meteorite Samples 42111.1.6 Calibration: Building a Quantitative Model 42311.1.6.1 Overview 42311.1.6.2 Ordinary Least Squares (OLS) Regression 42411.1.6.3 Principal Component Regression (PCR) 42411.1.6.4 Partial Least Squares (PLS) Regression 42511.1.6.5 Variable Selection 42611.1.6.6 Example: Prediction of Gas Chromatographic Retention Indices for Polycyclic Aromatic Hydrocarbons 42711.1.7 Classiﬁcation: Discriminating Samples 42811.1.7.1 Overview 42811.1.7.2 Linear Discriminant Analysis (LDA) 43011.1.7.3 Discriminant Partial Least Squares (D-PLS) Analysis 43011.1.7.4 k-Nearest Neighbor (KNN) Classiﬁcation 43011.1.7.5 Support Vector Machine (SVM) 43111.1.7.6 Classiﬁcation Trees (CART) 43211.1.7.7 Example: Classiﬁcation of Meteorite Samples Using Mass Spectral Data 432Acknowledgements 434Selected Reading 435References 43511.2 Artiﬁcial Neural Networks (ANNs) 438Jure Zupan11.2.1 How to Learn a New Method? 43811.2.2 Multivariate Representation of Data 43911.2.3 Overview of Artiﬁcial Neural Networks (ANNs) 44211.2.4 Error Back-Propagation ANNs 44311.2.5 Kohonen and Counter-Propagation ANN 44511.2.6 Training of the ANN: Adapting the Weights 44811.2.7 Controlling Model Complexity and Optimizing Predictivity 45011.2.8 Few General Remarks about ANNs 450Selected Reading 451References 45111.3 Deep and Shallow Neural Networks 453David A. Winkler11.3.1 Drug Design in the Era of Big Data and Artiﬁcial Intelligence (AI) 45311.3.2 Deep Learning 45411.3.3 Controlling Model Complexity and Optimizing Predictivity Using Regularization 45511.3.4 Universal Approximation Theorem 45811.3.5 Do QSAR Models Generated by Neural Networks Meet the Requirements of the Universal Approximation Theorem? 45811.3.6 Comparison of the Performance of Deep and Shallow Regularized Neural Networks on Drug Datasets 45911.3.7 A Few General Remarks about Neural Networks for Drug Discovery 460Selected Reading 462References 46212 QSAR/QSPR Revisited 465Alexander Golbraikh and Alexander Tropsha12.1 Best Practices of QSAR Modeling 46612.1.1 Introduction 46612.1.2 Key Concepts 46712.1.3 Predictive QSAR Modeling Workﬂow 46812.1.4 Dataset Curation 46912.1.5 Modelability Studies 47012.1.6 Development of QSAR Models: Internal and External Validation 47112.1.7 Prediction Accuracy Criteria for QSAR Models for a Continuous Response Variable 47212.1.8 Prediction Accuracy Criteria for Category QSAR Models 47312.1.9 Time-Split Validation 47512.1.10 Validation by Y-Randomization 47512.1.11 Applicability Domain of QSAR Models 47512.1.11.1 Leverage AD for Regression QSAR Models 47612.1.11.2 Residual Standard Deviation (RSD) as AD 47612.1.11.3 Other widely Used ADs 47612.1.12 Ensemble Modeling 47812.1.13 Model Interpretation: Structural Alerts 47812.1.14 Virtual Screening 47912.1.15 Conclusions 48012.2 The Data Science of QSAR Modeling 48012.2.1 Introduction 48012.2.2 Data Curation: Trust but Verify! 48212.2.3 Models as Decision Support Tools 48712.2.4 Conclusions 487Selected Reading 489References 48913 Bioinformatics 497Heinrich Sticht13.1 Introduction 49713.2 Sequence Databases 49913.2.1 GenBank 49913.2.2 UniProt 50113.3 Searching Sequence Databases 50213.3.1 Tools for Sequence Database Searches 50313.3.2 Scoring Matrices 50313.3.3 Interpretation of the Results of a Database Search 50713.4 Characterization of Protein Families 50913.4.1 Multiple Sequence Alignment 50913.4.2 Sequence Signatures 51213.5 Homology Modeling 515Selected Reading 520References 52014 Future Directions 525Johann Gasteiger14.1 Access to Chemical Information 52514.2 Representation of Chemical Compounds 52714.3 Representation of Chemical Reactions 52714.4 Learning from Chemical Information 52814.5 Training in Chemoinformatics 529Answers Section 531Index 555