Formalizing Natural Languages
The NooJ Approach
Inbunden, Engelska, 2016
Av Max Silberztein, Max (International NooJ Association) Silberztein
2 369 kr
Produktinformation
- Utgivningsdatum2016-01-08
- Mått165 x 241 x 23 mm
- Vikt653 g
- FormatInbunden
- SpråkEngelska
- Antal sidor346
- FörlagISTE Ltd and John Wiley & Sons Inc
- ISBN9781848219021
Tillhör följande kategorier
Max Silberztein is President of the International NooJ Association. His research focuses on computational linguistics and language formalization.
- Acknowledgments xiChapter 1. Introduction: the Project 11.1. Characterizing a set of infinite size 41.2. Computers and linguistics 51.3. Levels of formalization 61.4. Not applicable 71.4.1. Poetry and plays on words 71.4.2. Stylistics and rhetoric 91.4.3. Anaphora, coreference resolution, and semantic disambiguation 101.4.4. Extralinguistic calculations 121.5. NLP applications 121.5.1. Automatic translation 141.5.2. Part-of-speech (POS) tagging 181.5.3. Linguistic rather than stochastic analysis 271.6. Linguistic formalisms: NooJ 271.7. Conclusion and structure of this book 301.8. Exercises 311.9. Internet links 32Part 1. Linguistic Units 35Chapter 2. Formalizing the Alphabet 372.1. Bits and bytes 372.2. Digitizing information 392.3. Representing natural numbers 392.3.1. Decimal notation 392.3.2. Binary notation 402.3.3. Hexadecimal notation 412.4. Encoding characters 412.4.1. Standardization of encodings 432.4.2. Accented Latin letters, diacritical marks, and ligatures 452.4.3. Extended ASCII encodings 462.4.4. Unicode 472.5. Alphabetical order 532.6. Classification of characters 562.7. Conclusion 562.8. Exercises 572.9. Internet links 57Chapter 3. Defining Vocabulary 593.1. Multiple vocabularies and the evolution of vocabulary 593.2. Derivation 633.2.1. Derivation applies to vocabulary elements 633.2.2. Derivations are unpredictable 643.2.3. Atomicity of derived words 653.3. Atomic linguistic units (ALUs) 673.3.1. Classification of ALUs 673.4. Multiword units versus analyzable sequences of simple words 703.4.1. Semantics 723.4.2. Usage 763.4.3. Transformational analysis 773.5. Conclusion 803.6. Exercises 813.7. Internet links 81Chapter 4. Electronic Dictionaries 834.1. Could editorial dictionaries be reused? 834.2. LADL electronic dictionaries 904.2.1. Lexicon-grammar 904.2.2. DELA 934.3. Dubois and Dubois-Charlier electronic dictionaries 944.3.1. The Dictionnaire électronique des mots 954.3.2. Les Verbes Français (LVF) 974.4. Specifications for the construction of an electronic dictionary 994.4.1. One ALU = one lexical entry 994.4.2. Importance of derivation 1004.4.3. Orthographic variation 1014.4.4. Inflection of simple words, compound words, and expressions 1034.4.5. Expressions 1044.4.6. Integration of syntax and semantics 1044.5. Conclusion 1074.6. Exercises 1084.7. Internet links 108Part 2. Languages, Grammars and Machines 111Chapter 5. Languages, Grammars, and Machines 1135.1. Definitions 1135.1.1. Letters and alphabets 1135.1.2. Words and languages 1145.1.3. ALU, vocabularies, phrases, and languages 1145.1.4. Empty string 1155.1.5. Free language 1165.1.6. Grammars 1165.1.7. Machines 1175.2. Generative grammars 1185.3. Chomsky-Schützenberger hierarchy 1195.3.1. Linguistic formalisms 1225.4. The NooJ approach 1245.4.1. A multifaceted approach 1245.4.2. Unified notation 1255.4.3. Cascading architecture 1275.5. Conclusion 1275.6. Exercises 1285.7. Internet links 129Chapter 6. Regular Grammars 1316.1. Regular expressions 1316.1.1. Some examples of regular expressions 1356.2. Finite-state graphs 1376.3. Non-deterministic and deterministic graphs 1396.4. Minimal deterministic graphs 1416.5. Kleene’s theorem 1426.6. Regular expressions with outputs and finite-state transducers 1466.7. Extensions of regular grammars 1516.7.1. Lexical symbols 1516.7.2. Syntactic symbols 1536.7.3. Symbols defined by grammars 1546.7.4. Special operators 1556.8. Conclusion 1596.9. Exercises 1596.10. Internet links 159Chapter 7. Context-Free Grammars 1617.1. Recursion 1647.1.1. Right recursion 1667.1.2. Left recursion 1677.1.3. Middle recursion 1687.2. Parse trees 1707.3. Conclusion 1737.4. Exercises 1737.5. Internet links 174Chapter 8. Context-Sensitive Grammars 1758.1. The NooJ approach 1768.1.1. The anbncn language 1778.1.2. The language a2n 1808.1.3. Handling reduplications 1818.1.4. Grammatical agreements 1828.1.5. Lexical constraints in morphological grammars 1858.2. NooJ contextual constraints 1868.3. NooJ variables 1888.3.1. Variables’ scope 1888.3.2. Computing a variable’s value 1898.3.3. Inheriting a variable’s value 1918.4. Conclusion 1918.5. Exercises 1928.6. Internet links 192Chapter 9. Unrestricted Grammars 1959.1. Linguistic adequacy 1979.2. Conclusion 1999.3. Exercise 1999.4. Internet links 199Part 3. Automatic Linguistic Parsing 201Chapter 10. Text Annotation Structure 20510.1. Parsing a text 20510.2. Annotations 20610.2.1. Limits of XML/TEI representation 20710.3. Text annotation structure (TAS) 20810.4. Exercise 21110.5. Internet links 212Chapter 11. Lexical Analysis 21311.1. Tokenization 21311.1.1. Letter recognition 21411.1.2. Apostrophe/quote 21711.1.3. Dash/hyphen 21911.1.4. Dot/period/point ambiguity 22211.2. Word forms 22411.2.1. Space and punctuation 22411.2.2. Numbers 22611.2.3. Words in upper case 22811.3. Morphological analyses 22911.3.1. Inflectional morphology 23011.3.2. Derivational morphology 23411.3.3. Lexical morphology 23611.3.4. Agglutinations 23911.4. Multiword unit recognition 24111.5. Recognizing expressions 24311.5.1. Characteristic constituent 24411.5.2. Varying the characteristic constituent 24511.5.3. Varying the light verb 24611.5.4. Resolving ambiguity 24711.5.5. Annotating expressions 25111.6. Conclusion 25411.7. Exercise 255Chapter 12. Syntactic Analysis 25712.1. Local grammars 25712.1.1. Named entities 25712.1.2. Grammatical word sequences 26212.1.3. Automatically identifying ambiguity 26312.2. Structural grammars 26512.2.1. Complex atomic linguistic units 26612.2.2. Structured annotations 26812.2.3. Ambiguities 27012.2.4. Syntax trees vs parse trees 27312.2.5. Dependency grammar and tree 27612.2.6. Resolving ambiguity transparently 27912.3. Conclusion 28012.4. Exercises 28112.5. Internet links 281Chapter 13. Transformational Analysis 28313.1. Implementing transformations 28613.2. Theoretical problems 29213.2.1. Equivalence of transformation sequences 29213.2.2. Ambiguities in transformed sentences 29313.2.3. Theoretical sentences 29413.2.4. The number of transformations to be implemented 29513.3. Transformational analysis with NooJ 29713.3.1. Applying a grammar in “generation” mode 29813.3.2. The transformation’s arguments 29913.4. Question answering 30313.5. Semantic analysis 30413.6. Machine translation 30513.7. Conclusion 30913.8. Exercises 30913.9. Internet links 310Conclusion 311Bibliography 315Index 327
This book lays ground for better understanding of both computational linguistics (CL) and natural language processing (NLP) perspectives, i.e. it shows how to describe language (CL) in order to build the best NLP applications (NLP). The book bridges the gap between theoretical linguistic phenomena and practical language models. It shows how computational linguists and language engineers working together can bring us closer to better language understanding by both humans and computers.The author takes us on a stroll through the layers of language processing, explaining very soundly and giving examples and counterexamples that bring additional clarification for each step we make on that path. Starting with the tiny bits of written language, the alphabet, via dictionary and atomic linguistic units that occupy it, he clarifies the importance of each step, giving us solid ground to build upon any language project we might venture to undertake.Silberztein knows how to invite an audience into his Project, as he calls it, and introduces the topic in such a manner that makes you want to read the book until the last page (and solve all the CL and NLP problems on the way). He smoothly transitions through Parts one, two and three, building one topic upon the previous one, as if playing with lego blocks.He begins by demonstrating the importance of defining basic (atomic) linguistic units starting with the alphabet and vocabulary that prepare us for the construction of electronic dictionaries. It is the design of the e-dictionary that will allow us and support us in formalizing the language of our interest. Thus, it is not a surprise that a thorough classification and understanding of our basic resources is needed to prepare (and prepare well) and specify affixes [re-, de-, un-, -ation], simple words [home, love, sky], multiword units [sweet potatoes, more and more, round table] and expressions [to give up, to turn off, to take off] that we will play around with to construct and annotate new words, phrases and sentences.He then takes regular grammars, context-free grammars, context-sensitive grammars and unrestricted grammars and he makes them all work via NooJ’s multifaceted approach. The (beautiful) simplicity of this application is aligned with the way we, as humans, process vocabulary, grammar, orthography, syntax, semantics…thus making the NooJ as a tool easy to use by beginners and more advanced users alike.It is only expected that the journey will end with applications both in parsing and generating written text. We are presented with the lexical analysis, syntactic analysis (local and structural) and transformational analysis that open up the door for more sophisticated NLP applications (Question Answering, Machine Translation, Semantic Analyzer, etc.)The most expected audience of ‘"Formalizing Natural Languages: The NooJ Approach’ are linguists i.e. computational linguists and NLP people (or as the author likes to call them language engieers). But, since the book holds the key that can open a whole sea of possible applications in the domains of other subfields, I would recommend it to etymologists, sociolinguists, psycholinguists, forensic linguists, internet linguists, corpus linguists or to any data scientist today. Having each chapter end with exercises and additional internet links, the book is also suitable as a class reading in NLP and CL classes, machine translation and similar. The book is presented in a way as to improve the understanding of the ways the natural language can be formalized and has the power to reveal some new applications to almost any type of written text. Since the book and NooJ as a tool came into existence in the era dominated by unstructured data, the potential of presented tool is limited only by the imagination of its user. —Kristina Kocijan, Department of Information and Communication Sciences Faculty of Humanities and Social Sciences University of Zagreb, Croatia