Corpus Linguistics for Writing Development provides a practical introduction to using corpora in the study of first and second language learners’ written language over time and across different levels of proficiency. Focusing on development in the use of vocabulary, formulaic language, and grammar, this book• discusses how corpus research can contribute to our understanding of writing development and to pedagogical practice;• reviews a range of corpus techniques for studying writing development from the perspectives of vocabulary, grammar, and formulaic language and interrogates the methodological bases of those techniques; and• guides readers to perform practical analyses of learner writing using the R open-source programming language.Aimed at the novice researcher, this book will be key reading for advanced undergraduate and postgraduate students in the fields of education, language, and linguistics. It will be of particular interest to those interested in first or second language writing, language assessment, and learner corpus research.
Philip Durrant is Associate Professor in Language Education at the University of Exeter, United Kingdom.
Table of ContentsPart One: FoundationsChapter 1. Studying Writing Development with a Corpus1. Introduction2. Using a corpus to study writing development3. How does writing development relate to vocabulary, grammar, formulaic language?4. Outline of the bookChapter 2. Learner Corpus Analysis in Practice: Some Basics1. Introduction2. Some housekeeping: getting your computer ready3. Getting to know R and RStudio3.1 Introduction: why learn R?3.2 Entering commands: the Console and Scripts3.3 Functions3.4 Vectors3.5 Getting help4. Some fundamentals of corpus research: encoding, markup, annotation, and metadata5. Corpora used in this book6. Automatically annotating your corpus for part of speech and syntactic relationships6.1 Introduction6.2 Make sure you have the required software6.3 Prepare the corpus for parsing6.4 Make a list of the files you want to process6.5 Run the CoreNLP pipeline7. ConclusionPart Two: Studying Vocabulary in Writing DevelopmentChapter 3. Understanding Vocabulary in Learner Writing1. Introduction2. Theorizing development in vocabulary2.1 Introduction2.2 Breadth, depth, and fluency2.3 Aspects of word knowledge3. Measures of vocabulary development3.1 Introduction3.2 Lexical diversity3.3 Lexical sophistication3.3.1 Word length3.3.2 Word frequency3.3.3 Register-based measures3.3.4. Contextual distinctiveness3.3.5 Semantic measures3.3.6 Psycholinguistic measures4. Complicating factors4.1 Introduction4.2 What is a ‘word’?4.2.1 Defining words4.2.2 Defining word tokens4.2.3 Defining word types4.3 Choosing a suitable reference corpus4.4 Relationships between measures of diversity and sophistication4.5 Vocabulary knowledge depth5. Conclusion6. Taking it furtherChapter 4. Vocabulary Research in Practice: Diversity and Academic Vocabulary1. Introduction2. Measuring vocabulary diversity2.1 Getting the metadata and corpus filenames2.2: Generating CTTR scores2.3 Recording the results2.4 Analysing vocabulary diversity3. Studying academic vocabulary3.1 Preparing the list of academic vocabulary3.2 Converting the parsed corpus to an easier-to-use format3.3 Identifying AVL words in the learner corpus3.4 Visualizing variation in measures3.5 Investigating the patterns4. ConclusionPart Three: Studying Grammar in Writing DevelopmentChapter 5. Understanding Grammar in Learner Writing1. Introduction2. Studying development through grammar2.1 Models of grammar2.2 Selecting and interpreting grammatical features3. Approaches to grammatical development3.1 Varieties of grammatical approaches3.2 Development in grammatical complexity3.3 Multi-dimensional analysis3.4 Usage-based models of development4. Conclusion5. Taking it furtherChapter 6. Grammar Research in Practice: Evaluating Parser Accuracy1. Introduction2. Reading a parsed corpus3. Accuracy evaluation and fixtagging: an introduction4. Accuracy evaluation and fixtagging: a worked example4.1 Hand-annotating a sample of texts4.2 Getting metadata and filenames4.3 Identifying and counting adjectives4.4 Identifying true positives, false positives, and false negatives4.5 Calculating precision and recall4.6 Identifying matches and differences in hand vs. computer parses4.7 Identifying and fixing parsing errors5. Tracing development in a grammatical feature5.1 Counting a feature in texts5.2 Visualizing variation across learner groups6. ConclusionPart Four: Studying Formulaic Language in Writing DevelopmentChapter 7. Understanding Formulaic Language in Learner Writing1. Introduction2. Defining formulaic language3. How can we study formulaic language in a corpus?3.1 A frequency-based approach to studying formulaic language3.2 Lexical bundles3.3 Collocations4. Conclusion5. Taking it furtherChapter 8. Formulaic Language Research in Practice: Academic Collocations1. Introduction2. Identifying collocations in a reference corpus2.1 Editing the parsed corpus2.2 Identifying lemmas and verb + noun combinations2.3 Identifying collocations3. Quantifying the use of academic collocations across learner groups3.1 Preparing the learner corpus3.2 Identifying academic collocations in the learner corpus3.3 Understanding use of academic collocations across levels4. Conclusion