Grammar and Corpora 2022 (GaC2022) is a three-day conference, taking place on-site in the city of Ghent, Belgium, from Thursday, June 30th, to Saturday, July 2nd, 2022.
The conference begins on Thursday morning and ends on Saturday mid-afternoon, leaving time for a 1.5-hour boat tour on the rivers Leie and Schelde around the city centre. There will be an informal, pre-conference warm-up gathering in a café in the city on Wednesday evening, a welcome reception on Thursday evening (included in the conference fee), and a conference dinner on Friday evening.
The final version of the programme can be found here: GaC22 – programme (version of 30 June 2022). It features six keynote talks and just short of sixty 20+10′ papers in three parallel sessions on a wide range of topics within the remit of the conference theme. One session hosts an all-day panel on Language Productivity on Friday. This dedicated panel also features a poster session.
The final version of the book of abstracts (including the programme) can be found here: GaC22 – book of abstracts (last update 1 July 2022). The author names and presentation titles in the programme are hyperlinked to the individual abstracts.
We look forward to welcoming you in Ghent. Please see here for registration, fees and payment.
Florent Perek (University of Birmingham, United Kingdom)
Constructions and the company they keep
One major contribution of corpus linguistics to the study of grammar is the realisation that there is a non-trivial relation between words and the syntactic contexts in which they occur (cf. Sinclair 1991, Hunston & Francis 2000, inter alia). Many corpus-based studies consistently report that grammatical constructions can be very choosy as to what words they can combine with, sometimes in seemingly unpredictable ways, which has led some scholars to consider that syntactic constructions, just like morphological patterns, display varying degrees of productivity (e.g. Goldberg 2006). In diachrony, lexical fillers of constructions may also vary over time, as speakers come to use language in slightly different ways to their forebearers, gradually expanding (or shrinking) the distribution of constructions (e.g. Rudanko 2011).
In this talk, I will show how distributional semantic models (also known as vector space models) can be used as a powerful tool to explore lexico-grammatical associations of this kind and how they vary, for instance over time. In line with the idea that “you shall know a word by the company it keeps” (Firth 1957: 1), distributional semantics aims to capture the meaning of words through their lexical collocates in large corpora, drawing on the intuition that words with a similar meaning are expected to co-occur with a common set of lexical items (Lenci 2008). Distributional semantic methods offer a robust, data-driven way to identify semantic areas in the distribution of a construction, and track changes in it.
Through a number of case studies on the productivity of constructions in diachrony (e.g. Perek 2016, 2018), I show that despite the still prevalent emphasis on measures based on type frequency (the number of different items), the most appropriate way to study syntactic productivity is by looking at the variability and spread of a construction in semantic space. Indeed, constructions with a similar increase in type frequency can be productive in very different ways, as captured by the distributional semantic methods I will demonstrate. In the last part of the talk, I offer some reflections on further developments and variations of these methods, and ways to address some of their limitations.
Firth, J. R. (1957). A synopsis of linguistic theory 1930-1955. In Studies in linguistic analysis (Special volume of the Philological Society), 1–32. Oxford: Blackwell.
Goldberg, A. E. (2006). Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press.
Hunston, S., & Francis, G. (2000). Pattern Grammar: A Corpus-driven Approach to the Lexical Grammar of English. Amsterdam: John Benjamins.
Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Rivista di Linguistica 20(1). 1–31.
Perek, F. (2016). Using distributional semantics to study syntactic productivity in diachrony: A case study. Linguistics 54(1): 149–188.
Perek, F. (2018). Recent change in the productivity and schematicity of the way-construction: a distributional semantic analysis. Corpus Linguistic and Linguistic Theory, 14(1), 65-97.
Rudanko, J. (2011). Changes in Complementation in British and American English: Corpus-Based Studies on Non-Finite Complements in Recent English. Basingstoke: Palgrave Macmillan.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Sali Tagliamonte (University of Toronto, Canada)
Grammar, statistics and 1000 people’s stories: Studying language variation and change in natural speech data
In this presentation I offer an overview of my research program investigating language variation and change. The data come from a large archive of vernacular speech from Ontario, Canada, in which I have been documenting language variation and change among people born from the late 1800’s up to the early 2001’s. As of 2021, the archive comprises 19 communities with representation from the largest city, Toronto, to many localities in the Near North (e.g. Tagliamonte, 2013; 2014).
Using a selection of well-studied, variable, grammatical features as case studies, I demonstrate how the findings arising from these materials provide important new insights into the nature of language variation and change. In some cases, cross-linguistic regularities expose typological tendencies (negation) (Burnett et al., 2018) or long time assumptions about ongoing grammatical trends are overturned by evidence of stability (Rothlisberger & Tagliamonte, to appear). Simultaneously, smaller pockets of change within variable systems are exposed, such as the lexicalization of certain constructions (ever) (Franco & Tagliamonte, to appear) or the splitting off of separate developments (anyway/anyways) (Franco & Tagliamonte, 2020). At the same time, social, geographic and cultural influences are also at play (e.g. Tagliamonte et al., 2010; Gardner & Tagliamonte, 2020). Taken together, these findings demonstrate that variation is best understood within a broad, contrastive perspective and that statistical techniques applied to corpus data offer an important means to detect patterns, not only within the variety or dialects under investigation, but also across languages leading to more integrated explanations.
Burnett, H., Tagliamonte, S. A. & Koopman, H. (2018). Soft Syntax and the Evolution of Negative and Polarity Indefinites in the History of English. Language Variation and Change 30(1): 83-107.
Franco, K. & Tagliamonte, S. A. (2020). New -way(s) with -ward(s): lexicalization, splitting and sociolinguistic patterns. Language Variation and Change 32(2): 217-239.
Franco, K. & Tagliamonte, S. A. (to appear). The most stable it’s ever been: The preterit/present perfect alternation in spoken Ontario English. English Language and Linguistics.
Gardner, M. & Tagliamonte, S. A. (2020). The bike, the back, and the boyfriend: Confronting the “definite article conspiracy” in Canadian and British English. English World Wide 41(2): 226-255.
Rothlisberger, M. & Tagliamonte, S. A. (to appear). The social embedding of a syntactic alternation: Variable particle placement in Ontario English. Language Variation and Change 32(3): 317-348.
Tagliamonte, S. A. (2013). Roots of English: Exploring the history of dialects. Cambridge: Cambridge University Press.
Tagliamonte, S. A. (2014). System and society in the evolution of change: The view from Canada. In Green, E. & Meyer, C. (Eds.), Variability in Current World Englishes Berlin and New York: Mouton de Gruyter. 199-238.
Tagliamonte, S. A., D’Arcy, A. & Jankowski, B. (2010). Social work and linguistic systems: Marking possession in Canadian English. Language Variation and Change 22(1): 1-25.
Anke Lüdeling, Julia Lukassek, Anna Shadrova (Humboldt-Universität zu Berlin, Germany)
Variability in Grammatical Categories and Structures: The Case of Word Formation
How variable are word formation categories across speakers? This question is interesting because (concatenative) word formation is both a grammatical process and intimately tied in with the lexicon. Most of the complex words that we find in any given corpus are highly lexicalized. At the same time, word formation is grammatical in the sense that we can identify categories and patterns that allow for productivity and are subject to language dynamics such as grammaticalization, similarly to syntactic pattern formation. There is evidence that these categories and patterns are accessible even in highly lexicalized words (Smolka/Libben/Dressler 2019). We know from previous research that parts of speech, constituents, and dependencies show highly stable proportions across corpora (otherwise, many applications of natural language processing such as probabilistic tagging or parsing would not be possible). The lexicon, on the other hand, is less easily described in statistical terms (Piantadosi 2014, Williams et al.. 2015).
In our talk, we will discuss the distributions of word-formation patterns of verbs and nouns in two corpora of German. The very high inter- and intra-speaker variability that we find (cf. Shadrova et al.. 2021) has far-reaching methodological and theoretical implications. Specifically, we will address issues with usage-based modeling of grammar and acquisition, such as notions of “constructions all the way down” (Goldberg 2006, 18) or or the construct of a homogeneous native speaker in corpus-based grammar research.
Goldberg, Adele E. (2016) Constructions at Work: The Nature of Generalizations in Language. Oxford University Press.
Piantadosi, Steven T. (2014) Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review 21(5), 1112-1130.
Shadrova, Anna, Pia Linscheid, Julia Lukassek, Anke Lüdeling & Sarah Schneider (2021) A challenge for contrastive L1/L2 corpus studies: Large inter-and intra-individual variation across morphological, but not global syntactic categories in task-based corpus data of a homogeneous L1 German group. Frontiers in Pschology 12, https://doi.org/10.3389/fpsyg.2021.716485.
Smolka, Eva, Gary Libben & Wolfgang U. Dressler (2019) When morphological structure overrides meaning: Evidence from German prefix and particle verbs. Language, Corgnition and Neuroscience 34(5), 599-614.
Williams, Jake Ryland, Paul R. Lessard and Suma Desu and Eric M. Clark and James P. Bagrow and Christopher M. Danforth and Peter Sheridan Dodds (2015) Zipf’s law holds for phrases, not words. Scientific Reports 5(1), 1-7, arXiv:1406.5181.
Marieke Meelen (University of Cambridge, United Kingdom)
Corpus annotation of the ‘conscious self’. From manuscript to egophoric grammars in low-resource historical languages
The verbal systems of some languages mark the speaker’s personal involvement in an event; for example, in Lhasa Tibetan nga em-chi yin ‘I’m a doctor’, ’di nga’i bu-mo yin ‘This is my daughter’, and ’di khyed-rang-gi gsol-ja yin ‘This is your tea [that I have made for you]’ all end with yin, and all three sentences involve the speaker somehow. In another branch of the Tibeto-Burman language family, Kathmandu Newar uses vowel lengthening to indicate a speaker’s involvement (ji: a:pwa twan-ā ‘I drank too much’ vs. chã/wa a:pwa twan-a. ‘you/(s)he drank too much’), but only if self-conscious . There is a clear distinction, for example, between the egophoric morpheme indicated by the long -ā ji: Mānaj nāpalān-ā ‘I met Manoj as planned’ vs. the non-egophoric short vowel -a in ji: Mānaj nāpalān-a ‘I met Manoj by coincidence’. This phenomenon, whereby the speaker’s knowledge, experience or personal involvement is grammatically expressed is called ‘egophoricity’. It was first described in Kathmandu Newar (Hale 1970) and Lhasa Tibetan (DeLancey 1980), but today is known in languages of the Himalayas, New Guinea, and equatorial South America. But how do languages develop egophoric marking?
In this talk I will show that Newar and Tibetan offer an excellent starting point to answer this nearly unexplored question. Unlike other languages with egophoric marking, such as Awa Pit (Barbacoan), Kaluli (Trans New Guinea) and Guambiano (Coconucan), Tibetan and Newar varieties have long literary traditions (Tibetan since 650 CE and Newar since 1112 CE). Unlike their present-day descendants, neither Classical Tibetan or Classical Newar exhibit egophoricity (Tournadre & Jiatso 2001). This means that, in theory, we should be able to create annotated diachronic corpora to explore this unanswered research question.
In practice, however, things are not so simple. Since historical Newar and Tibetan are both low-resource and under-researched historical languages, creating well-annotated corpora is not a straightforward task. I will therefore first discuss the creation of deeply-annotated corpora in six different Tibetan and Newar varieties. For some of these varieties, we currently only have photographs of 16-17th c. manuscripts, for others, we need to go to Nepal to collect data in the field and then transcribe it. Each of these therefore present some unique challenges that need to be addressed to arrive at the sophisticated level of annotation we need to understand how egophoric marking emergence and develops over time.
In this talk I’ll present some crucial case studies at different stages of the annotation workflow to illustrate how challenges of low-resource historical languages can be overcome and why close collaborations of philologists, NLP experts and linguists in different areas (e.g. those specializing in historical linguistics, phonetics, morphosyntax, semantics & pragmatics) is essential to tackle complex questions of language variation and change, such as the emergence of egophoricity.
M.C. Parafita Couto (Leiden University, the Netherlands)
The role of multilingual corpora in describing multilingual grammars
In this talk, I will (i) illustrate how (open access) corpora of naturalistic multilingual speech help us describe distributional patterns that arise and shed light on the grammaticality of these structures, as well as (ii) discuss whether the psycho-/neurolinguistic findings align with the corpora-based findings. For these purposes, I will discuss the case of competing theoretical and methodological tensions in the structural study of code-switching, that is, when multilingual speakers “go back and forth” between the languages they speak within a conversation, or even within a sentence (Deuchar 2012). I will show how corpus analyses of production data can provide a wealth of information about the naturalistic occurrences of code-switches, and enable the predictions of different theoretical models to be assessed in an ecologically valid way. Determining the grammatical constraints that may predict code-switching patterns has been the focus of attention of many recent studies (cf. Backus 2015, Balam et al. 2020, Toribio 2017, López 2020, among many others), some of which also employ psycho-/neurolinguistic measures (Beatty-Martínez et al. 2018, Pablos et al. 2018, Van Hell et al. 2018, Vaughan-Evans et al. 2020, inter alia). I will discuss how processing of code-switched speech often aligns with the code-switching patterns that have previously been reported in naturalistic production in the specific multilingual community, highlighting the importance of studying code-switching from a language ecological perspective. I will finish with a call for rapprochement between domains and argue for open access corpora, which are often collected at public expense. The availability of these data will help us further unravel recent theoretical and empirical questions and criticisms being raised about the description and nature of code-switching grammars (e.g. Toribio 2018, Parafita Couto et al. in press).
Backus, A. (2015). A usage-based approach to code-switching: The need for reconciling structure and function. In G. Stell & K. Yakpo (Eds). Code-switching between Structural and Sociolinguistic Perspectives (19-37). Berlin/Munich/Boston: De Gruyter.
Balam, O., Parafita Couto, M.C., & Stadthagen-González, H. (2020). Bilingual verbs in three Spanish/English code-switching communities. International Journal of Bilingualism, 24(5–6), 952–967.
Beatty-Martínez, A. L., Valdés Kroff, J. R., & Dussias, P. E. (2018). From the field to the lab: A converging methodsapproach to the study of codeswitching. Languages, 3(2), 1–19.
Deuchar, M. (2012). Code-switching. In Chapelle, C.A. (ed.) Encyclopedia of Applied Linguistics. New York: Wiley, 657-664.
López, L. (2020). Bilingual grammar. Toward an integrated model. Cambridge University Press.
Pablos, L., Parafita Couto, M.C., Boutonnet, B., De Jong, A., Perquin, M., De Haan, A., & Schiller, N.O. (2019). Adjective-Noun order in Papiamento-Dutch code-switching. Linguistic Approaches to Bilingualism, Volume 9, Issue 4, p. 710 – 735.
Parafita Couto, M.C. Greidanus Romaneli, M. &Bellamy, K. (in press) Code-switching at the interface between language, culture, and cognition. Lapurdum, IKER UMR 5478 CNRS.
Parafita Couto, M. C. Bellamy, K. & Ameka, F. (in press). Theoretical Linguistic Approaches to Multilingual code-switching. Cambridge Handbook of Third Language Acquisition and Processing. Eds. Cabrelli, J. Chaouch-Orozco,A., González Alonso, J., Pereira Soares,, S., Puig-Mayenco, E. & Rothman, J. Cambridge University Press.
Toribio, A. J. (2017). Structural approaches to code-switching: Research then and now. In R.E.V. Lopes, J. Ornelas de Avelar & S. M. L. Cyrino (Eds.) Romance Languages and Linguistic Theory 12. Selected papers from the 45th Linguistic Symposium on Romance Languages (LSRL), Campinas, Brazil (pp. 213-233). Amsterdam: John Benjamins Publishing Company.
Toribio, A. J. (2018). The future of code-switching research. In López, L. (Ed.). Code-Swiching- Experimental Answers to Theoretical Questions: In honor of Kay González Vilbazo. Issues in Hispanic and Lusohpone Linguistics 19. John Benjamins, pp. 257–267.
van Hell, J. G., Fernandez, C., Kootstra, G. J., Litcofsky, K. A., & Ting, C.Y. (2018). Electrophysiological and experimental-behavioral approaches to the study of intra-sentential code-switching. Linguistic Approaches to Bilingualism, 8(1), 144-171.
Vaughan-Evans, A., Parafita Couto M.C., Boutonnet, B., Hoshino, N., Webb-Davies, P., Deuchar, M. and Thierry, G. (2020). Switchmate! An Electrophysiological Attempt to Adjudicate Between Competing Accounts of Adjective-Noun Code-Switching. Frontiers in. Psychology. 11:54976
Gert De Sutter (Ghent University, Belgium)
Understanding grammatical variation from a bilingual perspective: descriptive, methodological and theoretical insights from corpus-based translation research
Grammatical variation has played a key role in understanding how language use in general and variation in particular functions in society, how it is constrained and how it can be represented cognitively. As such, it is a central topic in many usage-based linguistic disciplines, such as sociolinguistics, psycholinguistics, corpus linguistics, probabilistic linguistics and cognitive linguistics, with empirical research into grammatical variation leading to major advances in terms of description, methodology and theory.
It nevertheless seems fair to say that current understanding of grammatical variation is primarily based on studies of monolingual language use, which does not do justice to the ever-increasing amount of multilingual communication, due to increased mobility and global communication. Therefore, it is crucial to investigate to what extent our current understanding of grammatical variation applies to bilingual language production contexts or, alternatively, how it should be adjusted in order to incorporate insights from bilingualism more accurately.
To address this issue, I will present three corpus-based translation studies of grammatical variation phenomena, namely pre- vs postverbal subject placement in Dutch, English that/zero alternation in complement clauses and the genitive alternation in Dutch, which rely on parallel corpora, i.e. source texts and their translations, and/or monolingual comparable corpora, i.e. translated and non-translated texts in the same language. This will allow us to evaluate (i) to what extent translations, being a prototypical example of bilingual communicative events, exhibit probabilistic patterns of grammatical variation similar to monolingual text production, (ii) to what extent mainstream multifactorial statistical analysis is capable of accurately detecting variation patterns in this type of multilingual data and (iii) to what extent bilingual text production is affected by constraints such as structural priming, structural integration cost, markedness of coding and statistical pre-emption. To conclude, I will discuss the implications of these insights for (probabilistic, cognitive-linguistic) theory of grammatical variation, for statistical analysis and for the use of parallel corpus data in linguistic research.