Bibliometric Analysis

Scopus-indexed publications in corpus linguistics

This project employs a scoping review framework (Munn et al. 2018) combined with quantitative bibliometric metrics to analyze all Scopus‑indexed publications from 2010 to 2020. It aims to identify the corpus types most frequently used, determine which languages receive the greatest research focus, idenify the main subfields of linguistics that utilize corpus methods, map temporal trends and emerging knowledge gaps, and highlight the notable underrepresentation of phonetic and phonological studies in corpus‑based research.

Methodological Framework

  • Data collection from the Scopus database using a multifaceted search query
  • Two-stage screening process (title screening followed by abstract review)
  • Annotation of corpus types (written, spoken, both, sign language) and languages covered
  • Mapping of research streams via quantitative bibliometric indicators

Key Findings

Corpus Types
  • Written corpora: 823 instances
  • Spoken corpora: 547 instances
  • Both written and spoken: 92 instances
  • Sign language corpora: 19 instances
Languages Covered
  • Total languages identified: 212
  • English (and its variants): 505 occurrences
  • Spanish: 164 occurrences
  • French: 84 occurrences
  • German: 71 occurrences

Research Streams and Contributions

  1. Computational Linguistics Hub: Combines NLP and Speech Recognition using large corpora for language processing tasks.
  2. Natural Language Processing (NLP): Focuses on algorithm development for analyzing linguistic patterns.
  3. Discourse Studies: Investigates language use in context through the integration of corpus methods and discourse analysis.
  4. Speech Perception & Processing: Explores auditory processing. Yet studies on phonetics and phonology applying corpus methods remain comparatively uncommon.

Conclusion

This bibliometric analysis highlights the evolving landscape of corpus linguistic research. The findings illuminate the extensive use of written data, underepresenation of certain languages and predominance of the research, mainly on Germanic languages, and the emergence of distinct research streams that collectively inform current scholarly practices and future research directions in the field.