Bibliometric Analysis (Scopus databasa) | Alona Kononenko-Szoszkiewicz, PhD

This project employs a scoping review framework (Munn et al. 2018) combined with quantitative bibliometric metrics to analyze all Scopus‑indexed publications from 2010 to 2020. It aims to identify the corpus types most frequently used, determine which languages receive the greatest research focus, idenify the main subfields of linguistics that utilize corpus methods, map temporal trends and emerging knowledge gaps, and highlight the notable underrepresentation of phonetic and phonological studies in corpus‑based research.

Methodological Framework

Data collection from the Scopus database using a multifaceted search query
Two-stage screening process (title screening followed by abstract review)
Annotation of corpus types (written, spoken, both, sign language) and languages covered
Mapping of research streams via quantitative bibliometric indicators

Key Findings

Corpus Types

Written corpora: 823 instances
Spoken corpora: 547 instances
Both written and spoken: 92 instances
Sign language corpora: 19 instances

Languages Covered

Total languages identified: 212
English (and its variants): 505 occurrences
Spanish: 164 occurrences
French: 84 occurrences
German: 71 occurrences

Research Streams and Contributions

Computational Linguistics Hub: Combines NLP and Speech Recognition using large corpora for language processing tasks.
Natural Language Processing (NLP): Focuses on algorithm development for analyzing linguistic patterns.
Discourse Studies: Investigates language use in context through the integration of corpus methods and discourse analysis.
Speech Perception & Processing: Explores auditory processing. Yet studies on phonetics and phonology applying corpus methods remain comparatively uncommon.

Conclusion

This bibliometric analysis highlights the evolving landscape of corpus linguistic research. The findings illuminate the extensive use of written data, underepresenation of certain languages and predominance of the research, mainly on Germanic languages, and the emergence of distinct research streams that collectively inform current scholarly practices and future research directions in the field.