Automatic Dependency Parsing of a Learner English Corpus Realec

The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The essays are a part of students' preparation for the independent final examination similar to the international English exam. While adjusting existing dependency parsing tools to a learner data, one has to take into account to what extent students' mistakes provoke errors in the parser output. The ungrammatical and stylistically inappropriate utterances may challenge parsers' algorithms trained on grammatically appropriate written texts. In our experiments, we compared the output of the dependency parser UDPipe (trained on UD-English 2.0) with the results of manual parsing, placing a particular focus on parses of ungrammatical English clauses. We show how mistakes made by students influence the work of the parser. Overall, UDPipe performed reasonably well (UAS 92.9, LAS 91.7). The following cases cause the errors in automatic annotation a) incorrect detection of a head, b) incorrect detection of the relation type, as well as c) both. We propose some solutions which could improve the automatic output and thus make the assessment of syntactic complexity more reliable

MoreLess

Year of publication:	2017
Authors:	Lyashevkaya, Olga ; Panteleeva, Irina
Publisher:	[S.l.] : SSRN

Extent:	1 Online-Ressource (13 p)
Series:	Higher School of Economics Research Paper ; No. WP BRP 62/LNG/2017
Type of publication:	Book / Working Paper
Language:	English
Notes:	Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments December 18, 2017 erstellt
Other identifiers:	10.2139/ssrn.3089660 [DOI]
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10014117664