Machine learning remains the prevalent approach to solving problems in natural language processing for both speech and text domains. This approach generally requires large amounts of annotated or highly structured data in order to process language effectively. Even in the era of Big Data and a greater focus on low-resource languages, there are a significant number of languages and NLP domains where the availability of data resources to facilitate effective machine learning, remains a challenge.
Machine learning in low-resource environments typically focusses on low-resource languages that have limited linguistic resources and data available. This remains an important topic of interest, especially given the recent DARPA LORELEI call (http://www.darpa.mil/program/low-resource-languages-for-emergent-incidents). There are however an increasing number of domains where other limitations on the availability of these resources have a similar impact on the successful use of machine learning, including biomedical knowledge extraction, on-line harassment detection, and crisis resolution. In all of these environments it is necessary to develop novel approaches in order to expand and improve resource acquisition and machine learning approaches.
The MaLLoRE workshop aims to bring together researchers working on various low-resource problems in order to disseminate knowledge on different tools and approaches to solving data sparsity, domain adaptation, annotation, and evaluation problems. The intention of the workshop is to provide an inclusive environment for the exchange of ideas, experiences, and demonstrations that facilitate the agile development of language resources for machine learning in domains where resources are limited due to security, confidentiality, or financial constraints.
The main topics of interest include, but are not limited to:
- machine learning with limited training data;
- unsupervised and semi-supervised machine learning techniques;
- semi-automated resource development;
- active and on-line learning;
- domain adaptation methods;
- using multilingual resources in monolingual resource development;
- reusing non-traditional resources in machine learning;
- crowdsourcing; and
- evaluation in low-resource environments.
The workshop forms part of a cooperative effort by the Department of Arts and Culture of South Africa and the Dutch Language Union to expedite greater co-operation between research institutions working on natural language processing in Belgium, the Netherlands, and South Africa, with the aim of encouraging joint research projects. This call does not preclude submissions from other regions and all interested parties are encouraged to make submissions to this workshop.
Authors are invited to submit full-length papers (4-8 pages) in English describing original, on-going or completed work through the online submission form on START (https://www.softconf.com/lrec2016/main/).
All submitted papers will be peer-reviewed by at least two members of the programme committee and accepted papers will be published as workshop proceedings.
Submissions should follow the main LREC style sheet as published on the LREC conference website.
Submission deadline: 15 February 2016
Notification to authors: 4 March 2016
Submission of final version: 21 March 2016
Workshop: 28 May 2016
Identify, Describe and Share your LRs
Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences).
To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.
As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2016 endorses the need to uniquely identify LRs through the use of the International Standard Language Resource Number (ISLRN, http://www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.
Roald Eiselen (CTexT, North-West University, South Africa)
Febe de Wet (Meraka Institute, CSIR, South Africa)
Walter Daelemans (CLiPS, University of Antwerp, Belgium)