MPhil Projects

Note: please contact (shg36@cam.ac.uk) to discuss the project before applying.

Curriculum learning for text simplification

Proposer: Sian Gooding
Supervisors: Sian Gooding, Zheng Yuan, Ted Briscoe
Special Resources: Access to an NLIP machine/server + Access to a GPU
Areas: Machine translation, curriculum learning, text simplification

Humans and animals learn much more effectively when the examples are not randomly presented but organized in a meaningful way. For instance, by introducing gradually more concepts and increasingly more complex ones [1]. This has been referred to as curriculum learning. The use of a curriculum learning framework applied to neural machine translation (NMT) has been shown to reduce training time and result in overall better performance [2].

Text simplification is aimed at reducing the reading and grammatical complexity of text whilst retaining the meaning. NMT approaches have dominated the text simplification field over the past few years [3]. In this setting, the aim is to ‘translate’ complex text to simpler text using traditional translation architectures. In this project, you will investigate whether curriculum learning can be used to improve an NMT system for text simplification. You will reimplement a neural translation system and experiment with how the ordering of training data impacts the model performance.

Proficiency prediction from reading behaviour

Proposer: Sian Gooding
Supervisors: Sian Gooding, Ted Briscoe
Special Resources: Access to an NLIP machine/server + Access to a GPU
Areas: Reading comprehension, interaction, AI in edtech

Reading comprehension is based on a range of reader-related, text-related, and situational factors. It has been defined as the ability to process text, understand its meaning, and integrate the meaning with what the reader already knows. Being able to judge whether a reader would comprehend a given text requires knowledge of their background and level of proficiency. Techniques have been explored that measure how an individual interacts with text whilst reading to predict their proficiency level. For instance, features from eye tracking have been used to predict the language proficiency of readers [4].

On-device reading has predominantly taken the place of traditional formats. Such devices allow access to implicit user feedback by measuring how a user interacts with the text they read. A key advantage to implicit feedback techniques is that they can unobtrusively obtain information by measuring user interactions with a system. In this project, you will use a dataset that has been collected in collaboration with Google, containing implicit reading interactions of participants whilst reading [5][6]. Using features extracted from text interactions, the aim is to predict the comprehension score and proficiency of readers.

Sources

[1] Curriculum learning (Bengio et al., 2009)
[2] Competence-based Curriculum Learning for Neural Machine Translation (Plantanious et al., 2019)
[3] A Survey on Text Simplification (Sikka and Mango, 2020)

[4] Assessing Language Proficiency from Eye Movements in Reading (Berzak et al., 2018)
[5] Predicting Text Readability from Scrolling Interactions (Gooding et al., 2021)
[6] Google AI Blog (Gooding, 2021)