Thesis Topics 2015

Geplaatst op 28-10-2014 door Maarten Marx | DiLiPaD, eXist, onderwijs | tags: , | comment image Geen reacties »

Dilipad-logo-REVERSED-300dpi Below is a list of possible thesis topics which can be done in 2015 in the context of the DiLiPaD or ExPoSe projects.
These thesis topics can be taken for Bachelor or Master thesis in Information Science or Artificial Intelligence at UvA.

Mining contrastive opinions on political texts using cross-perspective topic model

We replicate the research done in [1], but now on Dutch data and using Dutch topics.

[1] Yi Fang, Luo Si, Naveen Somasundaram, and Zhengtao Yu. 2012. Mining contrastive opinions on political texts using cross-perspective topic model. In Proceedings of the fifth ACM international conference on Web search and data mining (WSDM ’12). ACM, New York, NY, USA, 63-72. http://doi.acm.org/10.1145/2124295.2124306

Expected skills of the student
* Python programming with text
* Statistics

Linking parliament to news

In this project we link news articles to parliamentary proceedings. We both look for articles that describe events that happened in parliament, and for articles that describe an event (eg MH17 disaster) that was discussed in parliament). See the more elaborate description below.

Expected skills of the student
* Python programming with text

Enabling diachronic comparative research on text corpora

We create tools to enable humanities scholars to do diachronic comparative research using large volumes of text. We work with two use cases:
1) Heeft de verzuiling in Nederland echt plaatsgevonden? (research by Voerman and Wijfjes). Data is Dutch newspapers from 1920-1980 and also Dutch “omroepgidsen”.
2) Treatment of the topic ‘immigration’ in Dutch, UK and Canadian parliaments in the period 1945-recent (Dilipad research project.)

Expected skills of the student
* Python programming with text
* visualization, design of interfaces
* interested in humanities research.
You work in the DiLiPaD project and might take an internship in the Dutch Parliament or the Dutch eScience center.

Authorship attribution

You participate in the PAN 2015 Author Identification Task.
Given a document, who wrote it?

This task focuses on authorship verification and methods to answer the question whether two given documents have the same author or no. This question accurately emulates the real-world problem that most forensic linguists face every day. Learn More.

You work together with Hosein Azarbonyad and Maarten Marx. In house we have a large collection of tweets, emails, political speeches and much more training and test material.
Expected skills of the student
* Python programming with text

Author Profiling

You participate in the PAN 2015 Author Profiling Task.
Given a document, what’re its author’s traits?

This task is concerned with predicting an author’s demographics from her writing. For example, an author’s style may reveal her age and gender. Learn More.

You work together with Hosein Azarbonyad and Maarten Marx. In house we have a large collection of tweets, emails, political speeches and much more training and test material.
Expected skills of the student
* Python programming with text

Social Book Search Lab

See the website for the task description. You participate in an international challenge. The deadline is in June.

Expected skills of the student
* Python programming with text
* IR
* recommendation systems

Further information

In the project we will turn the Dutch (or European parliament) parliamentary proceedings into hyperdocuments. We will do this for current documents as well as for older documents (the corpus is complete from 1814 on and available at http://search.politicalmashup.nl/).
Adding links and connecting parliamentary proceedings to the Linked Open Data Cloud has been done by both applicants in the PoliticalMashup and PoliMedia projects. UvA has concentrated on linking named entities to biographies and DBpedia and on linking to internal documents (eg link a vote to the document (motion/proposal for law/amendment) that is being voted upon). VU has done similar linking for the modern EU proceedings and has connected the older Dutch proceedings to newsarticles from the KB news corpus.

We focus on linking news articles to proceedings as this task still needs additional work and in fact it uses linked named entities as part of its input. There are two types of linking tasks in this setting, directly connected to two types of news articles:
1. Articles that describe an event that happened in parliament and that is recorded in the proceedings (a vote, a debate,etc).
2. Articles that describe an entity (event, person, organization) which is also being discussed in parliament (e.g., the disaster with MH17; Vestia; the Afghan interpreter who worked for the Dutch in Afghanistan and was refused asylum in the Netherlands).
The first task is related to known item search: given a good description of the parliamentary event, find articles which are about that event. The task amounts to creating a very high precision (measured with mean reciprocal rank (MRR)) complex query to a news database. We will use the EMM Newsexplorer database and modern proceedings for experiments and evaluation.

The second task is more challenging as can be seen from the third example which concerns an entity which is described by a definite description. We also go for high precision, but in order to have at least some recall we must take an IR approach and present an ordered list of candidates. Precision at 5 or 10 are the appropriate evaluation measures.

As we aim for a system which can and will be used in practice we cannot stop with an evaluation at a system level. For a system which will make mistakes to be accepted by users, we must add high quality explanations to our links. These explanations can be compared to the snippets on Google’s result page: often these are sufficient for users to discard wrong candidates, and to steer users to selecting the most relevant candidate.
Our system of explanations will be evaluated by a user study.