Thesis Topics 2015

Geplaatst op 28-10-2014 door Maarten Marx | DiLiPaD, eXist, onderwijs | tags: , | comment image Geen reacties »

Dilipad-logo-REVERSED-300dpi Below is a list of possible thesis topics which can be done in 2015 in the context of the DiLiPaD or ExPoSe projects.
These thesis topics can be taken for Bachelor or Master thesis in Information Science or Artificial Intelligence at UvA.
| lees verder…

Talk of Europe

Geplaatst op 06-10-2014 door Maarten Marx | DiLiPaD | tags: | comment image Geen reacties »

Maarten Marx gives a presentation about DiLiPaD at the Talk of Europe Creative Camp on October 8, 2014 in Hilversum, The Netherlands.

| lees verder…

DiLiPaD zoekt wetenschappelijk programmeur

Geplaatst op 17-09-2014 door Maarten Marx | DiLiPaD | | comment image Geen reacties »


DiLiPaD is een “Digging into Data” project waarin UvA samenwerkt met de Universiteit van Toronto, King’s College London en het Britse Parlement. Dilipad-logo-REVERSED-300dpi Het doel is om vergelijkend onderzoek mogelijk te maken op basis van de notulen van de parlementen van Canada, Nederland en de UK.
Het streven is om van elk land minstens 100 jaar maar liever 200 jaar van de notulen compleet en digitaal op een goede manier beschikbaar te maken. Historici en politicologen zullen onderzoek doen naar de omgang met immigratie in die drie landen in de afgelopen 100 jaar. Informatici en taalkundigen zullen hen daarbij helpen.
Meer informatie vindt je op de DiLiPaD blog.

We zoeken

Master AI of informatica student met interesse in machine leren, information retrieval en natuurlijke taal verwerking. Je bent handig in scripten, werkt onder Linux, kan een LAMP systeem opzetten en in de lucht houden en vindt het leuk om iets te maken dat anderen ook echt zullen gebruiken.

Aanstelling: 1 dag per week, maximaal 2 jaar.
Waar: Instituut voor Informatica, Science Park, ILPS groep.
Waarbij: DiLiPaD project (Digging into Linked Parliamentary Data)


  • werken in een bruisende internationale academische omgeving
  • project dat midden in de wereld staat
  • desgewenst samenwerking met Tweede Kamer en/of Nationaal Archief
  • mogelijkheid tot bezoek aan Toronto
  • werken op zware machines (48Gb RAM)


Stuur je CV met korte motivatiebrief naar Voor inlichtingen kan je ook contact opnemen met Maarten Marx.


26 September 2014

Dutch parliamentary data in RDF format

Geplaatst op 12-08-2014 door Maarten Marx | data, DiLiPaD, ExPoSe, Political Mashup | tags: , | comment image Geen reacties »

The Dutch parliamentary data created by PoliticalMashup is now also available in RDF format, and can be queried from a SPARQL endpoint.
The dataset is enriched with recognized named entities which are linked to DBpedia and Wikipedia.
More information can be found in the ParlBench paper and this description of a benchmark experiment.


The sparql endpoint only contains 1% of all data. If you want to query the data using sparql you must download it and load it in your own endpoint. Below is a short description how to do that.


| lees verder…

From Text to Political Positions. Text analysis across disciplines

Geplaatst op 13-05-2014 door Maarten Marx | DiLiPaD, ExPoSe | tags: | comment image Geen reacties »

The book From Text to Political Positions. Text analysis across disciplines edited by Bertie Kaal, Isa Maks and Annemarie van Elfrinkhof recently appeared.
It contains chapters by DiLiPaD researchers Graeme Hirst and Maarten Marx.


From Text to Political Positions addresses cross-disciplinary innovation in political text analysis for party positioning. Drawing on political science, computational methods and discourse analysis, it presents a diverse collection of analytical models including pure quantitative and qualitative approaches.Dilipad-logo-REVERSED-300dpi By bringing together the prevailing text-analysis methods from each discipline the volume aims to alert researchers to new and exciting possibilities of text analyses across their own disciplinary boundary.
The volume builds on the fact that each of the disciplines has a common interest in extracting information from political texts. The focus on political texts thus facilitates interdisciplinary cross-overs. The volume also includes chapters combining methods as examples of cross-disciplinary endeavours. These chapters present an open discussion of the constraints and (dis)advantages of either quantitative or qualitative methods when evaluating the possibilities of combining analytic tools.

| lees verder…

Digging into Parliamentary Data Tutorial

Geplaatst op 08-05-2014 door Maarten Marx | DiLiPaD, ExPoSe | | comment image Geen reacties »

Jaap Kamps and Maarten Marx (both University of Amsterdam) will give a tutorial at Digital Libraries 2014, London, called Digging into Parliamentary Data.
Date: Monday, 8th September.

Dilipad-logo-REVERSED-300dpiThe tutorial will show how to incrementally annotate textual corpora, starting from OCR’ed flat text to encoding structure and entities, and demonstrate the remarkable power of such light-weight semantic annotation: linear text becomes valuable research data that can be sliced and diced to present many views.

This tutorial is part of the Digging into Linked Parliamentary Data (DiLiPaD) and Exploratory Political Search (ExPoSe) projects.

Text of tutorial proposal.
Additional tutorial material (Slides, links, etc).

PoliticalMashup and Parliamentary Data

Geplaatst op 16-04-2014 door Maarten Marx | DiLiPaD, parliament | | comment image Geen reacties »

The PoliticalMashup project at the University of Amsterdam started collecting parliamentary proceedings in 2008. We started with Dutch proceedings and have since moved to collecting proceedings from other European states as well. Dilipad-logo-REVERSED-300dpi
Our aim is to transform all proceedings into a rich common XML format and store these into a single XML database system. With this database system we facilitated comparative diachronic research for historians, political scientists, linguists and communication scientists.
Currently we collect data from The Netherlands, the UK, Flanders, Germany, Denmark, Sweden and Norway.
The main problem in keeping the collecting up to date and going back as far as possible is changing data formats. The content and layout of the proceedings are in general very stable over time, but the technical formats differ much, especially in the “digital era” (starting roughly around 1995). For older material, OCR errors and badly placed scans form major challenges. Besides these challenges with the texts, much work is needed to recognize, disambiguate and link political entities (speakers, parties, constituencies, ministerial functions, etc) to existing databases.


After consultation with a panel of users consisting of scientists, journalists and archivists we decided to focus on the following aims:

  • create a complete copy of the proceedings of the meetings in parliament;
  • add metadata which record for each word spoken in parliament when it was said, who said it, in what role, on behalf of which party, and in which context. If possible, also indicate the type of speech act (e.g., speech from central lectern, interruption of a speech, shout from the benches, etc);
  • give each entity a unique identifier which is resolvable by a Handle system comparable to the DOI system; do this for real entities (persons, parties) and textual objects (proceedings, topics, speeches, paragraphs, votes, etc);
  • use these identifiers to link data to existing databases and link the parliamentary data to the Linked Open Data Cloud.

Available tools

Proceedings of the UK and the Netherlands are actively collected “until yesterday”. The collections start in 1935 (UK) and 1814 (NL) respectively. They can be downloaded and accessed through a Search Interface.


DiLiPaD’s Dutch principal investigators Jaap Kamps and Maarten Marx collaborate with the Information Office of the Dutch House of Commons, the Dutch Royal Library, the Dutch National Archive, the Dutch Documentation Centre for Political Parties, and scientists from the Humanities, Social and Computer Sciences.

Linking Hansards to related newsarticles

Geplaatst op 15-04-2014 door Maarten Marx | DiLiPaD, ExPoSe, ODE, parliament | tags: | comment image Geen reacties »

We describe a simple technique with which to link news articles to debates in Parliament.
The technique uses the news search engine EMM Newsexplorer.
As search strings we use

  • the date of the debate
  • the speakers
  • the first ten words from a unigram parsimonious language model created from the debate

Results on oral questions are promising. In this post we explain how we find the relevant news articles, evaluate the results. Code is provided.
| lees verder…

Leren classificeren van verkiezingsprogrammas

Geplaatst op 15-04-2014 door Maarten Marx | DiLiPaD, Political Mashup, resultaten | tags: | comment image Geen reacties »

Het artikel Automatic thematic classification of election manifestos van Suzan Verbernea, Eva D’hondt, Antal van den Bosch en Maarten Marx is verschenen in Information Processing & Management (Volume 50, Issue 4, July 2014, Pages 554–567).
| lees verder…

DiLiPaD on Twitter

Geplaatst op 04-04-2014 door Maarten Marx | DiLiPaD | | comment image Geen reacties »

Dilipad-logo-REVERSED-300dpiThe DiLiPaD project is on Twitter,
DiLiPaD blogs at

« eerdere stukken