PoliticalMashup and Parliamentary Data

Geplaatst op 16-04-2014 door Maarten Marx | DiLiPaD, parliament | | Geen reacties »

The PoliticalMashup project at the University of Amsterdam started collecting parliamentary proceedings in 2008. We started with Dutch proceedings and have since moved to collecting proceedings from other European states as well. Dilipad-logo-REVERSED-300dpi
Our aim is to transform all proceedings into a rich common XML format and store these into a single XML database system. With this database system we facilitated comparative diachronic research for historians, political scientists, linguists and communication scientists.
Currently we collect data from The Netherlands, the UK, Flanders, Germany, Denmark, Sweden and Norway.
The main problem in keeping the collecting up to date and going back as far as possible is changing data formats. The content and layout of the proceedings are in general very stable over time, but the technical formats differ much, especially in the “digital era” (starting roughly around 1995). For older material, OCR errors and badly placed scans form major challenges. Besides these challenges with the texts, much work is needed to recognize, disambiguate and link political entities (speakers, parties, constituencies, ministerial functions, etc) to existing databases.


After consultation with a panel of users consisting of scientists, journalists and archivists we decided to focus on the following aims:

  • create a complete copy of the proceedings of the meetings in parliament;
  • add metadata which record for each word spoken in parliament when it was said, who said it, in what role, on behalf of which party, and in which context. If possible, also indicate the type of speech act (e.g., speech from central lectern, interruption of a speech, shout from the benches, etc);
  • give each entity a unique identifier which is resolvable by a Handle system comparable to the DOI system; do this for real entities (persons, parties) and textual objects (proceedings, topics, speeches, paragraphs, votes, etc);
  • use these identifiers to link data to existing databases and link the parliamentary data to the Linked Open Data Cloud.

Available tools

Proceedings of the UK and the Netherlands are actively collected “until yesterday”. The collections start in 1935 (UK) and 1814 (NL) respectively. They can be downloaded and accessed through a Search Interface.


DiLiPaD’s Dutch principal investigators Jaap Kamps and Maarten Marx collaborate with the Information Office of the Dutch House of Commons, the Dutch Royal Library, the Dutch National Archive, the Dutch Documentation Centre for Political Parties, and scientists from the Humanities, Social and Computer Sciences.

Linking Hansards to related newsarticles

Geplaatst op 15-04-2014 door Maarten Marx | DiLiPaD, ExPoSe, ODE, parliament | tags: | Geen reacties »

We describe a simple technique with which to link news articles to debates in Parliament.
The technique uses the news search engine EMM Newsexplorer.
As search strings we use

  • the date of the debate
  • the speakers
  • the first ten words from a unigram parsimonious language model created from the debate

Results on oral questions are promising. In this post we explain how we find the relevant news articles, evaluate the results. Code is provided.
| lees verder…

War in Parliament: What a Digital Approach Can Add to the Study of Parliamentary History

Geplaatst op 15-04-2014 door Maarten Marx | resultaten | tags: , | Geen reacties »

Het artikel War in Parliament: What a Digital Approach Can Add to the Study of Parliamentary History van Hinke Piersma, Ismee Tames (beide NIOD), Lars Buitinck, Johan van Doornik en Maarten Marx (alle Informatics Institute, UvA) is verschenen in Digital Humanities Quarterly.
| lees verder…

Leren classificeren van verkiezingsprogrammas

Geplaatst op 15-04-2014 door Maarten Marx | DiLiPaD, Political Mashup, resultaten | tags: | Geen reacties »

Het artikel Automatic thematic classification of election manifestos van Suzan Verbernea, Eva D’hondt, Antal van den Bosch en Maarten Marx is verschenen in Information Processing & Management (Volume 50, Issue 4, July 2014, Pages 554–567).
| lees verder…

DiLiPaD on Twitter

Geplaatst op 04-04-2014 door Maarten Marx | DiLiPaD | | Geen reacties »

Dilipad-logo-REVERSED-300dpiThe DiLiPaD project is on Twitter, https://twitter.com/parl_data.
DiLiPaD blogs at http://dilipad.history.ac.uk/.

UK Hansards in PoliticalMashup format

Geplaatst op 03-04-2014 door Maarten Marx | data, DiLiPaD, parliament, Political Mashup | | Geen reacties »
Dilipad-logo-REVERSED-300dpi Debates of the House of Lords and House of Commons from 1935 until “yesterday” are available in the XML format developed within the PoliticalMashup project. The debates are available as one dump of XML files and through a rudimentary search interface.
All debates are available in XML, RDF and HTML formats, via a simple parameter:

| lees verder…

Digging into Parliamentary Data

Geplaatst op 16-01-2014 door Maarten Marx | DiLiPaD, parliament, research | tags: | Geen reacties »

A consortium consisting of the University of Amsterdam, King’s College London and the University of Toronto was awarded a Digging into Data grant for the project Digging into Parliamentary Data.
| lees verder…

Household income in Georgian Asset Declarations

Geplaatst op 24-12-2013 door Maarten Marx | parliament, TI | tags: | Geen reacties »

The Georgian Asset Declarations are a great source of open government data and a fantastic example of openness. Unfortunately they are not available in easily machine readable format, but only in PDF form. Still, with some work one can do nice analysis on them.
At TI Georgia, the income of public officials is an item of interest. In this post we report on our work in which we calculated the total household income of each public official.

In fact we calculated how much each member of the household contributes, expressed as a percentage, to the total household income. We calculate the income based on the questions about “entrepeneurial activity” and “paid work”. See the Asset Declaration of Kakhaber Chikhradze as an example.

Kakhaber Chikhradze is a counterexample to the idea that Georgian man are macho: he works at the Ministry of Defense and earns a mere 6.300 GEL. His mother and wife earn 144K and 70K USD, respectively. Even his son earns more with 16K USD. Also he is the only one in the household without a car.
| lees verder…

Georgian First Names

Geplaatst op 05-12-2013 door Maarten Marx | data, TI | tags: | Geen reacties »

Ani Elchishvili from TI georgia created a spreadsheet with for over 1500 Georgian Names their gender specified. For 1144, she also found a specific webpage (often Wikipedia) for the names.
The data is available as a TAB seperated CSV spreadsheet and also as an HTML table.

| lees verder…

Overview of Georgian passed laws

Geplaatst op 28-11-2013 door Maarten Marx | parliament, TI | tags: | Geen reacties »

The Georgian Government has a convenient page for finding all passed laws.
TI Georgia created a text-scraper which collects all information from that page and turns it into a spreadsheet. The version of November 2013 is available as a Google Fusion Table.
We also downloaded for each law the accompyaning file listing how each member voted and the final outcome, and turned these into two spreadsheets: outcomes with one line for each law, and votesperlaw.csv.gz and votesperlaw.xml.gz, with one line for each vote of an MP.

The post below contains more information about this interesting dataset. Because we have aggregated data that was previously only available in separate files, we can do new analyses. For instance, we can count for each MP at how many of the votes he or she was present.

The analysis is rather crude and reflects the noise present in the data. We left the noise in on purpose.

| lees verder…

« eerdere stukken