PoliticalMashup and Parliamentary Data

Geplaatst op 16-04-2014 door Maarten Marx | DiLiPaD, parliament | | comment image Geen reacties »

The PoliticalMashup project at the University of Amsterdam started collecting parliamentary proceedings in 2008. We started with Dutch proceedings and have since moved to collecting proceedings from other European states as well. Dilipad-logo-REVERSED-300dpi
Our aim is to transform all proceedings into a rich common XML format and store these into a single XML database system. With this database system we facilitated comparative diachronic research for historians, political scientists, linguists and communication scientists.
Currently we collect data from The Netherlands, the UK, Flanders, Germany, Denmark, Sweden and Norway.
The main problem in keeping the collecting up to date and going back as far as possible is changing data formats. The content and layout of the proceedings are in general very stable over time, but the technical formats differ much, especially in the “digital era” (starting roughly around 1995). For older material, OCR errors and badly placed scans form major challenges. Besides these challenges with the texts, much work is needed to recognize, disambiguate and link political entities (speakers, parties, constituencies, ministerial functions, etc) to existing databases.


After consultation with a panel of users consisting of scientists, journalists and archivists we decided to focus on the following aims:

  • create a complete copy of the proceedings of the meetings in parliament;
  • add metadata which record for each word spoken in parliament when it was said, who said it, in what role, on behalf of which party, and in which context. If possible, also indicate the type of speech act (e.g., speech from central lectern, interruption of a speech, shout from the benches, etc);
  • give each entity a unique identifier which is resolvable by a Handle system comparable to the DOI system; do this for real entities (persons, parties) and textual objects (proceedings, topics, speeches, paragraphs, votes, etc);
  • use these identifiers to link data to existing databases and link the parliamentary data to the Linked Open Data Cloud.

Available tools

Proceedings of the UK and the Netherlands are actively collected “until yesterday”. The collections start in 1935 (UK) and 1814 (NL) respectively. They can be downloaded and accessed through a Search Interface.


DiLiPaD’s Dutch principal investigators Jaap Kamps and Maarten Marx collaborate with the Information Office of the Dutch House of Commons, the Dutch Royal Library, the Dutch National Archive, the Dutch Documentation Centre for Political Parties, and scientists from the Humanities, Social and Computer Sciences.

Linking Hansards to related newsarticles

Geplaatst op 15-04-2014 door Maarten Marx | DiLiPaD, ExPoSe, ODE, parliament | tags: | comment image Geen reacties »

We describe a simple technique with which to link news articles to debates in Parliament.
The technique uses the news search engine EMM Newsexplorer.
As search strings we use

  • the date of the debate
  • the speakers
  • the first ten words from a unigram parsimonious language model created from the debate

Results on oral questions are promising. In this post we explain how we find the relevant news articles, evaluate the results. Code is provided.
| lees verder…

UK Hansards in PoliticalMashup format

Geplaatst op 03-04-2014 door Maarten Marx | data, DiLiPaD, parliament, Political Mashup | | comment image Geen reacties »
Dilipad-logo-REVERSED-300dpi Debates of the House of Lords and House of Commons from 1935 until “yesterday” are available in the XML format developed within the PoliticalMashup project. The debates are available as one dump of XML files and through a rudimentary search interface.
All debates are available in XML, RDF and HTML formats, via a simple parameter:

| lees verder…

Digging into Parliamentary Data

Geplaatst op 16-01-2014 door Maarten Marx | DiLiPaD, parliament, research | tags: | comment image Geen reacties »

A consortium consisting of the University of Amsterdam, King’s College London and the University of Toronto was awarded a Digging into Data grant for the project Digging into Parliamentary Data.
| lees verder…

Household income in Georgian Asset Declarations

Geplaatst op 24-12-2013 door Maarten Marx | parliament, TI | tags: | comment image Geen reacties »

The Georgian Asset Declarations are a great source of open government data and a fantastic example of openness. Unfortunately they are not available in easily machine readable format, but only in PDF form. Still, with some work one can do nice analysis on them.
At TI Georgia, the income of public officials is an item of interest. In this post we report on our work in which we calculated the total household income of each public official.

In fact we calculated how much each member of the household contributes, expressed as a percentage, to the total household income. We calculate the income based on the questions about “entrepeneurial activity” and “paid work”. See the Asset Declaration of Kakhaber Chikhradze as an example.

Kakhaber Chikhradze is a counterexample to the idea that Georgian man are macho: he works at the Ministry of Defense and earns a mere 6.300 GEL. His mother and wife earn 144K and 70K USD, respectively. Even his son earns more with 16K USD. Also he is the only one in the household without a car.
| lees verder…

Overview of Georgian passed laws

Geplaatst op 28-11-2013 door Maarten Marx | parliament, TI | tags: | comment image Geen reacties »

The Georgian Government has a convenient page for finding all passed laws.
TI Georgia created a text-scraper which collects all information from that page and turns it into a spreadsheet. The version of November 2013 is available as a Google Fusion Table.
We also downloaded for each law the accompyaning file listing how each member voted and the final outcome, and turned these into two spreadsheets: outcomes with one line for each law, and votesperlaw.csv.gz and votesperlaw.xml.gz, with one line for each vote of an MP.

The post below contains more information about this interesting dataset. Because we have aggregated data that was previously only available in separate files, we can do new analyses. For instance, we can count for each MP at how many of the votes he or she was present.

The analysis is rather crude and reflects the noise present in the data. We left the noise in on purpose.

| lees verder…

Age difference in marriage in Georgian Parliament

Geplaatst op 13-11-2013 door Maarten Marx | parliament, TI, xquery | tags: | comment image Geen reacties »

To show the wealth of TI Georgia’s database of Asset Declarations of Public Officials we created a spreadsheet containing all couples of which one is a working in the Parliament of Georgia. There are 239 such couples in the spreadsheet.

| lees verder…

Regering=positief, oppositie=negatief

Geplaatst op 21-10-2013 door Maarten Marx | ODE, parliament, Political Mashup | tags: | comment image Geen reacties »

Taalkundig onderzoek door Graeme Hirst en zijn studenten laat zien dat leden van regeringspartijen goed te onderscheiden zijn van leden van de oppositie door naar de polariteit van hun taalgebruik te kijken.
PoliticalMashup heeft dat onderzoek nagedaan voor de Tweede Kamer. We keken naar alle Handelingen sinds het aantreden van het kabinet Rutte II in November 2012.
Daaruit haalden we alleen de adjectieven en legden die naast een woordenlijst waarin voor veel adjectieven het sentiment is aangegeven (verkrijgbaar bij de Taalunie).

Wat bleek? Ook in Nederland vermijden zowel de kabinetsleden als de leden van de regeringspartijen negatieve woorden. Leden van de oppositie gebruiken daarentegen veel vrijer negatieve termen.

De top 30 woorden per groep

We laten alleen adjectieven zien met een sentimentscore. De score kan lopen van 1 tot 4 plusjes of minnetjes. Ter verduidelijking hebben we die score steeds aan het woord vastgeplakt.

Klik op de wolk om hem groot te zien. Als je je muis erover legt zie je bij welke groep de wolk hoort

Wordle: MP opposition 12-13 Wordle: MP VVD & PvdA 12-13 Wordle: Govn Members 12-13
| lees verder…

Hilariteit in de Kamer

Geplaatst op 02-10-2013 door Maarten Marx | ODE, parliament | tags: | comment image Geen reacties »

Sinds een tijdje bevatten de Handelingen ook aanwijzingen voor niet verbale gebeurtenissen zoals geroffel op de bankjes of hilariteit. In de Duitse Handelingen is dit al langer het geval, en vinden we vaak de mooie term Heiterkeit, wat in het Nederlands vertaald kan worden met blijdschap, jovialiteit, opgeruimdheid, opgewektheid, vreugde of vrolijkheid.

We hebben in de Algemene Beschouwingen van Woensdag 25 September gekeken wie er nou aanleiding gaf tot gelach of geroffel, en met welke uitspraak.
De lijst staat als spreadsheet op Google Fusion Table.

Zip file met de Handelingen data en het Xquery script om de CSV te maken.

Handelingen 2012-2013 in Folia formaat beschikbaar in Dans Easy

Geplaatst op 30-09-2013 door Maarten Marx | data, ODE, parliament | tags: , | comment image Geen reacties »

De Handelingen der Staten Generaal uit het Parlementaire jaar 2012-2013 in XML formaat volgens het proceedings.rncl schema zijn beschikbaar gemaakt via DANS Easy op met identifier urn:nbn:nl:ui:13-l67p-ty.
De teksten in de paragrafen in de Handelingen zijn opgedeeld in zinnen en vervolgens in woorden.
Aan elk woord is het lemma en de woordsoort toegevoegd. Named entities zijn herkend en hun meest waarschijnlijke Wikipedia paginas zijn daarmee verbonden.
Deze informatie staat in het Folia formaat en is gemaakt met de Frog software.

« eerdere stukken