Dutch parliamentary data in RDF format

Geplaatst op 12-08-2014 door Maarten Marx | data, DiLiPaD, ExPoSe, Political Mashup | tags: , | comment image Geen reacties »

The Dutch parliamentary data created by PoliticalMashup is now also available in RDF format, and can be queried from a SPARQL endpoint.
The dataset is enriched with recognized named entities which are linked to DBpedia and Wikipedia.
More information can be found in the ParlBench paper and this description of a benchmark experiment.


The sparql endpoint only contains 1% of all data. If you want to query the data using sparql you must download it and load it in your own endpoint. Below is a short description how to do that.



First make sure the virtuoso server is up and running.
Unpack the data located in /pm-rdf/data/ and copy it to a subfolder in the path present in the DirsAllowed parameter in the virtuoso.ini configuration file (see above).

Then, run the script that selects 1% of the data and uploads it the virtuoso.
(If virtuoso is installed somewhere else then above, change the ISQL_CMD= parameter at the beginning of the script.

From the loadRDF.sh usage:

# The script requires writing permissions to the directory where it is executed.
# The script requires the input RDF files to be in $INPUT_FOLDER directory in their corresponding folders:
# - 'members' for members
# - 'parties' for parties
# - 'proceedings' with 'year' folders inside for proceedings
# - 'paragraphs' with 'year' folders inside for paragraphs
# - 'tagged-entities' with 'year' folders inside for tagged entities
# Input:
# -----
# This scripts takes one parameter:
# $1 - absolute path to the folder with RDF files.

Question by the install script:

Do you want to load members? (y/n) y
Do you want to load parties? (y/n) y
Do you want to load proceedings? (y/n) y
Enter the number (integer) indicating what percent of proceedings you want to load? 1
Do you want to load paragraphs? (y/n) y
Do you want to load tagged entities? (y/n) y

# After some time, again:
Do you want to load proceedings? (y/n) y

Note that adding 1% of the data should take around one minute, whereas loading all the data would take approx. one hour.