The Dutch parliamentary data created by PoliticalMashup is now also available in RDF format, and can be queried from a SPARQL endpoint.
The dataset is enriched with recognized named entities which are linked to DBpedia and Wikipedia.
More information can be found in the ParlBench paper and this description of a benchmark experiment.
The sparql endpoint only contains 1% of all data. If you want to query the data using sparql you must download it and load it in your own endpoint. Below is a short description how to do that.
First make sure the virtuoso server is up and running.
Unpack the data located in
/pm-rdf/data/ and copy it to a subfolder in the path present in the
DirsAllowed parameter in the
virtuoso.ini configuration file (see above).
Then, run the script that selects 1% of the data and uploads it the virtuoso.
(If virtuoso is installed somewhere else then above, change the
ISQL_CMD= parameter at the beginning of the script.
From the loadRDF.sh usage:
# The script requires writing permissions to the directory where it is executed. # The script requires the input RDF files to be in $INPUT_FOLDER directory in their corresponding folders: # - 'members' for members # - 'parties' for parties # - 'proceedings' with 'year' folders inside for proceedings # - 'paragraphs' with 'year' folders inside for paragraphs # - 'tagged-entities' with 'year' folders inside for tagged entities # # Input: # ----- # This scripts takes one parameter: # $1 - absolute path to the folder with RDF files.
Question by the install script:
Do you want to load members? (y/n) y Do you want to load parties? (y/n) y Do you want to load proceedings? (y/n) y Enter the number (integer) indicating what percent of proceedings you want to load? 1 Do you want to load paragraphs? (y/n) y Do you want to load tagged entities? (y/n) y # After some time, again: Do you want to load proceedings? (y/n) y
Note that adding 1% of the data should take around one minute, whereas loading all the data would take approx. one hour.