Authors: Carlos Martin and Maarten Marx

We created a corpus consisting of all parliamentary documents from
Spain since its first legislative period in 1977. The documents were
collected from the web page of the Spanish Congress
and converted into a uniform XML format
with extensive metadata in the Dublin Core standard.

The collection contains over 50.000 documents with almost 1 million
pages having over half a billion tokens.
We also collected a complete list of names and biographical data of
all members of parliaments during this period.
All this data is available for download and will be updated daily.

This abstract describes the
parliamentary data, the data
collection and transformation process and presents some use cases for this
corpus.The corpus
can be used for corpus-linguistic and political science research,
and is suitable for performing scalability tests for XML information



author =      {C. Martin and M. Marx},
title =      {Parliamentary documents from {Spain}},
booktitle =      {Proceedings},
year =     2010,
note =     {url{}}
