Paper on parliamentary debates
Tim Gielissen en Maarten Marx wrote a paper Exemelification of Parliamentary Debates (PDF) on the many opportunities offered by parliamentary data for information retrieval researchers.
The paper appeared in the proceedings of the 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009).
In the abstract they write:
In this paper we analyze the structure of the parliamentary proceedings and sketch a widely applicable DTD. We show how proceedings in PDF format can be transformed into deeply nested XML. We call this exemelficaition.
Having the proceedings in XML makes a wide range of applications possible. We elaborate on four of these:
- entry point retrieval,
- advanced content and structure search;
- automatic creation of tables of contents and hyperlinked navigation menus;
- large savings on storage space and bandwidth for scanned documents.

Reageer