Federal Register 2.0

Geplaatst op 28-07-2010 door Maarten Marx | XML, data | | Geen reacties »

Ed Summers posted the following message on the W3C EGov public mailing list:

I don’t know if this got discussed on here much yet, but I discovered
today via the Sunlight Foundation blog [1] that the Federal Register
2.0 site was recently released [2]. The Federal Register is one of the
most important government publications in the US, since it is the most
comprehensive publication of all the rules and regulations of the
various agencies that make up US federal government.

The new site is interesting to me for a few reasons:

- it uses opensource technologies (ruby, ruby on rails, mysql, sphinx,
nginx, apache2, varnish)
- the source code for the website itself is opensource, and available
to people to contribute changes/enhancements on github
- there is machine readable data available various flavors of xml
- there are permalinks for each entry in the Federal Register, which
incourages citability
- it is deployed in the cloud on Amazon’s ec2/s3
- it was the result of an egov software contest organized by the
Sunlight Foundation

I wrote up some more of my thoughts in my blog [3], if you care to
comment here or there. If anyone from NARA, GPO or Sunlight Foundation
are reading, nice work!

//Ed

[1] http://sunlightlabs.com/blog/2010/meet-the-new-federal-register/
[2] http://www.federalregister.gov/
[3]
http://inkdroid.org/journal/2010/07/27/federal-register-embraces-the-web-and
-opensource/

Some missing aspects
This XML collection is potentially a great resource, but at least three things need to be done before the XML can be reused reliably in a mashup:

  1. Provide a DTD or Schema
  2. The XML does not contain any of the metadata which is in the “infobox” on the right of the HTML page.
    In particular the reference/provenance information like the Document Citation and the Document ID are needed.
  3. Inside the XML there is no URI pointing persistently to itself, neither is there a URI pointing to the HTML-page based on the XML.

A fantastic aspect of the site is the ability to link to individual paragraphs in the documents.
Try for example http://www.federalregister.gov/a/2010-18383/p-12. This link is provided in the red ribbon to the right of the paragraph.
Mashups could potentially benefit from this feature. But unfortunately, these links are not present in the XML.

Conclusion
If you want to add this data to the Linked Open Data cloud, or if you want to create a mashup based on this data set, you have to screen scrape the HTML page which comes with each XML document.
This is a pity, because you are reverse engineering. Obviously this is not a reliable and stable solution.

ESAIR keynote speech

Geplaatst op 27-07-2010 door Maarten Marx | Uncategorized | | Geen reacties »

Maarten Marx will give a keynote speech at the 2010 edition of ESAIR, the workshop on Exploiting Semantic Annotations for Information Retrieval, held during CIKM 2010.

Title the Surplus Value of Semantic Annotations.
Abstract
We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries, and result grouping.
Applications which summarize large collections of text or explain real world phenomena based on textual evidence may receive even more benefit from semantic annotations.

Semantic annotation creates surplus value if the annotated data can be used beyond any foreseen application. In particular by third parties linking your data by means of your semantic markup to other data with similar markup.
We present a list of properties of the annotated data which optimize this surplus value. They are derived from the principle which states that annotation should facilitate the reuse of data in a mashup without information being lost or distorted.

For the Dutch House of Parliament we annotated the parliamentary proceedings based on this principle. Concrete examples from this data collection will illustrate the surplus value enhancing properties.

Blijkmeer eindafrekening

Geplaatst op 16-07-2010 door Maarten Marx | Uncategorized | | Geen reacties »

Op deze pagina worden de laatste ontwikkelingen rond de overgebleven pot van Blijkmeer meegedeeld.

Stand van zaken

  • [2010-07-27] Er zijn notulen van de laatste OLV. De uitbetaling moet nog wachten tot het eind van het jaar (2010).
  • [2010-07-16] De notulen van de laatste OLV zijn niet gemaakt. De penningmeester gaat over tot uitkeren en haalt informatie op.

Vul hieronder je gegevens in.