Peak explanation in parliamentary proceedings

Geplaatst op 27-09-2012 door Maarten Marx | research | tags: | comment image Geen reacties »

The political n-gram viewer created for 200 years of Dutch parliamentary proceedings can reveal fascinating patterns. Here are two example queries:

We ask for the occurances of leading Dutch opinion weeklies and daily newspapers, respectively.

Clicking on the search terms in the box on the right brings more clarity in the spagetti pictures. What it reveals is that every news-source has its own peaks. For instance, “haagse post” has 268 hits in 1974, while the maximum number of hits in a year for all 5 newspapers was 48.

What happened?

A nice addition to the n-gram viewer would be a feature which explains such peaks.
A search on Google on the terms “haagse post”+1974 reveals one possible explanation: a link to a news article from 1974 in the archive of the Reformatorisch Dagblad, stating that the Dutch government wants to financially support the magazine “Haagse Post”.

Is this the explanation?

It would be nice to have a high precision full-automatic peak explanation mechanism, possibly using a historical newspaper corpus like the one of Reformatorisch Dagblad.

Using the KB newspaper archive to find explanations

The query NSB gives three clear peaks: in 1979, 1986 and 1988. A search on the terms nsb kamer in seems to give a very clear indication about the possible explanation of the peak, even though the corpus of that period is limited to the communist daily and some other small newspapers.
One could try to create an explanation from the newspaper titels using Named Entity recognition and counts of frequencies, and comparisons of dates (when is it discussed in Parliament and when are those corresponding newspaper articles in that year peaking?).


