Overview of Georgian passed laws

Geplaatst op 28-11-2013 door Maarten Marx | parliament, TI | tags: | comment image Geen reacties »

The Georgian Government has a convenient page for finding all passed laws.
TI Georgia created a text-scraper which collects all information from that page and turns it into a spreadsheet. The version of November 2013 is available as a Google Fusion Table.
We also downloaded for each law the accompyaning file listing how each member voted and the final outcome, and turned these into two spreadsheets: outcomes with one line for each law, and votesperlaw.csv.gz and votesperlaw.xml.gz, with one line for each vote of an MP.

The post below contains more information about this interesting dataset. Because we have aggregated data that was previously only available in separate files, we can do new analyses. For instance, we can count for each MP at how many of the votes he or she was present.

The analysis is rather crude and reflects the noise present in the data. We left the noise in on purpose.

| lees verder…

What TI Georgia blogs about

Geplaatst op 28-11-2013 door Maarten Marx | TI | tags: , | comment image Geen reacties »

Blogging is an important activity of Transparency International Georgia. The archive on the TI site goes back to February 2010 and contains (at date of writing) 268 posts. According to Wikipedia:

Transparency International (TI) is a non-governmental organization that monitors and publicizes corporate and political corruption in international development.

The term corruption occurs indeed in 10% (26) of the blogposts of TI Georgia.

A remarkable finding is that the blogs do not mention persons very frequently. The most occuring named entities are locations (In order: Georgia (1204), Tbilisi (364), Rustavi (though probably often the TV Channel is meant), Zugdidi (89), Batumi (80)).
The most mentioned persons are Ivanishvili (59) and Saakashvili (44), Chikovani (43, but this name refers to several different persons), and Baratashvili (36).

Summary in Words

(Click image for larger version)

Data and code

The blogposts can be collected from TI Georgia. The code and data and wordcloud files are available in this zip file.

English Georgian Parallel Corpus

Geplaatst op 26-11-2013 door Maarten Marx | data, TI | tags: , , | comment image Geen reacties »

We created a Georgian English parallel corpus by crawling the Georgian news site http://civil.ge. This site contains over 26 thousand news stories in both English and Georgian. The first one is from November 2002.
Such parallel corpora are the source of automatic machine translation software like Google Translate.
The fact that Google Translate (at the time of writing) makes a mistake with translating საქართველოს (the genitive of “Sakartvelo”, the Georgian word for Georgia) shows that such parallel corpora are still useful.
All data mentioned in this blog post is available in a zip file (32M).
| lees verder…

Uniqueness of Georgian Names

Geplaatst op 14-11-2013 door Maarten Marx | TI | tags: , | comment image Geen reacties »

We were told that Georgian names are often ambiguous, in the sense that there are many persons with the same first name, last name combination. Here we investigate to what extend this is true. In a sample of over 20K Georgian persons we found that at least 91.3% is uniquely determined by their first and last name, and 98.4% if we add the date of birth as well.
We can thus conclude that Georgian names are quite good in uniquely identifying persons.

| lees verder…

Age difference in marriage in Georgian Parliament

Geplaatst op 13-11-2013 door Maarten Marx | parliament, TI, xquery | tags: | comment image Geen reacties »

To show the wealth of TI Georgia’s database of Asset Declarations of Public Officials we created a spreadsheet containing all couples of which one is a working in the Parliament of Georgia. There are 239 such couples in the spreadsheet.

| lees verder…

Suffixes of Georgian last names

Geplaatst op 12-11-2013 door Maarten Marx | TI | tags: , | comment image Geen reacties »

In a project for Transparency International Georgia we are creating a database of Asset Declarations of Georgian public officials. Besides a lot of worthy information about their income and relations to companies, having such a large database also gives the opportunity to do fun linguistic research.

Here we report on the names occuring in this database. These are names of Georgian public officials and their reported relatives. Georgian names are simple: they always consist of two tokens “firstname, surname”. There are 1604 different first names and 3883 different surnames in our database, coming from a total of 19522 different names.
| lees verder…