Household income in Georgian Asset Declarations

Geplaatst op 24-12-2013 door Maarten Marx | parliament, TI | tags: | comment image Geen reacties »

The Georgian Asset Declarations are a great source of open government data and a fantastic example of openness. Unfortunately they are not available in easily machine readable format, but only in PDF form. Still, with some work one can do nice analysis on them.
At TI Georgia, the income of public officials is an item of interest. In this post we report on our work in which we calculated the total household income of each public official.

In fact we calculated how much each member of the household contributes, expressed as a percentage, to the total household income. We calculate the income based on the questions about “entrepeneurial activity” and “paid work”. See the Asset Declaration of Kakhaber Chikhradze as an example.

Kakhaber Chikhradze is a counterexample to the idea that Georgian man are macho: he works at the Ministry of Defense and earns a mere 6.300 GEL. His mother and wife earn 144K and 70K USD, respectively. Even his son earns more with 16K USD. Also he is the only one in the household without a car.
| lees verder…

Georgian First Names

Geplaatst op 05-12-2013 door Maarten Marx | data, TI | tags: | comment image Geen reacties »

Ani Elchishvili from TI georgia created a spreadsheet with for over 1500 Georgian Names their gender specified. For 1144, she also found a specific webpage (often Wikipedia) for the names.
The data is available as a TAB seperated CSV spreadsheet and also as an HTML table.

| lees verder…

Overview of Georgian passed laws

Geplaatst op 28-11-2013 door Maarten Marx | parliament, TI | tags: | comment image Geen reacties »

The Georgian Government has a convenient page for finding all passed laws.
TI Georgia created a text-scraper which collects all information from that page and turns it into a spreadsheet. The version of November 2013 is available as a Google Fusion Table.
We also downloaded for each law the accompyaning file listing how each member voted and the final outcome, and turned these into two spreadsheets: outcomes with one line for each law, and votesperlaw.csv.gz and votesperlaw.xml.gz, with one line for each vote of an MP.

The post below contains more information about this interesting dataset. Because we have aggregated data that was previously only available in separate files, we can do new analyses. For instance, we can count for each MP at how many of the votes he or she was present.

The analysis is rather crude and reflects the noise present in the data. We left the noise in on purpose.

| lees verder…

What TI Georgia blogs about

Geplaatst op 28-11-2013 door Maarten Marx | TI | tags: , | comment image Geen reacties »

Blogging is an important activity of Transparency International Georgia. The archive on the TI site goes back to February 2010 and contains (at date of writing) 268 posts. According to Wikipedia:

Transparency International (TI) is a non-governmental organization that monitors and publicizes corporate and political corruption in international development.

The term corruption occurs indeed in 10% (26) of the blogposts of TI Georgia.

A remarkable finding is that the blogs do not mention persons very frequently. The most occuring named entities are locations (In order: Georgia (1204), Tbilisi (364), Rustavi (though probably often the TV Channel is meant), Zugdidi (89), Batumi (80)).
The most mentioned persons are Ivanishvili (59) and Saakashvili (44), Chikovani (43, but this name refers to several different persons), and Baratashvili (36).

Summary in Words

(Click image for larger version)

Data and code

The blogposts can be collected from TI Georgia. The code and data and wordcloud files are available in this zip file.

English Georgian Parallel Corpus

Geplaatst op 26-11-2013 door Maarten Marx | data, TI | tags: , , | comment image Geen reacties »

We created a Georgian English parallel corpus by crawling the Georgian news site http://civil.ge. This site contains over 26 thousand news stories in both English and Georgian. The first one is from November 2002.
Such parallel corpora are the source of automatic machine translation software like Google Translate.
The fact that Google Translate (at the time of writing) makes a mistake with translating საქართველოს (the genitive of “Sakartvelo”, the Georgian word for Georgia) shows that such parallel corpora are still useful.
All data mentioned in this blog post is available in a zip file (32M).
| lees verder…

Uniqueness of Georgian Names

Geplaatst op 14-11-2013 door Maarten Marx | TI | tags: , | comment image Geen reacties »

We were told that Georgian names are often ambiguous, in the sense that there are many persons with the same first name, last name combination. Here we investigate to what extend this is true. In a sample of over 20K Georgian persons we found that at least 91.3% is uniquely determined by their first and last name, and 98.4% if we add the date of birth as well.
We can thus conclude that Georgian names are quite good in uniquely identifying persons.

| lees verder…

Age difference in marriage in Georgian Parliament

Geplaatst op 13-11-2013 door Maarten Marx | parliament, TI, xquery | tags: | comment image Geen reacties »

To show the wealth of TI Georgia’s database of Asset Declarations of Public Officials we created a spreadsheet containing all couples of which one is a working in the Parliament of Georgia. There are 239 such couples in the spreadsheet.

| lees verder…

Suffixes of Georgian last names

Geplaatst op 12-11-2013 door Maarten Marx | TI | tags: , | comment image Geen reacties »

In a project for Transparency International Georgia we are creating a database of Asset Declarations of Georgian public officials. Besides a lot of worthy information about their income and relations to companies, having such a large database also gives the opportunity to do fun linguistic research.

Here we report on the names occuring in this database. These are names of Georgian public officials and their reported relatives. Georgian names are simple: they always consist of two tokens “firstname, surname”. There are 1604 different first names and 3883 different surnames in our database, coming from a total of 19522 different names.
| lees verder…