Inequality in Australia & India: A Data Journalism Case Study

In an endeavour to learn R and explore visualisation, thought of taking a peek into the interesting database - World Inequality Database - https://www.wider.unu.edu/project/wiid-world-income-inequality-database

Although the default visualisation with map-view is good enough, I wanted to study the change of the Gini Coefficient (a measure of inequality) for India and Australia, over the timespan of 1951-2010.

wiid <- read.csv("WIID.csv")  

First, lets isolate only two countries (India & Australia) we are concerned about with only few necessary columns -

au_in_iid <- subset(wiid,  
                    Countrycode2 == "AU" | Countrycode2 == "IN",
                    select = c("Countrycode2", "Year", "Gini"))

Lets split things up for India and Australia, and also aggregate by Year (as some years have multiple observations) -

au_iid <- subset(au_in_iid,  
                 Countrycode2 == "AU", 
                 select = c("Year", "Gini"))

# We can multiple observations for a year. Let's aggregate on Year
au_iid_aggr <- aggregate(Gini ~ Year, data = au_iid, mean)

in_iid <- subset(au_in_iid,  
                 Countrycode2 == "IN", 
                 select = c("Year", "Gini"))
in_iid_aggr <- aggregate(Gini ~ Year, data = in_iid, mean)  

Looks like Australia has more observations compared to India. In order to compare against similar timespan (1951-2010), lets splice things up -

au_iid_aggr <- au_iid_aggr[au_iid_aggr$Year >= 1951 & au_iid_aggr$Year < 2011,]  

Now lets merge things back

iid_merged <- merge(in_iid_aggr,  
                    au_iid_aggr, 
                    by = c("Year"), 
                    suffixes = c(".India", ".Australia"))

And it looks like -

> head(iid_merged)
  Year Gini.India Gini.Australia
1 1951   36.23333          28.75  
2 1952   35.80000          28.80  
3 1953   35.35556          27.50  
4 1954   39.08750          27.75  
5 1955   37.52222          28.15  
6 1956   36.09000          27.75  

To make it's easy for ggplot, lets melt things up -

library(reshape2)  
m <- melt(iid_merged, id.vars = c("Year"))  

And it looks like -

> head(m)
  Year   variable    value
1 1951 Gini.India 36.23333  
2 1952 Gini.India 35.80000  
3 1953 Gini.India 35.35556  
4 1954 Gini.India 39.08750  
5 1955 Gini.India 37.52222  
6 1956 Gini.India 36.09000  

We first tried bar plot

library(ggplot2)  
ggplot(data=m, aes(x = Year, y = value, fill = variable)) +  
  geom_bar(stat = "identity", position=position_dodge()) + 
  scale_fill_manual(values=c("gold2", "gold4"))

Nothing great. In fact, confusing. Let's try something else (one-over-another area plot with a bit of alpha transparency trick)

ggplot(data=m, aes(x = Year, y = value, fill = variable, alpha = variable)) +  
  geom_area(position = position_dodge()) + 
  scale_fill_manual(values=c("gold2", "gold4")) +
  scale_alpha_manual(values=c(1, 0.5))

From this plot, it seems income inequality more or less stays same for India, except the big spike after 2000. However, for Australia, it is continuously increasing from 1974 till 2010.

After some illustrations in Sketch (after exporting to SVG) -