Seasonal Maximum Temperature Of India: A Data Visualization Study

In line with efforts of governments across the world, Indian government is opening up more and more datasets to public (https://data.gov.in). One such dataset is about temperature. It seems interesting enough to explore how the trend has been.

https://data.gov.in/catalog/annual-and-seasonal-maximum-temperature-india

This time series is based on the surface air temperature (1.2 m above sea level) data from more than 350 stations spread over the country. The time series shows a warming over India during the recent years.

temp <- read.csv('india-temp.csv')  
head(temp)

  YEAR ANNUAL JAN.FEB MAR.MAY JUN.SEP OCT.DEC
1 1901  28.96   23.27   31.46   31.27   27.25  
2 1902  29.22   25.75   31.76   31.09   26.49  
3 1903  28.47   24.24   30.71   30.92   26.26  
4 1904  28.49   23.62   30.95   30.67   26.40  
5 1905  28.30   22.25   30.00   31.33   26.57  
6 1906  28.73   23.03   31.11   30.86   27.29  

Let's melt things up, but through the tidyr package as opposed to usual reshape2

library(tidyr)  
temp_tidy <- temp %>%  
              gather(time, temperature, -YEAR)
head(temp_tidy)  
  YEAR   time temperature
1 1901 ANNUAL       28.96  
2 1902 ANNUAL       29.22  
3 1903 ANNUAL       28.47  
4 1904 ANNUAL       28.49  
5 1905 ANNUAL       28.30  
6 1906 ANNUAL       28.73  

It makes sense to convert the time field to a factor (categorical one)

temp_tidy$time_f <- factor(temp_tidy$time,  
                           levels = c("ANNUAL", "JAN.FEB", "MAR.MAY", 
                                      "JUN.SEP", "OCT.DEC"))
temp_tidy$time <- NULL  
head(temp_tidy)  
  YEAR temperature time_f
1 1901       28.96 ANNUAL  
2 1902       29.22 ANNUAL  
3 1903       28.47 ANNUAL  
4 1904       28.49 ANNUAL  
5 1905       28.30 ANNUAL  
6 1906       28.73 ANNUAL  

Now, what should be the base visualization for this - geom_line, geom_bar or ...?
Even though temperature is a continuous variable, it has been discretized at ANNUAL level or seasonal levels. May be a geom_tile will look good. Let's look at the seasonal levels first, as aggregated ANNUAL one should be easy.

library(magrittr)  
library(dplyr)  
temp_seasonal <- temp_tidy %>%  
                    filter(as.character(time_f) != "ANNUAL")

So,

temp_seasonal %>%  
  ggplot(aes(x = YEAR, y = time_f, fill = factor(temperature))) +
. . .

But what about colours. We can preview colorBrewer colors via

display.brewer.all()  

Let's pick Oranges palette. You can preview a specific palette by

display.brewer.pal(5, "Oranges")  

However, selected palette has 5 colours, but our temperature scale can have many values

length(unique(temp_seasonal$temperature))  
[1] 310

So how the hell, we interpolate 310 colours from just 5. R has cool function to help us - colorRampPalette

pal <- colorRampPalette(rev(brewer.pal(5, "Oranges")))  
color_palette_colors <- pal(nrow(temp_seasonal))  

pal is our new palette from which we can "pick" any number of colours - pal(20) will give you 20 colours interpolated from the original Oranges palette. With colour-palette problem resolved, let's finish the ggplot

temp_seasonal %>%  
  ggplot(aes(x = YEAR, y = time_f, 
             fill = factor(temperature))) +
  geom_tile() +
  scale_fill_manual(values = rev(color_palette_colors), name = "") +
  theme_wsj() +
  theme_change

We have created a mini-theme (plus theme_wsj applied on top) here -

theme_change <- theme(  
  plot.background = element_blank(),
  panel.grid.minor = element_blank(),
  panel.grid.major = element_blank(),
  panel.background = element_blank(),
  panel.border = element_blank(),
  axis.line = element_blank(),
  axis.title.x = element_blank(),
  legend.position = "none"
)

The result -

Drawing

After some polish in sketch

This clearly shows, the famous Indian summer (March - May) is getting warmer over the years, more than winter. Best time to visit India: Jan - Feb