This document will demonstrate how to produce choropleth maps in R that can be customised in (almost) any way that you want. We’ll be mainly be using the ggplot2 package and as well as a few others.

1 Packages

First we want to load in the packages we’ll be using. readr and dplyr are for reading and manipulating data. rgdal is to read in our shapefiles, ggplot2 to plot. We can use classInt to define intervals in our data and add colours to our map using RcolorBrewer or viridis.

library(readr) # for reading and writing CSV files
library(dplyr) # for working with data
library(broom) #summarising data into a tibble
library(rgdal) # Reprojects data between co-ordinate systems (e.g. lat/long to BNG)
library(classInt) # defines intervals for data sets
library(ggplot2) # for plotting the map
library(RColorBrewer) # defines colour palettes
library(viridis) # more colour palettes

2 Load in Shapefiles and Data

We are going to be plotting England local authority data so will be using the corresponding shapefiles. We’ll load them using the rgdal package. Notice we don’t include the file extension:

LAs <- rgdal::readOGR("Shapefiles","England_LAUA_2016") 
## OGR data source with driver: ESRI Shapefile 
## Source: "\\virago\ts\LON3\IAVBULIM\Desktop\20181010_Choropleth_Maps_in_R\Shapefiles", layer: "England_LAUA_2016"
## with 326 features
## It has 21 fields

This loads in as a large spatial polygons data frame. We can use plot() to see what it looks like:

plot(LAs)

Next we’ll load in the data we’ll be plotting. This is just some dummy data I created:

dummy <- read_csv("Data/LA_dummy_data.csv")

head(dummy)
## # A tibble: 6 x 3
##   LA_Code   Local_Authority Percentage
##   <chr>     <chr>                <dbl>
## 1 E92000001 ENGLAND               80.2
## 2 E12000001 North East            45.2
## 3 E06000047 County Durham         88.1
## 4 E06000005 Darlington            77.0
## 5 E06000001 Hartlepool            16.0
## 6 E06000002 Middlesbrough         41.7

Now in order to plot the shapefile using ggplot2, we need to convert it from a spatial polygons data frame into a ggplot2-compatible data frame

tidy_LAs <- broom::tidy(LAs, region = "LAD16CD")

head(tidy_LAs)
## # A tibble: 6 x 7
##      long     lat order hole  piece group       id       
##     <dbl>   <dbl> <int> <lgl> <chr> <chr>       <chr>    
## 1 447097  537152      1 FALSE 1     E06000001.1 E06000001
## 2 447229. 537033.     2 FALSE 1     E06000001.1 E06000001
## 3 447281. 537120.     3 FALSE 1     E06000001.1 E06000001
## 4 447378. 537095.     4 FALSE 1     E06000001.1 E06000001
## 5 447455. 537024.     5 FALSE 1     E06000001.1 E06000001
## 6 447550. 537078.     6 FALSE 1     E06000001.1 E06000001

The region argument is the variable we are using to split up the areas. Here it is “LAD16CD” which is the Local Authority code. This is turned into an id variable.

Now we will join our data with the converted shapefile so that we can plot it:

map_data <- tidy_LAs %>% 
  left_join(dummy, by = c("id" = "LA_Code"))

head(map_data)
## # A tibble: 6 x 9
##      long    lat order hole  piece group  id    Local_Authority Percentage
##     <dbl>  <dbl> <int> <lgl> <chr> <chr>  <chr> <chr>                <dbl>
## 1 447097  5.37e5     1 FALSE 1     E0600~ E060~ Hartlepool            16.0
## 2 447229. 5.37e5     2 FALSE 1     E0600~ E060~ Hartlepool            16.0
## 3 447281. 5.37e5     3 FALSE 1     E0600~ E060~ Hartlepool            16.0
## 4 447378. 5.37e5     4 FALSE 1     E0600~ E060~ Hartlepool            16.0
## 5 447455. 5.37e5     5 FALSE 1     E0600~ E060~ Hartlepool            16.0
## 6 447550. 5.37e5     6 FALSE 1     E0600~ E060~ Hartlepool            16.0

3 Plot the map

Using geom_polygon, we’ll plot the latitude against the longitude and then colour the map by the “Percentage” variable:

map <- ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, fill = Percentage), 
               col = "black")
map

Since we’re plotting a polygon with multiple rows of data, we need to overide the default ‘group’ and map it to a variable that has a different value for each group. In this case, when we converted our shapefile, a ‘group’ variable was created.

Great, we have a map! We don’t even need to define a colour scale as it’s using the ggplot2 default continuous colour sclae.

It looks a bit distorted though. We can fix this by using coord_equal() to force a 1:1 aspect ratio

map <- map + coord_equal()

map

Okay, we have a good aspect ratio now, but we have a horrible grey background and axes that we don’t really want. We can use the theme() function to set these things to blank:

map + theme(panel.background = element_blank(), # get rid of grey background
        axis.line = element_blank(), # get rid of axes lines
        axis.title = element_blank(), # get rid of axes labels
        axis.text = element_blank(), # get rid of the axes text
        axis.ticks = element_blank()) # get rid of the tick marks

There are many elements we can use within the theme function. If there are standard things we want in all of our maps, we can create a function so that we don’t need to call of these elements each time. We’ll base this function on the built-in theme_minimal():

theme_map <- theme_minimal() +
    theme(
      text = element_text(family = "sans"), # Arial font
      axis.line = element_blank(),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      axis.title = element_blank(),
      panel.grid = element_blank(),
      plot.background = element_blank(), 
      panel.background = element_blank(), 
      legend.background = element_blank(),
      panel.border = element_blank()
    )

So now we can plot the map using out new theme:

map + theme_map

4 Customise your breaks

So now we have a good looking map. However the continuous scale isn’t very clear.

We can split the scale up to into discrete categories using the cut_number() function. This splits a numeric vector into intervals containing equal number of points. The first argument is the variable we are ‘cutting’ and the next argument is the number of intervals we want.

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, fill = cut_number(Percentage, 5)), 
               col = "black") +
  coord_equal()+
  theme_map

Our colours now change to the default dicrete colour scale but we will look at how to change these a bit later.

If you want an even bigger of choice of breaks or you want to make your own, we can use classIntervals() function from the classInt package. This gives us commonly used methods for choosing univariate class intervals for mapping or other graphics purposes.

The different methods available are: “fixed”, “sd”, “equal”, “pretty”, “quantile”, “kmeans”, “hclust”, “bclust”, “fisher”, or “jenks”. We can use ?classIntervals to get a full description of the different methods.

Here we are going to use ‘jenks’ breaks. This is a data clustering method designed to determine the best arrangement of values into different classes by reducing the variance within classes and maximising the variance between classes. It’s been checked for consistency with ArcView and ArcGIS.

jenks_breaks <- classIntervals(dummy$Percentage, n=5, style = "jenks")

We can check our breaks like so:

jenks_breaks$brks 
## [1]  1.022535 20.296803 42.692727 62.109793 80.176022 99.800471

If you’re not sure whether the breaks you have created are right for your data, you can compare interval methods to one another. For example, let’s have a look at “equal” breaks:

equal_breaks <- classIntervals(dummy$Percentage, n=5, style = "equal")
equal_breaks$brks
## [1]  1.022535 20.778123 40.533710 60.289297 80.044884 99.800471

We can have a closer look distribution of the variable by plotting a quick histogram of the variable then use abline() in different colours to see the breaks we’ve defined.

vlines <- data.frame(
  jenks = jenks_breaks$brks,
  equal = equal_breaks$brks
) %>% tidyr::gather("method", "brks", jenks:equal)

ggplot(dummy, aes(x = Percentage))+
  geom_histogram(stat = "bin", binwidth = 10)+
  geom_vline(data = vlines, aes(xintercept = brks, color = method))+
  theme_bw()

Here we can see how much data falls into each interval and this may help us to make a better decision.

We’re going to go ahead and use “jenks” breaks for our map within the cut() rather than cut_number():

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks, 
                              include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  theme(legend.title = element_blank()) #adding this to temporarily get rid of the legend title

5 Customise the labels

Here we’re going to define custom labels for our categories in the ‘a to b’ format.

labels <- jenks_breaks$brks %>%
  round(0) %>%
  paste0(" to ", lead(.),"%") %>%
  head(-1)

labels
## [1] "1 to 20%"   "20 to 43%"  "43 to 62%"  "62 to 80%"  "80 to 100%"

Here we are taking each of the breaks, rounding them to the nearest whole number and concatenating with “to” and the next break, then removing the last label.

Now we can call our labels in the labels argument within cut():

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  theme(legend.title = element_blank())

6 Add Colour

We’ll go through 2 different packages you can use to add colour to maps.

6.1 RColorBrewer

This package contains colour palettes to visualise your data. 3 categories of palettes:

  • qualitative palettes - different hues to visualise differences between classes

  • sequential palettes - progress from light to dark (good for interval data)

  • Diverging palettes are composed of darker colors of contrasting hues on the high and low extremes and lighter colors in the middle

More information at: http://moderndata.plot.ly/create-colorful-graphs-in-r-with-rcolorbrewer-and-plotly/

To see all available palettes:

display.brewer.all()

And to return information about palettes including whether suitable for those colourblind:

brewer.pal.info
##          maxcolors category colorblind
## BrBG            11      div       TRUE
## PiYG            11      div       TRUE
## PRGn            11      div       TRUE
## PuOr            11      div       TRUE
## RdBu            11      div       TRUE
## RdGy            11      div      FALSE
## RdYlBu          11      div       TRUE
## RdYlGn          11      div      FALSE
## Spectral        11      div      FALSE
## Accent           8     qual      FALSE
## Dark2            8     qual       TRUE
## Paired          12     qual       TRUE
## Pastel1          9     qual      FALSE
## Pastel2          8     qual      FALSE
## Set1             9     qual      FALSE
## Set2             8     qual       TRUE
## Set3            12     qual      FALSE
## Blues            9      seq       TRUE
## BuGn             9      seq       TRUE
## BuPu             9      seq       TRUE
## GnBu             9      seq       TRUE
## Greens           9      seq       TRUE
## Greys            9      seq       TRUE
## Oranges          9      seq       TRUE
## OrRd             9      seq       TRUE
## PuBu             9      seq       TRUE
## PuBuGn           9      seq       TRUE
## PuRd             9      seq       TRUE
## Purples          9      seq       TRUE
## RdPu             9      seq       TRUE
## Reds             9      seq       TRUE
## YlGn             9      seq       TRUE
## YlGnBu           9      seq       TRUE
## YlOrBr           9      seq       TRUE
## YlOrRd           9      seq       TRUE

We’ll use the function scale_fill_brewer() to call the palette. We can now also give the legend a suitable title.

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_brewer(palette = "Greens", name = "Percentage")

6.2 viridis

This package brings to R, color scales created by Stéfan van der Walt and Nathaniel Smith for the Python matplotlib library. We can use the colour scales in this package to make plots that are “pretty, better represent your data, easier to read by those with colorblindness, and print well in grey scale”.

More information here: https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html

Here we’ll use the function scale_fill_viridis() to call the palette:

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_viridis(option = "magma", discrete = TRUE, name = "Percentage")

But maybe you don’t want to use one of these lovely pre-defined palettes and you want to define the colours yourself. You can do this using scale_fill_manual() and define the colours as your values:

dft_colours <- c("#006853", #dark green
                 "#66a498", # green
                 "#d25f15", #orange
                 "#e49f73", #light orange
                 "#c99212") #yellow


ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_manual(values = dft_colours, name = "Percentage")

7 Customise even further….

There are a bunch of other things we can do to customise our map even further.

We can add a title and caption:

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_manual(values = dft_colours, name = "Percentage")+
  labs(x = NULL, 
       y = NULL, 
       title = "Local Authorities in England", 
       subtitle = "2016 - 2017", 
       caption = "Department for Transport")

We can add to the theme to change the style of text:

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_manual(values = dft_colours, name = "Percentage")+
  labs(x = NULL, 
       y = NULL, 
       title = "Local Authorities in England", 
       subtitle = "2016 - 2017", 
       caption = "Department for Transport")+
  theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
        plot.caption = element_text(size = 11),
        legend.title = element_text(size = 11),
        legend.text = element_text(size = 11))

We can do various things to the legend.

We can change the width and height of the legend categories:

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_manual(values = dft_colours, name = "Percentage",
                    guide = guide_legend(keyheight = unit(4, units = "mm"),
                                         keywidth = unit(8, units = "mm")))+
  labs(x = NULL, 
       y = NULL, 
       title = "Local Authorities in England", 
       subtitle = "2016 - 2017", 
       caption = "Department for Transport")+
  theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
        plot.caption = element_text(size = 11),
        legend.title = element_text(size = 11),
        legend.text = element_text(size = 11))

We can make it horizontal and put it at the bottom:

ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_manual(values = dft_colours, name = "Percentage",
                    guide = guide_legend(direction = "horizontal"))+
  labs(x = NULL, 
       y = NULL, 
       title = "Local Authorities in England", 
       subtitle = "2016 - 2017", 
       caption = "Department for Transport")+
  theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
        plot.caption = element_text(size = 11),
        legend.title = element_text(size = 11),
        legend.text = element_text(size = 11),
        legend.position = "bottom")

We can change the position of the legend more precisely using coordinates within the theme:

map2 <- ggplot() + 
  geom_polygon(data = map_data, 
               aes(x = long, y = lat, group = group, 
                   fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
                              labels = labels, include.lowest = TRUE)), 
               col = "black") +
  coord_equal()+
  theme_map +
  scale_fill_manual(values = dft_colours, name = "Percentage",
                    guide = guide_legend(keyheight = unit(4, units = "mm"),
                                         keywidth = unit(8, units = "mm")))+
  labs(x = NULL, 
       y = NULL, 
       title = "Local Authorities in England", 
       subtitle = "2016 - 2017", 
       caption = "Department for Transport")+
  theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
        plot.caption = element_text(size = 11),
        legend.title = element_text(size = 11),
        legend.text = element_text(size = 11),
        legend.position = c(0.2,0.5))

map2

We can label the local authorities…

First summarise the data to get mean latitude and mean longitude for each LA

LA_labels <- map_data %>% 
  group_by(Local_Authority) %>% 
  summarise_at(vars(long, lat), mean)

head(LA_labels)
## # A tibble: 6 x 3
##   Local_Authority    long     lat
##   <chr>             <dbl>   <dbl>
## 1 Adur            519182. 106774.
## 2 Allerdale       322813. 535915.
## 3 Amber Valley    435629. 348353.
## 4 Arun            500759. 106234.
## 5 Ashfield        449089. 354446.
## 6 Ashford         596184. 139706.

Then we’ll add the labels using geom_text()

map2 + geom_text(data=LA_labels, aes(long, lat, label = Local_Authority), size=3, fontface="bold")

With 326 local authorities, plotting them all at once isn’t a great idea. Instead we can choose a subset of LA labels to plot:

LA_labels_selected <- LA_labels %>% 
  filter(Local_Authority %in% c("Manchester", "Shropshire", "York", "County Durham"))

map2 + geom_text(data=LA_labels_selected, aes(long, lat, label = Local_Authority), size=3, fontface="bold")

Still not very clear. Let’s make them into actual labels instead:

map2 + geom_label(data=LA_labels_selected, aes(long, lat, label = Local_Authority), size=3, fontface="bold")

The labels cover up the area on the map so lets make them more transparent with alpha:

# can change the transparency
map2 +
  geom_label(data=LA_labels_selected, aes(long, lat, label = Local_Authority), alpha= 0.7, size=2, fontface="bold")