This document will demonstrate how to produce choropleth maps in R that can be customised in (almost) any way that you want. We’ll be mainly be using the ggplot2 package and as well as a few others.
First we want to load in the packages we’ll be using. readr
and dplyr
are for reading and manipulating data. rgdal
is to read in our shapefiles, ggplot2
to plot. We can use classInt
to define intervals in our data and add colours to our map using RcolorBrewer
or viridis
.
library(readr) # for reading and writing CSV files
library(dplyr) # for working with data
library(broom) #summarising data into a tibble
library(rgdal) # Reprojects data between co-ordinate systems (e.g. lat/long to BNG)
library(classInt) # defines intervals for data sets
library(ggplot2) # for plotting the map
library(RColorBrewer) # defines colour palettes
library(viridis) # more colour palettes
We are going to be plotting England local authority data so will be using the corresponding shapefiles. We’ll load them using the rgdal
package. Notice we don’t include the file extension:
LAs <- rgdal::readOGR("Shapefiles","England_LAUA_2016")
## OGR data source with driver: ESRI Shapefile
## Source: "\\virago\ts\LON3\IAVBULIM\Desktop\20181010_Choropleth_Maps_in_R\Shapefiles", layer: "England_LAUA_2016"
## with 326 features
## It has 21 fields
This loads in as a large spatial polygons data frame. We can use plot()
to see what it looks like:
plot(LAs)
Next we’ll load in the data we’ll be plotting. This is just some dummy data I created:
dummy <- read_csv("Data/LA_dummy_data.csv")
head(dummy)
## # A tibble: 6 x 3
## LA_Code Local_Authority Percentage
## <chr> <chr> <dbl>
## 1 E92000001 ENGLAND 80.2
## 2 E12000001 North East 45.2
## 3 E06000047 County Durham 88.1
## 4 E06000005 Darlington 77.0
## 5 E06000001 Hartlepool 16.0
## 6 E06000002 Middlesbrough 41.7
Now in order to plot the shapefile using ggplot2, we need to convert it from a spatial polygons data frame into a ggplot2-compatible data frame
tidy_LAs <- broom::tidy(LAs, region = "LAD16CD")
head(tidy_LAs)
## # A tibble: 6 x 7
## long lat order hole piece group id
## <dbl> <dbl> <int> <lgl> <chr> <chr> <chr>
## 1 447097 537152 1 FALSE 1 E06000001.1 E06000001
## 2 447229. 537033. 2 FALSE 1 E06000001.1 E06000001
## 3 447281. 537120. 3 FALSE 1 E06000001.1 E06000001
## 4 447378. 537095. 4 FALSE 1 E06000001.1 E06000001
## 5 447455. 537024. 5 FALSE 1 E06000001.1 E06000001
## 6 447550. 537078. 6 FALSE 1 E06000001.1 E06000001
The region argument is the variable we are using to split up the areas. Here it is “LAD16CD” which is the Local Authority code. This is turned into an id variable.
Now we will join our data with the converted shapefile so that we can plot it:
map_data <- tidy_LAs %>%
left_join(dummy, by = c("id" = "LA_Code"))
head(map_data)
## # A tibble: 6 x 9
## long lat order hole piece group id Local_Authority Percentage
## <dbl> <dbl> <int> <lgl> <chr> <chr> <chr> <chr> <dbl>
## 1 447097 5.37e5 1 FALSE 1 E0600~ E060~ Hartlepool 16.0
## 2 447229. 5.37e5 2 FALSE 1 E0600~ E060~ Hartlepool 16.0
## 3 447281. 5.37e5 3 FALSE 1 E0600~ E060~ Hartlepool 16.0
## 4 447378. 5.37e5 4 FALSE 1 E0600~ E060~ Hartlepool 16.0
## 5 447455. 5.37e5 5 FALSE 1 E0600~ E060~ Hartlepool 16.0
## 6 447550. 5.37e5 6 FALSE 1 E0600~ E060~ Hartlepool 16.0
Using geom_polygon
, we’ll plot the latitude against the longitude and then colour the map by the “Percentage” variable:
map <- ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group, fill = Percentage),
col = "black")
map
Since we’re plotting a polygon with multiple rows of data, we need to overide the default ‘group’ and map it to a variable that has a different value for each group. In this case, when we converted our shapefile, a ‘group’ variable was created.
Great, we have a map! We don’t even need to define a colour scale as it’s using the ggplot2 default continuous colour sclae.
It looks a bit distorted though. We can fix this by using coord_equal()
to force a 1:1 aspect ratio
map <- map + coord_equal()
map
Okay, we have a good aspect ratio now, but we have a horrible grey background and axes that we don’t really want. We can use the theme()
function to set these things to blank:
map + theme(panel.background = element_blank(), # get rid of grey background
axis.line = element_blank(), # get rid of axes lines
axis.title = element_blank(), # get rid of axes labels
axis.text = element_blank(), # get rid of the axes text
axis.ticks = element_blank()) # get rid of the tick marks
There are many elements we can use within the theme function. If there are standard things we want in all of our maps, we can create a function so that we don’t need to call of these elements each time. We’ll base this function on the built-in theme_minimal()
:
theme_map <- theme_minimal() +
theme(
text = element_text(family = "sans"), # Arial font
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
plot.background = element_blank(),
panel.background = element_blank(),
legend.background = element_blank(),
panel.border = element_blank()
)
So now we can plot the map using out new theme:
map + theme_map
So now we have a good looking map. However the continuous scale isn’t very clear.
We can split the scale up to into discrete categories using the cut_number()
function. This splits a numeric vector into intervals containing equal number of points. The first argument is the variable we are ‘cutting’ and the next argument is the number of intervals we want.
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group, fill = cut_number(Percentage, 5)),
col = "black") +
coord_equal()+
theme_map
Our colours now change to the default dicrete colour scale but we will look at how to change these a bit later.
If you want an even bigger of choice of breaks or you want to make your own, we can use classIntervals()
function from the classInt package. This gives us commonly used methods for choosing univariate class intervals for mapping or other graphics purposes.
The different methods available are: “fixed”, “sd”, “equal”, “pretty”, “quantile”, “kmeans”, “hclust”, “bclust”, “fisher”, or “jenks”. We can use ?classIntervals
to get a full description of the different methods.
Here we are going to use ‘jenks’ breaks. This is a data clustering method designed to determine the best arrangement of values into different classes by reducing the variance within classes and maximising the variance between classes. It’s been checked for consistency with ArcView and ArcGIS.
jenks_breaks <- classIntervals(dummy$Percentage, n=5, style = "jenks")
We can check our breaks like so:
jenks_breaks$brks
## [1] 1.022535 20.296803 42.692727 62.109793 80.176022 99.800471
If you’re not sure whether the breaks you have created are right for your data, you can compare interval methods to one another. For example, let’s have a look at “equal” breaks:
equal_breaks <- classIntervals(dummy$Percentage, n=5, style = "equal")
equal_breaks$brks
## [1] 1.022535 20.778123 40.533710 60.289297 80.044884 99.800471
We can have a closer look distribution of the variable by plotting a quick histogram of the variable then use abline()
in different colours to see the breaks we’ve defined.
vlines <- data.frame(
jenks = jenks_breaks$brks,
equal = equal_breaks$brks
) %>% tidyr::gather("method", "brks", jenks:equal)
ggplot(dummy, aes(x = Percentage))+
geom_histogram(stat = "bin", binwidth = 10)+
geom_vline(data = vlines, aes(xintercept = brks, color = method))+
theme_bw()
Here we can see how much data falls into each interval and this may help us to make a better decision.
We’re going to go ahead and use “jenks” breaks for our map within the cut()
rather than cut_number()
:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
theme(legend.title = element_blank()) #adding this to temporarily get rid of the legend title
Here we’re going to define custom labels for our categories in the ‘a to b’ format.
labels <- jenks_breaks$brks %>%
round(0) %>%
paste0(" to ", lead(.),"%") %>%
head(-1)
labels
## [1] "1 to 20%" "20 to 43%" "43 to 62%" "62 to 80%" "80 to 100%"
Here we are taking each of the breaks, rounding them to the nearest whole number and concatenating with “to” and the next break, then removing the last label.
Now we can call our labels in the labels argument within cut()
:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
theme(legend.title = element_blank())
We’ll go through 2 different packages you can use to add colour to maps.
This package contains colour palettes to visualise your data. 3 categories of palettes:
qualitative palettes - different hues to visualise differences between classes
sequential palettes - progress from light to dark (good for interval data)
Diverging palettes are composed of darker colors of contrasting hues on the high and low extremes and lighter colors in the middle
More information at: http://moderndata.plot.ly/create-colorful-graphs-in-r-with-rcolorbrewer-and-plotly/
To see all available palettes:
display.brewer.all()
And to return information about palettes including whether suitable for those colourblind:
brewer.pal.info
## maxcolors category colorblind
## BrBG 11 div TRUE
## PiYG 11 div TRUE
## PRGn 11 div TRUE
## PuOr 11 div TRUE
## RdBu 11 div TRUE
## RdGy 11 div FALSE
## RdYlBu 11 div TRUE
## RdYlGn 11 div FALSE
## Spectral 11 div FALSE
## Accent 8 qual FALSE
## Dark2 8 qual TRUE
## Paired 12 qual TRUE
## Pastel1 9 qual FALSE
## Pastel2 8 qual FALSE
## Set1 9 qual FALSE
## Set2 8 qual TRUE
## Set3 12 qual FALSE
## Blues 9 seq TRUE
## BuGn 9 seq TRUE
## BuPu 9 seq TRUE
## GnBu 9 seq TRUE
## Greens 9 seq TRUE
## Greys 9 seq TRUE
## Oranges 9 seq TRUE
## OrRd 9 seq TRUE
## PuBu 9 seq TRUE
## PuBuGn 9 seq TRUE
## PuRd 9 seq TRUE
## Purples 9 seq TRUE
## RdPu 9 seq TRUE
## Reds 9 seq TRUE
## YlGn 9 seq TRUE
## YlGnBu 9 seq TRUE
## YlOrBr 9 seq TRUE
## YlOrRd 9 seq TRUE
We’ll use the function scale_fill_brewer()
to call the palette. We can now also give the legend a suitable title.
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_brewer(palette = "Greens", name = "Percentage")
This package brings to R, color scales created by Stéfan van der Walt and Nathaniel Smith for the Python matplotlib library. We can use the colour scales in this package to make plots that are “pretty, better represent your data, easier to read by those with colorblindness, and print well in grey scale”.
More information here: https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html
Here we’ll use the function scale_fill_viridis()
to call the palette:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_viridis(option = "magma", discrete = TRUE, name = "Percentage")
But maybe you don’t want to use one of these lovely pre-defined palettes and you want to define the colours yourself. You can do this using scale_fill_manual()
and define the colours as your values:
dft_colours <- c("#006853", #dark green
"#66a498", # green
"#d25f15", #orange
"#e49f73", #light orange
"#c99212") #yellow
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_manual(values = dft_colours, name = "Percentage")
There are a bunch of other things we can do to customise our map even further.
We can add a title and caption:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_manual(values = dft_colours, name = "Percentage")+
labs(x = NULL,
y = NULL,
title = "Local Authorities in England",
subtitle = "2016 - 2017",
caption = "Department for Transport")
We can add to the theme to change the style of text:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_manual(values = dft_colours, name = "Percentage")+
labs(x = NULL,
y = NULL,
title = "Local Authorities in England",
subtitle = "2016 - 2017",
caption = "Department for Transport")+
theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
plot.caption = element_text(size = 11),
legend.title = element_text(size = 11),
legend.text = element_text(size = 11))
We can do various things to the legend.
We can change the width and height of the legend categories:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_manual(values = dft_colours, name = "Percentage",
guide = guide_legend(keyheight = unit(4, units = "mm"),
keywidth = unit(8, units = "mm")))+
labs(x = NULL,
y = NULL,
title = "Local Authorities in England",
subtitle = "2016 - 2017",
caption = "Department for Transport")+
theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
plot.caption = element_text(size = 11),
legend.title = element_text(size = 11),
legend.text = element_text(size = 11))
We can make it horizontal and put it at the bottom:
ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_manual(values = dft_colours, name = "Percentage",
guide = guide_legend(direction = "horizontal"))+
labs(x = NULL,
y = NULL,
title = "Local Authorities in England",
subtitle = "2016 - 2017",
caption = "Department for Transport")+
theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
plot.caption = element_text(size = 11),
legend.title = element_text(size = 11),
legend.text = element_text(size = 11),
legend.position = "bottom")
We can change the position of the legend more precisely using coordinates within the theme:
map2 <- ggplot() +
geom_polygon(data = map_data,
aes(x = long, y = lat, group = group,
fill = cut(x = Percentage, n = 5, breaks = jenks_breaks$brks,
labels = labels, include.lowest = TRUE)),
col = "black") +
coord_equal()+
theme_map +
scale_fill_manual(values = dft_colours, name = "Percentage",
guide = guide_legend(keyheight = unit(4, units = "mm"),
keywidth = unit(8, units = "mm")))+
labs(x = NULL,
y = NULL,
title = "Local Authorities in England",
subtitle = "2016 - 2017",
caption = "Department for Transport")+
theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 15, face = "italic"),
plot.caption = element_text(size = 11),
legend.title = element_text(size = 11),
legend.text = element_text(size = 11),
legend.position = c(0.2,0.5))
map2
We can label the local authorities…
First summarise the data to get mean latitude and mean longitude for each LA
LA_labels <- map_data %>%
group_by(Local_Authority) %>%
summarise_at(vars(long, lat), mean)
head(LA_labels)
## # A tibble: 6 x 3
## Local_Authority long lat
## <chr> <dbl> <dbl>
## 1 Adur 519182. 106774.
## 2 Allerdale 322813. 535915.
## 3 Amber Valley 435629. 348353.
## 4 Arun 500759. 106234.
## 5 Ashfield 449089. 354446.
## 6 Ashford 596184. 139706.
Then we’ll add the labels using geom_text()
map2 + geom_text(data=LA_labels, aes(long, lat, label = Local_Authority), size=3, fontface="bold")
With 326 local authorities, plotting them all at once isn’t a great idea. Instead we can choose a subset of LA labels to plot:
LA_labels_selected <- LA_labels %>%
filter(Local_Authority %in% c("Manchester", "Shropshire", "York", "County Durham"))
map2 + geom_text(data=LA_labels_selected, aes(long, lat, label = Local_Authority), size=3, fontface="bold")
Still not very clear. Let’s make them into actual labels instead:
map2 + geom_label(data=LA_labels_selected, aes(long, lat, label = Local_Authority), size=3, fontface="bold")
The labels cover up the area on the map so lets make them more transparent with alpha:
# can change the transparency
map2 +
geom_label(data=LA_labels_selected, aes(long, lat, label = Local_Authority), alpha= 0.7, size=2, fontface="bold")