countrycode
standardizes country names, converts them into ~40 different coding schemes, and assigns region descriptors. Scroll down for more details or visit the countrycode CRAN page
If you use countrycode
in your research, we would be very grateful if you could cite our paper:
Arel-Bundock, Vincent, Nils Enevoldsen, and CJ Yetman, (2018). countrycode: An R package to convert country names and country codes. Journal of Open Source Software, 3(28), 848, https://doi.org/10.21105/joss.00848
- Why
countrycode
? - Installation
- Supported codes
countrycode
- Convert of a single name or code
- Vectors and data.frames
- Flags
- Country names in 600+ different languages and formats
custom_dict
: American statescustom_dict
: theISOcodes
packagedestination
: Fallback codesnomatch
: Fill in missing codes manuallycustom_match
: Override default valueswarn
: Silence warnings
countryname
: Convert country names from any language- Custom conversion functions and "crosswalks"
- Contributions
Different data sources use different coding schemes to represent countries (e.g. CoW or ISO). This poses two main problems: (1) some of these coding schemes are less than intuitive, and (2) merging these data requires converting from one coding scheme to another, or from long country names to a coding scheme.
The countrycode
function can convert to and from 40+ different country coding schemes, and to 600+ variants of country names in different languages and formats. It uses regular expressions to convert long country names (e.g. Sri Lanka) into any of those coding schemes or country names. It can create new variables with various regional groupings.
From the R console, type:
install.packages("countrycode")
To install the latest development version, you can use the remotes
package:
library(remotes)
install_github('vincentarelbundock/countrycode')
To get an up-to-date list of supported country codes, install the package and type ?codelist
. These include:
- 600+ variants of country names in different languages and formats.
- AR5
- Continent and region identifiers.
- Correlates of War (numeric and character)
- European Central Bank
- EUROCONTROL - The European Organisation for the Safety of Air Navigation
- Eurostat
- Federal Information Processing Standard (FIPS)
- Food and Agriculture Organization of the United Nations
- Global Administrative Unit Layers (GAUL)
- Geopolitical Entities, Names and Codes (GENC)
- Gleditsch & Ward (numeric and character)
- International Civil Aviation Organization
- International Monetary Fund
- International Olympic Committee
- ISO (2/3-character and numeric)
- Polity IV
- United Nations
- United Nations Procurement Division
- Varieties of Democracy
- World Bank
- World Values Survey
- Unicode symbols (flags)
Load library:
library(countrycode)
Convert single country codes:
# ISO to Correlates of War
countrycode('DZA', origin = 'iso3c', destination = 'cown')
[1] 615
# English to ISO
countrycode('Albania', origin = 'country.name', destination = 'iso3c')
[1] "ALB"
# German or Italian to Arabic
countrycode(c('Algerien', 'Albanien'), origin = 'country.name.de', destination = 'un.name.ar')
[1] "الجزائر" "ألبانيا"
countrycode(c('Moldavia', 'Stati Uniti'), origin = 'country.name.it', destination = 'un.name.ar')
[1] "ﺞﻤﻫﻭﺮﻳﺓ ﻡﻮﻟﺩﻮﻓﺍ" "ﺎﻟﻭﻼﻳﺎﺗ ﺎﻠﻤﺘﺣﺩﺓ ﺍﻸﻣﺮﻴﻜﻳﺓ"
> cowcodes <- c("ALG", "ALB", "UKG", "CAN", "USA")
> countrycode(cowcodes, origin = "cowc", destination = "iso3c")
[1] "DZA" "ALB" "GBR" "CAN" "USA"
Generate vectors and 2 data frames without a common id (i.e. can't merge the 2 df):
> isocodes <- c(12,8,826,124,840)
> var1 <- sample(1:500,5)
> var2 <- sample(1:500,5)
> df1 <- data.frame(cowcodes,var1)
> df2 <- data.frame(isocodes,var2)
Inspect the data:
> df1
cowcodes var1
1 ALG 71
2 ALB 427
3 UKG 180
4 CAN 21
5 USA 383
> df2
isocodes var2
1 12 238
2 8 329
3 826 463
4 124 437
5 840 26
Create a common variable with the iso3c code in each data frame, merge the data, and create a country identifier:
> df1$iso3c <- countrycode(df1$cowcodes, origin = "cowc", destination = "iso3c")
> df2$iso3c <- countrycode(df2$isocodes, origin = "iso3n", destination = "iso3c")
> df3 <- merge(df1,df2,id="iso3c")
> df3$country <- countrycode(df3$iso3c, origin = "iso3c", destination = "country.name")
> df3
iso3c cowcodes var1 isocodes var2 country
1 ALB ALB 113 8 245 ALBANIA
2 CAN CAN 373 124 197 CANADA
3 DZA ALG 254 12 295 ALGERIA
4 GBR UKG 351 826 57 UNITED KINGDOM
5 USA USA 241 840 85 UNITED STATES
countrycode
can convert country names and codes to unicode flags. For example, we can use the gt
package to draw a table with countries and their corresponding flags:
library(gt)
library(countrycode)
Countries <- c('Canada', 'Germany', 'Thailand', 'Algeria', 'Eritrea')
Flags <- countrycode(Countries, 'country.name', 'unicode.symbol')
dat <- data.frame(Countries, Flags)
gt(dat)
Which produces this file:
Note that embedding unicode characters in R
graphics is possible, but it can be tricky. If your output looks like \U0001f1e6\U0001f1f6
, then you could try feeding it to this function: utf8::utf8_print()
. That should cover a lot of cases without dipping into the complexity of graphics devices. As a rule of thumb, if your output looks like □□□□
(boxes), things tend to get more complicated. In that case, you'll have to think about different output devices, file viewers, and/or file formats (e.g., 'SVG' or 'HTML').
Since inserting unicode symbols into R
graphics is not a countrycode
-specific issue, we won't be able to offer any more support than this. Good luck!
The Unicode organisation hosts the CLDR project, which publishes many variants of country names. For each language/culture locale, there is a full set of names, plus possible 'alt-short' or 'alt-variant' variations of specific country names.
> countrycode('United States of America', origin = 'country.name', destination = 'cldr.name.en')
> [1] "United States"
> countrycode('United States of America', origin = 'country.name', destination = 'cldr.short.en')
> [1] "US"
To see a full list of country name variants available, inspect this data.frame:
> head(countrycode::cldr_examples)
Code Example
1 cldr.name.af Franse Suidelike Gebiede
2 cldr.name.agq TF
3 cldr.name.ak TF
4 cldr.name.am የፈረንሳይ ደቡባዊ ግዛቶች
5 cldr.name.ar الأقاليم الجنوبية الفرنسية
6 cldr.name.ar_ly الأقاليم الجنوبية الفرنسية
Since version 0.19, countrycode accepts user-supplied dictionaries via the custom_dict
argument. These dictionaries will override the built-in country code dictionary. For example, the countrycode Github repository includes a dictionary of regexes and abbreviations to work with US state names.
Load the library and download the custom dictionary data.frame:
library(countrycode)
url = "https://raw.githubusercontent.com/vincentarelbundock/countrycode/master/data/custom_dictionaries/us_states.csv"
state_dict = read.csv(url, stringsAsFactors=FALSE)
Convert:
countrycode('State of Alabama',
origin = 'state',
destination = 'abbreviation',
custom_dict = state_dict,
origin_regex = TRUE)
[1] "AL"
countrycode(c('MI', 'OH', 'Bad'), 'abbreviation', 'state', custom_dict=state_dict)
[1] "Michigan" "Ohio" NA
Note that if you use a custom dictionary with country codes, you could easily merge it into the countrycode::codelist
or countrycode::codelist_panel
to gain access to all other codes.
countrycode
already supports ISO4217 (currencies) and ISO3166 (country codes). The ISOcodes
package supplies other codes, including ISO15924 (language writing systems), ISO639 (languages), and ISO8859 (computer character encodings). Users can convert those codes using countrycode
's custom_dict
argument.
For example, the ISOcodes::ISO_639_2
dataframe includes 4 columns: Alpha_3_B
, Alpha_3_T
, Alpha_2
, and Name
. We can convert language names like this:
> countrycode('abk', 'Alpha_3_B', 'Name', custom_dict = ISOcodes::ISO_639_2)
[1] "Abkhazian"
The ISOcodes::ISO_8859
dataset is a 3-dimensional array where the second dimension represents the character encoding. We take the subset of ISO_8859_1
codes and convert the dict to a dataframe for use in countrycode
's custom_dict
argument:
library(ISOcodes)
dict