country converter

The country converter (coco) is a Python package to convert country names between different classifications and between different naming versions. Internally it uses regular expressions to match country names.

Installation

Just download the package and add the path to your python path:

import sys
_fd = r'S:\coco'
if not _fd in sys.path:
    sys.path.append(_fd)
del _fd
import country_converter as coco

The package depends on pandas; for testing py.test is required.

Usage

Basic usage

Convert various country names to some standard names:

import country_converter as coco
cc = coco.CountryConverter()

some_names = ['United Rep. of Tanzania', 'Cape Verde', 'Burma', 'Iran (Islamic Republic of)', 'Korea, Republic of', "Dem. People's Rep. of Korea"]

standard_names = cc.convert(names = some_names, src = 'regex', to = 'name_short')
print(standard_names)

Which results in ['Tanzania', 'Cabo Verde', 'Myanmar', 'Iran', 'South Korea', 'North Korea'].

Convert between classification schemes:

iso3_codes = ['USA', 'VUT', 'TKL', 'AUT' ]
iso2_codes = cc.convert(names = iso3_codes, src = 'ISO3', to = 'ISO2')
print(iso2_codes)

Which results in ['US', 'VU', 'TK', 'AT']

Internally the data is stored in a pandas dataframe, which can be accessed directly. For example, this can be used to filter countries for membership organisations (per year).

some_countries = ['Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Cyprus', 'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'India', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Romania', 'Russia',  'Turkey', 'United Kingdom', 'United States']

oecd_since_1995 = cc.data[(cc.data.OECD >= 1995) & cc.data.name_short.isin(some_countries)].name_short
eu_until_1980 = cc.data[(cc.data.EU <= 1980) & cc.data.name_short.isin(some_countries)].name_short
print(oecd_since_1995)
print(eu_until_1980)

Some properties provide direct access to affiliations:

cc.EU28
cc.OECD

cc.EU27in('ISO3')

and the classification schemes available:

cc.valid_class

The regular expressions can also be used to match any list of countries to any other. For example:

match_these = ['norway', 'united_states', 'china', 'taiwan']
master_list = ['USA', 'The Swedish Kingdom', 'Norway is a Kingdom too', 'Peoples Republic of China', 'Republic of China' ]

matching_dict = coco.match(match_these, master_list)

See the IPython Notebook (country_converter_examples.ipynb) for more information.

Refining and Extending

The underlying raw data is a tab-separated file which is read into a pandas dataframe (available as attribute .data in the main class). Any column added to this dataframe can be used for all conversions. The tab-separated datafile is utf-8 encoded.

The included regular expressions were tested against names commonly found in various databases. In case, the expression need to updated I recommend to rerun all tests (using the _py.test package).

These tests check

Do the short names uniquely match the regular expression?
Do the official name uniquely match the regular expression?
Do the alternative names tested so far still uniquely match the standard names?

To specify a new test set just add a tab-separated file with headers "name_short" and "name_test" and provide name (corresponding to the short name in the main classification file) and the alternative name which should be tested (one pair per row in the file). If the file name starts with "test_regex_" it will be automatically recognised by the test functions.

Classification schemes

Currently the following classification schemes are available:

ISO2 (ISO 3166-1 alpha-2)
ISO3 (ISO 3166-1 alpha-3)
ISO - numeric (ISO 3166-1 numeric)
UN numeric code (which follows to a large extend ISO - numeric)
A standard or short name
The "official" name
Continent
UN region
EXIOBASE 1 classification
EXIOBASE 2 classification
EXIOBASE 2 classification
WIOD classification
OECD membership (per year)
UN membership (per year)
EU membership (per year)

Data sources and further reading

Most of the underlying data can be found in Wikipedia. https://en.wikipedia.org/wiki/ISO_3166-1 is a good starting point. UN regions/codes are given on the United Nation Statistical Division (unstats) web-page. EXIOBASE and WIOD classification were extracted from the respective databases. The membership of OECD, UN and EU can be found at the membership organisations webpages.

Acknowledgements

This package was inspired by (and the regular expression are mostly based on) the R-package countrycode by Julian Hinz and its port to Python (pycountrycode) by Vincent Arel-Bundock.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
tests		tests
.gitignore		.gitignore
LICENCE		LICENCE
country_converter.py		country_converter.py
country_converter_examples.ipynb		country_converter_examples.ipynb
country_data.txt		country_data.txt
readme.rst		readme.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

country converter

Installation

Usage

Basic usage

Refining and Extending

Classification schemes

Data sources and further reading

Acknowledgements

About

Releases

Packages

Languages

License

cynepiaadmin/country_converter

Folders and files

Latest commit

History

Repository files navigation

country converter

Installation

Usage

Basic usage

Refining and Extending

Classification schemes

Data sources and further reading

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages