A pandas-based python package for programatically requesting data from the U.S. Bureau of Economic Analysis (BEA).
BEApy
is registered on PyPI, so just use pip to install:
pip install beapy
or
python3 -m pip install beapy
The BEA organizes its data into the following datasets:
- National Income & Product Accounts (NIPA)
- National Income Underlying Detail (NIUnderlyingDetail)
- Multinational Enterprises (MNE)
- Standard Fixed Assets (FixedAssets)
- International Transaction Accounts (ITA)
- International Investment Position (IIP)
- Input-Output Data (InputOutput)
- International Services Trade (IntlServTrade)
- GDP by Industry (GDPByIndustry)
- Regional Economic Data (Regional)
- Underlying GDP by Industry (UnderlyingGDPByIndustry)
The terms in parantheses above are the labels to use when accessing the corresponding dataset using BEApy
.
We first initialize an instance of the beapy.BEA
class, providing it with your personal BEA API key (request a free key here). Then provide the dataset name, table name, and data frequency:
>>> import beapy
>>> bea = beapy.BEA(key=your_personal_api_key)
>>> res = bea.data('nipa', tablename='t10101', frequency='a') # DataResponse
This would return a DataResponse
object that stores the annual data of table 't10101' in its .data
property, and the associated metadata in .metadata
, both of which are pandas
DataFrames. (See below for how to assign more intuitive, human-readable, names to the tables). The bea.data()
method's keywords necessary to construct a valid API call vary from dataset to dataset; the full list for each one can be retrieved using thebea.parameter_list(dataset)
method.
This second example retrieves monthly underlying data for 2015 & 2016:
>>> res = bea.data('underlying', tablename='u70205s', frequency='m', year=['2015', '2016'])
>>> res.data
ALLO03 AUTO35 AUTO40 ... UPR02 UPR03 UPR05
2015M01 45.83 3726400.0 3542200.0 ... 28471.0 26743.0 24139.0
2015M02 46.03 3619700.0 3481400.0 ... 28797.0 26946.0 24214.0
2015M03 45.77 3924700.0 3740500.0 ... 29038.0 26982.0 24199.0
... ... ... ... ... ... ... ...
2016M10 45.63 3382600.0 3173400.0 ... 29132.0 27052.0 24529.0
2016M11 44.85 3450200.0 3126100.0 ... 28635.0 26812.0 24411.0
2016M12 45.47 3447900.0 3216700.0 ... 28492.0 26897.0 24466.0
[24 rows x 44 columns]
Note the index is not formatted as a pandas
DatetimeIndex, as might be expected. This is because the BEA API can return data of multiple frequencies within the same request, formatting the periods as yyyy
, yyyyQq
, and yyyyMmm
for annual, quarterly, and monthly data, respectively. Casting these to Datetimes would mean, for example, '2000', '2000Q1', and '2000M01' would be indistinguishable. A pandas
PeriodIndex would not be able to hold the Periods of differing frequencies.
Each DataResponse
object also stores the metadata for the requested table. Using the same response as in the last example, we access the metadata using the .metadata
attribute:
>>> res.metadata
TableName SeriesCode ... CL_UNIT Notes
index ...
NSAT U70205S NSAT ... Level Table 7.2.5S. Auto and T...
NSAD U70205S NSAD ... Level Table 7.2.5S. Auto and T...
NSAF U70205S NSAF ... Level Table 7.2.5S. Auto and T...
... ... ... ... ... ...
UPR03 U70205S UPR03 ... Level Table 7.2.5S. Auto and T...
UPR05 U70205S UPR05 ... Level Table 7.2.5S. Auto and T...
CPA43 U70205S CPA43 ... Level Table 7.2.5S. Auto and T...
The information stored in this DataFrame varies from dataset to dataset. For this request, the metadata fields are
>>> res.metadata.columns
Index(['TableName', 'SeriesCode', 'LineNumber', 'LineDescription',
'METRIC_NAME', 'CL_UNIT', 'Notes'],
dtype='object')
A lot of the meta information relates to how the dataset is organized and displayed in the BEA tables. In this example, 'LineNumber'; in other examples, 'RowNumber', 'ColumnNumber', and the like. Other common ones are 'CL_UNIT' (level, percent change, etc.); 'METRIC_NAME' (Fisher Index, ratio, current dollars); and 'Notes', which are the footnotes of the BEA table.
The BEA doesn't seem to have a consistent term to identify series in a dataset table. The NIPA & NIUnderlyingDetail datasets use the term 'SeriesCode' to uniquely label a series in the data and metadata tables, whereas the IIP dataset uses 'TimeSeriesID'. The entries in these fields are used in the .metadata
index labels, and the .data
column labels.
The MNE dataset has a 'SeriesID' field, but it doesn't refer to a single variable in a table. Instead, the 'RowCode' & 'ColumnCode' are additionally needed to locate an individual series. In this case, the entries are concatenated together and separated by underscores to form the column/index labels. For example, a series in the MNE dataset with SeriesID = 5
, RowCode = 202
, and ColumnCode = 5400
will be uniquely identified in the metadata index and data columns by withthe label 5_202_5400
.
The fields that are used to label series in a dataset are stored in the .series_identifiers
property:
>>> res = bea.data('mne', ...)
>>> res.series_identifiers
['SeriesID', 'RowCode', 'ColumnCode']
BEA stores information about the datasets themselves as well, providing four different methods, which return three different subclasses of the BEAResponse
class.
The first metadata method simply retrieves the dataset names, along with short descriptions. The .dataset_list()
method of beapy.BEA
returns a DatasetListResponse
object, with a dictionary of those names and descriptions:
>>> import beapy
>>> bea = beapy.BEA(key=your_personal_api_key)
>>> res = bea.dataset_list()
>>> for name, descr in res.datasets.items():
>>> print(name, ': ', desc)
'NIPA' : 'Standard NIPA tables'
'NIUnderlyingDetail' : 'Standard NI underlying detail tables'
'MNE' : 'Multinational Enterprises'
...
The second method returns a ParameterListResponse
object, with a dictionary of names and summaries of the parameters that can define an API call. Those parameters differ between datasets though, so some care must be taken.
>>> res = bea.parameter_list('regional')
>>> for name, desc in res.parameters.items():
>>> print(name, ': ', desc)
'GeoFips' : {'ParameterDataType': 'string', 'ParameterDescription': ...}
'LineCode' : {'ParameterIsRequiredFlag': '1', 'MultipleAcceptedFlag': '0', ...}
'TableName' : {'ParameterDescription': 'Regional income or product table', ...}
'Year' : {'ParameterDataType': 'string', ...}
The third method returns the permissible values of a parameter:
>>> res = bea.parameter_values('intlservtrade', parameter='tradedirection')
>>> for k, v in res.parameters.items():
>>> print(k, ': ', v)
'Balance' : 'Balance'
'Exports' : 'Exports'
'Imports' : 'Imports'
'SupplementalIns' : 'Supplemental detail on insurance transactions'
The final metadata method retrieves the permissible values of a parameter (called the target parameter) based on the other provided parameters, which are provided as keyword arguments. For example,
>>> res = bea.filtered_parameter_values('regional', target='linecode', tablename='sainc1')
>>> for k, v in res.parameters.items():
>>> print(k, ': ', v)
'1' : '[SAINC1] Personal income'
'2' : '[SAINC1] Population'
'3' : '[SAINC1] Per capita personal income'
Multiple parameters can be filtered on at the same time; just provide them as additional keyword arguments.
res = bea.filtered_parameter_values('regional', target='year', tablename='cainc5n', geofips='01001')
This returns a list of the valid years in the Regional table 'cainc5n' with geographic code '01001'.
The beapy.BEA
class can be initialized with or without the BEA-provided key (go here to request your free API key). If the code isn't provided upon initialization, it is assumed your key has been saved with the built-in beapy.save_key()
method.
Before you initialize the BEA
class for the first time, use this method to record your key:
>>> import beapy
>>> beapy.save_key('YOUR_PERSONAL_API_KEY_FROM_BEA')
>>> bea = beapy.BEA()
The stored API key is automatically overwritten by further calls of the beapy.save_key()
method.
The table names of the 'NIPA', 'NIUnderlyingDetail', and 'FixedAssets' datasets are not particularly informative or nice to look at. The beapy
module provides methods that can be used to define custom table names that will be stored for future use.
Use beapy.define_table_name()
to create a custom reference:
>>> res = bea.data('underlying', tablename='auto_output', year='2016') # raises a BEAAPIError
>>> beapy.define_table_name(custom='auto_output', table_name='u70205s', dataset='underlying')
>>> res = bea.data('underlying', tablename='auto_output', year='2016') # no Error; data in table 'u70205s' is returned
A similar method is provided to define other names for the datasets
beapy.define_dataset_name(custom: str, dataset_name: str)