Skip to content

pandas-based python package for programatically retrieving data from the BEA

Notifications You must be signed in to change notification settings

loganhotz/bea-dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BEApy

A pandas-based python package for programatically requesting data from the U.S. Bureau of Economic Analysis (BEA).

Installation

BEApy is registered on PyPI, so just use pip to install:

pip install beapy

or

python3 -m pip install beapy

Requesting Economic Data

The BEA organizes its data into the following datasets:

  • National Income & Product Accounts (NIPA)
  • National Income Underlying Detail (NIUnderlyingDetail)
  • Multinational Enterprises (MNE)
  • Standard Fixed Assets (FixedAssets)
  • International Transaction Accounts (ITA)
  • International Investment Position (IIP)
  • Input-Output Data (InputOutput)
  • International Services Trade (IntlServTrade)
  • GDP by Industry (GDPByIndustry)
  • Regional Economic Data (Regional)
  • Underlying GDP by Industry (UnderlyingGDPByIndustry)

The terms in parantheses above are the labels to use when accessing the corresponding dataset using BEApy.

Dataset Data

We first initialize an instance of the beapy.BEA class, providing it with your personal BEA API key (request a free key here). Then provide the dataset name, table name, and data frequency:

>>> import beapy

>>> bea = beapy.BEA(key=your_personal_api_key)
>>> res = bea.data('nipa', tablename='t10101', frequency='a') # DataResponse

This would return a DataResponse object that stores the annual data of table 't10101' in its .data property, and the associated metadata in .metadata, both of which are pandas DataFrames. (See below for how to assign more intuitive, human-readable, names to the tables). The bea.data() method's keywords necessary to construct a valid API call vary from dataset to dataset; the full list for each one can be retrieved using thebea.parameter_list(dataset) method.

This second example retrieves monthly underlying data for 2015 & 2016:

>>> res = bea.data('underlying', tablename='u70205s', frequency='m', year=['2015', '2016'])
>>> res.data

        ALLO03     AUTO35     AUTO40  ...    UPR02    UPR03    UPR05
2015M01  45.83  3726400.0  3542200.0  ...  28471.0  26743.0  24139.0
2015M02  46.03  3619700.0  3481400.0  ...  28797.0  26946.0  24214.0
2015M03  45.77  3924700.0  3740500.0  ...  29038.0  26982.0  24199.0
  ...     ...      ...        ...     ...    ...      ...      ...
2016M10  45.63  3382600.0  3173400.0  ...  29132.0  27052.0  24529.0
2016M11  44.85  3450200.0  3126100.0  ...  28635.0  26812.0  24411.0
2016M12  45.47  3447900.0  3216700.0  ...  28492.0  26897.0  24466.0
[24 rows x 44 columns]

Note the index is not formatted as a pandas DatetimeIndex, as might be expected. This is because the BEA API can return data of multiple frequencies within the same request, formatting the periods as yyyy, yyyyQq, and yyyyMmm for annual, quarterly, and monthly data, respectively. Casting these to Datetimes would mean, for example, '2000', '2000Q1', and '2000M01' would be indistinguishable. A pandas PeriodIndex would not be able to hold the Periods of differing frequencies.

Dataset Metadata

Each DataResponse object also stores the metadata for the requested table. Using the same response as in the last example, we access the metadata using the .metadata attribute:

>>> res.metadata

       TableName SeriesCode  ...  CL_UNIT                        Notes
index                        ...
NSAT     U70205S       NSAT  ...    Level  Table 7.2.5S. Auto and T...
NSAD     U70205S       NSAD  ...    Level  Table 7.2.5S. Auto and T...
NSAF     U70205S       NSAF  ...    Level  Table 7.2.5S. Auto and T...
 ...       ...         ...   ...     ...              ...
UPR03    U70205S      UPR03  ...    Level  Table 7.2.5S. Auto and T...
UPR05    U70205S      UPR05  ...    Level  Table 7.2.5S. Auto and T...
CPA43    U70205S      CPA43  ...    Level  Table 7.2.5S. Auto and T...

The information stored in this DataFrame varies from dataset to dataset. For this request, the metadata fields are

>>> res.metadata.columns

Index(['TableName', 'SeriesCode', 'LineNumber', 'LineDescription',
       'METRIC_NAME', 'CL_UNIT', 'Notes'],
      dtype='object')

A lot of the meta information relates to how the dataset is organized and displayed in the BEA tables. In this example, 'LineNumber'; in other examples, 'RowNumber', 'ColumnNumber', and the like. Other common ones are 'CL_UNIT' (level, percent change, etc.); 'METRIC_NAME' (Fisher Index, ratio, current dollars); and 'Notes', which are the footnotes of the BEA table.

Identifying Series

The BEA doesn't seem to have a consistent term to identify series in a dataset table. The NIPA & NIUnderlyingDetail datasets use the term 'SeriesCode' to uniquely label a series in the data and metadata tables, whereas the IIP dataset uses 'TimeSeriesID'. The entries in these fields are used in the .metadata index labels, and the .data column labels.

The MNE dataset has a 'SeriesID' field, but it doesn't refer to a single variable in a table. Instead, the 'RowCode' & 'ColumnCode' are additionally needed to locate an individual series. In this case, the entries are concatenated together and separated by underscores to form the column/index labels. For example, a series in the MNE dataset with SeriesID = 5, RowCode = 202, and ColumnCode = 5400 will be uniquely identified in the metadata index and data columns by withthe label 5_202_5400.

The fields that are used to label series in a dataset are stored in the .series_identifiers property:

>>> res = bea.data('mne', ...)
>>> res.series_identifiers
['SeriesID', 'RowCode', 'ColumnCode']

Requesting Metadata

BEA stores information about the datasets themselves as well, providing four different methods, which return three different subclasses of the BEAResponse class.

Dataset List

The first metadata method simply retrieves the dataset names, along with short descriptions. The .dataset_list() method of beapy.BEA returns a DatasetListResponse object, with a dictionary of those names and descriptions:

>>> import beapy

>>> bea = beapy.BEA(key=your_personal_api_key)
>>> res = bea.dataset_list()
>>> for name, descr in res.datasets.items():
>>>     print(name, ': ', desc)

    'NIPA' :  'Standard NIPA tables'
    'NIUnderlyingDetail' :  'Standard NI underlying detail tables'
    'MNE' :  'Multinational Enterprises'
    ...

Parameter List

The second method returns a ParameterListResponse object, with a dictionary of names and summaries of the parameters that can define an API call. Those parameters differ between datasets though, so some care must be taken.

>>> res = bea.parameter_list('regional')
>>> for name, desc in res.parameters.items():
>>>     print(name, ': ', desc)

'GeoFips' :  {'ParameterDataType': 'string', 'ParameterDescription': ...}
'LineCode' :  {'ParameterIsRequiredFlag': '1', 'MultipleAcceptedFlag': '0', ...}
'TableName' :  {'ParameterDescription': 'Regional income or product table', ...}
'Year' :  {'ParameterDataType': 'string', ...}

Parameter Values

The third method returns the permissible values of a parameter:

>>> res = bea.parameter_values('intlservtrade', parameter='tradedirection')
>>> for k, v in res.parameters.items():
>>>     print(k, ': ', v)

'Balance' :  'Balance'
'Exports' :  'Exports'
'Imports' :  'Imports'
'SupplementalIns' :  'Supplemental detail on insurance transactions'

Filtered Parameter Values

The final metadata method retrieves the permissible values of a parameter (called the target parameter) based on the other provided parameters, which are provided as keyword arguments. For example,

>>> res = bea.filtered_parameter_values('regional', target='linecode', tablename='sainc1')
>>> for k, v in res.parameters.items():
>>>     print(k, ': ', v)

'1' :  '[SAINC1] Personal income'
'2' :  '[SAINC1] Population'
'3' :  '[SAINC1] Per capita personal income'

Multiple parameters can be filtered on at the same time; just provide them as additional keyword arguments.

res = bea.filtered_parameter_values('regional', target='year', tablename='cainc5n', geofips='01001')

This returns a list of the valid years in the Regional table 'cainc5n' with geographic code '01001'.

Saving your API Key

The beapy.BEA class can be initialized with or without the BEA-provided key (go here to request your free API key). If the code isn't provided upon initialization, it is assumed your key has been saved with the built-in beapy.save_key() method.

Before you initialize the BEA class for the first time, use this method to record your key:

>>> import beapy

>>> beapy.save_key('YOUR_PERSONAL_API_KEY_FROM_BEA')
>>> bea = beapy.BEA()

The stored API key is automatically overwritten by further calls of the beapy.save_key() method.

Custom Table Names

The table names of the 'NIPA', 'NIUnderlyingDetail', and 'FixedAssets' datasets are not particularly informative or nice to look at. The beapy module provides methods that can be used to define custom table names that will be stored for future use.

Use beapy.define_table_name() to create a custom reference:

>>> res = bea.data('underlying', tablename='auto_output', year='2016') # raises a BEAAPIError
>>> beapy.define_table_name(custom='auto_output', table_name='u70205s', dataset='underlying')
>>> res = bea.data('underlying', tablename='auto_output', year='2016') # no Error; data in table 'u70205s' is returned

A similar method is provided to define other names for the datasets

beapy.define_dataset_name(custom: str, dataset_name: str)

About

pandas-based python package for programatically retrieving data from the BEA

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages