Return categorical dtypes from (JDBC)Backend to reduce memory usage #228

zikolach · 2019-11-28T09:41:01Z

Whenever possible switch to using categorical dtypes instead of object, at it allows to significantly reduce memory utilization when number of repeated values in a column is more than 50%.
Pay extra attention as comparison of dataframes (e.g. in tests) is sensible to dtypes (e.g. order of the values in category).
Here is an article providing more information about internal structures of dataframes in pandas.

khaeru · 2019-11-30T12:04:43Z

It should be decided whether this will be something that is:

allowed by—that is, not specified by, but compatible with—the Backend API, and implemented by JDBCBackend, or
specified as part of the Backend API, and then implemented by JDBCBackend.

The changes to the tests and documentation will differ depending on whether (1) or (2) is chosen.

khaeru changed the title ~~Use categorical dtypes to optimize memory itilization~~ Return categorical dtypes from (JDBC)Backend to reduce memory usage Nov 29, 2019

khaeru added the backend.jdbc Interaction with ixmp_source via JDBCBackend & JPype label Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return categorical dtypes from (JDBC)Backend to reduce memory usage #228

Return categorical dtypes from (JDBC)Backend to reduce memory usage #228

zikolach commented Nov 28, 2019

khaeru commented Nov 30, 2019

Return categorical dtypes from (JDBC)Backend to reduce memory usage #228

Return categorical dtypes from (JDBC)Backend to reduce memory usage #228

Comments

zikolach commented Nov 28, 2019

khaeru commented Nov 30, 2019