Skip to content

Fast PostgreSQL bulk inserts with Cython and binary copy

Notifications You must be signed in to change notification settings

altaurog/cpgcopy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpgcopy

This project is not maintained. Please use pgcopy.

cpgcopy is a Cython implementation of pgcopy, a small system for very fast bulk insertion of data into a PostgreSQL database table using binary copy.

cpgcopy is about twice as fast as pgcopy. Network bandwith and other system limitations are much more likely to be your rate-limiting factor. In retrospect, I learned a nice bit about cython with this project, but ultimately found it a misplaced optimization effort.

Requirements

Cython and numpy must be installed to build the package. (Help distributing sources without Cython dependency would be greatly appreciated.) Additional run-time dependencies are:

cpgcopy can be built on recent linux systems (glibc version 2.9 or higher). No information is available regarding its use on other platforms.

nose is required to run the tests.

Basic use

cpgcopy provides facility for copying data from a pandas DataFrame to a table in a postgresql database using a CopyManager, which must be instantiated with a psycopg2 db connection, the table name, and an iterable indicating the names of the columns to be inserted. cpgcopy inspects the database to determine the datatypes of the columns.

For example:

import numpy as np
import pandas as pd
import psycopg2
from cpgcopy import CopyManager
cols = ('a', 'b', 'c')
df = pd.DataFrame(np.random.randn(500, 3), columns=cols)
conn = psycopg2.connect(database='weather_db')
mgr = CopyManager(conn, 'measurements_table', cols)
mgr.copy(df)

Supported datatypes

Currently the following PostgreSQL datatypes are supported:

  • bool
  • smallint
  • integer
  • bigint
  • real
  • double precision
  • char
  • varchar
  • date
  • timestamp
  • timestamp with time zone

Benchmarks

Below are simple benchmark results for 100000 records. This gives a general idea of the kind of speedup available with cpgcopy:

$ nosetests -c tests/benchmark.cfg
                     Benchmark:   0.35s
          ExecuteManyBenchmark:   7.86s
----------------------------------------------------------------------
Ran 2 tests in 9.085s

About

Fast PostgreSQL bulk inserts with Cython and binary copy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages