Smart and awesome CSV utils

CSVs are awesome, yet they're pretty dumb. Let's get them smarter!

smartcsv is a python utility to read and parse CSVs based on model definitions. Instead of just parsing the CSV into lists (like the builtin csv module) it adds the ability to specify models with attributes names. On top of that it adds nice features like validation, custom parsing, failure control and nice error messages.

Installation

pip install smartcsv

Usage

To see an entire set of usages check the test package (99% coverage).

The basic is to define a spec for the columns of your csv. Assuming the following CSV file:

title,category,subcategory,currency,price,url,image_url
iPhone 5c blue,Phones,Smartphones,USD,399,https://apple.com/iphone,https://apple.com/iphone.jpg
iPad mini,Tablets,Apple,USD,699,https://apple.com/iphone,https://apple.com/iphone.jpg

First you need to define the spec for your columns. This is an example (the one used in tests):

COLUMNS_1 = [
    {'name': 'title', 'required': True},
    {'name': 'category', 'required': True},
    {'name': 'subcategory', 'required': False},
    {
        'name': 'currency',
        'required': True,
        'choices': CURRENCIES
    },
    {
        'name': 'price',
        'required': True,
        'validator': is_number
    },
    {
        'name': 'url',
        'required': True,
        'validator': lambda c: c.startswith('http')
    },
    {
        'name': 'image_url',
        'required': False,
        'validator': lambda c: c.startswith('http')
    },
]

You can then use smartcsv to parse the CSV:

import smartcsv
with open('my-csv.csv', 'r') as f:
    reader = smartcsv.reader(f, columns=COLUMNS_1)
    for obj in reader:
        print(obj['title'])

smartcsv.reader uses the builtin csv module and accepts a dialect to use.

More advanced usage

Assuming a CSV with the an error in the second row.

reader = smartcsv.reader(f, columns=COLUMNS_1, fail_fast=False)
for obj in reader:
    print obj['title']
    
row = reader.errors['rows'][1]  # Second row has index = 1. Errors are 0-indexed.
print(error_row['row'])  # Print original row data
print(error_row['errors'].keys())  # currency  (the currency column)
print(error_row['errors']['currency'])  # Invalid currency... (nice error explanation)

You can also specify a max_failures parameter. It will count failures and will raise an exception when that threshold is exceeded.

Strip white spaces

By default the strip_white_spaces option is set to True. Example:

sample.csv
title,price
   Some Product  ,  55.5

row['title'] will be "Some Product" and row['price'] will be "55.5" (spaces stripped)

Skip lines

sample.csv
GENERATED BY AWESOME SCRIPT
2014-08-12

title,price
Some Product,55.5

The first 3 lines don't contain any valuable data so we'll skip them.

reader = smartcsv.reader(f, columns=COLUMNS_1, fail_fast=False, skip_lines=3)
for obj in reader:
    print obj['title']

Contributing

Fork, code, watch your tests pass, submit PR. To test:

$ python setup.py test  # Run tests in your venv
$ tox  # Make sure it passes in all versions.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
smartcsv		smartcsv
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
requirements-dev.txt		requirements-dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart and awesome CSV utils

Installation

Usage

More advanced usage

Contributing

About

Releases

Packages

ilmesi/smartcsv

Folders and files

Latest commit

History

Repository files navigation

Smart and awesome CSV utils

Installation

Usage

More advanced usage

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages