PyMet (WIP)

A Python API for consuming the Metropolitan Museum of Art's publicly available dataset.

Introduction

Finding sources for working with images of paintings is very difficult. The quality of Google image search results are highly variable.

The Met

Thankfully, the Metropolitan Museum of Art in New York City has graciously made their works avaiable for download and free use via their openaccess repo.

Unfortunately, they only provide a massive CSV (256 MB) without any instructions on how to download or use the collection.

How to download

Because of the size of the data file, you'll need to use git's Large File Storage extension to properly download the full collection.

Update the CsvPaths constants

Once you have the dataset on your local machine, you'll need to update the file paths located in the constants module.

MetObjects.csv represents the full collection. MetPaintings.csv represent only those rows where Object Name == 'Painting'.

The Collection

The full collection contains almost 500,000 rows of unique works. The data is not clean or uniform. Be wary of encoding issues if using Python 2. Columns were named to be human readable and are not easily accessed by data science libraries (pandas).

A list of all the columns (after transformation) in the dataset can be found here.

MetPaintings

MetPaintings is an object intended to make all the paintings in the collection more accessible for study. For this reason I have limited the columns that are contained in this dataset. This represents 6100 individual works in 750 mediums.

This is the primary object I will be developing and working with.

Incapsula

The Met prevents accessing their collection by web scraping through the incapsula service.

At first I tried using incapsula-cracker-py3 to handle my requests to their server, but this did not bypass incapsula.

My next plan is to try using PhantomJS and Selenium to better impersonate a non-bot user.

###PhantomJS PhantomJS on Mac: brew cask install phantomjs

PhantomJS is deprecated.

###Chrome WebDriver I downloaded the chrome webdriver and found success using Selenium with this new driver. By mimicking the browser, the software doesn't get recognized as a bot.

But this is pretty heavy weight for our purposes. I need to better define what my purpose is.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
pymet		pymet
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyMet (WIP)

Introduction

The Met

How to download

Update the CsvPaths constants

The Collection

MetPaintings

Incapsula

About

Releases

Packages

Languages

atheis4/pymet

Folders and files

Latest commit

History

Repository files navigation

PyMet (WIP)

Introduction

The Met

How to download

Update the CsvPaths constants

The Collection

MetPaintings

Incapsula

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages