Skip to content

Extract multimodal training data from Flickr WARC files

Notifications You must be signed in to change notification settings

kingoflolz/flickr_warc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flickr WARC to TFRecords

A script for extracting images and corresponding metadata from the ArchiveTeam Flickr Dump and writing to TFRecords

Compile instructions

RUSTFLAGS="-C target-cpu=native" cargo build --release
cp target/release/flickr_warc .

Usage Instructions

# flickr_warc <input file> <output file>
flickr_warc flickr_20190324074003_89733133.megawarc.warc.gz flickr_20190324074003_89733133.tfrecords

About

Extract multimodal training data from Flickr WARC files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published