Skip to content

Download and load Houston 2013 Dataset like a buildin function in scikit-learn.

License

Notifications You must be signed in to change notification settings

songyz2019/fetch_houston2013

Repository files navigation

English | 中文(简体)


This project is still working in progress


fetch_houston2013

Download and load Houston 2013 Dataset (2013 IEEE GRSS Data Fusion Contest) like a buildin function in scikit-learn.

  • Automaticlly download all needed files
  • Support caching
  • Verify checksums
  • Show copyright of the dataset
  • numpy format
  • PyTorch support
  • Use sparse matrix to representing ground truth, less memory usage and easier iteration
  • Faster loading than .mat
  • Need more testing before GA

screenshot

fetch_muufl

fetch_muufl dataset

Usage

  1. Install scikit-image
pip install scikit-image==0.22.0
pip install scipy
  1. copy fetch_houston2013.py to your project
  2. import and run
from fetch_houston2013 import fetch_houston2013
casi, lidar, train_y, test_y, info = fetch_houston2013()

PyTorch

  1. Install scikit-image
pip install scikit-image==0.22.0
pip install scipy
  1. copy Houston2013.py to your project
  2. import and run
from Houston2013 import Houston2013
trainset = DataLoader(Houston2013(train=True, patch_size=7), batch_size=32, shuffle=True)
testset  = DataLoader(Houston2013(train=False, patch_size=7))

Troubleshooting

Remove ~/scikit_learn_data to clean cache and try again.
We download dataset from official website and pastbin.com. Make sure you can access these websites.

Benchmark

In our simple test case, loading data is nearly 2x faster than traditional ways, thanks to the native code in imageio.

In [4]: from util.houton2013 import fetch_houston2013
   ...: from time import time
   ...: start_time = time()
   ...: fetch_houston2013()
   ...: print("fetch_houston2013 %.4f ms"  % (1000*(time() - start_time)))

fetch_houston2013 521.4026 ms

In [5]: from scipy.io import loadmat
   ...: start_time = time()
   ...: loadmat(os.path.expanduser('~/dataset/Houston2013/HSI.mat'))['HSI']
   ...: loadmat(os.path.expanduser('~/dataset/Houston2013/DSM.mat'))['DSM']
   ...: loadmat(os.path.expanduser('~/dataset/Houston2013/TR.mat'))['TR_map']
   ...: loadmat(os.path.expanduser('~/dataset/Houston2013/TE.mat'))['TE_map']
   ...: print("loadmat Time %.4f ms" % (1000*(time() - start_time)))

loadmat Time 979.6755 ms

Test environmetn: Ubuntu 22.04 on Intel(R) Core(TM) i9-10900X

This is totally meaningless!

Yes, comparing to training time, improving data loading time is not notable, but it is interesting.

TODO(Maybe)

Replace scikit-image with imageio Publish to pypi or provide whl

Credits

Houston2013 dataset: https://hyperspectral.ee.uh.edu/?page_id=459
paperswithcode: https://paperswithcode.com/dataset/houston
The 2013_IEEE_GRSS_DF_Contest_Samples_VA.txt in this repo is exported from original 2013_IEEE_GRSS_DF_Contest_Samples_VA.roi.

Note: If this data is used in any publication or presentation the following reference must be cited:
P. Gader, A. Zare, R. Close, J. Aitken, G. Tuell, “MUUFL Gulfport Hyperspectral and LiDAR Airborne Data Set,” University of Florida, Gainesville, FL, Tech. Rep. REP-2013-570, Oct. 2013.
If the scene labels are used in any publication or presentation, the following reference must be cited:
X. Du and A. Zare, “Technical Report: Scene Label Ground Truth Map for MUUFL Gulfport Data Set,” University of Florida, Gainesville, FL, Tech. Rep. 20170417, Apr. 2017. Available: http:https://ufdc.ufl.edu/IR00009711/00001.
If any of this scoring or detection code is used in any publication or presentation, the following reference must be cited:
T. Glenn, A. Zare, P. Gader, D. Dranishnikov. (2016). Bullwinkle: Scoring Code for Sub-pixel Targets (Version 1.0) [Software]. Available from https://github.com/GatorSense/MUUFLGulfport/.

License

AGPL-3.0-only

About

Download and load Houston 2013 Dataset like a buildin function in scikit-learn.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages