Skip to content

Commit

Permalink
Clean list format
Browse files Browse the repository at this point in the history
  • Loading branch information
caesar0301 committed Jan 2, 2016
1 parent c5b4ac7 commit d2f8cb8
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 57 deletions.
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ before_script:
- gem install awesome_bot
script:
- site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu
- awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov
- whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org
- site503=labrosa.ee.columbia.edu/millionsong,datamob.org
- awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503
113 changes: 57 additions & 56 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Biology
* `MIT Cancer Genomics Data <http:https://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi>`_
* `NIH Microarray data <http:https://bit.do/VVW6>`_ or `FTP <ftp:https://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/>`_
* `OpenSNP genotypes data <https://opensnp.org/>`_
* `Pathguid: Protein-Protein Interactions Catalog <http:https://www.pathguide.org/>`_
* `Pathguid - Protein-Protein Interactions Catalog <http:https://www.pathguide.org/>`_
* `Protein Data Bank <http:https://www.rcsb.org/>`_
* `PubChem Project <https://pubchem.ncbi.nlm.nih.gov/>`_
* `PubGene (now Coremine Medical) <http:https://www.pubgene.org/>`_
Expand Down Expand Up @@ -132,20 +132,20 @@ Economics

* `American Economic Ass (AEA) <https://www.aeaweb.org/RFE/toc.php?show=complete>`_
* `EconData from UMD <http:https://inforumweb.umd.edu/econdata/econdata.html>`_
* `Economic Freedom of the World Data <http:https://www.freetheworld.com/datasets_efw.html>`_
* `Historical MacroEconomc Statistics <http:https://www.historicalstatistics.org/>`_
* `International Trade Statistics <http:https://www.econostatistics.co.za/>`_
* `Internet Product Code Database <http:https://www.upcdatabase.com/>`_
* `OpenCorporates Database of Companies in the World <https://opencorporates.com/>`_
* `Joint External Debt Data Hub <http:https://www.jedh.org/>`_
* `Jon Haveman International Trade Data Links <http:https://www.macalester.edu/research/economics/PAGE/HAVEMAN/Trade.Resources/TradeData.html>`_
* `OpenCorporates Database of Companies in the World <https://opencorporates.com/>`_
* `Our World in Data <http:https://ourworldindata.org/>`_
* `SciencesPo World Trade Gravity Datasets <http:https://econ.sciences-po.fr/thierry-mayer/data>`_
* `The Atlas of Economic Complexity <atlas.cid.harvard.edu>`_
* `The Observatory of Economic Complexity <atlas.media.mit.edu/en/>`_
* `The Center for International Data <cid.econ.ucdavis.edu>`_
* `The Observatory of Economic Complexity <atlas.media.mit.edu/en/>`_
* `UN Commodity Trade Statistics <comtrade.un.org/db/>`_
* `UN Human Development Reports <hdr.undp.org/en>`_
* `International Trade Statistics <http:https://www.econostatistics.co.za/>`_
* `Historical MacroEconomc Statistics <http:https://www.historicalstatistics.org/>`_
* `SciencesPo World Trade Gravity Datasets <http:https://econ.sciences-po.fr/thierry-mayer/data>`_
* `Jon Haveman International Trade Data Links <http:https://www.macalester.edu/research/economics/PAGE/HAVEMAN/Trade.Resources/TradeData.html>`_
* `Economic Freedom of the World Data <http:https://www.freetheworld.com/datasets_efw.html>`_
* `Our World in Data <http:https://ourworldindata.org/>`_


Energy
Expand Down Expand Up @@ -181,9 +181,9 @@ Finance
Geology
-------

* `Earth Models <http:https://www.earthmodels.org/>`_
* `Smithsonian Institution Global Volcano and Eruption Database <http:https://volcano.si.edu/>`_
* `USGS Earthquake Archives <http:https://earthquake.usgs.gov/earthquakes/search/>`_
* `Earth Models <http:https://www.earthmodels.org/>`_


GeoSpace/GIS
Expand All @@ -194,8 +194,10 @@ GeoSpace/GIS
* `EOSDIS - NASA's earth observing system data <http:https://sedac.ciesin.columbia.edu/data/sets/browse>`_
* `Factual Global Location Data <https://www.factual.com/>`_
* `Geo Spatial Data from ASU <http:https://geodacenter.asu.edu/datalist/>`_
* `Geo Wiki Project - Citizen-driven Environmental Monitoring <http:https://geo-wiki.org/>`_
* `GeoNames Worldwide <http:https://www.geonames.org/>`_
* `Global Administrative Areas Database (GADM) <http:https://www.gadm.org/>`_
* `International Institute for Systems Analysis - GIS Datasets <http:https://www.iiasa.ac.at/web/home/research/modelsData/Models--Tools--Data.en.html>`_
* `Landsat 8 on AWS <https://aws.amazon.com/public-data-sets/landsat/>`_
* `List of all countries in all languages <https://github.com/umpirsky/country-list>`_
* `Natural Earth - vectors and rasters of the world <http:https://www.naturalearthdata.com/>`_
Expand All @@ -205,19 +207,17 @@ GeoSpace/GIS
* `TIGER/Line - U.S. boundaries and roads <http:https://www.census.gov/geo/maps-data/data/tiger-line.html>`_
* `TwoFishes - Foursquare's coarse geocoder <https://github.com/foursquare/twofishes>`_
* `TZ Timezones shapfiles <http:https://efele.net/maps/tz/world/>`_
* `World countries in multiple formats <https://github.com/mledoze/countries>`_
* `International Institute for Systems Analysis - GIS Datasets <http:https://www.iiasa.ac.at/web/home/research/modelsData/Models--Tools--Data.en.html>`_
* `Geo Wiki Project - Citizen-driven Environmental Monitoring <http:https://geo-wiki.org/>`_
* `UN Environmental Data <http:https://geodata.grid.unep.ch/>`_
* `World countries in multiple formats <https://github.com/mledoze/countries>`_


Government
----------

* `Alberta, Province of Canada <http:https://open.alberta.ca>`_
* `Antwerp, Belgium <http:https://opendata.antwerpen.be/datasets>`_
* `Argentina <http:https://datos.argentina.gob.ar/>`_
* `Argentina (non official) <http:https://datar.noip.me/>`_
* `Argentina <http:https://datos.argentina.gob.ar/>`_
* `Austin, TX, US <https://data.austintexas.gov/>`_
* `Australia (abs.gov.au) <http:https://www.abs.gov.au/AUSSTATS/[email protected]/DetailsPage/3301.02009?OpenDocument>`_
* `Australia (data.gov.au) <https://data.gov.au/>`_
Expand All @@ -231,6 +231,7 @@ Government
* `Canada <http:https://open.canada.ca/en?lang=En&n=5BCD274E-1>`_
* `Chicago <https://data.cityofchicago.org/>`_
* `Dallas Open Data <https://www.dallasopendata.com/>`_
* `DataBC - data from the Province of British Columbia <http:https://www.data.gov.bc.ca/>`_
* `Denver Open Data <http:https://data.denvergov.org//>`_
* `Durham, NC Open Data <https://opendurham.nc.gov/explore/>`_
* `Edmonton, AB, Canada <https://data.edmonton.ca/>`_
Expand All @@ -251,8 +252,8 @@ Government
* `Indian Government Data <https://data.gov.in/>`_
* `Indonesian Data Portal <http:https://data.go.id/>`_
* `Laval, QC, Canada <http:https://www.laval.ca/Pages/Fr/Citoyens/donnees.aspx>`_
* `London, ON, Canada <http:https://www.london.ca/city-hall/open-data/Pages/default.aspx>`_
* `London Datastore, UK <http:https://data.london.gov.uk/dataset>`_
* `London, ON, Canada <http:https://www.london.ca/city-hall/open-data/Pages/default.aspx>`_
* `Los Angeles Open Data <https://data.lacity.org/>`_
* `MassGIS, Massachusetts, U.S. <http:https://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/>`_
* `Mexico <http:https://catalogo.datos.gob.mx/dataset>`_
Expand Down Expand Up @@ -302,7 +303,6 @@ Government
* `Uruguay <https://catalogodatos.gub.uy/>`_
* `Vancouver, BC Open Data Catalog <http:https://data.vancouver.ca/datacatalogue/>`_
* `Victoria, BC, Canada <http:https://www.victoria.ca/EN/main/city/open-data-catalogue.html>`_
* `DataBC - data from the Province of British Columbia <http:https://www.data.gov.bc.ca/>`_


Healthcare
Expand Down Expand Up @@ -332,16 +332,11 @@ Image Processing
* `Indoor Scene Recognition <http:https://web.mit.edu/torralba/www/indoor.html>`_
* `International Affective Picture System, UFL <http:https://csea.phhp.ufl.edu/media/iapsmessage.html>`_
* `Massive Visual Memory Stimuli, MIT <http:https://cvcl.mit.edu/MM/stimuli.html>`_
* `Several Shape-from-Silhouette Datasets <http:https://kaiwolf.no-ip.org/3d-model-repository.html>`_
* `Stanford Dogs Dataset <http:https://vision.stanford.edu/aditya86/ImageNetDogs/>`_
* `SUN database, MIT <http:https://groups.csail.mit.edu/vision/SUN/hierarchy.html>`_
* `The Oxford-IIIT Pet Dataset <http:https://www.robots.ox.ac.uk/~vgg/data/pets/>`_
* `YouTube Faces Database <http:https://www.cs.tau.ac.il/~wolf/ytfaces/>`_
* `Several Shape-from-Silhouette Datasets <http:https://kaiwolf.no-ip.org/3d-model-repository.html>`_

Legal
----------------

* `Canadian Legal Information Institute <https://www.canlii.org/en/index.php>`_


Machine Learning
Expand All @@ -367,13 +362,13 @@ Machine Learning
Museums
-------

* `Canada Science and Technology Museums Corporation's Open Data <http:https://techno-science.ca/en/data.php>`_
* `Cooper-Hewitt's Collection Database <https://github.com/cooperhewitt/collection>`_
* `Minneapolis Institute of Arts metadata <https://github.com/artsmia/collection>`_
* `Natural History Museum (London) Data Portal <http:https://data.nhm.ac.uk/>`_
* `Rijksmuseum Historical Art Collection <https://www.rijksmuseum.nl/en/api>`_
* `Tate Collection metadata <https://github.com/tategallery/collection>`_
* `The Getty vocabularies <http:https://vocab.getty.edu>`_
* `Canada Science and Technology Museums Corporation's Open Data <http:https://techno-science.ca/en/data.php>`_


Natural Language
Expand Down Expand Up @@ -409,7 +404,7 @@ Physics


Psychology/Cognition
--------------
--------------------

* `OSU Cognitive Modeling Repository Datasets <http:https://www.cmr.osu.edu/browse/datasets>`_

Expand Down Expand Up @@ -449,69 +444,77 @@ Search Engines
* `DataMarket (Qlik) <https://datamarket.com/data/list/?q=all>`_
* `Harvard Dataverse Network of scientific data <https://dataverse.harvard.edu/>`_
* `ICPSR (UMICH) <http:https://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp>`_
* `Institute of Education Sciences <http:https://eric.ed.gov>`_
* `National Technical Reports Library <https://ntrl.ntis.gov/NTRL/login.xhtml>`_
* `Open Data Certificates (beta) <https://certificates.theodi.org/en/datasets>`_
* `OpenDataNetwork - A search engine of all Socrata powered data portals <http:https://www.opendatanetwork.com/>`_
* `Statista.com - statistics and Studies <http:https://www.statista.com/>`_
* `Institute of Education Sciences <http:https://eric.ed.gov>`_
* `National Technical Reports Library <https://ntrl.ntis.giv/NTRL>`_
* `Zenodo - An open dependable home for the long-tail of science <https://zenodo.org/collection/datasets>`_


Social Sciences
Social Networks
---------------

* `72 hours #gamergate scrape <http:https://waxy.org/random/misc/gamergate_tweets.csv>`_
* `72 hours #gamergate Twitter Scrape <http:https://waxy.org/random/misc/gamergate_tweets.csv>`_
* `Ancestry.com Forum Dataset over 10 years <http:https://www.cs.cmu.edu/~jelsas/data/ancestry.com/>`_
* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape <https://archive.org/details/twitter_cikm_2010>`_
* `CMU Enron Email of 150 users <http:https://www.cs.cmu.edu/~enron/>`_
* `EDRM Enron EMail of 151 users, hosted on S3 <https://aws.amazon.com/datasets/enron-email-data/>`_
* `Facebook Data Scrape (2005) <https://archive.org/details/oxford-2005-facebook-matrix>`_
* `Facebook Social Networks from LAW (since 2007) <http:https://law.di.unimi.it/datasets.php>`_
* `FBI Hate Crime 2013 - aggregated data <https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013>`_
* `Foursquare from UMN/Sarwat (2013) <https://archive.org/details/201309_foursquare_dataset_umn>`_
* `GDELT Global Events Database <http:https://gdeltproject.org/data.html>`_
* `General Social Survey (GSS) since 1972 <http:https://gss.norc.org>`_
* `GetGlue - users rating TV shows <http:https://getglue-data.s3.amazonaws.com/getglue_sample.tar.gz>`_
* `GitHub Collaboration Archive <https://www.githubarchive.org/>`_
* `Google Scholar citation relations <http:https://www3.cs.stonybrook.edu/~leman/data/gscholar.db>`_
* `MIT Reality Mining Dataset <http:https://realitycommons.media.mit.edu/realitymining.html>`_
* `Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>`_
* `Network Twitter Data <http:https://snap.stanford.edu/data/higgs-twitter.html>`_
* `PewResearch Internet Survey Project <http:https://www.pewinternet.org/datasets/pages/2/>`_
* `PewResearch Society Data Collection <http:https://www.pewresearch.org/data/download-datasets/>`_
* `Political Polarity Data <http:https://www3.cs.stonybrook.edu/~leman/data/14-icwsm-political-polarity-data.zip>`_
* `Reddit Comments <https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/>`_
* `Skytrax' Air Travel Reviews Dataset <https://github.com/quankiquanki/skytrax-reviews-dataset>`_
* `Social Twitter Data <http:https://snap.stanford.edu/data/egonets-Twitter.html>`_
* `SourceForge.net Research Data <http:https://www3.nd.edu/~oss/Data/data.html>`_
* `StackExchange Data Explorer <http:https://data.stackexchange.com/help>`_
* `Texas Inmates Executed Since 1984 <http:https://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html>`_
* `Titanic Survival Data Set <https://github.com/caesar0301/awesome-public-datasets/tree/master/Datasets>`_
* `Twitter Data for Sentiment Analysis <http:https://help.sentiment140.com/for-students/>`_
* `Twitter Graph of entire Twitter site <http:https://an.kaist.ac.kr/traces/WWW2010.html>`_
* `Twitter Scrape Calufa May 2011 <http:https://archive.org/details/2011-05-calufa-twitter-sql>`_
* `UCB's Archive of Social Science Data (D-Lab) <http:https://ucdata.berkeley.edu/>`_
* `UCLA Social Sciences Data Archive <http:https://dataarchives.ss.ucla.edu/Home.DataPortals.htm>`_
* `UNIMI/LAW Social Network Datasets <http:https://law.di.unimi.it/datasets.php>`_
* `Universities Worldwide <http:https://univ.cc/>`_
* `UPJOHN for Labor Employment Research <http:https://www.upjohn.org/services/resources/employment-research-data-center>`_
* `Yahoo! Graph and Social Data <http:https://webscope.sandbox.yahoo.com/catalog.php?datatype=g>`_
* `Youtube Video Social Graph in 2007,2008 <http:https://netsg.cs.sfu.ca/youtubedata/>`_


Social Sciences
---------------

* `Canadian Legal Information Institute <https://www.canlii.org/en/index.php>`_
* `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc <http:https://www.systemicpeace.org/>`_
* `Correlates of War Project <http:https://www.correlatesofwar.org/>`_
* `The MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste <http:https://nsd.uib.no>`_
* `Cryptome Conspiracy Theory Items <http:https://cryptome.org>`_
* `Datacards <http:https://datacards.org>`_
* `European Social Survey <www.europeansocialsurvey.org/data/>`_
* `FBI Hate Crime 2013 - aggregated data <https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013>`_
* `GDELT Global Events Database <http:https://gdeltproject.org/data.html>`_
* `General Social Survey (GSS) since 1972 <http:https://gss.norc.org>`_
* `General Social Survey <http:https://gss.norc.org/Get-The-Data>`_
* `German Social Survey <http:https://www.gesis.org/en/home/>`_
* `Global Religious Futures Project <http:https://www.globalreligiousfutures.org/>`_
* `Institute for Demographic Studies <http:https://www.ined.fr/en/>`_
* `UN Civil Society Database <http:https://esango.un.org/civilsociety/>`_
* `Terrorism Research and Analysis Consortium <http:https://www.trackingterrorism.org/>`_
* `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc <http:https://www.systemicpeace.org/>`_
* `International Networks Archive <http:https://www.princeton.edu/~ina/>`_
* `Paul Hensel General International Data Page <http:https://www.paulhensel.org/dataintl.html>`_
* `James McGuire Cross National Data <http:https://jmcguire.faculty.wesleyan.edu/welcome/cross-national-data/>`_
* `International Studies Compendium Project <http:https://www.isacompendium.com/public/>`_
* `European Social Survey <www.europeansocialsurvey.org/data/>`_
* `General Social Survey <gss.norc.org/Get-The-Data>`_
* `International Social Survey Program ISSP <http:https://www.issp.org>`_
* `German Social Survey <http:https://www.gesis.org/en/home/>`_
* `International Studies Compendium Project <http:https://www.isacompendium.com/public/>`_
* `James McGuire Cross National Data <http:https://jmcguire.faculty.wesleyan.edu/welcome/cross-national-data/>`_
* `MIT Reality Mining Dataset <http:https://realitycommons.media.mit.edu/realitymining.html>`_
* `Paul Hensel General International Data Page <http:https://www.paulhensel.org/dataintl.html>`_
* `PewResearch Internet Survey Project <http:https://www.pewinternet.org/datasets/pages/2/>`_
* `PewResearch Society Data Collection <http:https://www.pewresearch.org/data/download-datasets/>`_
* `Political Polarity Data <http:https://www3.cs.stonybrook.edu/~leman/data/14-icwsm-political-polarity-data.zip>`_
* `StackExchange Data Explorer <http:https://data.stackexchange.com/help>`_
* `Terrorism Research and Analysis Consortium <http:https://www.trackingterrorism.org/>`_
* `Texas Inmates Executed Since 1984 <http:https://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html>`_
* `The MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste <http:https://nsd.uib.no>`_
* `Titanic Survival Data Set <https://github.com/caesar0301/awesome-public-datasets/tree/master/Datasets>`_
* `UCB's Archive of Social Science Data (D-Lab) <http:https://ucdata.berkeley.edu/>`_
* `UCLA Social Sciences Data Archive <http:https://dataarchives.ss.ucla.edu/Home.DataPortals.htm>`_
* `UN Civil Society Database <http:https://esango.un.org/civilsociety/>`_
* `Universities Worldwide <http:https://univ.cc/>`_
* `UPJOHN for Labor Employment Research <http:https://www.upjohn.org/services/resources/employment-research-data-center>`_


Sports
Expand All @@ -528,11 +531,11 @@ Sports
Time Series
-----------

* `Databanks International Cross National Time Series Data Archive <http:https://www.cntsdata.com>`_
* `Hard Drive Failure Rates <https://www.backblaze.com/hard-drive-test-data.html>`_
* `Heart Rate Time Series from MIT <http:https://ecg.mit.edu/time-series/>`_
* `Time Series Data Library (TSDL) from MU <https://datamarket.com/data/list/?q=provider:tsdl>`_
* `UC Riverside Time Series Dataset <http:https://www.cs.ucr.edu/~eamonn/time_series_data/>`_
* `Databanks International Cross National Time Series Data Archive <http:https://www.cntsdata.com>`_


Transportation
Expand Down Expand Up @@ -564,13 +567,11 @@ Transportation
Complementary Collections
-------------------------

* `Database of Scientific Code Contributions <https://mozillascience.org/collaborate>`_
* DataWrangling: `Some Datasets Available on the Web <http:https://www.datawrangling.com/some-datasets-available-on-the-web>`_
* Inside-r: `Finding Data on the Internet <http:https://www.inside-r.org/howto/finding-data-internet>`_
* OpenDataMonitor: `An overview of available open data resources in Europe <http:https://opendatamonitor.eu>`_
* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits <http:https://www.opendatanetwork.com/>`_
* Quora: `Where can I find large datasets open to the public? <http:https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public>`_
* RS.io: `100+ Interesting Data Sets for Statistics <http:https://rs.io/100-interesting-data-sets-for-statistics/>`_
* StaTrek: `Leveraging open data to understand urban lives <http:https://xiaming.me/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>`_
* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. <https://zenodo.org/collection/datasets>`_
* `Database of Scientific Code Contributions <https://mozillascience.org/collaborate>`_

0 comments on commit d2f8cb8

Please sign in to comment.