This is an umbrella bug report for a range of high-impact, long-standing bugs which prevent sharing, finding, and collaborating on documents and data.
See also @JeanFred's original tracking bug T44725, linked to over 200 other tickets, including hundreds of formats not covered here.
The problem:
- It is very hard to share data files or document files on Wikimedia projects. This limits the efficacy of community members, and of outreach to new communities (Who would join a community that rejects every format of their work?).
- In the rare cases where a data file (in the Commons:Data namespace) or document (as pdf) is shared on Commons, it is hard to find them, as the search interface hides them.
- Adding new format support is easy, and the highest-impact work we do, but inertia + lack of clear process has made it surprisingly rare. After two decades we have support for only 20 filetypes. The only media category where we support the most common formats (directly or via transcoding) is Images.
- Many filetypes that our communities depend on to run the projects and do daily research, are blocked from upload to our wikis -- pushing them to use non-free services (such as Dropbox or Google), or non-public ones (such as email).
Issues blocking uploading data + documents:
0) Organize related tickets and data: Add FileTypes tag for new requests + analyses.
There are hundreds of file formats we should eventually support; the hard-to-follow response to dozens of past bugs on related issues is in part related to the lack of a single queue for related work.
The "how to add support for new filetypes" guide is a great start; a tag here will help. Work on things like finessing tabular-data support (into a new namespace) (on Commons only) might also merit the tag.
- 0.1) Reopen tickets for gathering relevant data. T77796 - data on what unsupported filetypes are being uploaded
1) Add Data and Documents as categories in Commons search.
(Currently it has Images, Audio, Video, and Other Media -- add Documents, Data, before Other)
- 1.1) Make these search .tab and .map datasets in the Data namespace - T252327
- 1.2) Allow searching Newfiles to filter by format. - T66768
2) Add upload support for essential document file formats
(Currently we support only PDF, arguable the least open of all of these, and DJVU)
- 2.1) Add .RDF support
- 2.3) Add .EPUB format - is there an underlying problem w/ zipped formats? T252250
- 2.2) Add .ODT support - T45154 (for all ODF formats)
- 2.4) Add .ODP support - T45154 (for all ODF formats), (presentations)
- 2.5) Review other OO formats - OASIS (T4089),
3) Add upload support for essential data file formats
(Currently we support NO STRUCTURED DATA FORMATS AT ALL, despite using them in every technical part of our workflow)
"There is currently a major issue with storing statistical data in Wikidata, which would be solved if we could upload the data to Commons as Tabular Data files." - @NavinoEvans on T181319
- 3.1) Add .CSV support - in use across wikimedia, just not as files
- 3.2) Add .JSON support - in use on many other MW instances, see T68036
- 3.3) Add .XML support (see also Music XML + Lilypond: T214023 , T208494)
- 3.4) Add .SQLite support (widely used, selected as an archival format by the Library of Congress)
- 3.5) Add .ODS support : T45151
- 3.6) Update related conversations. How to deal with open datasets, Talk:Allowable file types