Warning: These projects are in the early scoping stages; do not use for anything other than amusement/frustration purposes
What is going on here?
rexcel
googlesheets
(update coming soon)
These read spreadsheets into a common intermediate format (see linen
below). Rather than reading just the data they read formatting and other metadata, unevaluated formulas, etc. They can either return a data.frame
or return a linen
workbook or worksheet object.
linen
is our general spreadsheet object.
It represents workbooks, worksheets, "views" into worksheets, and formatting information. It eventually will support a variety of common operations on spreadsheets.
cellranger
looks after referencing
Used through out all the other packages, cellranger holds references to regions within worksheets and workbooks.
jailbreakr
attempts to extract non-tabular data from spreadsheets.
We have collected all the Enron corpus in a repository on gitlab. These files can be accessed using remotefile, or you can clone the whole ~1.5GB of files.
We ran rexcel
over all 15871 xlsx files in the corpus and have stored linen objects in this gitlab repository. This will be useful (for us) for seeing how Excel is used in the wild.