Skip to content
/ csview Public

Interactive plot of any CSV file in your browser

Notifications You must be signed in to change notification settings

rixed/csview

Repository files navigation

CSVIEW

Simple, convenient, pretty, fast: pick all!

Command line: csview file1 file2 file3 will start an HTTP server on some port and open a browser to this page, that will display the CSV files.

CSV

Structure can be given as options. By default the first field will be assumed to be the X axis, and the plot type will be a time graph. In this mode the X axis column must be ascending (because csview does not sort the file and reads only a window of it).

A tmpdir is used to store aggregated views (can be either saved for future reuse or deleted at exit).

We need to be able to quickly scan the CSV, aka to read arrays of floats (unboxed in ocaml) from a csv file. We have two kind of reads:

  1. Block reads, when we will keep most of the values we read ; for which we just need to quickly locate the starting time t1. This poses no particular difficulty.

  2. Sparse reads, when most of the values will not be read ; the idea that we should read everything regardless just to compute on the fly the min/max/avg etc is wrong: shall we want to display those then another CSV, pre-aggregated, should be provided. Hum, but then the idea of a sparse read is wrong as well: the CSV should be pre-aggregated!

Let’s first imagine we really want to plot only 100 points at regular time interval from a file of 1M points. Basically we will poke in the file looking for the timestamps that we want, using the index we have. That’s 100 syncs.

A quick measurement shows that it takes ~0.2ms to seek randomly in a huge file on the SSD of my macbook. to get 100 points with a perfect index would therefore require 20ms which is still acceptable. We need this perfect index though.

What’s a perfect index?

  • it’s small enough that it fits in csview memory (it’s flushed on disk at exit unless instructed otherwise).

  • It can grow as we explore more of the file

  • We can very quickly know where it’s useful to grow it (ie. if we know the offset of a timestamp we can quickly learn what’s the closest timestamps we already know, and how far they are in the file)

  • As more useful information is added we can remove the less useful entries.

Useful has two possible definitions: good coverage of the file, or close to what we used to ask.

For a start, a mere binary tree (as in Set of timestamp * offset) would be good enough.

From the index we also learn the local average size of a tuple, allowing to bisect quicker to finally find the page we want. (note: we read the file by "page", in order to increase the chance that the desired timestamp is in there).

URL

Parameters: - file=name1,name2,…​ or all by default; - start=time or minimum of all files by default; - stop=time or maximum of all files by default; - n=nb points

Design

The backend generates the SVG directly. Everything else but that SVG is static content (that’s served from some files for easy customization). Therefore, we do not really need a lot of javascript, and js_of_ocaml might not even be required. Unless we add a lot of interactions with the graph, but so far we need:

  • scroll

  • zoom

  • pointed values

Plan A: Eliom

Pros: - Once the first version is done it’s easier to grow into a more complex app (add graph types, add filters, etc); - First grade web server;

Cons: - It’s bloated: Hard to install, to compile, it’s slower, and it will make simple things harder.

Plan B: Custom HTTP server, minimal JS.

Update: Done that! not particularly fast BTW, but contributed 2 useful libs (parsercombinator and net_codecs).

Pros: - Lean, fast, simple to customize;

Cons: - Have to build an HTTP server - quite easy, but some decoders maybe;

Still, tyxml can help to build statically correct HTML and_SVG. If too cumbersome, we can use owww.

Timeline

On csview:

  1. parsing the query: 5ms

  2. reading the CSV:

  3. computing the SVG:

  4. writing the SVG:

On the browser:

  1. reading the SVG:

  2. rendering the SVG:

Command line

cmdliner is not a good fit since this is not a multi-command program. Arg.parse should be simpler overall.

Scope of the options: per file (change only the settings for the files given after), per graph or global.

  • --separator: a char (default: auto!)

  • --x: index

  • --title: also set the name of the graph config, so more important than it seems;

  • --to-float followed by an index and the name of a formatter (string→float);

  • --y1, --y2 list of indices for values to report on the left and right axis;

  • --to-string followed by an index and the name of a labeler (float→string), used for ticks and legends;

  • --x-label, --y1-label, --y2-label for the axis;

  • --y1-stacked, --y2-stacked: booleans;

  • --annot or --y3: index of a field to use for annotations - values are supposed to be strings already suitable for display;

  • --color, --width, --filled, --opacity: index and name of a color, boolean or float;

  • --confdir, name of the directory to save/read configuration to/from (one per file for the file related settings and one per graph for the graph settings - which include the name of the file to use for that graph, and one global file for global settings such as the open-with below).

  • --no-browser: to avoid opening the browser automatically

  • --open-with: how exactly to launch the browser?

  • --confname: instead of using the filename to get previous configuration from confdir, use this alternate name. This is useful if for instance the actual file is a temporary file as in =(grep foo file.csv).

Filters

A way to say when to consider a y value depending on other fields.

Quite complex and doable on the CSV directly. Also, would likely disprove the assumption that a CSV line can be read from any block.

TODO: fix this limitation that a line must fit within a block, that is probably desirable anyway, and then we could filters.

data sources

Implement a quote character in addition to the field separator. Also: a way to use a column as factors: -x 1 -y 2 -factors 3,6 Requires to read all the lines for a given T not just one, still assuming proper ordering. Then the following parameters would apply to all fields created. The problem is we don’t know in advance how many factors we will have and have to scan the whole file first :-(

AutoSaving configuration

Use case: We devise the perfect command line to represent some data in some file. When the file is updated, we do not want to have to remember it. Ideally, the same configuration can be figured out from the config file name. But there are cases when we can devise several layout for a single CSV file, all worth saving. In that case using the alternate file name for saving allows to store several representations for the same file.

Also, a single graph may be composed of several files.

So what we really want to save are the graphs themselves. Saving also fields info on a per file conig can also make sense, though (so that we already know the formatters, for instance).

So we want to save:

  • given an (alternative) file name, the label, fmt, color, etc, of each field (a field array)

  • given the title of a graph, the file array and labels.

  • given the title of the page, the list of graphs. (for dashboards)

TODO: rewrite.

Also, a Chart.Source with label, color, stroke width, etc, used by Config.field, and passed in the fold iteratof instead of one parameter for color, another for label, etc…​

Legend and overlay

TODO: JS as in clopinet to highlight part of the graph when hovering its label in the legend.

Annotations

TODO

Examples

date,with,without
2001-01-01,392,1414
2001-01-02,381,762
2001-01-03,342,809
2001-01-04,374,792
2001-01-05,410,857
2001-01-06,443,847
2001-01-07,369,735
2001-01-08,385,772
2001-01-09,351,833
2001-01-10,380,857
2001-01-11,406,821
./csview test_data/chicago.arrests.csv --has-header \
  -x 1 --format 'date(%Y-%m-%d)' \
  -y 2 --filled -y 3 --filled \
  --show-0 --y-label 'Nbr of Cases' --stacked
chicago1

About

Interactive plot of any CSV file in your browser

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published