Hacker News new | past | comments | ask | show | jobs | submit login

> Complicating matters, programmers can struggle to adapt standard version-control workflows to the fast, iterative nature of data exploration. As a result, crucial experimental details can be lost.

> "Somebody will say, oh, did you try this in the model, or did you try this analysis?" she says. Many times, the answer is yes, but because the analysis didn’t work, the code is deleted. "What you want to do during the meeting is just pull it back up and be like, oh, yeah, I did, and here’s why it didn’t work. And our tool lets you actually do that."

Errrrr... Slightly confused as git tags help with this already?

`git tag -a <tag-name> -m "<full description of what this tag is all about>"`

Hi, one of the creators of Vizier here.

The difficulty with git in this use case is that it relies on the user to manually commit. If you try the analysis, and decide to throw the code away before you commit, too bad! There's no longer a record of it.

Vizier records everything. Every single change to the notebook, just like track changes in your word processor/spreadsheet of choice. Speaking from experience, it's really nice not to have to think about manually check-pointing... the history is just there.

Many existing version control tools help with this, but there are some challenges that crop up when doing this with jupyter notebooks:

1. Versioning notebooks in a semantically-useful way is difficult: think of it more like versioning binary data than source code. I know we struggle with this a fair amount.

2. There are many debates, some here on HN, about "best practices" for collaborative versioning (do you squash? rebase? etc). These questions exist for notebook users too, and there's a lot of interest in developing tools that either make these decisions for you (are opinionated), integrate them more into the GUI nature of notebooks, or both.

Lots of people seem to state that git already solves lots of the versioning issues for ML experiments. However I am yet to find a good guide or example.

Just tagging commits and experiment runs does not seem enough of a framework to manage large experiments.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact