The Future of RethinkDB

nodesocket · on Nov 3, 2017

I'm interested in the backchannel and business that happens when a company that raises multiple millions of dollars shuts down. Do the investors take over? Do the founders have to pay back investors? Is there any liability personally by the founders to investors? This is something that is never really discussed, probably because it gets messy with lawyers.

EDIT: Just heard Mike say that CNCF acquired the ip and assets of RethinkDB for $20k? That's can't be right? Tiny startups that generate $1k a month in recurring revenue sell for more than $20K. RethinkDB raised over $12M right? What am I missing?

throwaway5752 · on Nov 4, 2017

The transaction was more of a donation. Companies that get bought are because the technology is strategically important or it is profitable (either business model scales up, or business functions overlap and you can reduce expenses). I'd guess that no companies with enough money to pay sufficiently more than $20K though thought either of those were the case?

nodesocket · on Nov 4, 2017

> I'd guess that no companies with enough money to pay sufficiently more than $20K though thought either of those were the case?

It seems like an official hosted RethinkDB that included enterprise support could generate pretty nice MRR revenue and take business from Compose.io (IBM). Shoot, wish I'd known, $20K for the ip and assets was a steal. Probably would have been exponentially more to a buyer who wanted to turn it commercial though.

throwaway5752 · on Nov 4, 2017

I don't know. Databases are a hard business (see Riak, FoundationDB). Costs money to add features and fix bugs (and really hard to find qualified folks that can do that) as well as hosting costs if you offer a service. We'll see what happens with some other entrants. I would read http://www.defmacro.org/2017/01/18/why-rethinkdb-failed.html regarding DBaaS, too.

nodesocket · on Nov 4, 2017

I've read it. Completely agree about RethinkDB specifically, who raised VC capital on the order of $12M. A cloud hosted solution is a very tough business, with thin margins when you have dozens of employees, $15-$25k a month bay area rent, and very high overhead. The numbers don't add up and not the return investors are looking for.

Though, if you're bootstrapping with your own capital and grow it to something like $10K or $20K a month in MRR, that's a win. I'm all about bootstrapping SaaS companies and growing recurring revenue.

kbd · on Nov 4, 2017

The company wasn't bought, it went out of business. This transaction was just for the rights to the IP so that the open source project could continue under the same name.

macawfish · on Nov 3, 2017

I really enjoy RethinkDB. Partly, that's cause it's the first time I've really dug into a nosql database, and I'm loving the freedom of it... Being able to dump stuff straight from a websockets feed straight into the database is awesome! Also, I like the web interface. It's simple and shows me what I want to see with regard to performance and storage.

I haven't really gotten much into doing fancy queries or transforms or streams.

If someone could make a tool that let you use RethinkDB as a (more or less) direct back end for pandas... That would be killer

chrisabrams · on Nov 3, 2017

I do a lot of work with RethinkDB + Pandas. We should talk.

ccmonnett · on Nov 7, 2017

I use RDB heavily with some Pandas hanging off the side because we haven't integrated them well but looking to improve that. You can get in touch with me if you'd like - I use this handle pretty much everywhere (except reddit ;) ).

macawfish · on Nov 5, 2017

my contact info is at wondering.xyz/contact

chrisco255 · on Nov 3, 2017

Pandas?

w0m · on Nov 3, 2017

I assume they're doing some dataframe munging in Python.

http://pandas.pydata.org/

macawfish · on Nov 3, 2017

yup, and I love pandas! it's so amazing! it makes me want to do data analysis just for fun

dorfsmay · on Nov 3, 2017

I've had to convert 500+ CSV files into graphs... Pandas + seaborn were the most effective solution!

rishav_sharan · on Nov 3, 2017

Is there any update on horizon.io?

it's github hasn't received any update in a very long while

andrewrothman · on Nov 3, 2017

I'm not sure if the pun was intended, but if not, it would've made an excellent joke.

jdoliner · on Nov 3, 2017

It's almost hard not to make that joke, horizon.io just lends itself to that so readily.

jewel777 · on Nov 3, 2017

Unfortunately, people don't always get jokes right away. Certain amounts of explanation can be necessary.

tarr11 · on Nov 3, 2017

What is the primary use case for rethinkdb (vs other databases) ?

traverseda · on Nov 3, 2017

There isn't one. But there also isn't really a primary use case for postgres, as compared to mysql. It's just a pretty pleasant to use nosql database.

Easy clustering, first-class changefeeds, a somewhat-confusing query-system, runs in the current working-dir by default means it's dead-simple to set up for development. No fire-and-forget write.

Reminds me a bit of firebase, but free and open-source. If you're in a position where changefeeds or nosql are important, you should probably give rethinkdb a look.

bpicolo · on Nov 3, 2017

That admin UI is the slickest in the biz tho

smnscu · on Nov 4, 2017

While not as powerful, I enjoy CockroachDB's UI too. And it's wire-protocol compatible with Postgres!

api · on Nov 3, 2017

We use it here (ZeroTier).

Pros:

- Very easy and robust clustering (Raft-based, automatic fail-over). This is huge for us.

- Streaming change feeds. This one is also huge. Makes any kind of real-time, reactive, or event driven programming very easy and IMHO is something that should exist in every database.

- It's kind of half SQL. It's a NoSQL document store but encourages a relational design and supports many relational queries.

- Rational and pretty easy to understand query language. It's much cleaner than Mongo.

- Easy to deploy and configure.

- It passed the Jepsen tests before Mongo did and overall has a solid history of not losing data.

Cons:

- It's a CPU hog, at least when compared with PostgreSQL.

- It's also an I/O hog, though we sponsored some improvements that are getting merged in the next version that will reduce this and also make table commit a configurable parameter. You'll be able to have fully and partially (long flush delay) in-memory tables for highly ephemeral data.

lucasjans · on Nov 6, 2017

Hey Adam - I'm very interested in your sponsored contributions towards reducing IOPS in RethinkDb. It's our biggest challenge, even though we're on SSDs. Our backfills after outages are especially long and painful. Are these updates something you're running safely in production today?

CodesInChaos · on Nov 3, 2017

How do you handle the lack of transactions / atomic updates affecting more than one document?

lackbeard · on Nov 3, 2017

If you're going to use a database that does not support these features then you should come up with a data model that does not rely on them.

For example, instead of applying a bank account transfer as a database transaction that debits one account record and credits another, you create a new transaction record (account transaction, not database transaction.) Then account balances are a sum over these transaction records.

api · on Nov 3, 2017

Our data model generally doesn't require this. We're actually okay with less guarantees than RethinkDB provides. AFIAK NoSQL stores in general are a bad choice if you need this. You should use a SQL database.

manigandham · on Nov 3, 2017

To be clear, nosql vs sql doesnt mean much - use the right type of database for the scenario: relational, document, graph, key/value, etc.

They all have various support for transactions with relational usually the most comprehensive.

traverseda · on Nov 3, 2017

Not OP, but I admit that's not a problem I've ever had. I've done a lot of webapps supporting data-science pipelines, and I've built some major components of those pipelines. It's not something I've felt the need for when I've used postgres.

What do you use that for?

phamilton · on Nov 4, 2017

Doing destructive data rollups (combining multiple rows and deleting the original rows) are made much simpler with transactions, especially if you do hybrid aggregations of short term data and long term data.

For example, you might store data with minute level granularity for the past 24 hours but only hourly for the past 30 days. If someone queries the past two days, you need to look at both those datasets. Then, every hour or so, you need to summarize an hour of minute level data, insert it into the hourly granularity table and then remove it from the minute granularity table. Meanwhile, you want to make sure any queries aren't going to double count that data after insertion but before removal.

This can be done without transactions in a few ways, but they require putting your replication and rollup logic and constraints into your reading code, rather than having it isolated to your roll up code. And your data model has to be tweaked to allow for some of these operations. And the complexity often results in double counting bugs (or bugs where the data is not counted at all).

There are solutions though. They just require a lot more hoops than starting a transaction, moving the data, committing the transaction.

chillydawg · on Nov 3, 2017

It's good for really easy changefeeds. Eg in a multiplayer game scenario you might have several people touching rows in a table. Any client can construct a query filtering on their particular subset and trivially just say .Changes() and then get a fast, reasonably-robust changefeed with an image-then-deltas type interface. It's not particularly low latency and the latency is quite variable - so if you're timing requirements are of the millisecond order, look elsewhere.

traverseda · on Nov 3, 2017

Games and chat are always popular examples, but I think it's really valuable for data-science.

You can also "fill in" missing data. If you were writing a webscraper, you could make a service that looks for url objects without any content, and scrape them. Then make a service that looks for url objects with content but that is missing ML-filled in details, and have it fill those in.

It's pretty good for disparate teams with different sets of technology. One group doing document classification, another trying NLP, another using RNNs, etc.

There are a few times in my career where rethinkdb would have been a killer feature, especially with it's well-documented language bindings.

jasondc · on Nov 3, 2017

Definitely valuable, as an aside, changefeeds are now in mongodb too: https://emptysqua.re/blog/driver-features-for-mongodb-3-6/#c...

traverseda · on Nov 3, 2017

and postgres!

_ps6d · on Nov 3, 2017

Do you mean using LISTEN/NOTIFY, or some other method?

williamstein · on Nov 3, 2017

Probably LISTEN/NOTIFY. I explain in detail how I migrated from RethinkDB to PostgreSQL using LISTEN/NOTIFY + tons of additional work here http://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.ht...

I've continued working on this codebase, even pushing commits this week like this one: https://github.com/sagemathinc/cocalc/commit/c20a62446b6e43c...

_ps6d · on Nov 3, 2017

Haha, I actually just finished skimming through that exact blog post because it was one of the first things that came up when I searched for "postgresql changefeed" to see if there was some other functionality for doing it that I didn't know about.

I'll definitely need to go back and read it more thoroughly later and take a look through your code, thanks for the links.

chillydawg · on Nov 4, 2017

You could do it with pglogical, if you're feeling particularly adventurous.

slap_shot · on Nov 3, 2017

I think it's very tough to describe the "use case" for RethinkDB (I used it in production two years and have expressed that I don't think RethinkDB did exceedingly well at any particular use case to warrant being used over other solutions). It might be easier to list the differentiators in roughly the order they were introduced:

JSON document storage

ReQL Query Language

JOINs

Easy Deployment

Administrative UI for monitoring, sharding, querying data

Change feeds

High Availability

_yapn · on Nov 3, 2017

For me, the real draw was how easy it is to setup a cluster with high availability and automatic failover. Having used MySQL in production on a mission critical web app for years, failing over between data centers and setting up replication again every time there was some kind of hiccup got very old very fast.

nailer · on Nov 3, 2017

It's a place to store data.

- The data can be structured

- It can also have relations

- The query language is JS, and allows both 'shape of data' and functional queries

- There's live change feeds (which means the DB, being the source of truth, takes the role of initiating change messages)

- RethinkDB has an excellent reputation for being able to get the data back after you save it.

Basically it's like Mongo but not (insert adjective).

We've been using it in production for 2 years at CertSimple and have been very happy. Previous experience is Mongo, GAE data store, and various ORMs pointed at SQL. The docs are great, the defaults are safe, and doing new things is easy.

drdaeman · on Nov 3, 2017

> RethinkDB has an excellent reputation for being able to get the data back

Uh. It's not always true, although not in the sense you've meant it (no complaints about storage reliability).

I don't know if my setup is broken or I did something stupid, but for me rethinkdb-dump saturates the CPU to the max and the only thing that keeps the machine from choking to death with LA1 going over 100 is resource limit on the database container. Trying to back things up results in random connection drops and timeouts. I gave up on trying to back up the database online.

And that's a very small database (75GB on disk, 12GB as uncompressed JSON, 2.5GB as a tarball), on a reasonably powerful machine. It's a single node, though - I thought I'd "upgrade" to a cluster at some point but it's way too early.

overcast · on Nov 3, 2017

It's what MongoDB should have been. A relational document database, with changefeeds.

fastball · on Nov 3, 2017

https://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis?

chx · on Nov 3, 2017

That site is a gem: https://kkovacs.eu/cool-but-obscure-unix-tools

nailer · on Nov 3, 2017

Could someone with the time to listen please provide a summary?

adamb_ · on Nov 3, 2017

Listened to it last week -- The gist is that they've found a new home with the CNCF & The Linux Foundation, which bought the IP so that they could continue working on it publicly. Besides the database (which was always open source) this is especially important for parts of RethinkDB that were meant for "enterprise-only", which the company was working on internally before they shutdown. All and all the community support sounds strong, and after listening I decided to take another look at Rethink for my next project :)

dankohn1 · on Nov 3, 2017

Small edit: CNCF funded the transaction (to free the IP by relicensing under Apache-2.0) but the project is hosted by CNCF's parent, The Linux Foundation.

Disclosure: I'm executive director of CNCF and did the transaction. And, in case you're wondering, I'm thrilled that the community of people able to take advantage of the code is growing.

muramira · on Nov 4, 2017

Dude, really thank you for your hard work on this. Out of curiosity, how did you pull it off?

okramcivokram · on Nov 3, 2017

There's a transcript on the bottom of the page (on mobile).

saidmasoud · on Nov 3, 2017

From the transcript, in case anyone was confused:

s/bizzare/bazzar/g

sant0sk1 · on Nov 3, 2017

Thanks for pointing that out! Our transcriber usually does a great job, but he doesn't get 'em all right.

The awesome thing is that our transcripts are open source and somebody must've read your comment, because I just merged a PR fixing this.

https://github.com/thechangelog/transcripts/pull/13

Our site auto-updates the transcripts after a merge, so your comment is now outdated. :)

ComodoHacker · on Nov 3, 2017

That was me, and I've read the transcript. :)

atombender · on Nov 3, 2017

Or bazaar, rather.

amflare · on Nov 3, 2017

Thats a lot of text...

Also https://xkcd.com/927/

bjt · on Nov 3, 2017

I don't think RethinkDB ever intended to unify and replace all existing databases.