Hacker News new | past | comments | ask | show | jobs | submit login

Disclosure: I work on Google Cloud.

Cool work! I love seeing people pushing distributed storage.

IIUC though, you make a similar choice as Avere and others. You're treating the object store as a distributed block store [1]:

> In HopsFS-S3, we added configuration parameters to allow users to provide their Amazon S3 bucket to be used as the block data store. Similar to HopsFS, HopsFSS3 stores the small files, < 128 KB, associated with the file system’s metadata. For large files, > 128 KB, HopsFS-S3 will store the files in the user-provided bucket.

...

> HopsFSS3 implements variable-sized block storage to allow for any new appends to a file to be treated as new objects rather than overwriting existing objects

It's somewhat unclear to me, but I think the combination of these statements means "S3 is always treated as a block store, but sometimes the File == Variably-Sized-Block == Object. Is that right?

Using S3 / GCS / any object store as a block-store with a different frontend is a fine assumption for dedicated client or applications like HDFS-based ones. But it does mean you throw away interop with other services. For example, if your HDFS-speaking data pipeline produces a bunch of output and you want to read it via some tool that only speaks S3 (like something in Sagemaker or whatever), you're kind of trapped.

It sounds like you're already prepared to support variably-sized chunks / blocks, so I'd encourage you to have a "transparent mode". So many users love things like s3fs, gcsfuse and so on, because even if they're slow, they preserve interop. That's why we haven't gone the "blocks" route in the GCS Connector for Hadoop, interop is too valuable.

P.S. I'd love to see which things get easier for you if you are also able to use GCS directly (or at least know you're relying on our stronger semantics). A while back we finally ripped out all the consistency cache stuff in the Hadoop Connector once we'd rolled out the Megastore => Spanner migration [2]. Being able to use Dual-Region buckets that are metadata consistent while actively running Hadoop workloads in two regions is kind of awesome.

[1] https://content.logicalclocks.com/hubfs/HopsFS-S3%20Extendin...

[2] https://cloud.google.com/blog/products/gcp/how-google-cloud-...




>It's somewhat unclear to me, but I think the combination of these statements means "S3 is always treated as a block store, but sometimes the File == Variably-Sized-Block == Object. Is that right?

If the file is "small" (under a configure size, typically 128KB), it is stored in the metadata-layer, not on S3. Otherwise, if you just write the file once in one session (and it is under the 5TB object size limit in S3), there will be one object in S3 (variable size - blocks in HDFS are by default fixed size). However, if you append to the file, then we add a new object (as a block) for the append.

We have a new version under development (working prototype) where we can (in the background) rewrite all the blocks in a single file as a single object, and make the object readable by a S3 API. It will be released some time next year. The idea is that you can mark directories as "S3 compatible" and only pay for rebalancing those ones as needed. But you then have the choice of doing the rebalancing on-demand or as a background task, and prioritizing, and so on. You know the tradeoffs. Yes, it would be easier to do this with GCS. But we did AWS and Azure first, as we feel GCS is more hostile to third-party vendors. The talks we have given at google (to the colossus team a couple of years ago and to Google Cloud/AI - https://www.meetup.com/SF-Big-Analytics/discussions/57666504... ) are like black holes of information transfer.


Your upcoming flexibility sounds awesome. I assume many people would just mark the entire bucket as “compatible” to support arbitrary renames/mv of directories, but being able to say “keep this directory in compat mode” for people who use a single mega bucket and split into teams / datasets at the top will be nice.

I’m sorry if you’ve tried to talk to us and we’ve been unhelpful. I’d be happy to put you in touch with some GCS people specifically — the Colossus folks are multiple layers below, while the AI folks are multiple layers above. They were probably mostly not sure what to say!

We worked quite openly and frankly with the Twitter folks on our GCS connector [1]. I’d be happy to help support doing the same with you. My contact info is in my profile.

(Though I’d definitely agree that we’ve also been surprisingly reticent to talk about Colossus, until recently the only public talk was some slides at FAST).

[1] https://cloud.google.com/blog/products/data-analytics/new-re...


> interop is too valuable

Good point, JuiceFS already provides this "transparent mode", which is called compatible mode.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: