Can the goals of the protocol be achieved at the I/O library layer? #7

sharkinsspatial · 2025-08-19T01:43:39Z

sharkinsspatial
Aug 19, 2025

@jkeifer Very excited to see all the fast progress on this. I hadn't been tracking this work but I was just referred to your blog PR on CNG cloudnativegeo/website-cloudnativegeo.org#88. I don't think I've had time to fully digest the concept but a few things jumped out at me initially that might warrant some wider discussion. Again, these are after a single read through of the very thorough blog post so you should view my thoughts with a grain of salt 😆.

This problem seems more generalized than chunks. In my mind it is an I/O optimization problem. We want to dispatch adjacent HTTP range requests in a way that balances HTTP request overhead with optimal data transfer size. Though it is not exposed in an obvious way, fsspec has already pioneered this concept as an I/O library optimization through its use of cat_ranges and its configuration options which allow merging of adjacent requests within a specified overall size range. See https://developer.nvidia.com/blog/optimizing-access-to-parquet-data-with-fsspec/ for some more details on optimizing parquet reads. @kylebarron and I have had several discussions about implementing middleware for obstore which uses heuristics for "smart" adjacent range coalescing. The obstore middleware approach is attractive because it allows anyone to "bring their own" optimization configuration that is tuned for their specific use case rather than trying to build a single solution for everyone.
The protocol docs describe CCRP as "A byte broker - returns raw, unprocessed chunks". I'm unclear on how the larger coalesced block of bytes returned by the protocol will interact with codec pipelines on the client side 🤔? I'll have to defer to others with more knowledge in this area but my limited understanding is that some compression schemes are only valid when operating on the bytes of the originally compressed chunk and would not function to compress the larger stream. Who's responsibility will it be to split this larger coalesced "chunk" into its original constituents for codec processing? Will their need to be companion client implementations that handle disassembling data returned from the broker?
The biggest reason so many users embrace object storage for data storage and delivery is a simple one - "laziness" 😆. I personally would much rather delegate the responsibility of uptime, maintenance and hyper scaling to the cloud providers. Any additional layer we introduce between object storage and the client is server infrastructure we need to pay for, maintain and scale. This ability to defer this scaling to the underlying object storage implementation is probably the biggest reason for the explosive growth of cloud distributions of scientific data.

In short I agree that the problems that CCRP is trying to address are widespread and painful, but I wonder if focusing on I/O client optimizations would allow us to address a wider range of these problematic cases with a simpler approach.

jkeifer · 2025-08-19T02:16:42Z

jkeifer
Aug 19, 2025
Maintainer

@sharkinsspatial Thanks for taking the time to read through this even if only briefly and writing up your thoughts. I appreciate the detailed comments!

This is definitely an important conversation to have. I don't want to discount the value of improving I/O client-side. I think there would continue to be value there even if CCRP becomes a thing. But let me see if I can respond to your points an a way to adequately convey why I think client-side I/O optimizations can only get us so far.

A major problem with client-side coalescing, at least as I understand it, is that you cannot request non-continuous byte ranges in a single request. Could support for such a thing be added to object store APIs? Certainly. But then you still cannot coalesce across multiple objects. And with arrays and parquet tables today, being able to query across objects is critical to allowing small chunks (to decrease access pattern misalignment/read amplification). Either this is because you want to be able to request multiple objects in one request, or more likely, across shards. The idea here is that having the abstraction of objects between the array bytes and the client requesting them is artificial -- I think we need a new API to get around that erroneous and artificial abstraction.
The format of the chunks returned from CCRP is probably deserving of some debate. My initial idea is to use an HTTP multipart response, where each chunk (or perhaps chunk ranges determined by the server) would be a part in a multipart message, with some headers indicating the chunk coordinates of the following bytes. So CCRP only reads whole chunks from the backing block store, and returns those whole in each part of the multipart message. This is no different than what a client gets when reading a byte range today, it's just that the client doesn't maintain the chunk coords to byte range mapping, so there has to be a little bit more structure to the response than when it is just a stream of bytes from some object.
The goal would be for CCRP to not be a service you would have to run -- this would be managed service offered by the cloud providers, similar to the specialization AWS has been building on top of S3 like S3 Tables or the recent S3 Vector product. In fact, I don't think CCRP could realize its benefits without being a native managed service, because ideally it is simply a different index to the backing block store blocks with chunk bytes that also backs regular objects in S3. In other words, it can't be a proxy in front of an object store if we want it to be as performant as possible, it needs to be another way to get bytes from chunked objects in an object store. I think this would allow you to continue to be "lazy". 😁 Of course, we probably need to start with a proxy service just to show the concept in action, but that is not the end goal.

I hope these responses help you see why I'm thinking we need to do more than just optimize client I/O. Let me know if I can unpack any of this more or if you have more thoughts. I really do appreciate having to defend this idea, because I'm still not sure if it is really a good one or just a mirage, so make me work for it. ❤️

2 replies

kylebarron Aug 19, 2025

The goal would be for CCRP to not be a service you would have to run -- this would be managed service offered by the cloud providers, similar to the specialization AWS has been building on top of S3 like S3 Tables or the recent S3 Vector product. In fact, I don't think CCRP could realize its benefits without being a native managed service, because ideally it is simply a different index to the backing block store blocks with chunk bytes that also backs regular objects in S3

... so the goal of this repo is to try to convince AWS to build a product around this protocol?

jkeifer Aug 19, 2025
Maintainer

AWS, Azure, GCP...but yeah. I wouldn't say that's the only goal, but I hope success with a PoC would show enough value to get to that point that cloud providers do implement this as a managed service. As @sharkinsspatial points out, there's a need to be able to tell data providers they don't have to worry about all the operational concerns, and that their data is stored with the same reliability they would get just using a managed object storage service.

jkeifer · 2025-08-19T04:06:08Z

jkeifer
Aug 19, 2025
Maintainer

Another proposed advantage of CCRP: it inherently supports hrefs that index into an array. Where you have a dataset like a grid-aligned temporal stack of imagery or other data, you could have a STAC collection asset point to the dataset, and then items in the collection (individual images) could have an asset point to the corresponding time slice in the stack. Doing so presents an interface that allows the equivalent of searching into a zarr array, not just for the array itself.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the goals of the protocol be achieved at the I/O library layer? #7

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can the goals of the protocol be achieved at the I/O library layer? #7

Uh oh!

sharkinsspatial Aug 19, 2025

Replies: 2 comments · 2 replies

Uh oh!

jkeifer Aug 19, 2025 Maintainer

Uh oh!

kylebarron Aug 19, 2025

Uh oh!

jkeifer Aug 19, 2025 Maintainer

Uh oh!

jkeifer Aug 19, 2025 Maintainer

sharkinsspatial
Aug 19, 2025

Replies: 2 comments 2 replies

jkeifer
Aug 19, 2025
Maintainer

jkeifer Aug 19, 2025
Maintainer

jkeifer
Aug 19, 2025
Maintainer