Skip to content

Change UnixFS dag-pb protobuf field ordering #533

@achingbrain

Description

@achingbrain

The .proto definition of a PBNode is this (spec):

message PBNode {
  // refs to other objects
  repeated PBLink Links = 2;

  // opaque user data
  optional bytes Data = 1;
}

Implementations are expected to write the repeated Links messages to the output buffer first, then the Data field, even though the field IDs are ordered the opposite way (not disallowed by the protobuf spec, though some off the shelf encoders will write in ID order).

When the PBNode represents a directory, the Links objects could be either flat directory entries, or HAMT shard entries - the information needed to ascertain this is contained in the Data field.

This means when processing an incoming PBNode message, we typically read all of the Links, and then use the Data to decide how to process them.

If we are performing a graph traversal, a streaming parser would let us process the PBNode message as it arrives, and to select the Link message we wish to traverse through or resolve to, however this is not currently possible since we need to process the Data field before we can return a Link. This can add significant overhead when there are many thousands of Links.

We should allow ordering the Data field first, this would enable the streaming use-case and help with IPFS code running in resource-constrained environments such as web browsers.

Since IPIP-499 we now have CID profiles which could be an upgrade path for the network. This proposal could be part of a unixfs-v1-2026 profile.

Metadata

Metadata

Assignees

No one assigned

    Labels

    need/triageNeeds initial labeling and prioritization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions