Skip to content

🎬🍿 Streams make the world go round 🌍 Episode 1#273

Open
erwinkramer wants to merge 37 commits intodvsekhvalnov:masterfrom
erwinkramer:master
Open

🎬🍿 Streams make the world go round 🌍 Episode 1#273
erwinkramer wants to merge 37 commits intodvsekhvalnov:masterfrom
erwinkramer:master

Conversation

@erwinkramer
Copy link
Copy Markdown

@erwinkramer erwinkramer commented Nov 7, 2025

Contributes to #3.

Encode and EncodeBytes now call the newly introduced EncodeStream, which is also directly callable as proven with the new EncodeStreamHS512 test. This means the entire encoding foundation is now streaming based.

New DecodeStream and VerifyStream introduced, which make use of the modified DecodeBytes, now called DecodeStream, that handles input and output payload now via Streams.

Non-seekable streams are not supported for now, i only focused on making seekable streams for payloads work in a streaming matter.

I'm sure there is room for more improvements, please review.

Progress on solutions and their unit tests:

  • jose-jwt-net40.sln - ✅ compiles and tests run successfully. Requires vs2019 and a seperate install of NUnit extension, but it's not easy to find that installer link for vs2019.
  • jose-jwt-net46.sln - ✅ compiles and tests run successfully.
  • jose-jwt-net47.sln - ✅ compiles and tests run successfully.
  • jose-jwt.sln - ✅ compiles and tests run successfully.
  • GitHub Action - ✅ runs fine, see this run in my project. Some remarks:
    1. I did some minor change to make xunit work again erwinkramer@caff3bf.
    2. It doesn't run the .net4.0 version, but that is not easy to get running anymore on GitHub images since they don't support it out of the box, thus not related to my PR (also fails silently in your version).

@erwinkramer
Copy link
Copy Markdown
Author

@dvsekhvalnov please let me know what you think.

@dvsekhvalnov
Copy link
Copy Markdown
Owner

Cool, thanks for that ! I'll try to take a look next week after weekend.

@erwinkramer erwinkramer marked this pull request as ready for review November 8, 2025 14:17
@dvsekhvalnov
Copy link
Copy Markdown
Owner

@erwinkramer help me understand :)

So, we are trying to introduce support for encoding/verifying big detached payloads, right?

If so. Would be nice to see a practical test: let's say couple hundrends MB/GB payload which causing OOM with current approach, but handled with no issues using streaming.

@erwinkramer
Copy link
Copy Markdown
Author

erwinkramer commented Nov 11, 2025

@dvsekhvalnov exactly, 2 things:

  1. for big payloads
  2. for systems that are streaming by nature, so you don't have to convert to byte[] or string, from stream.

I added a EncodeStreamHS512_GigabyteDetachedPayload test and I previously added Base64UrlEncodingStream_BenchmarkBufferSizes to showcase and proof that it works. the GigabyteDetachedPayload test shows you that memory on the PC increases very little.

@dvsekhvalnov
Copy link
Copy Markdown
Owner

Yeah, see the test, cool. What's an example of system streaming in nature that wants to do JWT on it?

Let me wrap my head around changes, may take a while and i'll pop here and there with random questions.

My usual concerns are:

  1. Interfaces backward compatibility (which seems fine here)
  2. Cross platform testing with other libs (can do that once we done with PR).
  3. Security issues when introducing new code, like https://github.com/dvsekhvalnov/jose-jwt/pull/273/files#diff-c736f4b1a7613222fa397e4f4602583b0eee313cad0150046e09cd32d9369df4, where some damn clever security researcher can find exploitation vector.

@erwinkramer
Copy link
Copy Markdown
Author

Yeah, see the test, cool. What's an example of system streaming in nature that wants to do JWT on it?

Request and response bodies in asp.net core are streams by nature: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/use-http-context?view=aspnetcore-9.0

Let me wrap my head around changes, may take a while and i'll pop here and there with random questions.

Sure!

  1. Security issues when introducing new code, like https://github.com/dvsekhvalnov/jose-jwt/pull/273/files#diff-c736f4b1a7613222fa397e4f4602583b0eee313cad0150046e09cd32d9369df4, where some damn clever security researcher can find exploitation vector.

You mean ConcatenatedStream.cs? Is there any way you think we can further test this?

@dvsekhvalnov
Copy link
Copy Markdown
Owner

You mean ConcatenatedStream.cs? Is there any way you think we can further test this?

Well, what i think is i'd prefer to isolate new streaming support from existing non-streaming code to limit possible blast radius if any new vulnerabilities found.

Let's start with new EncodeStream(..) https://github.com/dvsekhvalnov/jose-jwt/pull/273/files#diff-30ba6d4e57a14434031ed1cb13e5685bc6413b3f34f64eb0a38024717a431b7fR275 :

  1. don't think you need to delegate EncodeBytes() -> EncodeStream() or vice versa.
  2. ideal would be to just extract common code between them as is and delegate changing part. Something like:
public static string EncodeStream(Stream payload, object key, JwsAlgorithm algorithm, IDictionary<string, object> extraHeaders = null, JwtSettings settings = null, JwtOptions options = null) {

      return Sign(payload, key,...., new StreamingPayloadDelegate())
}


public static string EncodeBytes(byte[] payload, object key, JwsAlgorithm algorithm, IDictionary<string, object> extraHeaders = null, JwtSettings settings = null, JwtOptions options = null) {
      return Sign(payload, key,...., new BytePayloadDelegate())
}
  1. the real difference is only here: https://github.com/dvsekhvalnov/jose-jwt/blob/master/jose-jwt/JWT.cs#L292 , right? Basically how to calculate secured input + signature. Rest is same.

DecodeStream(...) set of methods may be little bit more fancy, i'll add some comments inline PR to discuss.

/// <exception cref="IntegrityException">if signature validation failed</exception>
/// <exception cref="EncryptionException">if JWT token can't be decrypted</exception>
/// <exception cref="InvalidAlgorithmException">if JWT signature, encryption or compression algorithm is not supported</exception>
public static Stream DecodeStream(string token, object key, JwsAlgorithm alg, JwtSettings settings = null, Stream payload = null)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is question. What's the point of returning Stream?

It either:

  1. payload from token, that have been already loaded into memory and can be trivially wrapped to MemoryStream with almost one line of code

  2. or it is essentially same detached payload stream that have been passed as argument

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simply because all other public decode functions work that way, DecodeBytes and Decode both return the payload itself (whether from body or from token), it was rather confusing as to why that's already happening in those existing functions. If you want, i can just make this method return a void but that wouldn't make it more consistent.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let's call it decade long sdk design error. I'd truly prefer to have interface that deal only with byte[] and let everything else go to caller side. But may be too late.

Does it make sense to have DecodeStream() at all? Looks like VerifyStream() should cover all possible use-cases.

@erwinkramer
Copy link
Copy Markdown
Author

erwinkramer commented Nov 17, 2025

  • don't think you need to delegate EncodeBytes() -> EncodeStream() or vice versa.

I attempted to do this earlier, but the Sign method is also something that is either bytes or streaming, same as Serialize and everything underlying these methods also accepts either byte array or stream, down to the algorithm functions. You'd have to make everything support byte[] and Stream. It makes it easier to leverage streams for the whole library, since going from byte[] to stream is much less impactful than supporting both, and it doesn't even impact performance (MemoryStream wraps the existing array without copying it).

@dvsekhvalnov
Copy link
Copy Markdown
Owner

Do you have previous version to compare?

I'm fine with algorithms to be Sign(Stream securedInput, object key) or just add it as overload to interface, it is easy part.

But the only difference in implementation is the line:

byte[] signature = jwsAlgorithm.Sign(securedInput(headerBytes, payload, jwtOptions.EncodePayload), key);

// which is simply delegate that takes: (alg, header, payload, encode, key and return byte[] array), e.g.
// you either pass one version or another or can do something fancy with pattern matching too
// byte[] signature = _doSign(jwsAlgorithm, key, headerBytes, payload, jwtOptions.EncodePayload)

@erwinkramer
Copy link
Copy Markdown
Author

erwinkramer commented Nov 19, 2025

I have no previous for you to compare.

But the only difference in implementation is the line:

byte[] signature = jwsAlgorithm.Sign(securedInput(headerBytes, payload, jwtOptions.EncodePayload), key);

The securedInput logic has been moved to the Serialize method, i had to do that in order to keep handling streams simple. Are you suggesting I place an original Serialize method next to it, so that one works with a byte array? Please let me know which specific parts you don't want to be streaming. I assume you're fine with it if that method still returns a stream and maybe even as input payload, but i guess you rather don't want it to do the logic that is related to Base64UrlEncodingStream and ConcatenatedStream classes, and do that with byte array instead.

In the end, this Serialize method is used in both encoding and decoding (public) methods, so that still would require some effort to make all methods play nicely with either the byte array version and the stream version of Serialize.

@dvsekhvalnov
Copy link
Copy Markdown
Owner

dvsekhvalnov commented Nov 20, 2025

If you mean Compact.Serizalize(..) - there is no problem to have additional overload method that takes stream input.

My motivation is simple, it is security. Encoding detached payload is quite narrow use-case. There is nothing wrong to have streaming support for big payloads. But implementation wise streaming should be narrowed to given use case as well and not affecting rest of flows that were pretty stable last years.

Ok, let me may be just dump original EncodeBytes(..) here with comments inline, hope it make it more clear:

       public static string EncodeBytes(.....)
// ==== that part stays, can be moved to internal '_encode(..)' methods as is ====
        {
            if (payload == null)
                throw new ArgumentNullException(nameof(payload));

            var jwtSettings = GetSettings(settings);
            var jwtOptions = options ?? JwtOptions.Default;

            var jwtHeader = new Dictionary<string, object> { { "alg", jwtSettings.JwsHeaderValue(algorithm) } };

            if (extraHeaders == null) //allow overload, but keep backward compatible defaults
            {
                extraHeaders = new Dictionary<string, object> { { "typ", "JWT" } };
            }

            if (!jwtOptions.EncodePayload)
            {
                jwtHeader["b64"] = false;
                jwtHeader["crit"] = Collections.Union(new[] { "b64" }, Dictionaries.Get<object>(extraHeaders, "crit"));
            }

            Dictionaries.Append(jwtHeader, extraHeaders);
            byte[] headerBytes = Encoding.UTF8.GetBytes(jwtSettings.JsonMapper.Serialize(jwtHeader));

            var jwsAlgorithm = jwtSettings.Jws(algorithm);

            if (jwsAlgorithm == null)
            {
                throw new JoseException(string.Format("Unsupported JWS algorithm requested: {0}", algorithm));
            }
// ==== end region ====

// ==== here we have real difference that can be abstracted to delegate or whatever pattern: 
//           it takes header and returns string
//           or as an option may be all the code that constructing 'headerBytes' can be extracted to its own method
            byte[] signature = jwsAlgorithm.Sign(securedInput(headerBytes, payload, jwtOptions.EncodePayload), key);

            byte[] payloadBytes = jwtOptions.DetachPayload ? new byte[0] : payload;

            return jwtOptions.EncodePayload
                ? Compact.Serialize(headerBytes, payloadBytes, signature)
                : Compact.Serialize(headerBytes, Encoding.UTF8.GetString(payloadBytes), signature);
        }

Also be careful with base64 encoding, somebody asked support for other encoding version: #271

@erwinkramer
Copy link
Copy Markdown
Author

implementation wise streaming should be narrowed to given use case as well and not affecting rest of flows that were pretty stable last years.

I have no interest in placing the byte[] implementation separate from the streaming implementation since i believe that the foundational logic should just be of 1 type to prevent any more complexity. I might be wrong, and it might turn out not that much complex, but then I'd like you or someone else to continue building on this.

My motivation is simple, it is security. Encoding detached payload is quite narrow use-case.

If you consider the current approach potentially unsafe, then it's also unsafe for the streaming methods if they were built completely separate and implemented in this library. So, the security motivation I consider flawed. However, i do agree that it is risky at this stage since it's not battle-tested, and i would consider (if you ever change your mind and implement it as-is) to make this a beta/preview release, with a major version increment (let's say v6.0.0).

You can go ahead and take some parts and implement it side-by-side, but i understand if you don't, then you can go ahead and close this PR or leave it open for someone else to work on.

@dvsekhvalnov
Copy link
Copy Markdown
Owner

@erwinkramer yeah, i can understand frustration, sorry about that. But nevertheless thanks for your contribution, great ideas and let's keep it open. I'm interested to play around and see what can we make out it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants