:bug: fix issue with gpt5codex and simple strategy by BLannoo · Pull Request #1153 · JetBrains/koog

BLannoo · 2025-11-20T07:50:56Z

Motivation and Context

When running codeagent which uses OpenAIModels.Chat.GPT5Codex and strategy = singleRunStrategy() we have a sudden regression due to reasoning generated by codex.

I tried many approaches, but this seems to be the only one that could fix the problem without causing issues with the already published article.

Options that were considered but would not be ideal are:

changing model (but our article talks about benchmarks for these models, rerunning benchmarks would significantly delay publication)
configuring the model to not use reasoning, only available on gpt5_1 (leading back to problem 1) and also requiring significant changes to the code_agent from the just published step 01 of our series.
adapting singleRunModeStrategy to skip reasoning message, but the tool call is already dropped at a lower abstraction level
adapting singleRunModeStrategy to make a new call when reasoning is returned, but this seems to lead to infinite loops.

Breaking Changes

Type of the changes

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Tests improvement
Refactoring

Checklist

The pull request has a description of the proposed change
I read the Contributing Guidelines before opening the pull request
The pull request uses develop as the base branch
Tests for the changes have been added
All new and existing tests passed

Additional steps for pull requests adding a new feature

An issue describing the proposed change exists
The pull request includes a link to the issue
The change was discussed and approved in the issue
Docs have been added / updated

github-actions · 2025-11-20T08:15:30Z

Qodana for JVM

1192 new problems were found

Inspection name	Severity	Problems
`Check Kotlin and Java source code coverage`	🔶 Warning	1181
`Missing KDoc for public API declaration`	🔶 Warning	11

@@ Code coverage @@
+ 71% total lines covered
16420 lines analyzed, 11800 lines covered
# Calculated according to the filters of your coverage tool

☁️ View the detailed Qodana report

Contact Qodana team

Contact us at qodana-support@jetbrains.com

Or via our issue tracker: https://jb.gg/qodana-issue
Or share your feedback: https://jb.gg/qodana-discussions

aozherelyeva · 2025-11-20T11:02:09Z

+        message = "Use executeFirstNonReasoningResponse to skip initial Reasoning messages when present",
+        replaceWith = ReplaceWith("executeFirstNonReasoningResponse(prompt, tools)")
+    )
    protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>): Message.Response =


I'm not sure about deprecating this one – may it be that some users would like to receive the first response even though it's reasoning?

I suggest we change the signature of protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>) to protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>, preferReasoning: Boolean = false) and then edit the body:

protected suspend fun executeSingle( prompt: Prompt, tools: List<ToolDescriptor>, preferReasoning: Boolean = false ): Message.Response { return if (preferReasoning) { executeMultiple(prompt, tools).first() } else { val responses = executeMultiple(prompt, tools) responses.firstOrNull { it !is Message.Reasoning } ?: responses.first() } }

I would move the filtering lambda parameter.
Rename "preferReasoning->"excludeReasoning" for clarity.

something like

protected suspend fun executeSingle( prompt: Prompt, tools: List<ToolDescriptor>, filter: (Message.Response) -> Boolean = { true } ): Message.Response = executeMultiple(prompt, tools).single(filter) protected suspend fun executeSingle( prompt: Prompt, tools: List<ToolDescriptor>, excludeReasoning: Boolean = false, ): Message.Response = executeSingle( prompt = prompt, tools = tools, filter = { !(it is Message.Reasoning && excludeReasoning) } )

kpavlov

How was it tested?

aozherelyeva · 2025-11-20T11:11:19Z

+        message = "Use executeFirstNonReasoningResponse to skip initial Reasoning messages when present",
+        replaceWith = ReplaceWith("executeFirstNonReasoningResponse(prompt, tools)")
+    )
    protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>): Message.Response =


Besides, could you please add a couple of unit tests for the update/new method in the same PR? Thanks!

EugeneTheDev · 2025-11-20T14:06:04Z

+     * [Message.Reasoning]. If all responses are reasoning messages, it will return the
+     * very first response as a fallback to preserve original behavior.
+     */
+    protected suspend fun executeFirstNonReasoningResponse(


execute... functions should belong to prompt executor. In this particular case, it can be converted to extension function on PromptExecutor. Similar to executeStructured. In the LLM session it should follow the pattern request... methods are following, i.e., validate session and delegate to prompt executor.

I'm quite concerned with modifying existing functions to always skip reasoning messages by default, this kinda negates the purpose of reasoning messages support. We already have onAssistantMessage, onToolCall etc. to filter correct message types in the strategy. Maybe it's better to update existing built-in strategies with additional parameters, e.g. skipReasoningMessages or something like that (although I'm not sure this is an optimal solution either).

devcrocod · 2025-11-21T15:56:20Z

@BLannoo, @EugeneTheDev
I looked into this more closely and noticed a few things.
A reasoning message is not a terminal message. When reasoning is enabled and we try to process those messages by adding them to the history, we end up in a loop. After sending the initial request, we get a response like [Reasoning, Tool.Call]. If we then take the reasoning message, append it to the history, and send the same initial request again, the model returns [Reasoning, Tool.Call] once more. The new reasoning message will be treated as new, even though the encrypted content is almost identical.

To avoid skipping reasoning while also preventing this loop, we need to always expect a list of messages in the response and take not the first but the last element.

If we simply skip reasoning, everything works as before, but we lose the benefits of reasoning on repeated calls

EugeneTheDev · 2025-11-22T16:30:34Z

So the best option probably would be to rework how we work with multiple messages and instead shift more towards a single message with content parts - with each message potentially consisting of text, tool calls, reasoning, etc. (as I mentioned previously already).

But since this is a more significant and breaking change, it wouldn't be wise to implement it right now. So the second best option I see is to always expect a list of messages, as @devcrocod said. This means removing all methods from the API that return only a single message and updating our built-in strategies accordingly. This is also a breaking change, but it won't change the semantics that much as the first option (which we can implement later). IMHO there's no universal clean way to always return only a single message by using some heuristics and picking only one from the list, so removing such APIs is probably cleaner and more honest. WDYT?

devcrocod · 2025-11-22T16:48:36Z

I’ll create a separate PR with the fix for this and another bug.
For now, I’m thinking of adding a flag in the request, as you suggested, that allows us to skip reasoning messages. We’ll still store them in the history, but only return either the assistant or the tool. This seems like the most optimal approach at the moment. I’m also adding onReasoningMessage in case we want to apply conditional handling to those messages

related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor

BLannoo · 2025-12-04T09:07:26Z

Was fixed with alternative PR

related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor

🐛 fix issue with gpt5codex and simple strategy

9baea83

BLannoo requested a review from denis-domanskii November 20, 2025 07:51

BLannoo marked this pull request as ready for review November 20, 2025 08:48

BLannoo requested review from EugeneTheDev and Faanbaria November 20, 2025 08:49

aozherelyeva suggested changes Nov 20, 2025

View reviewed changes

aozherelyeva requested a review from kpavlov November 20, 2025 11:03

kpavlov reviewed Nov 20, 2025

View reviewed changes

aozherelyeva reviewed Nov 20, 2025

View reviewed changes

EugeneTheDev requested changes Nov 20, 2025

View reviewed changes

aozherelyeva requested a review from devcrocod November 20, 2025 15:43

devcrocod mentioned this pull request Nov 22, 2025

Fix reasoning message handling in strategy #1166

Merged

11 tasks

BLannoo closed this Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 fix issue with gpt5codex and simple strategy#1153

🐛 fix issue with gpt5codex and simple strategy#1153
BLannoo wants to merge 1 commit intodevelopfrom
codeagent/step-02-no-reasoning

BLannoo commented Nov 20, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Nov 20, 2025

Uh oh!

aozherelyeva Nov 20, 2025

Uh oh!

kpavlov Nov 20, 2025 •

edited

Loading

Uh oh!

kpavlov left a comment

Uh oh!

aozherelyeva Nov 20, 2025

Uh oh!

EugeneTheDev Nov 20, 2025

Uh oh!

devcrocod commented Nov 21, 2025

Uh oh!

EugeneTheDev commented Nov 22, 2025

Uh oh!

devcrocod commented Nov 22, 2025

Uh oh!

BLannoo commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

BLannoo commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Breaking Changes

Type of the changes

Checklist

Additional steps for pull requests adding a new feature

Uh oh!

github-actions Bot commented Nov 20, 2025

Qodana for JVM

Uh oh!

aozherelyeva Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kpavlov Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpavlov left a comment

Choose a reason for hiding this comment

Uh oh!

aozherelyeva Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

EugeneTheDev Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

devcrocod commented Nov 21, 2025

Uh oh!

EugeneTheDev commented Nov 22, 2025

Uh oh!

devcrocod commented Nov 22, 2025

Uh oh!

BLannoo commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BLannoo commented Nov 20, 2025 •

edited

Loading

kpavlov Nov 20, 2025 •

edited

Loading