Skip to content

🐛 fix issue with gpt5codex and simple strategy#1153

Closed
BLannoo wants to merge 1 commit intodevelopfrom
codeagent/step-02-no-reasoning
Closed

🐛 fix issue with gpt5codex and simple strategy#1153
BLannoo wants to merge 1 commit intodevelopfrom
codeagent/step-02-no-reasoning

Conversation

@BLannoo
Copy link
Copy Markdown
Contributor

@BLannoo BLannoo commented Nov 20, 2025

Motivation and Context

When running codeagent which uses OpenAIModels.Chat.GPT5Codex and strategy = singleRunStrategy() we have a sudden regression due to reasoning generated by codex.

I tried many approaches, but this seems to be the only one that could fix the problem without causing issues with the already published article.

Options that were considered but would not be ideal are:

  1. changing model (but our article talks about benchmarks for these models, rerunning benchmarks would significantly delay publication)
  2. configuring the model to not use reasoning, only available on gpt5_1 (leading back to problem 1) and also requiring significant changes to the code_agent from the just published step 01 of our series.
  3. adapting singleRunModeStrategy to skip reasoning message, but the tool call is already dropped at a lower abstraction level
  4. adapting singleRunModeStrategy to make a new call when reasoning is returned, but this seems to lead to infinite loops.

Breaking Changes


Type of the changes

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tests improvement
  • Refactoring

Checklist

  • The pull request has a description of the proposed change
  • I read the Contributing Guidelines before opening the pull request
  • The pull request uses develop as the base branch
  • Tests for the changes have been added
  • All new and existing tests passed
Additional steps for pull requests adding a new feature
  • An issue describing the proposed change exists
  • The pull request includes a link to the issue
  • The change was discussed and approved in the issue
  • Docs have been added / updated

@github-actions
Copy link
Copy Markdown

Qodana for JVM

1192 new problems were found

Inspection name Severity Problems
Check Kotlin and Java source code coverage 🔶 Warning 1181
Missing KDoc for public API declaration 🔶 Warning 11
@@ Code coverage @@
+ 71% total lines covered
16420 lines analyzed, 11800 lines covered
# Calculated according to the filters of your coverage tool

☁️ View the detailed Qodana report

Contact Qodana team

Contact us at qodana-support@jetbrains.com

@BLannoo BLannoo marked this pull request as ready for review November 20, 2025 08:48
message = "Use executeFirstNonReasoningResponse to skip initial Reasoning messages when present",
replaceWith = ReplaceWith("executeFirstNonReasoningResponse(prompt, tools)")
)
protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>): Message.Response =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about deprecating this one – may it be that some users would like to receive the first response even though it's reasoning?

I suggest we change the signature of protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>) to protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>, preferReasoning: Boolean = false) and then edit the body:

protected suspend fun executeSingle(
        prompt: Prompt,
        tools: List<ToolDescriptor>,
        preferReasoning: Boolean = false
    ): Message.Response {
        return if (preferReasoning) {
            executeMultiple(prompt, tools).first()
        } else {
            val responses = executeMultiple(prompt, tools)
            responses.firstOrNull { it !is Message.Reasoning } ?: responses.first()
        }
    }

Copy link
Copy Markdown
Contributor

@kpavlov kpavlov Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the filtering lambda parameter.
Rename "preferReasoning->"excludeReasoning" for clarity.

something like

protected suspend fun executeSingle(
    prompt: Prompt,
    tools: List<ToolDescriptor>,
    filter: (Message.Response) -> Boolean = { true }
): Message.Response =
    executeMultiple(prompt, tools).single(filter)

protected suspend fun executeSingle(
    prompt: Prompt,
    tools: List<ToolDescriptor>,
    excludeReasoning: Boolean = false,
): Message.Response = executeSingle(
    prompt = prompt,
    tools = tools,
    filter = { !(it is Message.Reasoning && excludeReasoning) }
)

@aozherelyeva aozherelyeva requested a review from kpavlov November 20, 2025 11:03
Copy link
Copy Markdown
Contributor

@kpavlov kpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was it tested?

message = "Use executeFirstNonReasoningResponse to skip initial Reasoning messages when present",
replaceWith = ReplaceWith("executeFirstNonReasoningResponse(prompt, tools)")
)
protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>): Message.Response =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, could you please add a couple of unit tests for the update/new method in the same PR? Thanks!

* [Message.Reasoning]. If all responses are reasoning messages, it will return the
* very first response as a fallback to preserve original behavior.
*/
protected suspend fun executeFirstNonReasoningResponse(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. execute... functions should belong to prompt executor. In this particular case, it can be converted to extension function on PromptExecutor. Similar to executeStructured. In the LLM session it should follow the pattern request... methods are following, i.e., validate session and delegate to prompt executor.
  2. I'm quite concerned with modifying existing functions to always skip reasoning messages by default, this kinda negates the purpose of reasoning messages support. We already have onAssistantMessage, onToolCall etc. to filter correct message types in the strategy. Maybe it's better to update existing built-in strategies with additional parameters, e.g. skipReasoningMessages or something like that (although I'm not sure this is an optimal solution either).

@devcrocod
Copy link
Copy Markdown
Contributor

@BLannoo, @EugeneTheDev
I looked into this more closely and noticed a few things.
A reasoning message is not a terminal message. When reasoning is enabled and we try to process those messages by adding them to the history, we end up in a loop. After sending the initial request, we get a response like [Reasoning, Tool.Call]. If we then take the reasoning message, append it to the history, and send the same initial request again, the model returns [Reasoning, Tool.Call] once more. The new reasoning message will be treated as new, even though the encrypted content is almost identical.

To avoid skipping reasoning while also preventing this loop, we need to always expect a list of messages in the response and take not the first but the last element.

If we simply skip reasoning, everything works as before, but we lose the benefits of reasoning on repeated calls

@EugeneTheDev
Copy link
Copy Markdown
Collaborator

So the best option probably would be to rework how we work with multiple messages and instead shift more towards a single message with content parts - with each message potentially consisting of text, tool calls, reasoning, etc. (as I mentioned previously already).

But since this is a more significant and breaking change, it wouldn't be wise to implement it right now. So the second best option I see is to always expect a list of messages, as @devcrocod said. This means removing all methods from the API that return only a single message and updating our built-in strategies accordingly. This is also a breaking change, but it won't change the semantics that much as the first option (which we can implement later). IMHO there's no universal clean way to always return only a single message by using some heuristics and picking only one from the list, so removing such APIs is probably cleaner and more honest. WDYT?

@devcrocod
Copy link
Copy Markdown
Contributor

I’ll create a separate PR with the fix for this and another bug.
For now, I’m thinking of adding a flag in the request, as you suggested, that allows us to skip reasoning messages. We’ll still store them in the history, but only return either the assistant or the tool. This seems like the most optimal approach at the moment. I’m also adding onReasoningMessage in case we want to apply conditional handling to those messages

devcrocod added a commit that referenced this pull request Dec 1, 2025
related to #1153 


## Motivation and Context
- fix reasoning message in nodeLLMRequest
- fix conditions on multiple requests
- add onReasoningMessage and onMultipleReasoningMessage

## Breaking Changes
None

---

#### Type of the changes
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Tests improvement
- [x] Refactoring

#### Checklist
- [x] The pull request has a description of the proposed change
- [x] I read the [Contributing
Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md)
before opening the pull request
- [x] The pull request uses **`develop`** as the base branch
- [ ] Tests for the changes have been added
- [x] All new and existing tests passed

#### Additional Context
To add tests need to modify and refactor mock executor
serge-p7v pushed a commit that referenced this pull request Dec 2, 2025
related to #1153 


## Motivation and Context
- fix reasoning message in nodeLLMRequest
- fix conditions on multiple requests
- add onReasoningMessage and onMultipleReasoningMessage

## Breaking Changes
None

---

#### Type of the changes
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Tests improvement
- [x] Refactoring

#### Checklist
- [x] The pull request has a description of the proposed change
- [x] I read the [Contributing
Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md)
before opening the pull request
- [x] The pull request uses **`develop`** as the base branch
- [ ] Tests for the changes have been added
- [x] All new and existing tests passed

#### Additional Context
To add tests need to modify and refactor mock executor
@BLannoo
Copy link
Copy Markdown
Contributor Author

BLannoo commented Dec 4, 2025

Was fixed with alternative PR

@BLannoo BLannoo closed this Dec 4, 2025
kpavlov pushed a commit that referenced this pull request Dec 5, 2025
related to #1153 


## Motivation and Context
- fix reasoning message in nodeLLMRequest
- fix conditions on multiple requests
- add onReasoningMessage and onMultipleReasoningMessage

## Breaking Changes
None

---

#### Type of the changes
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Tests improvement
- [x] Refactoring

#### Checklist
- [x] The pull request has a description of the proposed change
- [x] I read the [Contributing
Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md)
before opening the pull request
- [x] The pull request uses **`develop`** as the base branch
- [ ] Tests for the changes have been added
- [x] All new and existing tests passed

#### Additional Context
To add tests need to modify and refactor mock executor
sdubov pushed a commit that referenced this pull request Dec 5, 2025
related to #1153

## Motivation and Context
- fix reasoning message in nodeLLMRequest
- fix conditions on multiple requests
- add onReasoningMessage and onMultipleReasoningMessage

## Breaking Changes
None

---

#### Type of the changes
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Tests improvement
- [x] Refactoring

#### Checklist
- [x] The pull request has a description of the proposed change
- [x] I read the [Contributing
Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md)
before opening the pull request
- [x] The pull request uses **`develop`** as the base branch
- [ ] Tests for the changes have been added
- [x] All new and existing tests passed

#### Additional Context
To add tests need to modify and refactor mock executor
vova-jb pushed a commit that referenced this pull request Jan 27, 2026
related to #1153 


## Motivation and Context
- fix reasoning message in nodeLLMRequest
- fix conditions on multiple requests
- add onReasoningMessage and onMultipleReasoningMessage

## Breaking Changes
None

---

#### Type of the changes
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Tests improvement
- [x] Refactoring

#### Checklist
- [x] The pull request has a description of the proposed change
- [x] I read the [Contributing
Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md)
before opening the pull request
- [x] The pull request uses **`develop`** as the base branch
- [ ] Tests for the changes have been added
- [x] All new and existing tests passed

#### Additional Context
To add tests need to modify and refactor mock executor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants