🐛 fix issue with gpt5codex and simple strategy#1153
Conversation
Qodana for JVM1192 new problems were found
@@ Code coverage @@
+ 71% total lines covered
16420 lines analyzed, 11800 lines covered
# Calculated according to the filters of your coverage tool☁️ View the detailed Qodana report Contact Qodana teamContact us at qodana-support@jetbrains.com
|
| message = "Use executeFirstNonReasoningResponse to skip initial Reasoning messages when present", | ||
| replaceWith = ReplaceWith("executeFirstNonReasoningResponse(prompt, tools)") | ||
| ) | ||
| protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>): Message.Response = |
There was a problem hiding this comment.
I'm not sure about deprecating this one – may it be that some users would like to receive the first response even though it's reasoning?
I suggest we change the signature of protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>) to protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>, preferReasoning: Boolean = false) and then edit the body:
protected suspend fun executeSingle(
prompt: Prompt,
tools: List<ToolDescriptor>,
preferReasoning: Boolean = false
): Message.Response {
return if (preferReasoning) {
executeMultiple(prompt, tools).first()
} else {
val responses = executeMultiple(prompt, tools)
responses.firstOrNull { it !is Message.Reasoning } ?: responses.first()
}
}There was a problem hiding this comment.
I would move the filtering lambda parameter.
Rename "preferReasoning->"excludeReasoning" for clarity.
something like
protected suspend fun executeSingle(
prompt: Prompt,
tools: List<ToolDescriptor>,
filter: (Message.Response) -> Boolean = { true }
): Message.Response =
executeMultiple(prompt, tools).single(filter)
protected suspend fun executeSingle(
prompt: Prompt,
tools: List<ToolDescriptor>,
excludeReasoning: Boolean = false,
): Message.Response = executeSingle(
prompt = prompt,
tools = tools,
filter = { !(it is Message.Reasoning && excludeReasoning) }
)
| message = "Use executeFirstNonReasoningResponse to skip initial Reasoning messages when present", | ||
| replaceWith = ReplaceWith("executeFirstNonReasoningResponse(prompt, tools)") | ||
| ) | ||
| protected suspend fun executeSingle(prompt: Prompt, tools: List<ToolDescriptor>): Message.Response = |
There was a problem hiding this comment.
Besides, could you please add a couple of unit tests for the update/new method in the same PR? Thanks!
| * [Message.Reasoning]. If all responses are reasoning messages, it will return the | ||
| * very first response as a fallback to preserve original behavior. | ||
| */ | ||
| protected suspend fun executeFirstNonReasoningResponse( |
There was a problem hiding this comment.
execute...functions should belong to prompt executor. In this particular case, it can be converted to extension function onPromptExecutor. Similar toexecuteStructured. In the LLM session it should follow the patternrequest...methods are following, i.e., validate session and delegate to prompt executor.- I'm quite concerned with modifying existing functions to always skip reasoning messages by default, this kinda negates the purpose of reasoning messages support. We already have
onAssistantMessage,onToolCalletc. to filter correct message types in the strategy. Maybe it's better to update existing built-in strategies with additional parameters, e.g.skipReasoningMessagesor something like that (although I'm not sure this is an optimal solution either).
|
@BLannoo, @EugeneTheDev To avoid skipping reasoning while also preventing this loop, we need to always expect a list of messages in the response and take not the first but the last element. If we simply skip reasoning, everything works as before, but we lose the benefits of reasoning on repeated calls |
|
So the best option probably would be to rework how we work with multiple messages and instead shift more towards a single message with content parts - with each message potentially consisting of text, tool calls, reasoning, etc. (as I mentioned previously already). But since this is a more significant and breaking change, it wouldn't be wise to implement it right now. So the second best option I see is to always expect a list of messages, as @devcrocod said. This means removing all methods from the API that return only a single message and updating our built-in strategies accordingly. This is also a breaking change, but it won't change the semantics that much as the first option (which we can implement later). IMHO there's no universal clean way to always return only a single message by using some heuristics and picking only one from the list, so removing such APIs is probably cleaner and more honest. WDYT? |
|
I’ll create a separate PR with the fix for this and another bug. |
related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor
related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor
|
Was fixed with alternative PR |
related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor
related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor
related to #1153 ## Motivation and Context - fix reasoning message in nodeLLMRequest - fix conditions on multiple requests - add onReasoningMessage and onMultipleReasoningMessage ## Breaking Changes None --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [x] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [ ] Tests for the changes have been added - [x] All new and existing tests passed #### Additional Context To add tests need to modify and refactor mock executor
Motivation and Context
When running codeagent which uses
OpenAIModels.Chat.GPT5Codexandstrategy = singleRunStrategy()we have a sudden regression due to reasoning generated by codex.I tried many approaches, but this seems to be the only one that could fix the problem without causing issues with the already published article.
Options that were considered but would not be ideal are:
Breaking Changes
Type of the changes
Checklist
developas the base branchAdditional steps for pull requests adding a new feature