Skip to content

Automatically convert JSON numbers to integers/floats for ClickHouse#4201

Open
benwebber wants to merge 5 commits intoredpanda-data:mainfrom
benwebber:fix/4192
Open

Automatically convert JSON numbers to integers/floats for ClickHouse#4201
benwebber wants to merge 5 commits intoredpanda-data:mainfrom
benwebber:fix/4192

Conversation

@benwebber
Copy link
Copy Markdown
Contributor

The ClickHouse driver does not specifically handle JSON values. JSON types implement fmt.Stringer, so it uses that to format JSON values:

https://github.com/ClickHouse/clickhouse-go/blob/v2.43.0/bind.go#L316-L322

This is a significant footgun because it can cause ClickHouse to execute a query differently, often with poor performance (e.g., doing a table scan instead of efficiently using the primary key). Therefore users almost always need to convert Bloblang numbers to the equivalent ClickHouse types using the number manipulation methods.

This changeset adds a ClickHouse argsConverter function to automatically convert json.Number values to Int64 or Float64. If args_mapping does not contain any numeric values, it passes the slice through untouched.

The regression test added in 36d41e2 fails without the argConverter change. You can check out that commit to confirm it fails before succeeding after 60afaae.

I used Sonnet 4.6 to write the test code.

Closes: #4192

@benwebber
Copy link
Copy Markdown
Contributor Author

I will rework this on top of #4178.

@benwebber
Copy link
Copy Markdown
Contributor Author

The changes in #4178 convert values within containers to ClickHouse types on output (sql_insert). These changes cast the types on input (sql_raw/sql_select).

The new sql_insert argsConverter closure currently does not call bloblValuesToClickHouseValues to convert json.Number values to their appropriate column data types. The driver accepts any integer/float value. If that value is too large for the target column, the insert will fail loudly, prompting the user to cast the input value to the appropriate type. On the other hand, the sql_select issue is a silent footgun.

The difference is only one line (and some tests). Simply call bloblValuesToClickHouseValues at the top of the loop in newClickHouseInsertArgsConverter. I will defer to the maintainers here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql_select: Validate ClickHouse data types

1 participant