Skip to content

Commit c128c19

Browse files
xiangfu0claude
andcommitted
Add first-class logical UUID type support
Adds UUID as a first-class Pinot data type backed by 16-byte BYTES storage with canonical lowercase RFC 4122 string as the external representation. ## Core type plumbing - `FieldSpec.DataType.UUID(BYTES, 16, false, true)` — stored as BYTES, fixed 16 bytes, not numeric, sortable - `PinotDataType.UUID` — conversions to/from STRING/BYTES, numeric ops throw as expected - `DataSchema.ColumnDataType.UUID` — backed by BYTES internal, maps to Calcite `SqlTypeName.UUID` - `Schema.validate()` — UUID and BIG_DECIMAL enforce SV-only constraint ## UuidUtils (pinot-spi) - Canonical conversions: `toBytes()`, `toUUID()`, `toString()` - `UuidKey` inner class — stores UUID as two `long` fields for O(1) hashCode/equals in hot groupby/join paths (avoids ByteArray allocation) - `compare()` uses `Long.compareUnsigned` for correct byte-order UUID comparison vs. `ByteArray.compare()` signed bytes ## Query engine - Predicate evaluators — Eq/NotEq/In/NotIn handle UUID via `BytesRaw*Evaluator` + `DataType.UUID` - Group key generators — `NoDictionarySingleColumnGroupKeyGenerator`, `NoDictionaryMultiColumnGroupKeyGenerator` updated - MSQE hot-path: `UuidToIdMap`, `OneUuidKeyGroupIdGenerator`, `UuidLookupTable` - `RequestContextUtils.evaluateLiteralValue` — supports CAST/TO_UUID/UUID_TO_BYTES/BYTES_TO_UUID/UUID_TO_STRING/IS_UUID on predicate RHS literals; uses `BadQueryRequestException` for arity errors - `CastTransformFunction` — CAST(col AS UUID) per-row support - Scalar functions: `IS_UUID`, `TO_UUID`, `UUID_TO_BYTES`, `BYTES_TO_UUID`, `UUID_TO_STRING` ## Segment / realtime - `MutableSegmentImpl` — UUID primary key normalization for upsert via `normalizePrimaryKeyValue()` - `MutableNoDictColumnStatistics.isSorted()` — uses `UuidUtils.compare` (unsigned byte-order) instead of `ByteArray.compare` (signed) for UUID - `BloomFilterCreator` — UUID stored as canonical string in bloom filter - `RawValueBitmapInvertedIndexCreator` — UUID handled via stored BYTES ## Response encoding - Arrow encoder — emits UUID string values in UUID columns - JSON encoder — passes through UUID as string ## Input format - Avro plugin — `logicalType: "uuid"` Avro schema field → UUID DataType; Pinot UUID field → Avro `{"type":"string","logicalType":"uuid"}` - Rejects Avro ARRAY<uuid> with a clear error (UUID is SV-only) ## Tests - `UuidTypeTest`, `UuidTypeRealtimeTest` — full integration tests for batch and realtime ingestion, predicate pushdown, group-by, CAST - `UuidUpsertRealtimeTest` — upsert with UUID primary key - Unit tests for UuidUtils, DataSchema, PinotDataType, RequestContextUtils, response encoders, Avro utils, predicate evaluators Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent aa483d3 commit c128c19

116 files changed

Lines changed: 5104 additions & 294 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,5 +187,55 @@ Check out [Pinot documentation](https://docs.pinot.apache.org/) for a complete d
187187
- [Pinot Architecture](https://docs.pinot.apache.org/basics/architecture)
188188
- [Pinot Query Language](https://docs.pinot.apache.org/users/user-guide-query/pinot-query-language)
189189

190+
### UUID Logical Type
191+
192+
Pinot supports a logical `UUID` type for single-value columns. In v1, Pinot stores `UUID` values using the existing
193+
16-byte `BYTES` representation, while schema definitions and query results use canonical lowercase RFC 4122 strings.
194+
195+
Schema example:
196+
```json
197+
{
198+
"schemaName": "events",
199+
"dimensionFieldSpecs": [
200+
{
201+
"name": "eventId",
202+
"dataType": "UUID"
203+
}
204+
]
205+
}
206+
```
207+
208+
Query example:
209+
```sql
210+
SELECT eventId
211+
FROM events
212+
WHERE eventId = CAST('550e8400-e29b-41d4-a716-446655440000' AS UUID)
213+
```
214+
215+
UUID conversion helpers:
216+
```sql
217+
SELECT
218+
TO_UUID('550E8400-E29B-41D4-A716-446655440000'),
219+
UUID_TO_STRING(eventId),
220+
UUID_TO_BYTES(eventId),
221+
BYTES_TO_UUID(eventIdBytes),
222+
IS_UUID(eventIdBytes)
223+
FROM events
224+
```
225+
226+
Behavior notes:
227+
- Pinot accepts canonical RFC 4122 UUID strings in either upper or lower case on ingest and in functions/casts.
228+
- Pinot always renders `UUID` results as canonical lowercase strings.
229+
- `CAST(... AS UUID)` accepts canonical strings and 16-byte `BYTES` values.
230+
231+
Migration notes:
232+
- Existing `BYTES` columns keep returning hex strings. Pinot only renders canonical UUID strings for columns declared as `UUID`.
233+
- Pinot does not support changing the data type of an existing column in place. To adopt `UUID` for existing
234+
`STRING` or `BYTES` UUID-shaped data, create a new `UUID` column or a new table/schema and reingest/backfill the
235+
data into it.
236+
- The `UUID` type itself does not require a segment or wire format bump in v1, but migration still requires rebuild or
237+
reingest because schema type mutation is unsupported.
238+
- Multi-value UUID columns are not supported in v1.
239+
190240
## License
191241
Apache Pinot is under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)

pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
import java.util.Collection;
2424
import java.util.HashMap;
2525
import java.util.Map;
26+
import java.util.UUID;
2627
import javax.annotation.Nullable;
2728
import org.apache.calcite.rel.type.RelDataType;
2829
import org.apache.calcite.rel.type.RelDataTypeFactory;
@@ -52,6 +53,7 @@ private FunctionUtils() {
5253
put(Timestamp.class, PinotDataType.TIMESTAMP);
5354
put(String.class, PinotDataType.STRING);
5455
put(byte[].class, PinotDataType.BYTES);
56+
put(UUID.class, PinotDataType.UUID);
5557
put(int[].class, PinotDataType.PRIMITIVE_INT_ARRAY);
5658
put(long[].class, PinotDataType.PRIMITIVE_LONG_ARRAY);
5759
put(float[].class, PinotDataType.PRIMITIVE_FLOAT_ARRAY);
@@ -75,6 +77,7 @@ private FunctionUtils() {
7577
put(Timestamp.class, PinotDataType.TIMESTAMP);
7678
put(String.class, PinotDataType.STRING);
7779
put(byte[].class, PinotDataType.BYTES);
80+
put(UUID.class, PinotDataType.UUID);
7881
put(int[].class, PinotDataType.PRIMITIVE_INT_ARRAY);
7982
put(Integer[].class, PinotDataType.INTEGER_ARRAY);
8083
put(long[].class, PinotDataType.PRIMITIVE_LONG_ARRAY);
@@ -103,6 +106,7 @@ private FunctionUtils() {
103106
put(Timestamp.class, DataType.TIMESTAMP);
104107
put(String.class, DataType.STRING);
105108
put(byte[].class, DataType.BYTES);
109+
put(UUID.class, DataType.UUID);
106110
put(int[].class, DataType.INT);
107111
put(long[].class, DataType.LONG);
108112
put(float[].class, DataType.FLOAT);
@@ -125,6 +129,7 @@ private FunctionUtils() {
125129
put(Timestamp.class, ColumnDataType.TIMESTAMP);
126130
put(String.class, ColumnDataType.STRING);
127131
put(byte[].class, ColumnDataType.BYTES);
132+
put(UUID.class, ColumnDataType.UUID);
128133
put(int[].class, ColumnDataType.INT_ARRAY);
129134
put(long[].class, ColumnDataType.LONG_ARRAY);
130135
put(float[].class, ColumnDataType.FLOAT_ARRAY);
@@ -197,6 +202,8 @@ public static RelDataType getRelDataType(RelDataTypeFactory typeFactory, Class<?
197202
case STRING:
198203
case JSON:
199204
return typeFactory.createSqlType(SqlTypeName.VARCHAR);
205+
case UUID:
206+
return typeFactory.createSqlType(SqlTypeName.UUID);
200207
case BYTES:
201208
return typeFactory.createSqlType(SqlTypeName.VARBINARY);
202209
case INT_ARRAY:

pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020

2121
import com.google.common.annotations.VisibleForTesting;
2222
import java.io.UnsupportedEncodingException;
23-
import java.nio.ByteBuffer;
2423
import java.nio.charset.StandardCharsets;
2524
import java.text.Normalizer;
2625
import java.util.Base64;
@@ -30,6 +29,7 @@
3029
import org.apache.pinot.common.utils.URIUtils;
3130
import org.apache.pinot.spi.annotations.ScalarFunction;
3231
import org.apache.pinot.spi.utils.JsonUtils;
32+
import org.apache.pinot.spi.utils.UuidUtils;
3333

3434

3535
/**
@@ -442,12 +442,8 @@ public static String fromAscii(byte[] input) {
442442
@ScalarFunction
443443
public static byte[] toUUIDBytes(String input) {
444444
try {
445-
UUID uuid = UUID.fromString(input);
446-
ByteBuffer bb = ByteBuffer.wrap(new byte[16]);
447-
bb.putLong(uuid.getMostSignificantBits());
448-
bb.putLong(uuid.getLeastSignificantBits());
449-
return bb.array();
450-
} catch (IllegalArgumentException e) {
445+
return UuidUtils.toBytes(UUID.fromString(input));
446+
} catch (Exception e) {
451447
return null;
452448
}
453449
}
@@ -459,10 +455,7 @@ public static byte[] toUUIDBytes(String input) {
459455
*/
460456
@ScalarFunction
461457
public static String fromUUIDBytes(byte[] input) {
462-
ByteBuffer bb = ByteBuffer.wrap(input);
463-
long firstLong = bb.getLong();
464-
long secondLong = bb.getLong();
465-
return new UUID(firstLong, secondLong).toString();
458+
return UuidUtils.toString(input);
466459
}
467460

468461
/**
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
/**
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
package org.apache.pinot.common.function.scalar.uuid;
20+
21+
import java.util.List;
22+
import java.util.Set;
23+
import javax.annotation.Nullable;
24+
import org.apache.calcite.sql.type.OperandTypes;
25+
import org.apache.calcite.sql.type.ReturnTypes;
26+
import org.apache.calcite.sql.type.SqlTypeFamily;
27+
import org.apache.pinot.common.function.FunctionInfo;
28+
import org.apache.pinot.common.function.PinotScalarFunction;
29+
import org.apache.pinot.common.function.sql.PinotSqlFunction;
30+
import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
31+
import org.apache.pinot.spi.annotations.ScalarFunction;
32+
import org.apache.pinot.spi.utils.UuidUtils;
33+
34+
35+
/**
36+
* Polymorphic scalar function that validates string or bytes values as UUID inputs.
37+
*
38+
* <p>This implementation is stateless and thread-safe.
39+
*/
40+
@ScalarFunction(names = {"IS_UUID"})
41+
public class IsUuidScalarFunction implements PinotScalarFunction {
42+
private static final FunctionInfo STRING_FUNCTION_INFO;
43+
private static final FunctionInfo BYTES_FUNCTION_INFO;
44+
45+
static {
46+
try {
47+
STRING_FUNCTION_INFO =
48+
new FunctionInfo(IsUuidScalarFunction.class.getMethod("isUuid", String.class), IsUuidScalarFunction.class,
49+
true);
50+
BYTES_FUNCTION_INFO =
51+
new FunctionInfo(IsUuidScalarFunction.class.getMethod("isUuid", byte[].class), IsUuidScalarFunction.class,
52+
true);
53+
} catch (NoSuchMethodException e) {
54+
throw new RuntimeException(e);
55+
}
56+
}
57+
58+
@Override
59+
public String getName() {
60+
return "IS_UUID";
61+
}
62+
63+
@Override
64+
public Set<String> getNames() {
65+
return Set.of("IS_UUID", "ISUUID");
66+
}
67+
68+
@Nullable
69+
@Override
70+
public PinotSqlFunction toPinotSqlFunction() {
71+
return new PinotSqlFunction("IS_UUID", ReturnTypes.BOOLEAN,
72+
OperandTypes.or(OperandTypes.family(List.of(SqlTypeFamily.CHARACTER)),
73+
OperandTypes.family(List.of(SqlTypeFamily.BINARY))));
74+
}
75+
76+
@Nullable
77+
@Override
78+
public FunctionInfo getFunctionInfo(ColumnDataType[] argumentTypes) {
79+
if (argumentTypes.length != 1) {
80+
return null;
81+
}
82+
switch (argumentTypes[0]) {
83+
case STRING:
84+
return STRING_FUNCTION_INFO;
85+
case BYTES:
86+
return BYTES_FUNCTION_INFO;
87+
default:
88+
return null;
89+
}
90+
}
91+
92+
@Nullable
93+
@Override
94+
public FunctionInfo getFunctionInfo(int numArguments) {
95+
return numArguments == 1 ? STRING_FUNCTION_INFO : null;
96+
}
97+
98+
public static boolean isUuid(String value) {
99+
return UuidUtils.isUuid(value);
100+
}
101+
102+
public static boolean isUuid(byte[] value) {
103+
return UuidUtils.isUuid(value);
104+
}
105+
}
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
/**
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
package org.apache.pinot.common.function.scalar.uuid;
20+
21+
import java.util.List;
22+
import java.util.Set;
23+
import java.util.UUID;
24+
import javax.annotation.Nullable;
25+
import org.apache.calcite.sql.type.OperandTypes;
26+
import org.apache.calcite.sql.type.ReturnTypes;
27+
import org.apache.calcite.sql.type.SqlTypeFamily;
28+
import org.apache.pinot.common.function.FunctionInfo;
29+
import org.apache.pinot.common.function.PinotScalarFunction;
30+
import org.apache.pinot.common.function.sql.PinotSqlFunction;
31+
import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
32+
import org.apache.pinot.spi.annotations.ScalarFunction;
33+
import org.apache.pinot.spi.utils.UuidUtils;
34+
35+
36+
/**
37+
* Polymorphic scalar function that converts string or bytes inputs into Pinot's logical UUID type.
38+
*
39+
* <p>This implementation is stateless and thread-safe.
40+
*/
41+
@ScalarFunction(names = {"TO_UUID"})
42+
public class ToUuidScalarFunction implements PinotScalarFunction {
43+
private static final FunctionInfo STRING_FUNCTION_INFO;
44+
private static final FunctionInfo BYTES_FUNCTION_INFO;
45+
46+
static {
47+
try {
48+
STRING_FUNCTION_INFO =
49+
new FunctionInfo(ToUuidScalarFunction.class.getMethod("toUuid", String.class), ToUuidScalarFunction.class,
50+
true);
51+
BYTES_FUNCTION_INFO =
52+
new FunctionInfo(ToUuidScalarFunction.class.getMethod("toUuid", byte[].class), ToUuidScalarFunction.class,
53+
true);
54+
} catch (NoSuchMethodException e) {
55+
throw new RuntimeException(e);
56+
}
57+
}
58+
59+
@Override
60+
public String getName() {
61+
return "TO_UUID";
62+
}
63+
64+
@Override
65+
public Set<String> getNames() {
66+
return Set.of("TO_UUID", "TOUUID");
67+
}
68+
69+
@Nullable
70+
@Override
71+
public PinotSqlFunction toPinotSqlFunction() {
72+
return new PinotSqlFunction("TO_UUID", ReturnTypes.explicit(org.apache.calcite.sql.type.SqlTypeName.UUID),
73+
OperandTypes.or(OperandTypes.family(List.of(SqlTypeFamily.CHARACTER)),
74+
OperandTypes.family(List.of(SqlTypeFamily.BINARY))));
75+
}
76+
77+
@Nullable
78+
@Override
79+
public FunctionInfo getFunctionInfo(ColumnDataType[] argumentTypes) {
80+
if (argumentTypes.length != 1) {
81+
return null;
82+
}
83+
switch (argumentTypes[0]) {
84+
case STRING:
85+
return STRING_FUNCTION_INFO;
86+
case BYTES:
87+
return BYTES_FUNCTION_INFO;
88+
default:
89+
return null;
90+
}
91+
}
92+
93+
@Nullable
94+
@Override
95+
public FunctionInfo getFunctionInfo(int numArguments) {
96+
return numArguments == 1 ? STRING_FUNCTION_INFO : null;
97+
}
98+
99+
public static UUID toUuid(String value) {
100+
return value != null ? UuidUtils.toUUID(value) : null;
101+
}
102+
103+
public static UUID toUuid(byte[] value) {
104+
return value != null ? UuidUtils.toUUID(value) : null;
105+
}
106+
}

0 commit comments

Comments
 (0)