Description π
Follow up on indexing proposal #164 after experimentation.
related #214 #215
Performance
Using Map seems to be causing a significant performance impact.
I think it is best if we create a specialized interface on top of the raw index IndexReader that avoids memory allocation as much as possible.
Combining Data Types
There is no way to combine string and number values causing redundant index/keys declarations. Consider the following:
let map: Map<string, { type: number, weight: number, count: number, tag: string}>
{
type: EditRequestType.CREATE_INDEX,
data: {
name: "data",
keys: [...map.values()],
values: [...map.values()].flatMap(({ type, weight, count, tag }) => [
type,
Math.ceil((weight.value ?? 0) * WEIGHT_ROUNDING_FACTOR),
count,
// tag // can't be combined with numbers
]),
end: Array.from({ length: map.size }).map(
(v, index) => (index + 1) * 3
),
},
},
{
type: EditRequestType.CREATE_INDEX,
data: {
name: "tags",
keys: [...map.values()],
values: [...map.values()].map(({ tag }) => tag),
},
},
Using Transferable Data Types
Should we expose a method getIndex(name) that retrieves the entire index?
Should we build a transferable ArrayBuffer and expose an interface on top of it for reading IndexReader so that it can be messaged back to the main thread without being cloned?
**code**
worker
getIndex(name: string): { data: { keys: ArrayBuffer; values: ArrayBuffer | null; keysLength: number; valuesLength: number; keyType: "string" | "number"; valueType: "string" | "number" | "none"; mode: IndexMode; }; transfers: ArrayBuffer[]; } | null {
const entry = this.resolve(name);
if (!entry) return null;
const { source, info } = entry;
const { mode, keyType, valueType, size } = info;
// ββ Keys ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// Encode string keys as a UTF-8 byte buffer (null-terminated per entry)
// so the ArrayBuffer is transferable. Number keys are a plain Uint32Array.
let keysBuffer: ArrayBuffer;
let keysLength = size;
if (keyType === "number") {
const arr = new Uint32Array(size);
for (let i = 0; i < size; i++) arr[i] = this.readNumberKey(source, i) ?? 0;
keysBuffer = arr.buffer;
} else {
const encoder = new TextEncoder();
const parts: Uint8Array[] = [];
for (let i = 0; i < size; i++) {
parts.push(encoder.encode((this.readStringKey(source, i) ?? "") + "\0"));
}
const total = parts.reduce((n, p) => n + p.byteLength, 0);
const buf = new Uint8Array(total);
let offset = 0;
for (const p of parts) { buf.set(p, offset); offset += p.byteLength; }
keysBuffer = buf.buffer;
}
// ββ Values ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
let valuesBuffer: ArrayBuffer | null = null;
let valuesLength = 0;
if (mode === "oneToOne") {
if (valueType === "number") {
const arr = new Uint32Array(size);
for (let i = 0; i < size; i++) arr[i] = this.readNumberValueAt(source, i) ?? 0;
valuesBuffer = arr.buffer;
valuesLength = size;
} else if (valueType === "string") {
const encoder = new TextEncoder();
const parts: Uint8Array[] = [];
for (let i = 0; i < size; i++) {
parts.push(encoder.encode((this.readStringValueAt(source, i) ?? "") + "\0"));
}
const total = parts.reduce((n, p) => n + p.byteLength, 0);
const buf = new Uint8Array(total);
let offset = 0;
for (const p of parts) { buf.set(p, offset); offset += p.byteLength; }
valuesBuffer = buf.buffer;
valuesLength = size;
}
} else if (mode === "oneToNLinear" || mode === "oneToNNonLinear") {
if (valueType === "number") {
// Copy the entire flat value pool β sliceBounds offsets index into it.
const src = this.numberValuesArray(source);
const copy = src ? new Uint32Array(src) : new Uint32Array(0);
valuesBuffer = copy.buffer;
valuesLength = copy.length;
} else if (valueType === "string") {
const encoder = new TextEncoder();
const [, lastEnd] = this.sliceBounds(entry, size - 1);
const parts: Uint8Array[] = [];
for (let i = 0; i < lastEnd; i++) {
parts.push(encoder.encode((this.readStringValueAt(source, i) ?? "") + "\0"));
}
const total = parts.reduce((n, p) => n + p.byteLength, 0);
const buf = new Uint8Array(total);
let offset = 0;
for (const p of parts) { buf.set(p, offset); offset += p.byteLength; }
valuesBuffer = buf.buffer;
valuesLength = parts.length;
}
}
// ββ Transfer list ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
const transfers: ArrayBuffer[] = [keysBuffer];
if (valuesBuffer) transfers.push(valuesBuffer);
return {
data: { keys: keysBuffer, values: valuesBuffer, keysLength, valuesLength, keyType, valueType, mode },
transfers,
};
}
main thread
function decodeIndex(data: ReturnType<VirtualIndexesController["getIndex"]>["data"]) {
const { keys, values, keysLength, valuesLength, keyType, valueType, mode } = data;
const decoder = new TextDecoder();
// Decode keys
const decodedKeys: (string | number)[] = keyType === "number"
? Array.from(new Uint32Array(keys))
: decoder.decode(keys).split("\0").slice(0, keysLength);
// Decode values
const map = new Map<string | number, IndexEntry>();
if (mode === "keysOnly" || !values) {
for (const key of decodedKeys) map.set(key, null);
return map;
}
const decodedValues: (string | number)[] = valueType === "number"
? Array.from(new Uint32Array(values))
: decoder.decode(values).split("\0").slice(0, valuesLength);
for (let i = 0; i < decodedKeys.length; i++) {
map.set(decodedKeys[i], decodedValues[i] ?? null);
}
return map;
}
Index Validation
related #221
I have reached OOM and caused the browser to crash without any error messages due to an incorrect end vector that had values that exceeded the size of values.
Since the error is not transparent, and the construction of start/end is not straight forward, I would like to suggest validating start/end against the length of values, as well against as each other, throwing if out of bounds.
Description π
Follow up on indexing proposal #164 after experimentation.
related #214 #215
Performance
Using
Mapseems to be causing a significant performance impact.I think it is best if we create a specialized interface on top of the raw index
IndexReaderthat avoids memory allocation as much as possible.Combining Data Types
There is no way to combine string and number values causing redundant index/keys declarations. Consider the following:
Using Transferable Data Types
Should we expose a method
getIndex(name)that retrieves the entire index?Should we build a transferable ArrayBuffer and expose an interface on top of it for reading
IndexReaderso that it can be messaged back to the main thread without being cloned?**code**
worker
main thread
Index Validation
related #221
I have reached OOM and caused the browser to crash without any error messages due to an incorrect
endvector that had values that exceeded the size of values.Since the error is not transparent, and the construction of
start/endis not straight forward, I would like to suggest validatingstart/endagainst the length ofvalues, as well against as each other, throwing if out of bounds.