Compute full hamming distance but only for Uint8Array pairs that look likely to meet a minimum target score at the halfway mark.
Helps speed up comparisons on long arrays when using high target scores.
A heuristic is used to exit early at the halfway mark if
sufficient scores have not been achieved already. This heuristic assumes that any matching bits are
evenly distributed across the array (which is typical for embeddings). The number of matching bits required at the halfway stage is
calculated as (targetScore * arrayLength /2) * 0.9. The 0.9 is used to allow for some unevenness in the
distribution of matching bits. Tests with real-world text embeddings have shown that this heuristic produced
no false negatives (when compared with the exhaustive computeHammingSimilarity) and doubled the speed of high-target-score searches.
When the heuristic exits early at the halfway point the score returned is zero rather than any estimate based on what has been
gathered so far.
Parameters
a: Uint8Array
First Uint8Array
b: Uint8Array
Second Uint8Array
targetScore: number
a number between 0 and 1
Returns number
a similarity score between zero and 1 (zero if comparison exited early)
Compute full hamming distance but only for Uint8Array pairs that look likely to meet a minimum target score at the halfway mark. Helps speed up comparisons on long arrays when using high target scores. A heuristic is used to exit early at the halfway mark if sufficient scores have not been achieved already. This heuristic assumes that any matching bits are evenly distributed across the array (which is typical for embeddings). The number of matching bits required at the halfway stage is calculated as (targetScore * arrayLength /2) * 0.9. The 0.9 is used to allow for some unevenness in the distribution of matching bits. Tests with real-world text embeddings have shown that this heuristic produced no false negatives (when compared with the exhaustive computeHammingSimilarity) and doubled the speed of high-target-score searches. When the heuristic exits early at the halfway point the score returned is zero rather than any estimate based on what has been gathered so far.