Performance-focused release with Float32 graph internals and Matrix embedding format.
- Uniform embedding initialization: 7-14x faster, 3-10x less memory
- Embedding optimization: 8-10% faster due to better cache locality
- Graph construction: 15-28% less memory for simplicial set operations
- Overall fit/transform: Neutral to 7% faster, 3-20% less memory
-
Embedding format changed:
result.embeddingis now aMatrix{T}of shape(n_dims, n_points)instead ofVector{Vector{T}}# Old (v0.2) result.embedding[i] # Vector for point i # New (v0.3) result.embedding[:, i] # Column for point i
-
Graph edge weights are Float32:
result.graphnow haseltypeofFloat32
SourceViewParamsandSourceGlobalParamsnow use fixedFloat32fields (no longer type-parameterized)MembershipFnParams.aandMembershipFnParams.bare nowFloat32smooth_knn_distsreturnsFloat32arrays for ρs and σsSMOOTH_K_TOLERANCEchanged to1.0f-5
Version 0.2 is a major redesign of UMAP.jl focused on generality, extensibility, and better integration with the Julia ecosystem. This release introduces breaking changes to the API and internal structure.
-
Multi-view UMAP: Native support for multi-modal data through
NamedTupleinputsdata = (images=X_images, text=X_text) result = UMAP.fit(data, ...) # advanced usage API
-
Configuration System: Explicit configuration types for all algorithm stages
NeighborParams(withDescentNeighborsandPrecomputedNeighborsimplementations)SourceViewParamsandSourceGlobalParamsfor input space controlTargetParamsfor embedding space configurationOptimizationParamsfor SGD controlUMAPConfigbundling all parameters
-
Result Objects: New result types that encapsulate all information
UMAPResultfor fit resultsUMAPTransformResultfor transform results- Store configuration, intermediate results, and final embedding
-
Flexible Data Input: Support for multiple input formats
- Column-major matrices (as before)
- Vectors of points
- NamedTuples for multi-view data
-
Extensibility Points:
- Subtype
NeighborParamsfor custom KNN search implementations - Subtype
AbstractInitializationfor custom embedding initialization - Multiple dispatch on manifold types for future manifold support
- Subtype
-
Type-safe Initialization: Replace symbols with typed objects
SpectralInitialization()instead of:spectralUniformInitialization()instead of:random
-
Bandwidth Parameter: New
bandwidthparameter for controlling smooth k-distance calculation
-
Function names:
umap()andUMAP_()replaced withUMAP.fit()# Old (v0.1.11) embedding = umap(X, 2) model = UMAP_(X, 2) # New (v0.2) result = UMAP.fit(X, 2) embedding = result.embedding
-
Export to Public: Functions now use
publicinstead ofexport- Access as
UMAP.fit()andUMAP.transform() - Or use
using UMAP: fit, transformto bring into scope
- Access as
-
Transform signature:
transform()now takesUMAPResultand automatically inherits parameters# Old (v0.1.11) Q_embed = transform(model, Q; n_neighbors=15, min_dist=0.1, n_epochs=100) # New (v0.2) Q_result = UMAP.transform(result, Q) # Parameters inherited Q_embed = Q_result.embedding
-
Result types:
UMAP_struct replaced withUMAPResult- Fields reorganized:
knnsanddistsnow in tupleknns_dists - Configuration stored in
configfield - Added
fs_setsfield for fuzzy simplicial sets
- Fields reorganized:
-
Initialization parameter: Changed from
Symbolto typed objects# Old UMAP_(X, 2; init=:spectral) # New UMAP.fit(X, 2; init=UMAP.SpectralInitialization())
-
Removed direct
aandbparameters: Now computed automatically frommin_distandspread- Advanced users can still customize via
MembershipFnParamsin advanced API
- Advanced users can still customize via
-
Learning rate in transform: Default transform behavior changed
- Now uses 30 epochs (vs 100) and learning_rate/4
- Better defaults for transform use case
-
Modular architecture: Split into multiple files
config.jl,neighbors.jl,simplicial_sets.jl,embeddings.jl,optimize.jl,fit.jl,transform.jl
-
Dependency changes:
- Added
Accessors.jldependency - Changed from
usingtoimportfor most dependencies NearestNeighborDescentimported asNND
- Added
-
Embedding representation: Internal representation changed to
Vector{Vector}for future manifold support- Users typically won't notice unless accessing internal structures
Basic Usage:
# v0.1.11
using UMAP
embedding = umap(X, 2; n_neighbors=15, min_dist=0.1)
# v0.2
using UMAP: fit
result = fit(X, 2; n_neighbors=15, min_dist=0.1)
embedding = result.embeddingFit and Transform:
# v0.1.11
model = UMAP_(X, 2; n_neighbors=15)
Q_embed = transform(model, Q; n_neighbors=15, n_epochs=100)
# v0.2
result = UMAP.fit(X, 2; n_neighbors=15)
Q_result = UMAP.transform(result, Q)
Q_embed = Q_result.embeddingInitialization:
# v0.1.11
result = UMAP_(X, 2; init=:random)
# v0.2
result = UMAP.fit(X, 2; init=UMAP.UniformInitialization())Accessing KNN Information:
# v0.1.11
knns = model.knns
dists = model.dists
# v0.2
knns, dists = result.knns_distsFor complete migration guidance, see the Breaking Changes documentation.
- Performance characteristics remain similar to v0.1.11
- Improved type stability through explicit configuration types
- Memory layout changes may affect cache performance in some cases
- Comprehensive architecture documentation in landing page
- New examples demonstrating multi-view UMAP
- Detailed API reference for configuration types
- Loss function documentation with mathematical details
For release notes from v0.1.x releases, see the GitHub releases page.