Set-Similarity

This is a group project done as part of my Undergraduate Research Course. Due to University policy, I am not allowed to share code publicly. Please contact me further regarding this.

Abstract

Set similarity is a pervasive concept in mathematics and computer science which is applied extensively in the databases and data mining fields. There are multiple different measures of set similarity. In this paper we do comparative testing on the most notable and commonly used ones.

An evalutaion of set similarity measures by first translating the problem of set similarity from its abstract set theory domain to a concrete graph theory problem and then exhaustively testing the most common set similarity measures. The results show that using the Tversky index as a similarity measure yields the best results. This measure depends on two coefficients, α and β, and we find their optimal values to be 1.0 and 0.01, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Set Similarity.pdf		Set Similarity.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Set-Similarity

Abstract

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Set-Similarity

Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages