This is a group project done as part of my Undergraduate Research Course. Due to University policy, I am not allowed to share code publicly. Please contact me further regarding this.
Set similarity is a pervasive concept in mathematics and computer science which is applied extensively in the databases and data mining fields. There are multiple different measures of set similarity. In this paper we do comparative testing on the most notable and commonly used ones.
An evalutaion of set similarity measures by first translating the problem of set similarity from its abstract set theory domain to a concrete graph theory problem and then exhaustively testing the most common set similarity measures. The results show that using the Tversky index as a similarity measure yields the best results. This measure depends on two coefficients, α and β, and we find their optimal values to be 1.0 and 0.01, respectively.