Locally sensitive hashing for hiearchical data structures

Locally sensitive hashing for hiearchical data structures

Locality Sensitivity Hashing (LSH) is a powerful approach to similarity (or distance) estimation, which exploits a family of randomized hash functions to map the similar data instances to the same buckets with a higher probability than the dissimilar ones. In other words, LSH algorithms produce comparable hash values for similar values unlike the cryptographic hash algorithms that produces entirely different outputs for similar values. Yet traditional LSH algorithms cannot preserve the structure information represented as relations between elements, which is useful to hash hierarchical data structures like trees and graphs.

JSON is a powerful text based format that supports hierarchical data structures. Using JSON format, we conveniently represent arrays and unordered maps hierarchically.

In this project, you will investigate the application of hierarchical LSH techniques to estimate the similarity between JSON documents.

Also see a recent survey on LSH techniques on the structured data: https://arxiv.org/pdf/2204.11209.pdf

Project Advisor: 

Doğan Ulus

Project Status: 

Project Year: 

2023
  • Spring

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461
 

Connect with us

We're on Social Networks. Follow us & get in touch.