:


Title: Similarity Search: A Web Perspective.

Abstract: Similarity search is the problem of preprocessing a database of N objects in such a way that given a query object, one can effectively determine its nearest neighbors in database. "Geometric near-neighbor access tree" data structure, an early work (1995) by Sergey Brin, is one of the most known solutions to this problem.

Similarity search is closely connected to many algorithmic problems in the web. In this talk we will focus on personalized news aggregation (searching for news articles that are most similar to the user's profile of interests).
and behavioral targeting (searching for the most relevant user for displaying a given advertisement).
Other web-applications include near-duplicate detection, (2) finding related pages, classiffcation/fitering tasks like detecting spam pages, spelling correction, sense disambiguation, computing co-occurrence similarity, social network analysis (e.g. suggesting new friends), recommendation systems.

We describe features that make web applications somewhat different from previously studied models. Thus we re-examine the formalization and the classical algorithms for similarity search. This leads us to new algorithms (we mention two of them) and numerous open problems in the field.