If I have one document, and another document contains a portion of this document along with some text like advertisements, but only a small amount, can MinHash retain only the most comprehensive document? Can MinHashLSH identify partial documents as accurately as possible if such documents are distributed among a large number of files?
If I have one document, and another document contains a portion of this document along with some text like advertisements, but only a small amount, can MinHash retain only the most comprehensive document? Can MinHashLSH identify partial documents as accurately as possible if such documents are distributed among a large number of files?