Sometimes there are properties in the document with unstructured text, like newspaper articles, blog posts, or book abstracts. The inverted index is easy to build and is similar to data structures search engines use.
Such document structures can help in various complex search patterns, like common word detection, full-text searches, or document similarity searches, using humming distance or l2distance algorithms. Inverted indexes are useful when the number of keywords is not too large and when the existing data is either totally immutable or rarely changed, but frequently searched.
Usually, the documents are "parents," and the words inside the document are "children." To build an inverted index, we invert this relation to make the words "parents" and documents "children":
Take all or a subset of keywords from the document and pair it with the document ID
DocId1: keyword1
DocId1: keyword2
DocId1: keyword3
DocId2: keyword4
DocId2: keyword1
Revert the order by taking all unique keywords and making a list of documents where those keywords appear.
Read more on the Inverted index and other data modeling structures in my blog here.
Yours,
Maria
Comments