sameersoi t1_iwjye93 wrote on November 16, 2022 at 4:33 AM

First thing that popped into my mind was using a tree structure. A quick Google search (“suffix tree spellcheck”) led to this helpful article below that proposes the use of BK-trees (a tree that essential sorts child noes by Levenshtein edit distance). Crucially when doing a look up “Time Complexity will be O(L1L2log n), here L1 is the average length of word in our dictionary and L2 is the length of misspelled.”

https://www.geeksforgeeks.org/bk-tree-introduction-implementation/amp/

Devinco001 OP t1_iwmvton wrote on November 16, 2022 at 8:37 PM

Yeah, I googled BK tree. The data structure is amazing and reduces a lot of computational time. While searching that, I found another algorithm, symspell. This is even faster with high accuracy but doesn't use levenshtein, so currently going to use that.

But BK trees are the go to method if using pure levenshtein and similar, more accurate, string distances. So will be comparing the accuracy of both algos to choose the better one. Thanks