Arun Chinnachamy bio photo

Arun Chinnachamy

I am a chemical Engineer from BITS-Pilani. Right now, I work as Technology Lead at MySmartPrice. This is just a place where I write about the things I work and think.

Email Twitter Facebook Github

Now a days Spell checking has become an important feature in Search be it Google or any E-Commerce site. The famous and most used lucene based Search index engine SOLR comes with Spell checker in-built. If you are interested in performing spell checking with SOLR, you won’t be disappointed with what it has under its folds. Lets start out tutorial on How to configure SOLR spellchecker and use it for spell correction and search suggestions.

If you looking for How to install SOLR in Debian/Ubuntu, visit SOLR Installation Guide.

About SOLR SpellChecker

Be aware that the Spell checker in SOLR is not really a spell checker but a spell suggester. It provides inline spell checking for which you do not have to issue a separate request. Frankly, it should be named Spell Suggestions or Query suggestions rather than Spell Checker. So If you find the query to be spelled wrong, you need to query again with the correct spell. But this really helps in implementing features like providing_ Do you mean XXX or YYY _ suggestions to users.

SOLR Configuration for Spell Checker:

There are more than one way to implement the spell checker in SOLR. The important two being,

  • Dictionary Based Spell checker
  • Index Based Spell checker
Index Based Spell checking in Solr

I find this to be very effective and useful. Solr uses one of the configured field in the indexed document as Dictionary input and uses it for spell suggestions. The configuration is simple and requires very less effort.

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
   <str name="queryAnalyzerFieldType">string</str> <!-- Replace with Field Type of your schema -->
   <lst name="spellchecker">
       <str name="name">default</str>
       <str name="field">productname_spell</str> <!-- Replace with field name as per your scheme -->
       <str name="spellcheckIndexDir">./spellchecker</str>
       <str name="buildOnOptimize">true</str>
       <str name="buildOnCommit">true</str>
   </lst>
   <!-- a spellchecker that uses a different distance measure -->
   <lst name="spellchecker">
       <str name="name">jarowinkler</str> 
       <str name="field">spell</str>
       <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
       <str name="spellcheckIndexDir">./spellchecker2</str>
   </lst>

   <!-- a file based spell checker -->
   <lst name="spellchecker">
       <str name="classname">solr.FileBasedSpellChecker</str>
       <str name="name">file</str>
       <str name="sourceLocation">spellings.txt</str>
       <str name="characterEncoding">UTF-8</str>
       <str name="spellcheckIndexDir">./spellchecker</str>
   </lst>
</searchComponent>

The queryAnalyzerFieldType in the above configuration is important. Make sure the field productname_spell is of same type and the way you configure Analyzers and filters in that field type defines what goes inside the spell check dictionary.

Now we need to setup the spell checker request handler which will be serving all the queries, create a new request handler in solrconfig.xml as described below. There is a good chance that you might need to change the values based on your requirement. If you looking forward to use the file based spell checker, just replace the spellcheck.dictionary with file.

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
        <str name="df">productname</str> <!--The default field for spell checking. -->
        <str name="spellcheck.dictionary">default</str> <!--default or file or jarowinkler as mentioned above. -->
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str> 
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.maxResultsForSuggest">5</str> 
        <str name="spellcheck.collate">true</str>
        <str name="spellcheck.collateExtendedResults">true</str> 
        <str name="spellcheck.maxCollationTries">10</str>
        <str name="spellcheck.maxCollations">5</str> 
    </lst>
 <arr name="last-components">
     <str>spellcheck</str>
 </arr>
</requestHandler>

Now reload the SOLR configuration of the core by visiting the URL,

http://{SOLR IP}:{SOLR PORT}/solr/admin/cores?action=RELOAD&core={CORE NAME}

If you do not have multiple cores, just restart the tomcat service. If you know any better way to reload the configuration, please leave a comment.

Once the reload/restart is completed without errors, index the documents and make sure the documents index without errors. If you are using mysql and not using dataimporthandler to import documents into SOLR, you should try importing mysql into SOLR.

Now you can check the workings of Solr Spellchecker by visiting the url,

http://{SOLR IP}:{SOLR PORT}/solr/{CORE NAME}/spell?spellcheck=true&qt=spellchecker&spellcheck.accuracy=0.8&spellcheck.collate=true&fl=*%2Cscore&extendedResults=true+&q={YOUR QUERY}

In case you have single core, remove the CORE NAME from the URL. If you try to query with any wrong spelling, the SOLR returns a new section spellcheck with the correct spelling. In my case, I have few words like Mobiles, Television, Phones, Books, Cameras etc indexed in SOLR. When i execute,

http://{SOLR IP}:{SOLR PORT}/solr/{CORE NAME}/spell?spellcheck=true&qt=spellchecker&spellcheck.accuracy=0.8&spellcheck.collate=true&fl=*%2Cscore&extendedResults=true+&q=color teleision

I get the following output with corrected spelling for television.

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <result name="response" numFound="0" start="0" maxScore="0.0"/>
    <lst name="spellcheck">
        <lst name="suggestions">
            <lst name="teleision">
                <int name="numFound">1</int>
                <int name="startOffset">7</int>
                <int name="endOffset">16</int>
                <int name="origFreq">0</int>
                <arr name="suggestion">
                    <lst>
                        <str name="word">television</str>
                        <int name="freq">1</int>
                    </lst>
                </arr>
            </lst>
            <bool name="correctlySpelled">false</bool>
            <str name="collation">color television</str>
        </lst>
    </lst>
</response>

Play around with field analyzers and filters to get the desired result and what you want to index. Visit solr spellcheckcomponent WIKI page for more configuration details. Visit my other articles on SOLR Installation and SOLR DataImportHandler configuration.

Leave a comment if you find this useful. If you come across any errors, paste the error. I will try my best to answer your queries.