Arun Chinnachamy bio photo

Arun Chinnachamy

I am a chemical Engineer from BITS-Pilani. Right now, I work as Technology Lead at MySmartPrice. This is just a place where I write about the things I work and think.

Email Twitter Facebook Github

Recently I wrote a blog post on How to install and configure SOLR. While I am planning to write series of posts on SOLR, I found it important to provide few extra details about Spell Checker, DataImportHandler and other similar areas which is not covered in the post. If you are like me who want to perform MySQL data import into SOLR as Documents, this post (Apache SOLR - SOLR MySQL Data Import) is for you.

If you want to configure Spell Checker in SOLR, visit my another post on Index Based Spellchecker in SOLR.

How to Import MySQL to SOLR?

Solr comes with DataImportHandler which provides a configuration driven way to load SOLR with the data imported from relational database or XML in both “full builds” and using incremental delta imports.

To configure DataImportHandler, Edit the solrconfig.xml file which can be found in /var/lib/tomcat6/solr/conf if you have not configured multiple cores, else go to the solrconfig.xml in the conf directory inside your Core instance Directory. Add the request handler into the configuration file.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
 <str name="config">data-config.xml</str>
</lst>
</requestHandler>

Now save and close the file. Next step is to create a file named data-config.xml and save it inside the conf directory along with solrconfig.xml. Add the following lines into the files.

<dataConfig>
 <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
                     url="jdbc:mysql://{DB_HOST}/{DB_NAME}" 
                        user="{DB_USER}" password="{DB_PASS}"/>
 <document>
     <entity name="id" 
          query="select col1,col2,col3 from mytable">
        <field column="col1" name="solr_field1"/>
        <field column="col2" name="solr_field2"/>
        <field column="col3" name="solr_field3"/>
     </entity>
 </document>
</dataConfig>

This is the configuration for DataImportHandler in SOLR. Replace the values appropriately with Database credentials. In case of field section, you need to match the column name in the database table to the field name which you have configured in schema.xml of SOLR. In this case, it is assumed that the field names in your schema.xml are solr_field1, solr_field2 and solr_field3.

Last and final step is to copy the MySQL JDBC driver JAR file into the lib directory in your SOLR home or CORE instance directory. You can download the driver file from MySQL Download Page.

Once copied, you can run a full import by going to the URL

http://{SERVER IP}:{SERVER PORT}/solr/dataimport?command=full-import

if you have no cores else

http://{SERVER IP}:{SERVER PORT}/solr/{CORE NAME}/dataimport?command=full-import

If the server harasses you with errors, Check the latest log file of tomcat which in my case is under /var/lib/tomcat6/logs/. If you are not able to find the problem, feel free to post the error message. I will help if I can. Apart from full-import, You can also perform delta-imports.

Delta Import in Solr: Instead of full imports, you can perform delta imports if your data size is huge. Mention the query which will get the delta or new documents after the last import. This is quite useful if your index size is huge and takes lot of time.

For Example, look at the sample query below

<entity name="id" pk="ID" query="SELECT * FROM table1"
 deltaImportQuery="SELECT * FROM table1 WHERE id = '${dataimporter.delta.id}'"
 deltaQuery="SELECT id FROM table1 WHERE last_modified > '${dataimporter.last_index_time}'">
</entity>

Now that the SOLR MySQL data import is complete, there are other commands which will let you control the DataImportHandler. The important ones are listed below.

To start Delta imports hit the url, Without cores,

http://{SERVER IP}:{SERVER PORT}/solr/dataimport?command=delta-import

with cores,

http://{SERVER IP}:{SERVER PORT}/solr/{CORE NAME}/dataimport?command=delta-import

To know the status of the current command, Without cores,

http://{SERVER IP}:{SERVER PORT}/solr/dataimport

with cores,

http://{SERVER IP}:{SERVER PORT}/solr/{CORE NAME}/dataimport

To reload the data config file without restarting the SOLR, Without Cores,

http://{SERVER IP}:{SERVER PORT}/solr/dataimport?command=reload-config

With Cores,

http://{SERVER IP}:{SERVER PORT}/solr/{CORE NAME}/dataimport?command=reload-config

You can even abort the import, Without Cores,

http://{SERVER IP}:{SERVER PORT}/solr/dataimport?command=abort

With Cores,

http://{SERVER IP}:{SERVER PORT}/solr/{CORE NAME}/dataimport?command=abort

Please provide your comments if you find this useful or if you come across any issues.

[Updated on August 13, 2012] Error Debugging: If you come across the following error when configuring the SOLR with mysql,

Caused by: java.sql.SQLException: Illegal value for setFetchSize().

You can resolve the issue by adding batchSize=”-1” to your data source declaration like shown below.

<dataConfig>
 <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
                     url="jdbc:mysql://{DB_HOST}/{DB_NAME}" 
                        user="{DB_USER}" password="{DB_PASS}" batchSize="-1"/>
 <document>
     <entity name="id" 
          query="select col1,col2,col3 from mytable">
        <field column="col1" name="solr_field1"/>
        <field column="col2" name="solr_field2"/>
        <field column="col3" name="solr_field3"/>
     </entity>
 </document>
</dataConfig>