Sunspot gem does not have multibyte string support?

Wojciech Ziniewicz
Stories imported from wordpress
2 min readJun 10, 2013

I have an application that uses Sunspot Solr as it’s backend. This consists of two environments:

Production:

  • Solr version 3.6.1 instance on Tomcat
  • Tomcat 6 from debian repos (squeeze)
  • Java OpenJDK Runtime Environment (IcedTea6 1.8.13) (6b18–1.8.13–0+squeeze2)
  • rsolr (~> 1.0.7) gem
  • sunspot (1.3.3) gem

Development:

  • sunspot bundled gem for developers that want their own instance of the app set up by :

I was doing reindex on 174770 rows of quite complex multi-field model in Rails (along with its’ relations that have similar complexity)

[code language=”ruby”]
searchable do
text :number, :client_order_number, :customer_id, :id , :created_at, :updated_at, :type, :workflow_state, :status, :stock_status, :email
text :boolean_fields, :customer_name
text :human_status
text :email_segments
integer :number
integer :priority_id
integer :customer_id
integer :id
integer :status_order
integer :digitizing_team_id
string :email
string :client_order_number
string :type
string :workflow_state
string :status
string :customer_dan_email_address
string :asm_email_address
string :stock_status
string :customer_name
time :created_at
time :updated_at
date :completed_date
boolean :archived
boolean :sent_to_digitizer
string :inner_status
end
[/code]

.. and apparently there was some non-standard or multi-byte encoding issue that thrown me the error:

[code language=”html”]
RSolr::Error::Http — 400 Bad Request
Error: <html><head><title>Apache Tomcat/6.0.35 — Error report</title><style><! — H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,s
ans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;colo
r:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color
: black;}A.name {color : black;}HR {color : #525D76;} →</style> </head><body><h1>HTTP Status 400 — Unexpected end of input block in end tag
at [row,col {unknown-source}]: [1,52642]</h1><HR size=”1" noshade=”noshade”><p><b>type</b> Status report</p><p><b>message</b> <u>Unexpected end of input block in end tag
at [row,col {unknown-source}]: [1,52642]</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (Unexpected end of input block in end tag
at [row,col {unknown-source}]: [1,52642]).</u></p><HR size=”1" noshade=”noshade”><h3>Apache Tomcat/6.0.35</h3></body></html>
[/code]

So the solution was to create an initializer (for example in app/initializers/rsols.rb) that will alias the :execute method in order to use my own BINARY encoding. Voila:

[code language=”ruby”]
class RSolr::Connection
def execute_with_binary_encoding(client, request_context)
request_context[:data] = request_context[:data].force_encoding(Encoding::BINARY) if request_context[:data]
execute_without_binary_encoding(client, request_context)
end
alias_method_chain :execute, :binary_encoding
end
[/code]

--

--