Static searches with Drupal and Lunr
First, I needed to figure out how to get data from Drupal into Lunr. Existing search modules in Drupal like Search API provide a lot of customization for indexing, but also rely on there being a Drupal backend at runtime. It felt awkward to throw users into the Search API ecosystem with the caveat that most of the functionality they get using core or Solr search would not be available. I eventually landed on using Views, with a custom display, row, and style plugin that output JSON in a very similar way to the “REST export” plugin. Views has so much built in for data transformation, it seemed like a natural fit for this use case and was already accessible to site builders.
The Views plugins I wrote output pages of arrays of objects, where every object property is either a field to index in Lunr or a field to be displayed to end users. A typical setup may index the title and body fields, then have a html field that contains the rendered HTML of an entity view mode. A ref field is automatically added in the format page:index, so that when Lunr returns search results there’s a unique key to use to retrieve the display field.
With the data source figured out, it’s time to index our content. In a simple Lunr setup, the client will load all the source documents, create an index, and then perform a search against that index every time a search session is started. For a simple implementation with few documents this could be acceptable, but since this is all happening on the client it doesn’t scale far. Luckily, Lunr allows indexes to be pre-built and exported to JSON, then loaded in the client without needing the client to load any source documents.
The settings required to configure the indexing behavior needed to be stored somewhere, so I created a custom entity type that is a combination of what would normally be separate server, index, and view entities. This custom entity, “Lunr search”, has a form that looks like this:
Now that the pieces were in place, it was time to write the real indexing behavior. Indexing currently works like this:
- Indexing is initiated by an administrator in the UI.
- A Lunr Builder instance is created with the settings from the Lunr search entity.
- Every view row is indexed as a Lunr document.
- The configured “Display field” is stored in a separate array of objects, which stores the ref field and the display field.
- When a page is finished, the display document is uploaded to the server.
- Additional requests are made to page through the view.
- When there are no other pages to index, the builder exports JSON that is uploaded to the server.
- If there are any other installed languages, the entire process is repeated for each language. This allows for language specific indexes to be loaded by the client.
If it sounds complicated, it’s because it really is! But having Drupal do this work before hand means clients only have to load the compiled index to start searching.
But what about big indexes? A few people had asked me about performance and scale, and all my tests in the 100-1000 document range had worked great, but I wanted to see what practical limits Lunr had. I spun up a site and created 10,000 nodes, which were able to index fine, but after the client downloaded the huge 35mb index the main thread would noticeably lock up for a few seconds. Once loaded searches were surprisingly fast, but the main performance issue was that first interaction.
With all the deep details out of the way, here’s how searching currently works:
- The user loads the Lunr search page.
- A web worker is created.
- The web worker loads the Lunr library, and makes an AJAX request for the index file and loads it.
- The user enters a search query, or a query string is provided.
- The window history is updated as well as the URL.
- A message is sent to the web worker.
- The worker uses the Lunr to perform the query.
- If any field searches (aka facets) are present, another query is performed and the results are merged.
- The worker sends a message back to the client with the search results, which are just an array of reference (ref) IDs.
- The client makes additional AJAX calls based on the results to get the user-facing HTML from the display document JSON. Typically this is less than one call per result as many results may be in the same document.
- The results are displayed to the client, along with pagers if needed.
Phew! I feel like writing all this out makes the Drupal integration seem really complex, but I think that’s just because I’ve tried to provide something that’s actually viable as a replacement for core search. Running a Drupal static site means losing the backend, and it’s easy to take out of the box features like search for granted. I’m just glad that this project worked out and is usable - now there’s an actual answer for “How do you do static search?” that doesn’t involve me hand-waving!
In conclusion, the Drupal Lunr integration is ready for testing and use - the project is called “Lunr search” and can be downloaded here. There are no composer dependencies and a default search is provided out of the box, so if you enable it and kick off indexing at "/admin/config/lunr_search/default/index", you should be able to do a test search at
/search. I know I’ve talked a bit about static sites in this post, but I think the Lunr module is useful for all sorts of sites. Having your search be static means less uncacheable hits to your backend, and potentially a faster experience for users, so please try it out if you’re interested!