Understanding the ApacheSolr CCK API
Tagged:  •    •    •    •    •    •  

In this article I will show you how you can write a tiny bit of code that will reveal new fields and facets for searching with the ApacheSolr module and Acquia Search. Using Acquia Drupal we’ll write an example module that takes the file type from CCK file and image fields and makes them into their own search fields. This results in us being able to filter our search results based on file type. This code fulfils the situation where you want, for example, to find a specific post that has a JPEG image, or all of the posts with PDFs that match a particular keyword.

To start you may want to download the PDF file of screenshots that trace all of the steps I took to set up Acquia Drupal, Acquia Search, and the custom module. The broad steps are to:

  1. Sign up for a free trial.
  2. Download and install Acquia Drupal.
  3. Follow along with the code examples below to create the example.module.
  4. Set up a content type to have image and file fields. Create some content and upload a variety of files.
  5. Run cron to make sure your content has been indexed.
  6. Enable the new filters and blocks that the example.module is responsible for having created.
  7. Search!

What are facets?

The ApacheSolr module is a revolution in Drupal search. It allows you to search for the most general keyword that applies to what you’re looking for, and then use the provided facet links to drill down to exactly the right content. An example would be searching for “Drupal search” on Drupal.org, then filtering by project and robertDouglass to get just the modules that I have written that deal with search. Facets, in other words, are the better version of advanced search forms, which, to be honest, suck.

What facets are available?

By default, ApacheSolr makes facets available for content type, author, language, taxonomy terms, and all CCK fields that are text fields with option widgets (select, radio, checkbox). This article assumes that you want some different facets, and you’re using CCK fields. The file field and image field both have some interesting information that would make a great facet - their file type. Every file you upload has a distinct type: pdf, png, gif, tiff, doc, and so forth. Wouldn’t it be nice to have this available as a facet? Of course!

What needs to happen to add new facets?

The ApacheSolr module comes equipped with an API for extending what gets indexed and how searching works. One of the important hooks in this API is hook_apachesolr_cck_field_mappings(). We’re going to write a module that implements this hook, and use it to tell ApacheSolr how to make facets out of the file type on file and image fields.

To do this the hook is only going to have to tell ApacheSolr three things:

  1. What data type should be used in the index.
  2. What CCK widget types to be looking for during indexing.
  3. A callback function to use for extracting the data from the CCK field. We write this function ourselves.

The callback function that we write will then receive each node and each field name as they are being indexed. From that it must extract or generate whatever information interests us. In this case we’re just extracting the file type, which is already present in the field. We could, however, return any amount of data doing any arbitrary processing that we care to. See the code example below to understand the structure of the array that the callback has to return.

The example module implementing hook_apachesolr_cck_field_mappings()

The first step in writing any module is to creat an .info file. Here’s ours:

; file example/example.info
name = example
description = Example module showing custom CCK facets.
core = 6.x

The next step is to have a module file. This is the example/example.module file:

<?php
/**
* Implementation of hook_apachesolr_cck_field_mappings
*/
function example_apachesolr_cck_field_mappings() {
 
$mappings = array();
 
// 'filefield' is the CCK field_type. Correlates to $field['field_type']
 
$mappings['filefield'] = array(
   
// The callback function gets called at indexing time to get the values.
   
'callback' => 'example_callback',
   
// Common types are 'text', 'string', 'integer', 'double', 'float', 'date', 'boolean'
   
'index_type' => 'string',
   
// These are the CCK formatting widgets for which this mapping applies.
    // If we wanted to target images but not generic files, for example,
    // we could say 'filefield_widget' => FALSE
   
'widget_types' => array('filefield_widget' => TRUE, 'imagefield_widget' => TRUE),
  );
  return
$mappings;
}

/**
* A function that gets called during indexing.
* @node The current node being indexed
* @fieldname The current field being indexed
*
* @return an array of arrays. Each inner array is a value, and must be
* keyed 'safe' => $value
*/
function example_callback($node, $fieldname) {
 
$fields = array();
  foreach (
$node->$fieldname as $field) {
   
// In this case we are indexing the filemime type. While this technically
    // makes it possible that we could search for nodes based on the mime type
    // of their file fields, the real purpose is to have facet blocks during
    // searching.
   
$fields[] = array('safe' => check_plain($field['filemime']));
  }
  return
$fields;
}
?>

The example_apachesolr_cck_field_mappings() function returns an array that says “for any filefield CCK fields (this includes imagefields), use the function example_callback() while indexing, store the data as strings, and apply these instructions to filefield_widgets and imagefield_widgets”. The example_callback(), a function we specified as a callback, will get called with the $node and $fieldname during indexing. We use that information to dig around and get the file type, which is is found in $field['filemime']. Important note: the return value of the callback is an array of arrays. The inner arrays have one key, 'safe', and that key’s value is the actual value we want to be indexed and used for faceting. The “safe” name of the key is there to remind you, as the developer, not to allow any cross site scripting, and please sanitize the value.

Below are screenshots of the return values of each of these functions for those who learn visually.

mappings

filetype facets

Results

Now when we search, we have two new facet blocks available letting us drill down into the search results based on the type of files that are uploaded to each one. Not bad for 15 lines of code (excluding comments)!

Searching using file type facets

Login or register to tag items
AttachmentSize
apachesolr_cck.pdf2.97 MB

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote>
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Lines and paragraphs break automatically.

More information about formatting options