I am a Permanent Member (or Community Ambassador) of the Drupal Association.Drupal's file handling capabilities keep getting better. Beyond the core upload module, the filefield module for CCK has enabled us to build sites with all sorts of files; documents, images, music, videos, and so forth. Searching within these docuements, however, has never been a common feature on Drupal sites. Some solutions have existed, particularly for extracting texts from PDFs and common wordprocessing documents. With Apache Solr, the attachments module, and an extension library called Tika, things can be much better. With Tika you can extract texts not only from Microsoft Office, Open Office, and PDF documents, you can also get text and metadata from images, songs, Flash movies and zipped archives. Searching for these texts is done as part of the normal Apache Solr driven site search.
How refreshing it is when the US Government does something so right that my chest swells with pride and my heart fills with hope! How exciting it is that Drupal and Acquia can play a role in bringing openness to government. I’m referring to the Open Government Directive (OGD), an instruction from the President of the United States to all federal agencies to drastically change the way that government talks to and shares information with the public.
The OGD is predicated on three principles: transparancy, participation, and collaboration. Here I quote from the President’s website:
Transparency promotes accountability by providing the public with information about what the Government is doing. Participation allows members of the public to contribute ideas and expertise so that their government can make policies with the benefit of information that is widely dispersed in society. Collaboration improves the effectiveness of Government by encouraging partnerships and cooperation within the Federal Government, across levels of government, and between the Government and private institutions.
On top of these principles is a clarification of the baseline position that government agencies are to adopt, in particluar regarding the Freedom of Information Act (FOIA). Government agencies are instructed to presume openess and disclosure.
These instructions also come with teeth. There are concrete steps to be taken, and deadlines. Hard deadlines that are coming up fast. Each agency was given 45 days to identify three sets of data that had never been released before, and to make them available on Data.gov. In the same time agencies are to appoint a highlevel senior official to oversee the “quality and objectivity of, and internal controls over, the Federal spending information publicly disseminated through such public venues as USAspending.gov or other similar websites.” Those are big changes in a short time.
Within 60 days, agencies are to go even further, and launch portals on their websites, http://www.[agency].gov/open, which will keep the public informed of all their activity and efforts pertaining to the OGD. This also pertains to information and policies regarding the FOIA. There are also 60 and 120 day deadlines for the communication of frameworks for making Open Government systemic to the very DNA of government agencies. Changes include a “longer-term comprehensive strategy for Federal spending transparency”.
One particular goal, to be met in the first 45 days, summarizes the attitude towards technology and public participation:
[provide] a forum to share best practices on innovative ideas to promote participation and collaboration, including how to experiment with new technologies, take advantage of the expertise and insight of people both inside and outside the Federal Government, and form high-impact collaborations with researchers, the private sector, and civil society.
This is essentially saying that the government wants to collaborate with the public on the difficult issues facing us, and that modern technologies and tools (such as Drupal’s social publishing and collaboration tools) should be used where appropriate if they further the goal of fostering collaboration.
All of this is like fresh air. I work with open source software specifically because I believe in the value of transparancy, participation, and collaboration. The Drupal project is a shining example of what humans can achieve when they work together in this way. I am also thrilled that Acquia is already deeply involved in helping government agencies realize these goals, and that the tool that many are looking to as an Open Government Directive enabler, is Drupal. Acquia has a new Government JumpStart program, a whitepaper on Social Publising for Government, and an exciting partnership with Carahsoft to guarantee that we can meet the urgent needs of agencies in the throes of change.
Read more about Acquia’s OGD offerings and partnerships. Dana Blankenhorn covered Acquia, Drupal and OGD on ZDNet. Dries and Kieran have both written about Drupal and OGD on Acquia.com.
Edit: I totally had the dates wrong for the Drupal Developer Days in Munich - they’re May 5-7 :P
The Drupal-Initiative e.V. is a German non-profit organization dedicated to driving the growth of Drupal in German speaking countries. Yesterday was an important milestone for this group as we had our first ever planning sprint that was open to the public. The sprint was in Essen, in a neat location called the Unperfekthaus (the imperfect house), and we had 17 people from across Germany who came to help us plan, divide the work, take on responsibility, and breathe life into an organization which has until this point been the work of 5-6 individuals.
The history of the Drupal-Initiative starts in mid 2008 when members of the Cologne-Bonn Drupal Users Group decided to host a DrupalCamp in Cologne. We knew that we’d need some structures in place, such as a bank account, a website, and so forth. We also had our eyes on a bid to host the international DrupalCon in Germany at some point, so we thought that building a strong, formal entity, such as a non-profit organization, was the best way to achieve these goals.
Some months later, in January, 2009, we hosted DrupalCamp Cologne which saw over 200 people show up - a great success. Shortly after that we actually finished the legal proceedings, including becoming an e.V. (eingetragener Verein), and securing a license from Dries Buytaert to use the word “Drupal” in our name. Eventually we launched a membership program and have been quietly growing, with both individual and corporate memberships, ever since.
The planning sprint in Essen was the first time that we’d set out to involve our current members beyond the founding core, and to raise the level of awareness about what we do, and most importantly, how people can help. Some of the results of the sprint include:
Special thanks to Daniel Niehaus for taking the lead with organizing the sprint. When you meet Daniel, make sure to ask him why his nickname is Jack Plain - it’s a worthy story =)
Drush and Drush Make belong in every Drupal developer's toolkit. This is a make file that will build the following:
This demonstrates four different methods for downloading code, from three different sources.
To execute, install Drush, install Drush Make, then run the attached file like this:
drush make search.make
To specify an output directory, add the path as an extra parameter:
drush make search.make /var/log/www
To see what's going on, use the -v flag for verbose output:
drush -v make search.make
Here's the content of the make file:
core = "6.x"
projects[] = "drupal"
projects[] = "acquia_connector"
projects[apachesolr][download][type] = "cvs"
projects[apachesolr][download][module] = "contributions/modules/apachesolr"
projects[apachesolr][download][revision] = "DRUPAL-6--2"
projects[acquia_search][type] = "module"
projects[acquia_search][download][type] = "svn"
projects[acquia_search][download][url] = "https://svn.acquia.com/drupal/branches/1.x-6.x/modules/acquia/acquia_search/"
libraries[SolrPhpClient][download][type] = "get"
libraries[SolrPhpClient][download][url] = "http://solr-php-client.googlecode.com/files/SolrPhpClient.r22.2009-11-09.tgz"
libraries[SolrPhpClient][destination] = "modules/apachesolr"
On October 22, 2009, I gave a keynote presentation at a digital marketing conference in Brussels. After my speech I was interviewed by the organizers of the conference. Here is the video. Note that my grandmother is actually 88, and really is on Facebook.
This month's issue of the print magazine and website PHP User (in German) features an article on Drupal's CCK and Views module. The article was written by Meinolf Droste of MDWP, an Acquia Silver Partner.
There are at least four magazines in Germany that sometimes feature Drupal in print articles. This puts Drupal onto the news stands in kiosks, grocery stores, and train stations throughout Germany. It's nice to see Acquia Partners taking such an active role in promoting Drupal. Great work!
Yesterday I gave a keynote presentation at the Digital Marketing First 09 trade show in Brussels, Belgium. Drupal was out in full force with four Belgian companies joining forces to make the conference a special Drupal-themed event. There were also a number of other companies present who are using Drupal.
To prepare for the event I made a micro-site that focuses on Drupal and interactive digital marketing (the theme of the conference). It features a directory of the companies that were present and some case studies about how Drupal is the ultimate integration platform for anyone who offers an online service or tool.
The DrupalVillage.be companies that were present:
Also present were:
Special thanks to ICanLocalize.com for translating a portion of the micro-site content into Dutch and French.
My presentation was very well attended. The slides are below. The take away for me was that Drupal is a great tool for people doing digital marketing, and Drupal people should be attending marketing conferences (and vice versa).
See the update at the bottom!
Drupal’s pagers are neat, and when they were first developed, were way ahead of their time. They also have a couple problems. One of them is scalability. When you’ve got 10,000,000 somethings, calculating how many pages there are so that you can skip to the last one is time consuming.
Another limitation is that the pager is designed to page over a database query. The Apache Solr Search module uses Drupal pagers to move through pages of search results that come from Solr. The pseudo code for getting this to work looks like this:
<?php
// What result do we want to start on?
$offset = $page * $number_per_page;
// How many search results are there in total?
$total = $result->get_total();
// Send a very simple database query to the pager system to trick it.
pager_query("SELECT %d", $offset, 0, NULL, $total);
// Magic happens here. A pager appears.
$output .= theme('pager');
?>That’s great! But I recently had a case where it was impossible to tell how many results there are in the total set. What is really needed is the ability to advance the pager until there aren’t any more pages, but Drupal doesn’t support anything like this by default. Twitter does it, but Drupal… meh. It’s sneaky time!
<?php
// Remember: http://is.gd/3Sf9Z
$total = 0;
// Note that $page is zero based (page zero is the first page).
// Note also that count($results) is just one page's worth of results,
// not the entire possible set (which is impossible to calculate).
// If there are fewer results than what we want to show per page,
// we know we've come to the end of the result set, and don't need
// to show any more pages.
if (count($results) < $number_per_page) {
$total = $number_per_page * ($page + 1);
}
// Otherwise, we want to tell the pager to give us yet another
// page to go to.
else {
$total = $number_per_page * ($page + 2);
}
// Now the pager will either end where we are, or add one
// more page to the end. This way you can keep advancing one
// more page until there are no more results left.
pager_query("SELECT %d", $number_per_page, 0, NULL, $total);
$output .= theme('pager');
?>This strategy could be applied to both of the problem cases I mentioned above. If you have a HUGE result set and need a pager, and don’t want to destroy your database, this is a viable technique. It also works if you’re getting your results from a source that can’t tell you how many results there are in total. And it’s sneaky. Enjoy.
Update:
Instead of using a database query to manipulate the page you can manipulate the globals instead:
$GLOBALS[‘pager_page_array’][] = 1; //what page you are on
$GLOBALS[‘pager_total’][] = 3; // total number of pages
$items_per_page = 50;
print theme(‘pager’, NULL, $items_per_page);
Thanks Chx for the tip!
For the last six months, Scott Reynolds has been keeping a big juicy secret. As the maintainer of the Apache Solr Views module, he knows just how cool the future of Drupal Search is going to be. His module, based on an idea and code from Thomas Seidl, lets you make custom searches against the Solr index the same way you currently make views against the MySQL database. Want to build a search that just includes videos and MP3s, and renders the results as a playlist? Or how about a search that is limited to the current user's images, displayed in a slideshow? How about a block that shows the latest results that contain the phrase "badgers are the new pony"? Well, even if you didn't want a block like that, with Views 3 and Apache Solr Views, you can have it.
Thomas Siedl's brilliant idea was that Views should be able to build "queries" against any data source, not just databases. Earl Miles agreed, and inagurated the Views 3 branch by commiting the patch by Thomas (with great help from Jeff Miccolis and others). With Views 3 I predict you'll be able to build Views using data from Flickr, or from RDF databases using SPARQL, or from the local file system, or from any other data source that has an API.
To test it all out I used the Acquia Drupal Stack to create a new site (I just love the stack's multisite functionality!). I then signed up for a trial Acquia Network subscription because I wanted to get my hands on 30 days of free Acquia Search (it's easier than setting up Solr myself). I then downloaded Views 3 and Apache Solr (DRUPAL-6--2, just for fun. DRUPAL-6--1 works, too). I had to get the Apache Solr Views module from CVS (Scott, make a devel release!). I put these in sites/all/modules so that they'd override the versions in the Acquia Drupal Stack.
The CVS command for getting Apache Solr Views
$ cvs -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib \ co -d apachesolr_views contributions/modules/apachesolr_views
I installed Apache Solr manually which means I also needed to get the SolrPhpClient library. Since I have Drush, and since Apache Solr DRUPAL-6--2 has Drush integration, I did it like this:
$ drush solr phpclient
I <3 Drush!
I then used FeedAPI to grab all sorts of content from Planet Drupal. I could have just as well used Drush and the Devel module to generate some content, but lorem ipsum gets mighty boring. Finally I used Drush to run cron and even did a search (from the command line!) to check that the content was in the index.
$ drush cron $ # wait a few minutes for the search index to commit the changes... $ drush solr search drupal node/175 by admin (user/1) title: Agile and Scrum Videos This is likely to become a pretty big collection of videos about Scrum and other other Agile based managements processes. (Drupal 5, Drupal 6, Drupal 7, Drupal Planet, Drupal Video) ... node/1 by admin (user/1) title: Welcome to your new Acquia Drupal website! If you are new to Drupal, follow these steps to set up your web site in minutes: Step 1 ... , forums, polls, tags, comments, ratings, and more. Acquia Drupal comes with many modules to power social publishing capabilities on your site. Hundreds of additional Drupal 6.x compatible modules ...
Now for the good stuff. When you make a new view in Views 3 you get asked what data source to use. Here you can see that I use the Apache Solr search index as a data source.
Then I added some fields. These are not the same fields that are available to node based views. They are specific to the underlying data source.
I also added a sort so that the results would be displayed according to the search score (keyword relevance).
In order to make this view seem like a "search" screen, it needs a search box, right? You get that by adding a search filter and exposing it. I could add more filters, too, like a filter to limit it to just one content type.
This shouldn't just be a copy of the normal search screen. The results should look different. To that end I told Views to render the results in a table.
Since we want this to be a page view it needs a path, and I went ahead and stuck it in the menu as well.
Finally, I want to be able to use Apache Solr's facet blocks along with the view. This is a three-step process.
It tastes great! Feast your eyes on this marvelous search screen.
The keyword search and the facet block interact seamlessly.
An interesting point to note is that there are no database queries used in retrieving the data or displaying it. No complex views query with lots of joins, and no node_load() calls for displaying the results. This method of querying Solr is just as efficient as using the normal Apache Solr search module.
To my mind, Views 3 and Apache Solr Views are the future of Solr search for Drupal. Even though they are both in heavy development, you can try them out and enjoy the great control you have over your search experience. There are many more handlers that need writing, too, so jump into the Apache Solr Views issue queue and help out. Since it all works with Acquia Drupal and Acquia Search, you can easily get up and running using an Acquia subscription. Enjoy!