1. refactor search node\_rank with hook\_node\_rank scoring factors: Node module's content search allows four different runtime scoring factors, including keyword relevance, recency and number of comments. This patch replaces the hardcoded scoring factors with a hook that lets any module inject similar scoring factors.
2. Path module should add URL alias to update index in nodapi.: Currently, the URL alias of nodes doesn't play any role in the keyword relevancy of Drupal search. This might be the #1 reason Google still beats us at searching Drupal.org.
3. Add spelling suggestions to the "no search results found" page.: When no search results are found, this patch looks at the query, looks at the words in its index, and uses the Levenshtein algorithm to make a spelling suggestion that might be what you intended to search for.
4. Patch To Add User Profile Search: User search is useless in its current form. Being able to search user profiles would be a huge step forward in making search/user into something special.
5. search_index hardcodes boosts to html elements. Should be configurable.: <h1> gets 25 points, <a> gets 10, and <em> tags get 3; wouldn't it be nice if this were configurable?
6. Exclude node types from search index: Sometimes you don't want certain content types to be indexed. This adds an administration configuration for that case.
7. Optional Exclusion of Taxonomy Vocabulary from Advanced Search: For those of you with HUGE taxonomy vocabularies, this will make the advanced search form usable again.
8. Indexing options for taxonomy: Administrator gains the ability to say how strongly taxonomy terms should weigh in the indexing process. Synonym support included.
9. Add scoring factor controls to advanced search form
The administrator can adjust runtime scoring factors on the site configuration -> search page. Why not let the end user decide how important each scoring factor should be by using the advanced search form?
10. Fix search index link handling for non-existent nodes
Esoteric bug with an RTBC patch (waiting for just one more review) that fixes the case when someone links to a not-yet-created node.
11. Showing result count and result range in search results: This is really a feature request for the pager. Why don't we have something like "Showing 10-20 from 500 results" on our search pages?
12. Replace "blue smurf" in no search results message: The quintessential bike shed argument. What two words should replace "blue smurf"?
Please help review and refine these patches. All of them need SimpleTest cases written as well.





Re: Path module should add URL alias to update index in nodapi
Thanks for your great work on improving search. I wasn't even aware of the patch that adds spelling suggestions to the "no search results found" page. Nice work!
I'm curious:
I'm unsure how this is relevant since very few of the nodes on drupal.org even have URL aliases. Can you explain?
Maybe "among the top reasons" would be more accurate
There are about 3,400 aliases on drupal.org, with a lot of them being project names. I think Google pays close attention to the aliases. More research is needed to prove my hunch, though.
-Rob
That would be interesting to
That would be interesting to learn more about #2 - I would love to manage the google crawler more efficiently. uploading a sitemap to them certainly helps for smaller sites to minimize confusion about URL-aliases, having a solution to do it on large sites would be a great step forward.
re tge H1, H2 etc. [#5 on your list]- well, not sure if it makes sense to implement that, since we are talking about a moving target here. If at all, it should be left to the editor - but again, the effort to keep such a feature up to date would be time consuming - not as a developer, but to keep track of what google algos are doing in their weighting of H1,H2 etc.
Aliases in the index
#2 refers to the fact that when node/42 gets indexed by Drupal, the search index is unaware of its path alias. The alias might be "fancy-widgets", even if the word "widgets" doesn't appear in the body of the node anywhere. One would expect that searching Drupal for "widgets" would find this node, since its URL has the word "widgets" in it. Sadly, this is not the case. The patch in #2 fixes this.
As for the values of H1, H2, etc, there is real value in making these configurable. We don't need to keep track of what Google's algorithms are doing, this is about Drupal site administrators being able to determine how their own search index behaves. Maybe the markup isn't supposed to play as big a role in the scoring on my site. Right now I can't turn the volume of H1 down any... it is always 7X more imporant than <strong>.