This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together. For example, if you group by the author field, then all documents with the same value in the author field fall into a single group.
Grouping requires a number of inputs:
The implementation is two-pass: the first pass ({@link org.apache.lucene.search.grouping.FirstPassGroupingCollector}) gathers the top groups, and the second pass ({@link org.apache.lucene.search.grouping.SecondPassGroupingCollector}) gathers documents within those groups. If the search is costly to run you may want to use the {@link org.apache.lucene.search.CachingCollector} class, which caches hits and can (quickly) replay them for the second pass. This way you only run the query once, but you pay a RAM cost to (briefly) hold all hits. Results are returned as a {@link org.apache.lucene.search.grouping.TopGroups} instance.
Known limitations:
Typical usage looks like this (using the {@link org.apache.lucene.search.CachingCollector}):
FirstPassGroupingCollector c1 = new FirstPassGroupingCollector("author", groupSort, groupOffset+topNGroups);
boolean cacheScores = true;
double maxCacheRAMMB = 4.0;
CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB);
s.search(new TermQuery(new Term("content", searchTerm)), cachedCollector);
Collection topGroups = c1.getTopGroups(groupOffset, fillFields);
if (topGroups == null) {
// No groups matched
return;
}
boolean getScores = true;
boolean getMaxScores = true;
boolean fillFields = true;
SecondPassGroupingCollector c2 = new SecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset+docsPerGroup, getScores, getMaxScores, fillFields);
//Optionally compute total group count
AllGroupsCollector allGroupsCollector = null;
if (requiredTotalGroupCount) {
allGroupsCollector = new AllGroupsCollector("author");
c2 = MultiCollector.wrap(c2, allGroupsCollector);
}
if (cachedCollector.isCached()) {
// Cache fit within maxCacheRAMMB, so we can replay it:
cachedCollector.replay(c2);
} else {
// Cache was too large; must re-execute query:
s.search(new TermQuery(new Term("content", searchTerm)), c2);
}
TopGroups groupsResult = c2.getTopGroups(docOffset);
if (requiredTotalGroupCount) {
groupResult = new TopGroups(groupsResult, allGroupsCollector.getGroupCount());
}
// Render groupsResult...