public class CmsSearchSimilarity extends org.apache.lucene.search.similarities.Similarity
computeNorm(FieldInvertState) factor
for the CmsSearchField.FIELD_CONTENT field, while
keeping the Lucene default for all other fields.This implementation was added since apparently the default length norm is heavily biased for small documents. In the default, even if a term is found in 2 documents the same number of times, the smaller document (containing less terms) will have a score easily 3x as high as the longer document. Using this implementation the importance of the term number is reduced.
Inspired by Chuck Williams WikipediaSimilarity.
| Constructor and Description |
|---|
CmsSearchSimilarity()
Creates a new instance of the OpenCms search similarity.
|
| Modifier and Type | Method and Description |
|---|---|
long |
computeNorm(org.apache.lucene.index.FieldInvertState state)
Special implementation for "compute norm" to reduce the significance of this factor
for the
field, while
keeping the Lucene default for all other fields. |
boolean |
getDiscountOverlaps()
Returns true iff overlap tokens are discounted from the document's length.
|
org.apache.lucene.search.similarities.Similarity.SimScorer |
scorer(float boost,
org.apache.lucene.search.CollectionStatistics collectionStats,
org.apache.lucene.search.TermStatistics... termStats) |
void |
setDiscountOverlaps(boolean v)
Sets whether overlap tokens (Tokens with 0 position increment) are
ignored when computing norm.
|
public CmsSearchSimilarity()
public final long computeNorm(org.apache.lucene.index.FieldInvertState state)
CmsSearchField.FIELD_CONTENT field, while
keeping the Lucene default for all other fields.computeNorm in class org.apache.lucene.search.similarities.Similaritypublic boolean getDiscountOverlaps()
setDiscountOverlaps(boolean)public org.apache.lucene.search.similarities.Similarity.SimScorer scorer(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats)
scorer in class org.apache.lucene.search.similarities.SimilaritySimilarity.scorer(float, org.apache.lucene.search.CollectionStatistics, org.apache.lucene.search.TermStatistics[])public void setDiscountOverlaps(boolean v)
v - if true, tokens with position increment 0 are ignored when computing the norm, otherwise they are not ignored.