added
terms: low-frequency terms are added to a required boolean clause and high-frequency terms are
added to an optional boolean clause. The optional clause is only executed if the required
"low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly
contribute to the document score unless at least one of the low-frequency terms are matched. This
query can improve query execution times significantly if applicable.
CommonTermsQuery has several advantages over stopword filtering at index or query time
since a term can be "classified" based on the actual document frequency in the index and can
prevent slow queries even across domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected floatprotected floatprotected final BooleanClause.Occurprotected floatprotected floatprotected final BooleanClause.Occurprotected final float -
Constructor Summary
ConstructorsConstructorDescriptionCommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency) Creates a newCommonTermsQuery -
Method Summary
Modifier and TypeMethodDescriptionvoidAdds a term to theCommonTermsQueryprotected QuerybuildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms) protected intcalcHighFreqMinimumNumberShouldMatch(int numOptional) protected intcalcLowFreqMinimumNumberShouldMatch(int numOptional) voidcollectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) booleanOverride and implement query instance equivalence properly in a subclass.private booleanequalsTo(CommonTermsQuery other) floatGets the boost used for high frequency terms.floatGets the minimum number of the optional high frequent BooleanClauses which must be satisfied.Gets theBooleanClause.Occurused for high frequency terms.floatGets the boost used for low frequency terms.floatGets the minimum number of the optional low frequent BooleanClauses which must be satisfied.Gets theBooleanClause.Occurused for low frequency terms.floatGets the maximum threshold of a terms document frequency to be considered a low frequency term.getTerms()Gets the list of terms.inthashCode()Override and implement query hash code properly in a subclass.private final intminNrShouldMatch(float minNrShouldMatch, int numOptional) protected QuerynewTermQuery(Term term, TermStates termStates) Builds a new TermQuery instance.rewrite(IndexSearcher indexSearcher) Expert: called to re-write queries into primitive queries.voidsetHighFreqMinimumNumberShouldMatch(float min) Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.voidsetLowFreqMinimumNumberShouldMatch(float min) Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.Prints a query to a string, withfieldassumed to be the default field and omitted.voidvisit(QueryVisitor visitor) Recurse through the query tree, visiting any child queries.Methods inherited from class org.apache.lucene.search.Query
classHash, createWeight, rewrite, sameClassAs, toString
-
Field Details
-
terms
-
maxTermFrequency
protected final float maxTermFrequency -
lowFreqOccur
-
highFreqOccur
-
lowFreqBoost
protected float lowFreqBoost -
highFreqBoost
protected float highFreqBoost -
lowFreqMinNrShouldMatch
protected float lowFreqMinNrShouldMatch -
highFreqMinNrShouldMatch
protected float highFreqMinNrShouldMatch
-
-
Constructor Details
-
CommonTermsQuery
public CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency) Creates a newCommonTermsQuery- Parameters:
highFreqOccur-BooleanClause.Occurused for high frequency termslowFreqOccur-BooleanClause.Occurused for low frequency termsmaxTermFrequency- a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term.- Throws:
IllegalArgumentException- ifBooleanClause.Occur.MUST_NOTis pass as lowFreqOccur or highFreqOccur
-
-
Method Details
-
add
Adds a term to theCommonTermsQuery- Parameters:
term- the term to add
-
rewrite
Description copied from class:QueryExpert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.Callers are expected to call
rewritemultiple times if necessary, until the rewritten query is the same as the original query.The rewrite process may be able to make use of IndexSearcher's executor and be executed in parallel if the executor is provided.
However, if any of the intermediary queries do not satisfy the new API, parallel rewrite is not possible for any subsequent sub-queries. To take advantage of this API, the entire query tree must override this method.
- Overrides:
rewritein classQuery- Throws:
IOException- See Also:
-
visit
Description copied from class:QueryRecurse through the query tree, visiting any child queries. -
calcLowFreqMinimumNumberShouldMatch
protected int calcLowFreqMinimumNumberShouldMatch(int numOptional) -
calcHighFreqMinimumNumberShouldMatch
protected int calcHighFreqMinimumNumberShouldMatch(int numOptional) -
minNrShouldMatch
private final int minNrShouldMatch(float minNrShouldMatch, int numOptional) -
buildQuery
-
collectTermStates
public void collectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) throws IOException - Throws:
IOException
-
setLowFreqMinimumNumberShouldMatch
public void setLowFreqMinimumNumberShouldMatch(float min) Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number>=1as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min- the number of optional clauses that must match
-
getLowFreqMinimumNumberShouldMatch
public float getLowFreqMinimumNumberShouldMatch()Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied. -
setHighFreqMinimumNumberShouldMatch
public void setHighFreqMinimumNumberShouldMatch(float min) Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number>=1as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min- the number of optional clauses that must match
-
getHighFreqMinimumNumberShouldMatch
public float getHighFreqMinimumNumberShouldMatch()Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied. -
getTerms
Gets the list of terms. -
getMaxTermFrequency
public float getMaxTermFrequency()Gets the maximum threshold of a terms document frequency to be considered a low frequency term. -
getLowFreqOccur
Gets theBooleanClause.Occurused for low frequency terms. -
getHighFreqOccur
Gets theBooleanClause.Occurused for high frequency terms. -
getLowFreqBoost
public float getLowFreqBoost()Gets the boost used for low frequency terms. -
getHighFreqBoost
public float getHighFreqBoost()Gets the boost used for high frequency terms. -
toString
Description copied from class:QueryPrints a query to a string, withfieldassumed to be the default field and omitted. -
hashCode
public int hashCode()Description copied from class:QueryOverride and implement query hash code properly in a subclass. This is required so thatQueryCacheworks properly. -
equals
Description copied from class:QueryOverride and implement query instance equivalence properly in a subclass. This is required so thatQueryCacheworks properly.Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical to those of the other instance. Utility methods are provided for certain repetitive code.
-
equalsTo
-
newTermQuery
Builds a new TermQuery instance.This is intended for subclasses that wish to customize the generated queries.
- Parameters:
term- termtermStates- the TermStates to be used to create the low level term query. Can benull.- Returns:
- new TermQuery instance
-