Class HyphenationCompoundWordTokenFilterFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory
- All Implemented Interfaces:
ResourceLoaderAware
public class HyphenationCompoundWordTokenFilterFactory
extends TokenFilterFactory
implements ResourceLoaderAware
Factory for
HyphenationCompoundWordTokenFilter.
This factory accepts the following parameters:
hyphenator(mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.encoding(optional): encoding of the xml hyphenation file. defaults to UTF-8.dictionary(optional): dictionary of words. defaults to no dictionary.minWordSize(optional): minimal word length that gets decomposed. defaults to 5.minSubwordSize(optional): minimum length of subwords. defaults to 2.maxSubwordSize(optional): maximum length of subwords. defaults to 15.onlyLongestMatch(optional): if true, adds only the longest matching subword to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
</analyzer>
</fieldType>- Since:
- 3.1.0
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final Stringprivate CharArraySetprivate final Stringprivate final Stringprivate HyphenationTreeprivate final intprivate final intprivate final intstatic final StringSPI nameprivate final booleanprivate final booleanprivate final booleanFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsConstructorDescriptionDefault ctor for compatibility with SPICreates a new HyphenationCompoundWordTokenFilterFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate(TokenStream input) Transform the specified input TokenStreamvoidinform(ResourceLoader loader) Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFiltersMethods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
dictionary
-
hyphenator
-
dictFile
-
hypFile
-
encoding
-
minWordSize
private final int minWordSize -
minSubwordSize
private final int minSubwordSize -
maxSubwordSize
private final int maxSubwordSize -
onlyLongestMatch
private final boolean onlyLongestMatch -
noSubMatches
private final boolean noSubMatches -
noOverlappingMatches
private final boolean noOverlappingMatches
-
-
Constructor Details
-
HyphenationCompoundWordTokenFilterFactory
Creates a new HyphenationCompoundWordTokenFilterFactory -
HyphenationCompoundWordTokenFilterFactory
public HyphenationCompoundWordTokenFilterFactory()Default ctor for compatibility with SPI
-
-
Method Details
-
inform
Description copied from interface:ResourceLoaderAwareInitializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
informin interfaceResourceLoaderAware- Throws:
IOException
-
create
Description copied from class:TokenFilterFactoryTransform the specified input TokenStream- Specified by:
createin classTokenFilterFactory
-