Package org.apache.lucene.analysis.ko
Class KoreanTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.ko.KoreanTokenizerFactory
- All Implemented Interfaces:
ResourceLoaderAware
Factory for
KoreanTokenizer.
<fieldType name="text_ko" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KoreanTokenizerFactory"
decompoundMode="discard"
userDictionary="user.txt"
userDictionaryEncoding="UTF-8"
outputUnknownUnigrams="false"
discardPunctuation="true"
/>
</analyzer>
</fieldType>
Supports the following attributes:
- userDictionary: User dictionary path.
- userDictionaryEncoding: User dictionary encoding.
- decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See
KoreanTokenizer.DecompoundMode - outputUnknownUnigrams: If true outputs unigrams for unknown words.
- discardPunctuation: true if punctuation tokens should be dropped from the output.
- Since:
- 7.4.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Stringprivate static final Stringprivate final booleanprivate final KoreanTokenizer.DecompoundModestatic final StringSPI nameprivate static final Stringprivate final booleanprivate static final Stringprivate static final Stringprivate UserDictionaryprivate final Stringprivate final StringFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsConstructorDescriptionDefault ctor for compatibility with SPIKoreanTokenizerFactory(Map<String, String> args) Creates a new KoreanTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryvoidinform(ResourceLoader loader) Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizersMethods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
USER_DICT_PATH
- See Also:
-
USER_DICT_ENCODING
- See Also:
-
DECOMPOUND_MODE
- See Also:
-
OUTPUT_UNKNOWN_UNIGRAMS
- See Also:
-
DISCARD_PUNCTUATION
- See Also:
-
userDictionaryPath
-
userDictionaryEncoding
-
userDictionary
-
mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams -
discardPunctuation
private final boolean discardPunctuation
-
-
Constructor Details
-
KoreanTokenizerFactory
Creates a new KoreanTokenizerFactory -
KoreanTokenizerFactory
public KoreanTokenizerFactory()Default ctor for compatibility with SPI
-
-
Method Details
-
inform
Description copied from interface:ResourceLoaderAwareInitializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
informin interfaceResourceLoaderAware- Throws:
IOException
-
create
Description copied from class:TokenizerFactoryCreates a TokenStream of the specified input using the given AttributeFactory- Specified by:
createin classTokenizerFactory
-