Class UserDictionary
java.lang.Object
org.apache.lucene.analysis.ja.dict.UserDictionary
- All Implemented Interfaces:
Dictionary
Class for building a User Dictionary. This class allows for custom segmentation of phrases.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final intprivate final String[]private static final int[][]private final TokenInfoFSTstatic final intstatic final intprivate final int[][]static final intFields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate String[]getAllFeaturesArray(int wordId) getBaseForm(int wordId, char[] surface, int off, int len) Get base form of wordprivate StringgetFeature(int wordId, int... fields) getFST()getInflectionForm(int wordId) Get inflection form of tokensgetInflectionType(int wordId) Get inflection type of tokensintgetLeftId(int wordId) Get left id of specified wordgetPartOfSpeech(int wordId) Get Part-Of-Speech of tokensgetPronunciation(int wordId, char[] surface, int off, int len) Get pronunciation of tokensgetReading(int wordId, char[] surface, int off, int len) Get reading of tokensintgetRightId(int wordId) Get right id of specified wordintgetWordCost(int wordId) Get word cost of specified wordint[][]lookup(char[] chars, int off, int len) Lookup words in textint[]lookupSegmentation(int phraseID) static UserDictionaryprivate int[][]toIndexArray(Map<Integer, int[]> input) Convert Map of index and wordIdAndLength to array of {wordId, index, length}
-
Field Details
-
fst
-
segmentations
private final int[][] segmentations -
data
-
CUSTOM_DICTIONARY_WORD_ID_OFFSET
private static final int CUSTOM_DICTIONARY_WORD_ID_OFFSET- See Also:
-
WORD_COST
public static final int WORD_COST- See Also:
-
LEFT_ID
public static final int LEFT_ID- See Also:
-
RIGHT_ID
public static final int RIGHT_ID- See Also:
-
EMPTY_RESULT
private static final int[][] EMPTY_RESULT
-
-
Constructor Details
-
UserDictionary
- Throws:
IOException
-
-
Method Details
-
open
- Throws:
IOException
-
lookup
Lookup words in text- Parameters:
chars- textoff- offset into textlen- length of text- Returns:
- array of {wordId, position, length}
- Throws:
IOException
-
getFST
-
toIndexArray
Convert Map of index and wordIdAndLength to array of {wordId, index, length}- Returns:
- array of {wordId, index, length}
-
lookupSegmentation
public int[] lookupSegmentation(int phraseID) -
getLeftId
public int getLeftId(int wordId) Description copied from interface:DictionaryGet left id of specified word- Specified by:
getLeftIdin interfaceDictionary- Returns:
- left id
-
getRightId
public int getRightId(int wordId) Description copied from interface:DictionaryGet right id of specified word- Specified by:
getRightIdin interfaceDictionary- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId) Description copied from interface:DictionaryGet word cost of specified word- Specified by:
getWordCostin interfaceDictionary- Returns:
- word's cost
-
getReading
Description copied from interface:DictionaryGet reading of tokens- Specified by:
getReadingin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
Description copied from interface:DictionaryGet Part-Of-Speech of tokens- Specified by:
getPartOfSpeechin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Part-Of-Speech of the token
-
getBaseForm
Description copied from interface:DictionaryGet base form of word- Specified by:
getBaseFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getPronunciation
Description copied from interface:DictionaryGet pronunciation of tokens- Specified by:
getPronunciationin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
Description copied from interface:DictionaryGet inflection type of tokens- Specified by:
getInflectionTypein interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
Description copied from interface:DictionaryGet inflection form of tokens- Specified by:
getInflectionFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection form, or null
-
getAllFeaturesArray
-
getFeature
-