Package org.apache.lucene.analysis.icu
Class ICUNormalizer2CharFilter
java.lang.Object
java.io.Reader
org.apache.lucene.analysis.CharFilter
org.apache.lucene.analysis.charfilter.BaseCharFilter
org.apache.lucene.analysis.icu.ICUNormalizer2CharFilter
- All Implemented Interfaces:
Closeable,AutoCloseable,Readable
Normalize token text with ICU's
Normalizer2.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanprivate intprivate intprivate final StringBuilderprivate booleanprivate final com.ibm.icu.text.Normalizer2private final StringBuilderprivate final CharacterUtils.CharacterBufferFields inherited from class org.apache.lucene.analysis.CharFilter
input -
Constructor Summary
ConstructorsConstructorDescriptionCreate a new Normalizer2CharFilter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)ICUNormalizer2CharFilter(Reader in, com.ibm.icu.text.Normalizer2 normalizer) Create a new Normalizer2CharFilter with the specified Normalizer2ICUNormalizer2CharFilter(Reader in, com.ibm.icu.text.Normalizer2 normalizer, int bufferSize) -
Method Summary
Modifier and TypeMethodDescriptionprivate intnormalizeInputUpto(int length) private intoutputFromResultBuffer(char[] cbuf, int begin, int len) intread(char[] cbuf, int off, int len) private intprivate intprivate intprivate voidprivate voidrecordOffsetDiff(int inputLength, int outputLength) Methods inherited from class org.apache.lucene.analysis.charfilter.BaseCharFilter
addOffCorrectMap, correct, getLastCumulativeDiffMethods inherited from class org.apache.lucene.analysis.CharFilter
close, correctOffsetMethods inherited from class java.io.Reader
mark, markSupported, nullReader, read, read, read, ready, reset, skip, transferTo
-
Field Details
-
normalizer
private final com.ibm.icu.text.Normalizer2 normalizer -
inputBuffer
-
resultBuffer
-
inputFinished
private boolean inputFinished -
afterQuickCheckYes
private boolean afterQuickCheckYes -
checkedInputBoundary
private int checkedInputBoundary -
charCount
private int charCount -
tmpBuffer
-
-
Constructor Details
-
ICUNormalizer2CharFilter
Create a new Normalizer2CharFilter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold) -
ICUNormalizer2CharFilter
Create a new Normalizer2CharFilter with the specified Normalizer2- Parameters:
in- textnormalizer- normalizer to use
-
ICUNormalizer2CharFilter
ICUNormalizer2CharFilter(Reader in, com.ibm.icu.text.Normalizer2 normalizer, int bufferSize)
-
-
Method Details
-
read
- Specified by:
readin classReader- Throws:
IOException
-
readInputToBuffer
- Throws:
IOException
-
readAndNormalizeFromInput
private int readAndNormalizeFromInput() -
readFromInputWhileSpanQuickCheckYes
private int readFromInputWhileSpanQuickCheckYes() -
readFromIoNormalizeUptoBoundary
private int readFromIoNormalizeUptoBoundary() -
normalizeInputUpto
private int normalizeInputUpto(int length) -
recordOffsetDiff
private void recordOffsetDiff(int inputLength, int outputLength) -
outputFromResultBuffer
private int outputFromResultBuffer(char[] cbuf, int begin, int len)
-