Module org.apache.lucene.core
Class Lucene90CompressingTermVectorsWriter
java.lang.Object
org.apache.lucene.codecs.TermVectorsWriter
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsWriter
- All Implemented Interfaces:
Closeable,AutoCloseable,Accountable
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static classprivate classa pending docprivate classa pending field -
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final boolean(package private) static final Stringprivate final intprivate final CompressionModeprivate final Compressor(package private) static final intprivate FieldsIndexWriterprivate final BytesRefprivate int[]private final int(package private) static final intprivate IndexOutputprivate longprivate longprivate longprivate int(package private) static final int(package private) static final intprivate final ByteBuffersDataOutputprivate int[](package private) static final intprivate final Deque<Lucene90CompressingTermVectorsWriter.DocData> (package private) static final intprivate int[]private final ByteBuffersDataOutputprivate final Stringprivate int[]private final ByteBuffersDataOutput(package private) static final String(package private) static final String(package private) static final String(package private) static final Stringprivate IndexOutput(package private) static final int(package private) static final intprivate final BlockPackedWriterFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE -
Constructor Summary
ConstructorsConstructorDescriptionLucene90CompressingTermVectorsWriter(Directory directory, SegmentInfo si, String segmentSuffix, IOContext context, String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift) Sole constructor. -
Method Summary
Modifier and TypeMethodDescriptionaddDocData(int numVectorFields) voidaddPosition(int position, int startOffset, int endOffset, BytesRef payload) Adds a term position and offsetsvoidCalled by IndexWriter when writing new segments.private booleancanPerformBulkMerge(MergeState mergeState, MatchingReaders matchingReaders, int readerIndex) voidclose()private voidcopyChunks(MergeState mergeState, Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub sub, int fromDocID, int toDocID) voidfinish(int numDocs) Called beforeTermVectorsWriter.close(), passing in the number of documents that were written.voidCalled after a doc and all its fields have been added.voidCalled after a field and all its terms have been added.private voidflush(boolean force) private int[]Returns a sorted array containing unique field numbersprivate voidflushFields(int totalFields, int[] fieldNums) private voidflushFlags(int totalFields, int[] fieldNums) private intflushNumFields(int chunkDocs) private voidflushNumTerms(int totalFields) private voidflushOffsets(int[] fieldNums) private voidprivate voidprivate voidprivate voidReturns nested resources of this class.intmerge(MergeState mergeState) Merges in the term vectors from the readers inmergeState.longReturn the memory usage of this object in bytes.voidstartDocument(int numVectorFields) Called before writing the term vectors of the document.voidstartField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads) Called before writing the terms of the field.voidAdds a term and its term frequencyfreq.(package private) booleantooDirty(Lucene90CompressingTermVectorsReader candidate) Returns true if we should recompress this reader, even though we could bulk merge compressed dataprivate booleanMethods inherited from class org.apache.lucene.codecs.TermVectorsWriter
addAllDocVectors, finishTerm
-
Field Details
-
VECTORS_EXTENSION
- See Also:
-
VECTORS_INDEX_EXTENSION
- See Also:
-
VECTORS_META_EXTENSION
- See Also:
-
VECTORS_INDEX_CODEC_NAME
- See Also:
-
VERSION_START
static final int VERSION_START- See Also:
-
VERSION_CURRENT
static final int VERSION_CURRENT- See Also:
-
META_VERSION_START
static final int META_VERSION_START- See Also:
-
PACKED_BLOCK_SIZE
static final int PACKED_BLOCK_SIZE- See Also:
-
POSITIONS
static final int POSITIONS- See Also:
-
OFFSETS
static final int OFFSETS- See Also:
-
PAYLOADS
static final int PAYLOADS- See Also:
-
FLAGS_BITS
static final int FLAGS_BITS -
segment
-
indexWriter
-
metaStream
-
vectorsStream
-
compressionMode
-
compressor
-
chunkSize
private final int chunkSize -
numChunks
private long numChunks -
numDirtyChunks
private long numDirtyChunks -
numDirtyDocs
private long numDirtyDocs -
numDocs
private int numDocs -
pendingDocs
-
curDoc
-
curField
-
lastTerm
-
positionsBuf
private int[] positionsBuf -
startOffsetsBuf
private int[] startOffsetsBuf -
lengthsBuf
private int[] lengthsBuf -
payloadLengthsBuf
private int[] payloadLengthsBuf -
termSuffixes
-
payloadBytes
-
writer
-
maxDocsPerChunk
private final int maxDocsPerChunk -
scratchBuffer
-
BULK_MERGE_ENABLED_SYSPROP
-
BULK_MERGE_ENABLED
static final boolean BULK_MERGE_ENABLED
-
-
Constructor Details
-
Lucene90CompressingTermVectorsWriter
Lucene90CompressingTermVectorsWriter(Directory directory, SegmentInfo si, String segmentSuffix, IOContext context, String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift) throws IOException Sole constructor.- Throws:
IOException
-
-
Method Details
-
addDocData
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classTermVectorsWriter- Throws:
IOException
-
startDocument
Description copied from class:TermVectorsWriterCalled before writing the term vectors of the document.TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean)will be callednumVectorFieldstimes. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this casenumVectorFieldswill be zero.- Specified by:
startDocumentin classTermVectorsWriter- Throws:
IOException
-
finishDocument
Description copied from class:TermVectorsWriterCalled after a doc and all its fields have been added.- Overrides:
finishDocumentin classTermVectorsWriter- Throws:
IOException
-
startField
public void startField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads) throws IOException Description copied from class:TermVectorsWriterCalled before writing the terms of the field.TermVectorsWriter.startTerm(BytesRef, int)will be callednumTermstimes.- Specified by:
startFieldin classTermVectorsWriter- Throws:
IOException
-
finishField
Description copied from class:TermVectorsWriterCalled after a field and all its terms have been added.- Overrides:
finishFieldin classTermVectorsWriter- Throws:
IOException
-
startTerm
Description copied from class:TermVectorsWriterAdds a term and its term frequencyfreq. If this field has positions and/or offsets enabled, thenTermVectorsWriter.addPosition(int, int, int, BytesRef)will be calledfreqtimes respectively.- Specified by:
startTermin classTermVectorsWriter- Throws:
IOException
-
addPosition
public void addPosition(int position, int startOffset, int endOffset, BytesRef payload) throws IOException Description copied from class:TermVectorsWriterAdds a term position and offsets- Specified by:
addPositionin classTermVectorsWriter- Throws:
IOException
-
triggerFlush
private boolean triggerFlush() -
flush
- Throws:
IOException
-
flushNumFields
- Throws:
IOException
-
flushFieldNums
Returns a sorted array containing unique field numbers- Throws:
IOException
-
flushFields
- Throws:
IOException
-
flushFlags
- Throws:
IOException
-
flushNumTerms
- Throws:
IOException
-
flushTermLengths
- Throws:
IOException
-
flushTermFreqs
- Throws:
IOException
-
flushPositions
- Throws:
IOException
-
flushOffsets
- Throws:
IOException
-
flushPayloadLengths
- Throws:
IOException
-
finish
Description copied from class:TermVectorsWriterCalled beforeTermVectorsWriter.close(), passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls toTermVectorsWriter.startDocument(int), but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.- Specified by:
finishin classTermVectorsWriter- Throws:
IOException
-
addProx
Description copied from class:TermVectorsWriterCalled by IndexWriter when writing new segments.This is an expert API that allows the codec to consume positions and offsets directly from the indexer.
The default implementation calls
TermVectorsWriter.addPosition(int, int, int, BytesRef), but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example.NOTE: This API is extremely expert and subject to change or removal!!!
- Overrides:
addProxin classTermVectorsWriter- Throws:
IOException
-
copyChunks
private void copyChunks(MergeState mergeState, Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub sub, int fromDocID, int toDocID) throws IOException - Throws:
IOException
-
merge
Description copied from class:TermVectorsWriterMerges in the term vectors from the readers inmergeState. The default implementation skips over deleted documents, and usesTermVectorsWriter.startDocument(int),TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean),TermVectorsWriter.startTerm(BytesRef, int),TermVectorsWriter.addPosition(int, int, int, BytesRef), andTermVectorsWriter.finish(int), returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
mergein classTermVectorsWriter- Throws:
IOException
-
tooDirty
Returns true if we should recompress this reader, even though we could bulk merge compressed dataThe last chunk written for a segment is typically incomplete, so without recompressing, in some worst-case situations (e.g. frequent reopen with tiny flushes), over time the compression ratio can degrade. This is a safety switch.
-
canPerformBulkMerge
private boolean canPerformBulkMerge(MergeState mergeState, MatchingReaders matchingReaders, int readerIndex) -
ramBytesUsed
public long ramBytesUsed()Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal. -
getChildResources
Description copied from interface:AccountableReturns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- See Also:
-