Class STUniformSplitTermsWriter
- All Implemented Interfaces:
Closeable,AutoCloseable
UniformSplitTermsWriter by sharing all the fields terms in the same dictionary
and by writing all the fields of a term in the same block line.
The block file contains all the
term blocks for all fields. Each block line, for a single term, may have multiple fields TermState. The block file also contains the fields metadata at the end
of the file.
The dictionary file contains a
single trie (FST bytes) for all fields.
This structure is adapted when there are lots of fields. In this case the shared-terms dictionary trie is much smaller.
This FieldsConsumer requires a custom merge(MergeState, NormsProducer) method for efficiency. The regular merge would scan all the
fields sequentially, which internally would scan the whole shared-terms dictionary as many times
as there are fields. Whereas the custom merge directly scans the internal shared-terms dictionary
of all segments to merge, thus scanning once whatever the number of fields is.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static classprivate classprivate class(package private) final classprivate classprivate static interfaceprivate classprivate class -
Field Summary
Fields inherited from class org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
blockEncoder, blockOutput, DEFAULT_DELTA_NUM_LINES, DEFAULT_TARGET_NUM_BLOCK_LINES, deltaNumLines, dictionaryOutput, fieldInfos, fieldMetadataWriter, MAX_NUM_BLOCK_LINES, maxDoc, postingsWriter, targetNumBlockLines -
Constructor Summary
ConstructorsModifierConstructorDescriptionSTUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder) protectedSTUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder, FieldMetadata.Serializer fieldMetadataWriter, String codecName, int versionCurrent, String termsBlocksExtension, String dictionaryExtension) STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, BlockEncoder blockEncoder) -
Method Summary
Modifier and TypeMethodDescriptionprivate voidcombinePostingsPerField(BytesRef term, Map<String, STUniformSplitTermsWriter.MergingFieldTerms> fieldTermsMap, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap, List<STUniformSplitTermsWriter.MergingFieldTerms> groupedFieldTerms) private voidcombineSegmentsFields(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> groupedSegmentTerms, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap) private List<FieldMetadata> createFieldMetadataList(Iterator<FieldInfo> fieldInfos, int maxDoc) createFieldTermsQueue(Fields fields, List<FieldMetadata> fieldMetadataList) createMergingFieldTermsMap(List<FieldMetadata> fieldMetadataList, int numSegments) createSegmentTermsQueue(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList) private <T> voidgroupByTerm(STUniformSplitTermsWriter.TermIteratorQueue<T> termIteratorQueue, STUniformSplitTermsWriter.TermIterator<T> topTermIterator, List<STUniformSplitTermsWriter.TermIterator<T>> groupedTermIterators) voidmerge(MergeState mergeState, NormsProducer normsProducer) Merges in the fields from the readers inmergeState.private Collection<FieldMetadata> mergeSegments(MergeState mergeState, NormsProducer normsProducer, List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList, STBlockWriter blockWriter, IndexDictionary.Builder dictionaryBuilder) private <T> voidnextTermForIterators(List<? extends STUniformSplitTermsWriter.TermIterator<T>> termIterators, STUniformSplitTermsWriter.TermIteratorQueue<T> termIteratorQueue) voidwrite(Fields fields, NormsProducer normsProducer) Write all fields, terms and postings.protected voidwriteDictionary(int fieldsNumber, IndexDictionary.Builder dictionaryBuilder) private intwriteFieldMetadataList(Collection<FieldMetadata> fieldMetadataList) private voidwritePostingLines(BytesRef term, List<? extends STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.FieldTerms>> groupedFieldTerms, NormsProducer normsProducer, List<FieldMetadataTermState> termStates) private voidwriteSegment(STUniformSplitTermsWriter.SharedTermsWriter termsWriter) Writes the new segment with the providedSTUniformSplitTermsWriter.SharedTermsWriter, which can be either a single segment writer, or a multiple segment merging writer.private Collection<FieldMetadata> writeSingleSegment(Fields fields, NormsProducer normsProducer, STBlockWriter blockWriter, IndexDictionary.Builder dictionaryBuilder) Methods inherited from class org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
close, validateSettings, writeDictionary, writeEncodedFieldsMetadata, writeFieldsMetadata, writeFieldTerms, writePostingLine, writeUnencodedFieldsMetadata
-
Constructor Details
-
STUniformSplitTermsWriter
public STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, BlockEncoder blockEncoder) throws IOException - Throws:
IOException
-
STUniformSplitTermsWriter
public STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder) throws IOException - Throws:
IOException
-
STUniformSplitTermsWriter
protected STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder, FieldMetadata.Serializer fieldMetadataWriter, String codecName, int versionCurrent, String termsBlocksExtension, String dictionaryExtension) throws IOException - Throws:
IOException
-
-
Method Details
-
write
Description copied from class:FieldsConsumerWrite all fields, terms and postings. This the "pull" API, allowing you to iterate more than once over the postings, somewhat analogous to using a DOM API to traverse an XML tree.Notes:
- You must compute index statistics, including each Term's docFreq and totalTermFreq, as well as the summary sumTotalTermFreq, sumTotalDocFreq and docCount.
- You must skip terms that have no docs and fields that have no terms, even though the provided Fields API will expose them; this typically requires lazily writing the field or term until you've actually seen the first term or document.
- The provided Fields instance is limited: you cannot call any methods that return statistics/counts; you cannot pass a non-null live docs when pulling docs/positions enums.
- Overrides:
writein classUniformSplitTermsWriter- Throws:
IOException
-
createFieldMetadataList
-
createFieldTermsQueue
private STUniformSplitTermsWriter.TermIteratorQueue<STUniformSplitTermsWriter.FieldTerms> createFieldTermsQueue(Fields fields, List<FieldMetadata> fieldMetadataList) throws IOException - Throws:
IOException
-
writePostingLines
private void writePostingLines(BytesRef term, List<? extends STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.FieldTerms>> groupedFieldTerms, NormsProducer normsProducer, List<FieldMetadataTermState> termStates) throws IOException - Throws:
IOException
-
writeFieldMetadataList
- Throws:
IOException
-
writeDictionary
protected void writeDictionary(int fieldsNumber, IndexDictionary.Builder dictionaryBuilder) throws IOException - Throws:
IOException
-
merge
Description copied from class:FieldsConsumerMerges in the fields from the readers inmergeState. The default implementation skips and maps around deleted documents, and callsFieldsConsumer.write(Fields,NormsProducer). Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
mergein classFieldsConsumer- Throws:
IOException
-
createMergingFieldTermsMap
private Map<String,STUniformSplitTermsWriter.MergingFieldTerms> createMergingFieldTermsMap(List<FieldMetadata> fieldMetadataList, int numSegments) -
createSegmentTermsQueue
private STUniformSplitTermsWriter.TermIteratorQueue<STUniformSplitTermsWriter.SegmentTerms> createSegmentTermsQueue(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList) throws IOException - Throws:
IOException
-
combineSegmentsFields
private void combineSegmentsFields(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> groupedSegmentTerms, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap) -
combinePostingsPerField
private void combinePostingsPerField(BytesRef term, Map<String, STUniformSplitTermsWriter.MergingFieldTerms> fieldTermsMap, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap, List<STUniformSplitTermsWriter.MergingFieldTerms> groupedFieldTerms)
-