java.lang.Object
org.apache.lucene.util.fst.Util
Static helper methods.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classRepresents a path in TopNSearcher.static final classHolds a single input (IntsRef) + output, returned byshortestPaths().private static classCompares first by the provided comparator, and then tie breaks by path.input.static classUtility class to find top N shortest paths from start point(s).static final classHolds the results for a top N search usingUtil.TopNSearcher -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) static <T> intbinarySearch(FST<T> fst, FST.Arc<T> arc, int targetLabel) Perform a binary search of Arcs encoded as a packed arrayprivate static voidEmit a single state in thedotlanguage.static <T> TLooks up the output for this input, or null if the input is not acceptedstatic <T> TLooks up the output for this input, or null if the input is not accepted.private static StringprintableLabel(int label) Ensures an arc's label is indeed printable (dot uses US-ASCII).static <T> FST.Arc<T> readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise returnnull.static <T> Util.TopResults<T> shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString) Starting from node, find the top N min cost completions to a final node.static BytesReftoBytesRef(IntsRef input, BytesRefBuilder scratch) Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.static <T> voidDumps anFSTto a GraphViz'sdotlanguage description for visualization.static IntsReftoIntsRef(BytesRef input, IntsRefBuilder scratch) Just takes unsigned byte values from the BytesRef and converts into an IntsRef.static IntsReftoUTF16(CharSequence s, IntsRefBuilder scratch) Just maps each UTF16 unit (char) to the ints in an IntsRef.static IntsReftoUTF32(char[] s, int offset, int length, IntsRefBuilder scratch) Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.static IntsReftoUTF32(CharSequence s, IntsRefBuilder scratch) Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
-
Constructor Details
-
Util
private Util()
-
-
Method Details
-
get
Looks up the output for this input, or null if the input is not accepted.- Throws:
IOException
-
get
Looks up the output for this input, or null if the input is not accepted- Throws:
IOException
-
shortestPaths
public static <T> Util.TopResults<T> shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString) throws IOException Starting from node, find the top N min cost completions to a final node.- Throws:
IOException
-
toDot
public static <T> void toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates) throws IOException Dumps anFSTto a GraphViz'sdotlanguage description for visualization. Example of use:PrintWriter pw = new PrintWriter("out.dot"); Util.toDot(fst, pw, true, true); pw.close();and then, from command line:dot -Tpng -o out.png out.dot
Note: larger FSTs (a few thousand nodes) won't even render, don't bother.
- Parameters:
sameRank- Iftrue, the resultingdotfile will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.labelStates- Iftruestates will have labels equal to their offsets in their binary format. Expands the graph considerably.- Throws:
IOException- See Also:
-
emitDotState
private static void emitDotState(Writer out, String name, String shape, String color, String label) throws IOException Emit a single state in thedotlanguage.- Throws:
IOException
-
printableLabel
Ensures an arc's label is indeed printable (dot uses US-ASCII). -
toUTF16
Just maps each UTF16 unit (char) to the ints in an IntsRef. -
toUTF32
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it. -
toUTF32
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it. -
toIntsRef
Just takes unsigned byte values from the BytesRef and converts into an IntsRef. -
toBytesRef
Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte. -
readCeilArc
public static <T> FST.Arc<T> readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise returnnull.- Parameters:
label- the label to ceil onfst- the fst to operate onfollow- the arc to follow reading the label fromarc- the arc to read into in placein- the fst'sFST.BytesReader- Throws:
IOException
-
binarySearch
Perform a binary search of Arcs encoded as a packed array- Type Parameters:
T- the output type of the FST- Parameters:
fst- the FST from which to readarc- the starting arc; sibling arcs greater than this will be searched. Usually the first arc in the array.targetLabel- the label to search for- Returns:
- the index of the Arc having the target label, or if no Arc has the matching label,
-1 - idx), whereidxis the index of the Arc with the next highest label, or the total number of arcs if the target label exceeds the maximum. - Throws:
IOException- when the FST reader does
-