Class MutableCodePointTrie

java.lang.Object
com.ibm.icu.util.CodePointMap
com.ibm.icu.util.MutableCodePointTrie
All Implemented Interfaces:
Cloneable, Iterable<CodePointMap.Range>

public final class MutableCodePointTrie extends CodePointMap implements Cloneable
Mutable Unicode code point trie. Fast map from Unicode code points (U+0000..U+10FFFF) to 32-bit integer values. For details see https://icu.unicode.org/design/struct/utrie

Setting values (especially ranges) and lookup is fast. The mutable trie is only somewhat space-efficient. It builds a compacted, immutable CodePointTrie.

This trie can be modified while iterating over its contents. For example, it is possible to merge its values with those from another set of ranges (e.g., another @{link CodePointMap}): Iterate over those source ranges; for each of them iterate over this trie; add the source value into the value of each trie range.

  • Field Details

    • MAX_UNICODE

      private static final int MAX_UNICODE
      See Also:
    • UNICODE_LIMIT

      private static final int UNICODE_LIMIT
      See Also:
    • BMP_LIMIT

      private static final int BMP_LIMIT
      See Also:
    • ASCII_LIMIT

      private static final int ASCII_LIMIT
      See Also:
    • I_LIMIT

      private static final int I_LIMIT
      See Also:
    • BMP_I_LIMIT

      private static final int BMP_I_LIMIT
      See Also:
    • ASCII_I_LIMIT

      private static final int ASCII_I_LIMIT
      See Also:
    • SMALL_DATA_BLOCKS_PER_BMP_BLOCK

      private static final int SMALL_DATA_BLOCKS_PER_BMP_BLOCK
      See Also:
    • ALL_SAME

      private static final byte ALL_SAME
      See Also:
    • MIXED

      private static final byte MIXED
      See Also:
    • SAME_AS

      private static final byte SAME_AS
      See Also:
    • INITIAL_DATA_LENGTH

      private static final int INITIAL_DATA_LENGTH
      Start with allocation of 16k data entries.
      See Also:
    • MEDIUM_DATA_LENGTH

      private static final int MEDIUM_DATA_LENGTH
      Grow about 8x each time.
      See Also:
    • MAX_DATA_LENGTH

      private static final int MAX_DATA_LENGTH
      Maximum length of the build-time data array. One entry per 0x110000 code points.
      See Also:
    • I3_NULL

      private static final byte I3_NULL
      See Also:
    • I3_BMP

      private static final byte I3_BMP
      See Also:
    • I3_16

      private static final byte I3_16
      See Also:
    • I3_18

      private static final byte I3_18
      See Also:
    • INDEX_3_18BIT_BLOCK_LENGTH

      private static final int INDEX_3_18BIT_BLOCK_LENGTH
      See Also:
    • index

      private int[] index
    • index3NullOffset

      private int index3NullOffset
    • data

      private int[] data
    • dataLength

      private int dataLength
    • dataNullOffset

      private int dataNullOffset
    • origInitialValue

      private int origInitialValue
    • initialValue

      private int initialValue
    • errorValue

      private int errorValue
    • highStart

      private int highStart
    • highValue

      private int highValue
    • index16

      private char[] index16
      Temporary array while building the final data.
    • flags

      private byte[] flags
  • Constructor Details

    • MutableCodePointTrie

      public MutableCodePointTrie(int initialValue, int errorValue)
      Constructs a mutable trie that initially maps each Unicode code point to the same value. It uses 32-bit data values until buildImmutable(com.ibm.icu.util.CodePointTrie.Type, com.ibm.icu.util.CodePointTrie.ValueWidth) is called. buildImmutable() takes a valueWidth parameter which determines the number of bits in the data value in the resulting CodePointTrie.
      Parameters:
      initialValue - the initial value that is set for all code points
      errorValue - the value for out-of-range code points and ill-formed UTF-8/16
  • Method Details

    • clone

      public MutableCodePointTrie clone()
      Clones this mutable trie.
      Overrides:
      clone in class Object
      Returns:
      the clone
    • fromCodePointMap

      public static MutableCodePointTrie fromCodePointMap(CodePointMap map)
      Creates a mutable trie with the same contents as the CodePointMap.
      Parameters:
      map - the source map or trie
      Returns:
      the mutable trie
    • clear

      private void clear()
    • get

      public int get(int c)
      Returns the value for a code point as stored in the map, with range checking. Returns an implementation-defined error value if c is not in the range 0..U+10FFFF.
      Specified by:
      get in class CodePointMap
      Parameters:
      c - the code point
      Returns:
      the map value, or an implementation-defined error value if the code point is not in the range 0..U+10FFFF
    • maybeFilterValue

      private static final int maybeFilterValue(int value, int initialValue, int nullValue, CodePointMap.ValueFilter filter)
    • getRange

      public boolean getRange(int start, CodePointMap.ValueFilter filter, CodePointMap.Range range)
      Sets the range object to a range of code points beginning with the start parameter. The range start is the same as the start input parameter (even if there are preceding code points that have the same value). The range end is the last code point such that all those from start to there have the same value. Returns false if start is not 0..U+10FFFF. Can be used to efficiently iterate over all same-value ranges in a map. (This is normally faster than iterating over code points and get()ting each value, but may be much slower than a data structure that stores ranges directly.)

      If the CodePointMap.ValueFilter parameter is not null, then the value to be delivered is passed through that filter, and the return value is the end of the range where all values are modified to the same actual value. The value is unchanged if that parameter is null.

      Example:

       int start = 0;
       CodePointMap.Range range = new CodePointMap.Range();
       while (map.getRange(start, null, range)) {
           int end = range.getEnd();
           int value = range.getValue();
           // Work with the range start..end and its value.
           start = end + 1;
       }
       

      The trie can be modified between calls to this function.

      Specified by:
      getRange in class CodePointMap
      Parameters:
      start - range start
      filter - an object that may modify the map data value, or null if the values from the map are to be used unmodified
      range - the range object that will be set to the code point range and value
      Returns:
      true if start is 0..U+10FFFF; otherwise no new range is fetched
    • writeBlock

      private void writeBlock(int block, int value)
    • set

      public void set(int c, int value)
      Sets a value for a code point.
      Parameters:
      c - the code point
      value - the value
    • fillBlock

      private void fillBlock(int block, int start, int limit, int value)
    • setRange

      public void setRange(int start, int end, int value)
      Sets a value for each code point [start..end]. Faster and more space-efficient than setting the value for each code point separately.
      Parameters:
      start - the first code point to get the value
      end - the last code point to get the value (inclusive)
      value - the value
    • buildImmutable

      public CodePointTrie buildImmutable(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth)
      Compacts the data and builds an immutable CodePointTrie according to the parameters. After this, the mutable trie will be empty.

      The mutable trie stores 32-bit values until buildImmutable() is called. If values shorter than 32 bits are to be stored in the immutable trie, then the upper bits are discarded. For example, when the mutable trie contains values 0x81, -0x7f, and 0xa581, and the value width is 8 bits, then each of these is stored as 0x81 and the immutable trie will return that as an unsigned value. (Some implementations may want to make productive temporary use of the upper bits until buildImmutable() discards them.)

      Not every possible set of mappings can be built into a CodePointTrie, because of limitations resulting from speed and space optimizations. Every Unicode assigned character can be mapped to a unique value. Typical data yields data structures far smaller than the limitations.

      It is possible to construct extremely unusual mappings that exceed the data structure limits. In such a case this function will throw an exception.

      Parameters:
      type - selects the trie type
      valueWidth - selects the number of bits in a trie data value; if smaller than 32 bits, then the values stored in the trie will be truncated first
      See Also:
    • ensureHighStart

      private void ensureHighStart(int c)
    • allocDataBlock

      private int allocDataBlock(int blockLength)
    • getDataBlock

      private int getDataBlock(int i)
      No error checking for illegal arguments. The Java version always returns non-negative values.
    • maskValues

      private void maskValues(int mask)
    • equalBlocks

      private static boolean equalBlocks(int[] s, int si, int[] t, int ti, int length)
    • equalBlocks

      private static boolean equalBlocks(char[] s, int si, int[] t, int ti, int length)
    • equalBlocks

      private static boolean equalBlocks(char[] s, int si, char[] t, int ti, int length)
    • allValuesSameAs

      private static boolean allValuesSameAs(int[] p, int pi, int length, int value)
    • findSameBlock

      private static int findSameBlock(char[] p, int pStart, int length, char[] q, int qStart, int blockLength)
      Search for an identical block.
    • findAllSameBlock

      private static int findAllSameBlock(int[] p, int start, int limit, int value, int blockLength)
    • getOverlap

      private static int getOverlap(int[] p, int length, int[] q, int qStart, int blockLength)
      Look for maximum overlap of the beginning of the other block with the previous, adjacent block.
    • getOverlap

      private static int getOverlap(char[] p, int length, int[] q, int qStart, int blockLength)
    • getOverlap

      private static int getOverlap(char[] p, int length, char[] q, int qStart, int blockLength)
    • getAllSameOverlap

      private static int getAllSameOverlap(int[] p, int length, int value, int blockLength)
    • isStartOfSomeFastBlock

      private static boolean isStartOfSomeFastBlock(int dataOffset, int[] index, int fastILimit)
    • findHighStart

      private int findHighStart()
      Finds the start of the last range in the trie by enumerating backward. Indexes for code points higher than this will be omitted.
    • compactWholeDataBlocks

      private int compactWholeDataBlocks(int fastILimit, MutableCodePointTrie.AllSameBlocks allSameBlocks)
    • compactData

      private int compactData(int fastILimit, int[] newData, int dataNullIndex, MutableCodePointTrie.MixedBlocks mixedBlocks)
      Compacts a build-time trie. The compaction - removes blocks that are identical with earlier ones - overlaps each new non-duplicate block as much as possible with the previously-written one - works with fast-range data blocks whose length is a multiple of that of higher-code-point data blocks It does not try to find an optimal order of writing, deduplicating, and overlapping blocks.
    • compactIndex

      private int compactIndex(int fastILimit, MutableCodePointTrie.MixedBlocks mixedBlocks)
    • compactTrie

      private int compactTrie(int fastILimit)
    • build

      private CodePointTrie build(CodePointTrie.Type type, CodePointTrie.ValueWidth valueWidth)