Class UTF32Reader

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Readable

    public final class UTF32Reader
    extends Reader
    Since JDK does not come with UTF-32/UCS-4, let's implement a simple decoder to use.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static char CONVERT_LSEP_TO
      In xml 1.1, LSEP bit like \n, or \r.
      protected static char CONVERT_NEL_TO
      In xml 1.1, NEL (0x85) behaves much the way \n does (can be follow \r as part of the linefeed
      protected boolean mBigEndian  
      protected byte[] mByteBuffer  
      protected int mByteBufferEnd
      Pointed to the end marker, that is, position one after the last valid available byte.
      protected int mByteCount
      Total read byte count; used for error reporting purposes
      protected int mBytePtr
      Pointer to the next available byte (if any), iff less than mByteBufferEnd
      protected int mCharCount
      Total read character count; used for error reporting purposes
      protected ReaderConfig mConfig  
      protected char mSurrogate
      Although input is fine with full Unicode set, Java still uses 16-bit chars, so we may have to split high-order chars into surrogate pairs.
      protected char[] mTmpBuf  
      protected boolean mXml11  
      protected static char NULL_BYTE  
      protected static char NULL_CHAR  
    • Constructor Summary

      Constructors 
      Constructor Description
      UTF32Reader​(ReaderConfig cfg, InputStream in, byte[] buf, int ptr, int len, boolean recycleBuffer, boolean isBigEndian)  
    • Field Detail

      • mBigEndian

        protected final boolean mBigEndian
      • mXml11

        protected boolean mXml11
      • mSurrogate

        protected char mSurrogate
        Although input is fine with full Unicode set, Java still uses 16-bit chars, so we may have to split high-order chars into surrogate pairs.
      • mCharCount

        protected int mCharCount
        Total read character count; used for error reporting purposes
      • mByteCount

        protected int mByteCount
        Total read byte count; used for error reporting purposes
      • CONVERT_NEL_TO

        protected static final char CONVERT_NEL_TO
        In xml 1.1, NEL (0x85) behaves much the way \n does (can be follow \r as part of the linefeed
        See Also:
        Constant Field Values
      • CONVERT_LSEP_TO

        protected static final char CONVERT_LSEP_TO
        In xml 1.1, LSEP bit like \n, or \r. Need to choose one as the result. Let's use \n, for simplicity
        See Also:
        Constant Field Values
      • mByteBuffer

        protected byte[] mByteBuffer
      • mBytePtr

        protected int mBytePtr
        Pointer to the next available byte (if any), iff less than mByteBufferEnd
      • mByteBufferEnd

        protected int mByteBufferEnd
        Pointed to the end marker, that is, position one after the last valid available byte.
      • mTmpBuf

        protected char[] mTmpBuf
    • Constructor Detail

      • UTF32Reader

        public UTF32Reader​(ReaderConfig cfg,
                           InputStream in,
                           byte[] buf,
                           int ptr,
                           int len,
                           boolean recycleBuffer,
                           boolean isBigEndian)
    • Method Detail

      • setXmlCompliancy

        public void setXmlCompliancy​(int xmlVersion)
        Method that can be called to indicate the xml conformance used when reading content using this reader. Some of the character validity checks need to be done at reader level, and sometimes they depend on xml level (for example, xml 1.1 has new linefeeds and both more and less restricted characters).
      • canModifyBuffer

        protected final boolean canModifyBuffer()
        Method that can be used to see if we can actually modify the underlying buffer. This is the case if we are managing the buffer, but not if it was just given to us.
      • read

        public int read()
                 throws IOException
        Although this method is implemented by the base class, AND it should never be called by Woodstox code, let's still implement it bit more efficiently just in case
        Overrides:
        read in class Reader
        Throws:
        IOException
      • readBytes

        protected final int readBytes()
                               throws IOException
        Method for reading as many bytes from the underlying stream as possible (that fit in the buffer), to the beginning of the buffer.
        Throws:
        IOException
      • readBytesAt

        protected final int readBytesAt​(int offset)
                                 throws IOException
        Method for reading as many bytes from the underlying stream as possible (that fit in the buffer considering offset), to the specified offset.
        Returns:
        Number of bytes read, if any; -1 to indicate none available (that is, end of input)
        Throws:
        IOException
      • freeBuffers

        public final void freeBuffers()
        This method should be called along with (or instead of) normal close. After calling this method, no further reads should be tried. Method will try to recycle read buffers (if any).
      • reportBounds

        protected void reportBounds​(char[] cbuf,
                                    int start,
                                    int len)
                             throws IOException
        Throws:
        IOException
      • reportInvalidXml11

        protected void reportInvalidXml11​(int value,
                                          int bytePos,
                                          int charPos)
                                   throws IOException
        Throws:
        IOException