Class BasicStreamReader

  • All Implemented Interfaces:
    InputConfigFlags, ParsingErrorMsgs, InputProblemReporter, StreamReaderImpl, XMLStreamConstants, XMLStreamReader, org.codehaus.stax2.DTDInfo, org.codehaus.stax2.LocationInfo, org.codehaus.stax2.typed.TypedXMLStreamReader, org.codehaus.stax2.validation.Validatable, org.codehaus.stax2.XMLStreamReader2
    Direct Known Subclasses:
    TypedStreamReader

    public abstract class BasicStreamReader
    extends StreamScanner
    implements StreamReaderImpl, org.codehaus.stax2.DTDInfo, org.codehaus.stax2.LocationInfo
    Partial implementation of XMLStreamReader2 consisting of all functionality other than DTD-validation-specific parts, and Typed Access API (Stax2 v3.0), which are implemented at sub-classes.
    Author:
    Tatu Saloranta
    • Field Detail

      • MASK_GET_TEXT

        protected static final int MASK_GET_TEXT
        This mask covers all types for which basic getText() method can be called.
        See Also:
        Constant Field Values
      • MASK_GET_TEXT_XXX

        protected static final int MASK_GET_TEXT_XXX
        This mask covers all types for which extends getTextXxx methods can be called; which is less than those for which getText() can be called. Specifically, DTD and ENTITY_REFERENCE types do not support these extended
        See Also:
        Constant Field Values
      • MASK_GET_TEXT_WITH_WRITER

        protected static final int MASK_GET_TEXT_WITH_WRITER
        This mask is used with Stax2 getText() method (one that takes Writer as an argument): accepts even wider range of event types.
        See Also:
        Constant Field Values
      • MASK_GET_ELEMENT_TEXT

        protected static final int MASK_GET_ELEMENT_TEXT
        See Also:
        Constant Field Values
      • sPrefixXml

        protected static final String sPrefixXml
      • sPrefixXmlns

        protected static final String sPrefixXmlns
      • mConfigFlags

        protected final int mConfigFlags
        Set of locally stored configuration flags
      • mCfgCoalesceText

        protected final boolean mCfgCoalesceText
      • mCfgReportTextAsChars

        protected final boolean mCfgReportTextAsChars
      • mCfgLazyParsing

        protected final boolean mCfgLazyParsing
      • mShortestTextSegment

        protected final int mShortestTextSegment
        Minimum number of characters parser can return as partial text segment, IF it's not required to coalesce adjacent text segments.
      • mOwner

        protected final ReaderCreator mOwner
        Object to notify about shared stuff, such as symbol tables, as well as to query for additional config settings if necessary.
      • mDocStandalone

        protected int mDocStandalone
        Status about "stand-aloneness" of document; set to 'yes'/'no'/'unknown' based on whether there was xml declaration, and if so, whether it had standalone attribute.
      • mRootPrefix

        protected String mRootPrefix
        Prefix of root element, as dictated by DOCTYPE declaration; null if no DOCTYPE declaration, or no root prefix
      • mRootLName

        protected String mRootLName
        Local name of root element, as dictated by DOCTYPE declaration; null if no DOCTYPE declaration.
      • mDtdPublicId

        protected String mDtdPublicId
        Public id of the DTD, if one exists and has been parsed.
      • mDtdSystemId

        protected String mDtdSystemId
        System id of the DTD, if one exists and has been parsed.
      • mTextBuffer

        protected final TextBuffer mTextBuffer
        TextBuffer mostly used to collect non-element textual content (text, CDATA, comment content, pi data)
      • mElementStack

        protected final InputElementStack mElementStack
        Currently open element tree
      • mAttrCollector

        protected final AttributeCollector mAttrCollector
        Object that stores information about currently accessible attributes.
      • mStDoctypeFound

        protected boolean mStDoctypeFound
      • mTokenState

        protected int mTokenState
        State of the current token; one of M_ - constants from above.

        Initially set to fully tokenized, since it's the virtual START_DOCUMENT event that we fully know by now (parsed by bootstrapper)

      • mStTextThreshold

        protected final int mStTextThreshold
        Threshold value that defines tokenization state that needs to be achieved to "finish" current logical text segment (which may consist of adjacent CDATA and text segments; or be a complete physical segment; or just even a fragment of such a segment)
      • mCurrTextLength

        protected int mCurrTextLength
        Sized of currentTextLength for CDATA, CHARACTERS, WHITESPACE. When segmenting, this records to size of all the segments so we can track if the text length has exceeded limits.
      • mStEmptyElem

        protected boolean mStEmptyElem
      • mParseState

        protected int mParseState
        Main parsing/tokenization state (STATE_xxx)
      • mCurrToken

        protected int mCurrToken
        Current state of the stream, ie token value returned by getEventType(). Needs to be initialized to START_DOCUMENT, since that's the state it starts in.
      • mSecondaryToken

        protected int mSecondaryToken
        Additional information sometimes stored (when generating dummy events in multi-doc mode, for example) temporarily when mCurrToken is already populated.
      • mWsStatus

        protected int mWsStatus
        Status of current (text) token's "whitespaceness", that is, whether it is or is not all white space.
      • mValidateText

        protected boolean mValidateText
        Flag that indicates that textual content (CDATA, CHARACTERS) is to be validated within current element's scope. Enabled if one of validators returns XMLValidator.CONTENT_ALLOW_VALIDATABLE_TEXT, and will prevent lazy parsing of text.
      • mCheckIndentation

        protected int mCheckIndentation
        Counter used for determining whether we are to try to heuristically "intern" white space that seems to be used for indentation purposes
      • mPendingException

        protected XMLStreamException mPendingException
        Due to the way Stax API does not allow throwing stream exceptions from many methods for which Woodstox would need to throw one (especially getText and its variations), we may need to delay throwing an exception until next() is called next time. If so, this variable holds the pending stream exception.
      • mGeneralEntities

        protected Map<String,​EntityDecl> mGeneralEntities
        Entities parsed from internal/external DTD subsets. Although it will remain null for this class, extended classes make use of it, plus, to be able to share some of entity resolution code, instance is left here even though it semantically belongs to the sub-class.
      • mVldContent

        protected int mVldContent
        Mode information needed at this level; mostly to check what kind of textual content (if any) is allowed in current element context. Constants come from XMLValidator, (like XMLValidator.CONTENT_ALLOW_VALIDATABLE_TEXT). Only used inside tree; ignored for prolog/epilog (which have straight-forward static rules).
      • mReturnNullForDefaultNamespace

        protected boolean mReturnNullForDefaultNamespace
        Configuration from WstxInputProperties#RETURN_NULL_FOR_DEFAULT_NAMESPACE
        Since:
        4.1.2
    • Method Detail

      • getCharacterEncodingScheme

        public String getCharacterEncodingScheme()
        As per Stax (1.0) specs, needs to return whatever xml declaration claimed encoding is, if any; or null if no xml declaration found.

        Note: method name is rather confusing (compare to getEncoding()).

        Specified by:
        getCharacterEncodingScheme in interface XMLStreamReader
      • getEncoding

        public String getEncoding()
        As per Stax (1.0) specs, needs to return whatever parser determined the encoding was, if it was able to figure it out. If not (there are cases where this can not be found; specifically when being passed a Reader), it should return null.
        Specified by:
        getEncoding in interface XMLStreamReader
      • getElementText

        public String getElementText()
                              throws XMLStreamException
        From StAX specs:
        Reads the content of a text-only element, an exception is thrown if this is not a text-only element. Regardless of value of javax.xml.stream.isCoalescing this method always returns coalesced content.
        Precondition: the current event is START_ELEMENT.
        Postcondition: the current event is the corresponding END_ELEMENT.
        Specified by:
        getElementText in interface XMLStreamReader
        Throws:
        XMLStreamException
      • getEventType

        public int getEventType()
        Returns type of the last event returned; or START_DOCUMENT before any events has been explicitly returned.
        Specified by:
        getEventType in interface XMLStreamReader
      • getTextCharacters

        public int getTextCharacters​(int sourceStart,
                                     char[] target,
                                     int targetStart,
                                     int len)
        Specified by:
        getTextCharacters in interface XMLStreamReader
      • isWhiteSpace

        public boolean isWhiteSpace()

        05-Apr-2004, TSa: Could try to determine status when text is actually read. That'd prevent double reads... but would it slow down that one reading so that net effect would be negative?

        Specified by:
        isWhiteSpace in interface XMLStreamReader
      • close

        public void close()
                   throws XMLStreamException

        Note: as per StAX 1.0 specs, this method does NOT close the underlying input reader. That is, unless the new StAX2 property XMLInputFactory2.P_AUTO_CLOSE_INPUT is set to true.

        Specified by:
        close in interface XMLStreamReader
        Throws:
        XMLStreamException
      • getFeature

        @Deprecated
        public Object getFeature​(String name)
        Deprecated.
        Specified by:
        getFeature in interface org.codehaus.stax2.XMLStreamReader2
      • setFeature

        @Deprecated
        public void setFeature​(String name,
                               Object value)
        Deprecated.
        Specified by:
        setFeature in interface org.codehaus.stax2.XMLStreamReader2
      • isPropertySupported

        public boolean isPropertySupported​(String name)
        Specified by:
        isPropertySupported in interface org.codehaus.stax2.XMLStreamReader2
      • setProperty

        public boolean setProperty​(String name,
                                   Object value)
        Specified by:
        setProperty in interface org.codehaus.stax2.XMLStreamReader2
        Parameters:
        name - Name of the property to set
        value - Value to set property to.
        Returns:
        True, if the specified property was succesfully set to specified value; false if its value was not changed
      • getAttributeInfo

        public org.codehaus.stax2.AttributeInfo getAttributeInfo()
                                                          throws XMLStreamException
        Specified by:
        getAttributeInfo in interface org.codehaus.stax2.XMLStreamReader2
        Throws:
        XMLStreamException
      • getDTDInfo

        public org.codehaus.stax2.DTDInfo getDTDInfo()
                                              throws XMLStreamException
        Since this class implements DTDInfo, method can just return this.
        Specified by:
        getDTDInfo in interface org.codehaus.stax2.XMLStreamReader2
        Throws:
        XMLStreamException
      • getLocationInfo

        public final org.codehaus.stax2.LocationInfo getLocationInfo()
        Location information is always accessible, for this reader.
        Specified by:
        getLocationInfo in interface org.codehaus.stax2.XMLStreamReader2
      • getText

        public int getText​(Writer w,
                           boolean preserveContents)
                    throws IOException,
                           XMLStreamException
        Method similar to getText(), except that it just uses provided Writer to write all textual content. For further optimization, it may also be allowed to do true pass-through, thus possibly avoiding one temporary copy of the data.

        TODO: try to optimize to allow completely streaming pass-through: currently will still read all data in memory buffers before outputting

        Specified by:
        getText in interface org.codehaus.stax2.XMLStreamReader2
        Parameters:
        w - Writer to use for writing textual contents
        preserveContents - If true, reader has to preserve contents so that further calls to getText will return proper conntets. If false, reader is allowed to skip creation of such copies: this can improve performance, but it also means that further calls to getText is not guaranteed to return meaningful data.
        Returns:
        Number of characters written to the reader
        Throws:
        IOException
        XMLStreamException
      • getDepth

        public int getDepth()
        Specified by:
        getDepth in interface org.codehaus.stax2.XMLStreamReader2
        Returns:
        Number of open elements in the stack; 0 when parser is in prolog/epilog, 1 inside root element and so on.
      • isEmptyElement

        public boolean isEmptyElement()
                               throws XMLStreamException
        Specified by:
        isEmptyElement in interface org.codehaus.stax2.XMLStreamReader2
        Returns:
        True, if cursor points to a start or end element that is constructed from 'empty' element (ends with '/>'); false otherwise.
        Throws:
        XMLStreamException
      • getNonTransientNamespaceContext

        public NamespaceContext getNonTransientNamespaceContext()
        Specified by:
        getNonTransientNamespaceContext in interface org.codehaus.stax2.XMLStreamReader2
      • getPrefixedName

        public String getPrefixedName()
        Specified by:
        getPrefixedName in interface org.codehaus.stax2.XMLStreamReader2
      • closeCompletely

        public void closeCompletely()
                             throws XMLStreamException
        Specified by:
        closeCompletely in interface org.codehaus.stax2.XMLStreamReader2
        Throws:
        XMLStreamException
      • getProcessedDTD

        public Object getProcessedDTD()

        Note: DTD-handling sub-classes need to override this method.

        Specified by:
        getProcessedDTD in interface org.codehaus.stax2.DTDInfo
      • getDTDRootName

        public String getDTDRootName()
        Specified by:
        getDTDRootName in interface org.codehaus.stax2.DTDInfo
      • getDTDPublicId

        public String getDTDPublicId()
        Specified by:
        getDTDPublicId in interface org.codehaus.stax2.DTDInfo
      • getDTDSystemId

        public String getDTDSystemId()
        Specified by:
        getDTDSystemId in interface org.codehaus.stax2.DTDInfo
      • getDTDInternalSubset

        public String getDTDInternalSubset()
        Specified by:
        getDTDInternalSubset in interface org.codehaus.stax2.DTDInfo
        Returns:
        Internal subset portion of the DOCTYPE declaration, if any; empty String if none
      • getProcessedDTDSchema

        public org.codehaus.stax2.validation.DTDValidationSchema getProcessedDTDSchema()
        Sub-class will override this method
        Specified by:
        getProcessedDTDSchema in interface org.codehaus.stax2.DTDInfo
      • getStartingByteOffset

        public long getStartingByteOffset()
        Specified by:
        getStartingByteOffset in interface org.codehaus.stax2.LocationInfo
      • getStartingCharOffset

        public long getStartingCharOffset()
        Specified by:
        getStartingCharOffset in interface org.codehaus.stax2.LocationInfo
      • getEndingByteOffset

        public long getEndingByteOffset()
                                 throws XMLStreamException
        Specified by:
        getEndingByteOffset in interface org.codehaus.stax2.LocationInfo
        Throws:
        XMLStreamException
      • getEndingCharOffset

        public long getEndingCharOffset()
                                 throws XMLStreamException
        Specified by:
        getEndingCharOffset in interface org.codehaus.stax2.LocationInfo
        Throws:
        XMLStreamException
      • getLocation

        public final Location getLocation()
        Description copied from class: StreamScanner
        Returns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).
        Specified by:
        getLocation in interface InputProblemReporter
        Specified by:
        getLocation in interface org.codehaus.stax2.LocationInfo
        Specified by:
        getLocation in interface XMLStreamReader
        Specified by:
        getLocation in class StreamScanner
      • getEndLocation

        public final org.codehaus.stax2.XMLStreamLocation2 getEndLocation()
                                                                   throws XMLStreamException
        Specified by:
        getEndLocation in interface org.codehaus.stax2.LocationInfo
        Throws:
        XMLStreamException
      • validateAgainst

        public org.codehaus.stax2.validation.XMLValidator validateAgainst​(org.codehaus.stax2.validation.XMLValidationSchema schema)
                                                                   throws XMLStreamException
        Specified by:
        validateAgainst in interface org.codehaus.stax2.validation.Validatable
        Throws:
        XMLStreamException
      • stopValidatingAgainst

        public org.codehaus.stax2.validation.XMLValidator stopValidatingAgainst​(org.codehaus.stax2.validation.XMLValidationSchema schema)
                                                                         throws XMLStreamException
        Specified by:
        stopValidatingAgainst in interface org.codehaus.stax2.validation.Validatable
        Throws:
        XMLStreamException
      • stopValidatingAgainst

        public org.codehaus.stax2.validation.XMLValidator stopValidatingAgainst​(org.codehaus.stax2.validation.XMLValidator validator)
                                                                         throws XMLStreamException
        Specified by:
        stopValidatingAgainst in interface org.codehaus.stax2.validation.Validatable
        Throws:
        XMLStreamException
      • setValidationProblemHandler

        public org.codehaus.stax2.validation.ValidationProblemHandler setValidationProblemHandler​(org.codehaus.stax2.validation.ValidationProblemHandler h)
        Specified by:
        setValidationProblemHandler in interface org.codehaus.stax2.validation.Validatable
      • getInputElementStack

        public InputElementStack getInputElementStack()
        Method needed by classes (like stream writer implementations) that want to have efficient direct access to element stack implementation
        Specified by:
        getInputElementStack in interface StreamReaderImpl
      • getAttributeCollector

        public AttributeCollector getAttributeCollector()
        Method needed by classes (like stream writer implementations) that want to have efficient direct access to attribute collector Object, for optimal attribute name and value access.
        Specified by:
        getAttributeCollector in interface StreamReaderImpl
      • hasConfigFlags

        protected final boolean hasConfigFlags​(int flags)
      • initValidation

        protected void initValidation()
                               throws XMLStreamException
        Method called right before the document root element is handled. The default implementation is empty; validating stream readers should override the method and do whatever initialization is necessary
        Throws:
        XMLStreamException
      • handleMultiDocStart

        protected int handleMultiDocStart​(int nextEvent)
        Method called when an event was encountered that indicates document boundary in multi-doc mode. Needs to trigger dummy END_DOCUMENT/START_DOCUMENT event combination, followed by the handling of the original event.
        Returns:
        Event type to return
      • skipEquals

        protected final char skipEquals​(String name,
                                        String eofMsg)
                                 throws XMLStreamException
        Method that checks that input following is of form '[S]* '=' [S]*' (as per XML specs, production #25). Will push back non-white space characters as necessary, in case no equals char is encountered.
        Throws:
        XMLStreamException
      • parseQuoted

        protected final void parseQuoted​(String name,
                                         char quoteChar,
                                         TextBuffer tbuf)
                                  throws XMLStreamException
        Method called to parse quoted xml declaration pseudo-attribute values. Works similar to attribute value parsing, except no entities can be included, and in general need not be as picky (since caller is to verify contents). One exception is that we do check for linefeeds and lt chars, since they generally would indicate problems and are useful to catch early on (can happen if a quote is missed etc)

        Note: since it'll be called at most 3 times per document, this method is not optimized too much.

        Throws:
        XMLStreamException
      • finishDTD

        protected void finishDTD​(boolean copyContents)
                          throws XMLStreamException
        This method gets called to handle remainder of DOCTYPE declaration, essentially the optional internal subset. This class implements the basic "ignore it" functionality, but can optionally still store copy of the contents to the read buffer.

        NOTE: Since this default implementation will be overridden by some sub-classes, make sure you do NOT change the method signature.

        Parameters:
        copyContents - If true, will copy contents of the internal subset of DOCTYPE declaration in the text buffer; if false, will just completely ignore the subset (if one found).
        Throws:
        XMLStreamException
      • readEndElem

        protected final void readEndElem()
                                  throws XMLStreamException
        Method called to completely read a close tag, and update element stack appropriately (including checking that tag matches etc).
        Throws:
        XMLStreamException
      • safeEnsureFinishToken

        protected void safeEnsureFinishToken()
      • safeFinishToken

        protected void safeFinishToken()
      • finishToken

        protected void finishToken​(boolean deferErrors)
                            throws XMLStreamException
        Method called to read in contents of the token completely, if not yet read. Generally called when caller needs to access anything other than basic token type (except for elements), text contents or such.
        Parameters:
        deferErrors - Flag to enable storing an exception to a variable, instead of immediately throwing it. If true, will just store the exception; if false, will not store, just throw.
        Throws:
        XMLStreamException
      • readCoalescedText

        protected void readCoalescedText​(int currType,
                                         boolean deferErrors)
                                  throws XMLStreamException
        Method called to read the content of both current CDATA/CHARACTERS events, and all following consequtive events into the text buffer. At this point the current type is known, prefix (for CDATA) skipped, and initial consequtive contents (if any) read in.
        Parameters:
        deferErrors - Flag to enable storing an exception to a variable, instead of immediately throwing it. If true, will just store the exception; if false, will not store, just throw.
        Throws:
        XMLStreamException
      • readCDataSecondary

        protected boolean readCDataSecondary​(int shortestSegment)
                                      throws XMLStreamException
        Returns:
        True if the whole CData section was completely read (we hit the end marker); false if a shorter segment was returned.
        Throws:
        XMLStreamException
      • readTextSecondary

        protected final boolean readTextSecondary​(int shortestSegment,
                                                  boolean deferErrors)
                                           throws XMLStreamException
        Parameters:
        deferErrors - Flag to enable storing an exception to a variable, instead of immediately throwing it. If true, will just store the exception; if false, will not store, just throw.
        Returns:
        True if the text segment was completely read ('<' was hit, or in non-entity-expanding mode, a non-char entity); false if it may still continue
        Throws:
        XMLStreamException
      • skipWS

        protected final boolean skipWS​(char c)
                                throws XMLStreamException
        Method that will skip any white space from input source(s)
        Returns:
        true If at least one white space was skipped; false if not (character passed was not white space)
        Throws:
        XMLStreamException
      • handleGreedyEntityProblem

        protected void handleGreedyEntityProblem​(WstxInputSource input)
                                          throws XMLStreamException
        This problem gets reported if an entity tries to expand to a close tag matching start tag that did not came from the same entity (but from parent).
        Throws:
        XMLStreamException
      • throwNotTextualOrElem

        protected void throwNotTextualOrElem​(int type)
      • throwUnexpectedEOF

        protected void throwUnexpectedEOF()
                                   throws WstxException
        Method called when we get an EOF within content tree
        Throws:
        WstxException
      • _constructUnexpectedInTyped

        protected XMLStreamException _constructUnexpectedInTyped​(int nextToken)
        Method called to report a problem with
      • _constructTypeException

        protected org.codehaus.stax2.typed.TypedXMLStreamException _constructTypeException​(String msg,
                                                                                           String lexicalValue)
      • reportInvalidContent

        protected void reportInvalidContent​(int evtType)
                                     throws XMLStreamException
        Stub method implemented by validating parsers, to report content that's not valid for current element context. Defined at this level since some such problems need to be caught at low-level; however, details of error reports are not needed here.
        Parameters:
        evtType - Type of event that contained unexpected content
        Throws:
        XMLStreamException