Implements an HTML/XHTML serializer supporting both DOM and SAX
pretty serializing. HTML/XHTML mode is determined in the
constructor. For usage instructions see
Serializer
.
If an output stream is used, the encoding is taken from the
output format (defaults to
UTF-8). If a writer is
used, make sure the writer uses the same encoding (if applies)
as specified in the output format.
The serializer supports both DOM and SAX. DOM serializing is done
by calling
HTMLSerializer
and SAX serializing is done by firing
SAX events and using the serializer as a document handler.
If an I/O exception occurs while serializing, the serializer
will not throw an exception directly, but only throw it
at the end of serializing (either DOM or SAX's
org.xml.sax.DocumentHandler.endDocument
.
For elements that are not specified as whitespace preserving,
the serializer will potentially break long text lines at space
boundaries, indent lines, and serialize elements on separate
lines. Line terminators will be regarded as spaces, and
spaces at beginning of line will be stripped.
XHTML is slightly different than HTML:
- Element/attribute names are lower case and case matters
- Attributes must specify value, even if empty string
- Empty elements must have '/' in empty tag
- Contents of SCRIPT and STYLE elements serialized as CDATA
HTMLSerializer
public HTMLSerializer()
Constructs a new serializer. The serializer cannot be used without
calling HTMLSerializer
or HTMLSerializer
first.
HTMLSerializer
public HTMLSerializer(OutputStream output,
OutputFormat format)
Constructs a new serializer that writes to the specified output
stream using the specified output format. If format
is null, will use a default output format.
output
- The output stream to useformat
- The output format to use, null for the default
HTMLSerializer
public HTMLSerializer(Writer writer,
OutputFormat format)
Constructs a new serializer that writes to the specified writer
using the specified output format. If format is null,
will use a default output format.
writer
- The writer to useformat
- The output format to use, null for the default
HTMLSerializer
protected HTMLSerializer(boolean xhtml,
OutputFormat format)
Constructs a new HTML/XHTML serializer depending on the value of
xhtml. The serializer cannot be used without calling
HTMLSerializer
or HTMLSerializer
first.
xhtml
- True if XHTML serializing
HTMLSerializer
public HTMLSerializer(OutputFormat format)
Constructs a new serializer. The serializer cannot be used without
calling HTMLSerializer
or HTMLSerializer
first.
characters
protected void characters(String text)
throws IOException
Called to print the text contents in the prevailing element format.
Since this method is capable of printing text as CDATA, it is used
for that purpose as well. White space handling is determined by the
current element state. In addition, the output format can dictate
whether the text is printed as CDATA or unescaped.
- characters in interface BaseMarkupSerializer
text
- The text to print
characters
public void characters(char[] chars,
int start,
int length)
throws org.xml.sax.SAXException
- characters in interface BaseMarkupSerializer
endElement
public void endElement(String tagName)
throws org.xml.sax.SAXException
- endElement in interface org.xml.sax.DocumentHandler
endElement
public void endElement(String namespaceURI,
String localName,
String rawName)
throws org.xml.sax.SAXException
- endElement in interface org.xml.sax.ContentHandler
endElementIO
public void endElementIO(String namespaceURI,
String localName,
String rawName)
throws IOException
escapeURI
protected String escapeURI(String uri)
getEntityRef
protected String getEntityRef(int ch)
Returns the suitable entity reference for this character value,
or null if no such entity exists. Calling this method with '&'
will return "&".
- getEntityRef in interface BaseMarkupSerializer
ch
- Character value
- Character entity name, or null
serializeElement
protected void serializeElement(org.w3c.dom.Element elem)
throws IOException
Called to serialize a DOM element. Equivalent to calling startElement
, endElement
and serializing everything
inbetween, but better optimized.
- serializeElement in interface BaseMarkupSerializer
setOutputFormat
public void setOutputFormat(OutputFormat format)
Specifies an output format for this serializer. It the
serializer has already been associated with an output format,
it will switch to the new format. This method should not be
called while the serializer is in the process of serializing
a document.
- setOutputFormat in interface Serializer
- setOutputFormat in interface BaseMarkupSerializer
format
- The output format to use
setXHTMLNamespace
public void setXHTMLNamespace(String newNamespace)
startDocument
protected void startDocument(String rootTagName)
throws IOException
Called to serialize the document's DOCTYPE by the root element.
The document type declaration must name the root element,
but the root element is only known when that element is serialized,
and not at the start of the document.
This method will check if it has not been called before (
HTMLSerializer
),
will serialize the document type declaration, and will serialize all
pre-root comments and PIs that were accumulated in the document
(see
HTMLSerializer
). Pre-root will be serialized even if
this is not the first root element of the document.
startElement
public void startElement(String namespaceURI,
String localName,
String rawName,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException
- startElement in interface org.xml.sax.ContentHandler
startElement
public void startElement(String tagName,
org.xml.sax.AttributeList attrs)
throws org.xml.sax.SAXException
- startElement in interface org.xml.sax.DocumentHandler