Clover coverage report - dom4j - 1.6.1
Coverage timestamp: ma mei 16 2005 14:23:01 GMT+01:00
file stats: LOC: 841   Methods: 32
NCLOC: 281   Classes: 2
 
 Source file Conditionals Statements Methods TOTAL
HTMLWriter.java 29,5% 37% 37,5% 35,5%
coverage coverage
 1    /*
 2    * Copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved.
 3    *
 4    * This software is open source.
 5    * See the bottom of this file for the licence.
 6    */
 7   
 8    package org.dom4j.io;
 9   
 10    import java.io.IOException;
 11    import java.io.OutputStream;
 12    import java.io.StringWriter;
 13    import java.io.UnsupportedEncodingException;
 14    import java.io.Writer;
 15    import java.util.HashSet;
 16    import java.util.Iterator;
 17    import java.util.Set;
 18    import java.util.Stack;
 19   
 20    import org.dom4j.Document;
 21    import org.dom4j.DocumentHelper;
 22    import org.dom4j.Element;
 23    import org.dom4j.Entity;
 24    import org.dom4j.Node;
 25   
 26    import org.xml.sax.SAXException;
 27   
 28    /**
 29    * <p>
 30    * <code>HTMLWriter</code> takes a DOM4J tree and formats it to a stream as
 31    * HTML. This formatter is similar to XMLWriter but it outputs the text of CDATA
 32    * and Entity sections rather than the serialised format as in XML, it has an
 33    * XHTML mode, it retains whitespace in certain elements such as &lt;PRE&gt;,
 34    * and it supports certain elements which have no corresponding close tag such
 35    * as for &lt;BR&gt; and &lt;P&gt;.
 36    * </p>
 37    *
 38    * <p>
 39    * The OutputFormat passed in to the constructor is checked for isXHTML() and
 40    * isExpandEmptyElements(). See {@link OutputFormat OutputFormat}for details.
 41    * Here are the rules for <b>this class </b> based on an OutputFormat, "format",
 42    * passed in to the constructor: <br/><br/>
 43    *
 44    * <ul>
 45    * <li>If an element is in {@link #getOmitElementCloseSet()
 46    * getOmitElementCloseSet}, then it is treated specially:
 47    *
 48    * <ul>
 49    * <li>It never expands, since some browsers treat this as two separate
 50    * Horizontal Rules: &lt;HR&gt;&lt;/HR&gt;</li>
 51    * <li>If {@link org.dom4j.io.OutputFormat#isXHTML() format.isXHTML()}, then
 52    * it has a space before the closing single-tag slash, since Netscape 4.x-
 53    * treats this: &lt;HR /&gt; as an element named "HR" with an attribute named
 54    * "/", but that's better than when it refuses to recognize this: &lt;hr/&gt;
 55    * which it thinks is an element named "HR/".</li>
 56    * </ul>
 57    *
 58    * </li>
 59    * <li>If {@link org.dom4j.io.OutputFormat#isXHTML() format.isXHTML()}, all
 60    * elements must have either a close element, or be a closed single tag.</li>
 61    * <li>If {@link org.dom4j.io.OutputFormat#isExpandEmptyElements()
 62    * format.isExpandEmptyElements()}() is true, all elements are expanded except
 63    * as above.</li>
 64    * </ul>
 65    *
 66    * <b>Examples </b>
 67    * </p>
 68    *
 69    * <p>
 70    * </p>
 71    *
 72    * <p>
 73    * If isXHTML == true, CDATA sections look like this:
 74    *
 75    * <PRE>
 76    *
 77    * <b>&lt;myelement&gt;&lt;![CDATA[My data]]&gt;&lt;/myelement&gt; </b>
 78    *
 79    * </PRE>
 80    *
 81    * Otherwise, they look like this:
 82    *
 83    * <PRE>
 84    *
 85    * <b>&lt;myelement&gt;My data&lt;/myelement&gt; </b>
 86    *
 87    * </PRE>
 88    *
 89    * </p>
 90    *
 91    * <p>
 92    * Basically, {@link OutputFormat.isXHTML() OutputFormat.isXHTML()} ==
 93    * <code>true</code> will produce valid XML, while {@link
 94    * org.dom4j.io.OutputFormat#isExpandEmptyElements()
 95    * format.isExpandEmptyElements()} determines whether empty elements are
 96    * expanded if isXHTML is true, excepting the special HTML single tags.
 97    * </p>
 98    *
 99    * <p>
 100    * Also, HTMLWriter handles tags whose contents should be preformatted, that is,
 101    * whitespace-preserved. By default, this set includes the tags &lt;PRE&gt;,
 102    * &lt;SCRIPT&gt;, &lt;STYLE&gt;, and &lt;TEXTAREA&gt;, case insensitively. It
 103    * does not include &lt;IFRAME&gt;. Other tags, such as &lt;CODE&gt;,
 104    * &lt;KBD&gt;, &lt;TT&gt;, &lt;VAR&gt;, are usually rendered in a different
 105    * font in most browsers, but don't preserve whitespace, so they also don't
 106    * appear in the default list. HTML Comments are always whitespace-preserved.
 107    * However, the parser you use may store comments with linefeed-only text nodes
 108    * (\n) even if your platform uses another line.separator character, and
 109    * HTMLWriter outputs Comment nodes exactly as the DOM is set up by the parser.
 110    * See examples and discussion here: {@link#setPreformattedTags(java.util.Set)
 111    * setPreformattedTags}
 112    * </p>
 113    *
 114    * <p>
 115    * <b>Examples </b>
 116    * </p>
 117    * <blockquote>
 118    * <p>
 119    * <b>Pretty Printing </b>
 120    * </p>
 121    *
 122    * <p>
 123    * This example shows how to pretty print a string containing a valid HTML
 124    * document to a string. You can also just call the static methods of this
 125    * class: <br>
 126    * {@link #prettyPrintHTML(String) prettyPrintHTML(String)}or <br>
 127    * {@link #prettyPrintHTML(String,boolean,boolean,boolean,boolean)
 128    * prettyPrintHTML(String,boolean,boolean,boolean,boolean)} or, <br>
 129    * {@link #prettyPrintXHTML(String) prettyPrintXHTML(String)}for XHTML (note
 130    * the X)
 131    * </p>
 132    *
 133    * <pre>
 134    * String testPrettyPrint(String html) {
 135    * StringWriter sw = new StringWriter();
 136    * OutputFormat format = OutputFormat.createPrettyPrint();
 137    * // These are the default values for createPrettyPrint,
 138    * // so you needn't set them:
 139    * // format.setNewlines(true);
 140    * // format.setTrimText(true);&lt;/font&gt;
 141    * format.setXHTML(true);
 142    * HTMLWriter writer = new HTMLWriter(sw, format);
 143    * Document document = DocumentHelper.parseText(html);
 144    * writer.write(document);
 145    * writer.flush();
 146    * return sw.toString();
 147    * }
 148    * </pre>
 149    *
 150    * <p>
 151    * This example shows how to create a "squeezed" document, but one that will
 152    * work in browsers even if the browser line length is limited. No newlines are
 153    * included, no extra whitespace at all, except where it it required by
 154    * {@link #setPreformattedTags(java.util.Set) setPreformattedTags}.
 155    * </p>
 156    *
 157    * <pre>
 158    * String testCrunch(String html) {
 159    * StringWriter sw = new StringWriter();
 160    * OutputFormat format = OutputFormat.createPrettyPrint();
 161    * format.setNewlines(false);
 162    * format.setTrimText(true);
 163    * format.setIndent(&quot;&quot;);
 164    * format.setXHTML(true);
 165    * format.setExpandEmptyElements(false);
 166    * format.setNewLineAfterNTags(20);
 167    * org.dom4j.io.HTMLWriter writer = new HTMLWriter(sw, format);
 168    * org.dom4j.Document document = DocumentHelper.parseText(html);
 169    * writer.write(document);
 170    * writer.flush();
 171    * return sw.toString();
 172    * }
 173    * </pre>
 174    *
 175    * </blockquote>
 176    *
 177    * @author <a href="mailto:james.strachan@metastuff.com">James Strachan </a>
 178    * @author Laramie Crocker
 179    * @version $Revision: 1.21 $
 180    */
 181    public class HTMLWriter extends XMLWriter {
 182    private static String lineSeparator = System.getProperty("line.separator");
 183   
 184    protected static final HashSet DEFAULT_PREFORMATTED_TAGS;
 185   
 186    static {
 187    // If you change this list, update the javadoc examples, above in the
 188    // class javadoc, in writeElement, and in setPreformattedTags().
 189  1 DEFAULT_PREFORMATTED_TAGS = new HashSet();
 190  1 DEFAULT_PREFORMATTED_TAGS.add("PRE");
 191  1 DEFAULT_PREFORMATTED_TAGS.add("SCRIPT");
 192  1 DEFAULT_PREFORMATTED_TAGS.add("STYLE");
 193  1 DEFAULT_PREFORMATTED_TAGS.add("TEXTAREA");
 194    }
 195   
 196    protected static final OutputFormat DEFAULT_HTML_FORMAT;
 197   
 198    static {
 199  1 DEFAULT_HTML_FORMAT = new OutputFormat(" ", true);
 200  1 DEFAULT_HTML_FORMAT.setTrimText(true);
 201  1 DEFAULT_HTML_FORMAT.setSuppressDeclaration(true);
 202    }
 203   
 204    private Stack formatStack = new Stack();
 205   
 206    private String lastText = "";
 207   
 208    private int tagsOuput = 0;
 209   
 210    // legal values are 0+, but -1 signifies lazy initialization.
 211    private int newLineAfterNTags = -1;
 212   
 213    private HashSet preformattedTags = DEFAULT_PREFORMATTED_TAGS;
 214   
 215    /**
 216    * Used to store the qualified element names which should have no close
 217    * element tag
 218    */
 219    private HashSet omitElementCloseSet;
 220   
 221  1 public HTMLWriter(Writer writer) {
 222  1 super(writer, DEFAULT_HTML_FORMAT);
 223    }
 224   
 225  5 public HTMLWriter(Writer writer, OutputFormat format) {
 226  5 super(writer, format);
 227    }
 228   
 229  0 public HTMLWriter() throws UnsupportedEncodingException {
 230  0 super(DEFAULT_HTML_FORMAT);
 231    }
 232   
 233  0 public HTMLWriter(OutputFormat format) throws UnsupportedEncodingException {
 234  0 super(format);
 235    }
 236   
 237  0 public HTMLWriter(OutputStream out) throws UnsupportedEncodingException {
 238  0 super(out, DEFAULT_HTML_FORMAT);
 239    }
 240   
 241  0 public HTMLWriter(OutputStream out, OutputFormat format)
 242    throws UnsupportedEncodingException {
 243  0 super(out, format);
 244    }
 245   
 246  0 public void startCDATA() throws SAXException {
 247    }
 248   
 249  0 public void endCDATA() throws SAXException {
 250    }
 251   
 252    // Overloaded methods
 253    // added isXHTML() stuff so you get the CDATA brackets if you desire.
 254  1 protected void writeCDATA(String text) throws IOException {
 255    // XXX: Should we escape entities?
 256    // writer.write( escapeElementEntities( text ) );
 257  1 if (getOutputFormat().isXHTML()) {
 258  0 super.writeCDATA(text);
 259    } else {
 260  1 writer.write(text);
 261    }
 262   
 263  1 lastOutputNodeType = Node.CDATA_SECTION_NODE;
 264    }
 265   
 266  0 protected void writeEntity(Entity entity) throws IOException {
 267  0 writer.write(entity.getText());
 268  0 lastOutputNodeType = Node.ENTITY_REFERENCE_NODE;
 269    }
 270   
 271  3 protected void writeDeclaration() throws IOException {
 272    }
 273   
 274  4 protected void writeString(String text) throws IOException {
 275    /*
 276    * DOM stores \n at the end of text nodes that are newlines. This is
 277    * significant if we are in a PRE section. However, we only want to
 278    * output the system line.separator, not \n. This is a little brittle,
 279    * but this function appears to be called with these lineseparators as a
 280    * separate TEXT_NODE. If we are in a preformatted section, output the
 281    * right line.separator, otherwise ditch. If the single \n character is
 282    * not the text, then do the super thing to output the text.
 283    *
 284    * Also, we store the last text that was not a \n since it may be used
 285    * by writeElement in this class to line up preformatted tags.
 286    */
 287  4 if (text.equals("\n")) {
 288  0 if (!formatStack.empty()) {
 289  0 super.writeString(lineSeparator);
 290    }
 291   
 292  0 return;
 293    }
 294   
 295  4 lastText = text;
 296   
 297  4 if (formatStack.empty()) {
 298  4 super.writeString(text.trim());
 299    } else {
 300  0 super.writeString(text);
 301    }
 302    }
 303   
 304    /**
 305    * Overriden method to not close certain element names to avoid wierd
 306    * behaviour from browsers for versions up to 5.x
 307    *
 308    * @param qualifiedName
 309    * DOCUMENT ME!
 310    *
 311    * @throws IOException
 312    * DOCUMENT ME!
 313    */
 314  0 protected void writeClose(String qualifiedName) throws IOException {
 315  0 if (!omitElementClose(qualifiedName)) {
 316  0 super.writeClose(qualifiedName);
 317    }
 318    }
 319   
 320  2 protected void writeEmptyElementClose(String qualifiedName)
 321    throws IOException {
 322  2 if (getOutputFormat().isXHTML()) {
 323    // xhtml, always check with format object whether to expand or not.
 324  0 if (omitElementClose(qualifiedName)) {
 325    // it was a special omit tag, do it the XHTML way: "<br/>",
 326    // ignoring the expansion option, since <br></br> is OK XML,
 327    // but produces twice the linefeeds desired in the browser.
 328    // for netscape 4.7, though all are fine with it, write a space
 329    // before the close slash.
 330  0 writer.write(" />");
 331    } else {
 332  0 super.writeEmptyElementClose(qualifiedName);
 333    }
 334    } else {
 335    // html, not xhtml
 336  2 if (omitElementClose(qualifiedName)) {
 337    // it was a special omit tag, do it the old html way: "<br>".
 338  1 writer.write(">");
 339    } else {
 340    // it was NOT a special omit tag, check with format object
 341    // whether to expand or not.
 342  1 super.writeEmptyElementClose(qualifiedName);
 343    }
 344    }
 345    }
 346   
 347  2 protected boolean omitElementClose(String qualifiedName) {
 348  2 return internalGetOmitElementCloseSet().contains(
 349    qualifiedName.toUpperCase());
 350    }
 351   
 352  2 private HashSet internalGetOmitElementCloseSet() {
 353  2 if (omitElementCloseSet == null) {
 354  2 omitElementCloseSet = new HashSet();
 355  2 loadOmitElementCloseSet(omitElementCloseSet);
 356    }
 357   
 358  2 return omitElementCloseSet;
 359    }
 360   
 361    // If you change this, change the javadoc for getOmitElementCloseSet.
 362  2 protected void loadOmitElementCloseSet(Set set) {
 363  2 set.add("AREA");
 364  2 set.add("BASE");
 365  2 set.add("BR");
 366  2 set.add("COL");
 367  2 set.add("HR");
 368  2 set.add("IMG");
 369  2 set.add("INPUT");
 370  2 set.add("LINK");
 371  2 set.add("META");
 372  2 set.add("P");
 373  2 set.add("PARAM");
 374    }
 375   
 376    // let the people see the set, but not modify it.
 377   
 378    /**
 379    * A clone of the Set of elements that can have their close-tags omitted. By
 380    * default it should be "AREA", "BASE", "BR", "COL", "HR", "IMG", "INPUT",
 381    * "LINK", "META", "P", "PARAM"
 382    *
 383    * @return A clone of the Set.
 384    */
 385  0 public Set getOmitElementCloseSet() {
 386  0 return (Set) (internalGetOmitElementCloseSet().clone());
 387    }
 388   
 389    /**
 390    * To use the empty set, pass an empty Set, or null:
 391    *
 392    * <pre>
 393    *
 394    *
 395    * setOmitElementCloseSet(new HashSet());
 396    * or
 397    * setOmitElementCloseSet(null);
 398    *
 399    *
 400    * </pre>
 401    *
 402    * @param newSet
 403    * DOCUMENT ME!
 404    */
 405  0 public void setOmitElementCloseSet(Set newSet) {
 406    // resets, and safely empties it out if newSet is null.
 407  0 omitElementCloseSet = new HashSet();
 408   
 409  0 if (newSet != null) {
 410  0 omitElementCloseSet = new HashSet();
 411   
 412  0 Object aTag;
 413  0 Iterator iter = newSet.iterator();
 414   
 415  0 while (iter.hasNext()) {
 416  0 aTag = iter.next();
 417   
 418  0 if (aTag != null) {
 419  0 omitElementCloseSet.add(aTag.toString().toUpperCase());
 420    }
 421    }
 422    }
 423    }
 424   
 425    /**
 426    * @see #setPreformattedTags(java.util.Set) setPreformattedTags
 427    */
 428  0 public Set getPreformattedTags() {
 429  0 return (Set) (preformattedTags.clone());
 430    }
 431   
 432    /**
 433    * <p>
 434    * Override the default set, which includes PRE, SCRIPT, STYLE, and
 435    * TEXTAREA, case insensitively.
 436    * </p>
 437    *
 438    * <p>
 439    * <b>Setting Preformatted Tags </b>
 440    * </p>
 441    *
 442    * <p>
 443    * Pass in a Set of Strings, one for each tag name that should be treated
 444    * like a PRE tag. You may pass in null or an empty Set to assign the empty
 445    * set, in which case no tags will be treated as preformatted, except that
 446    * HTML Comments will continue to be preformatted. If a tag is included in
 447    * the set of preformatted tags, all whitespace within the tag will be
 448    * preserved, including whitespace on the same line preceding the close tag.
 449    * This will generally make the close tag not line up with the start tag,
 450    * but it preserves the intention of the whitespace within the tag.
 451    * </p>
 452    *
 453    * <p>
 454    * The browser considers leading whitespace before the close tag to be
 455    * significant, but leading whitespace before the open tag to be
 456    * insignificant. For example, if the HTML author doesn't put the close
 457    * TEXTAREA tag flush to the left margin, then the TEXTAREA control in the
 458    * browser will have spaces on the last line inside the control. This may be
 459    * the HTML author's intent. Similarly, in a PRE, the browser treats a
 460    * flushed left close PRE tag as different from a close tag with leading
 461    * whitespace. Again, this must be left up to the HTML author.
 462    * </p>
 463    *
 464    * <p>
 465    * <b>Examples </b>
 466    * </p>
 467    * <blockquote>
 468    * <p>
 469    * Here is an example of how you can set the PreformattedTags list using
 470    * setPreformattedTags to include IFRAME, as well as the default set, if you
 471    * have an instance of this class named myHTMLWriter:
 472    *
 473    * <pre>
 474    * Set current = myHTMLWriter.getPreformattedTags();
 475    * current.add(&quot;IFRAME&quot;);
 476    * myHTMLWriter.setPreformattedTags(current);
 477    *
 478    * //The set is now &lt;b&gt;PRE, SCRIPT, STYLE, TEXTAREA, IFRAME&lt;/b&gt;
 479    *
 480    *
 481    * </pre>
 482    *
 483    * Similarly, you can simply replace it with your own:
 484    *
 485    * <pre>
 486    *
 487    *
 488    * HashSet newset = new HashSet();
 489    * newset.add(&quot;PRE&quot;);
 490    * newset.add(&quot;TEXTAREA&quot;);
 491    * myHTMLWriter.setPreformattedTags(newset);
 492    *
 493    * //The set is now &lt;b&gt;{PRE, TEXTAREA}&lt;/b&gt;
 494    *
 495    *
 496    * &