com.cc.framework.util.parser
Class HtmlParser

java.lang.Object
  extended bycom.cc.framework.util.parser.HtmlParser
Direct Known Subclasses:
JspParser

public class HtmlParser
extends java.lang.Object

A very simple HTML parser

Author:
P001002

Nested Class Summary
private static class HtmlParser.Attribute
          A single Attribute definition
private static class HtmlParser.AttributesImpl
          The attribute collection of a HTML tag
 
Field Summary
private  java.util.Stack elements
          The element stack the parser uses to track well formed syntax Stack<String>
private  EntityMapper entityMapper
          The object to map entities
private  HtmlHandler handler
          The handler that implements the callback methods the parser will call during processing of the document
protected  int pos
          The current processing position (index)
protected  int processed
          The position (index) to which the document is processed
private  char[] source
          The documents source code
private  boolean validate
          This flag tells the parser to check if the document is well formed
 
Constructor Summary
HtmlParser()
          Constructor
 
Method Summary
protected  boolean eos()
           
protected  HtmlHandler getHandler()
           
protected  char[] getSource()
           
protected  boolean isIdentifierChar(char c)
          Returns true when the given character is a valid identifier character
 boolean isValidating()
           
protected  boolean isWhitespaceChar(char c)
          Returns true when the given character is a valid whitespace character
protected  boolean match(char value)
           
protected  boolean match(java.lang.String value)
           
 HtmlHandler parse(java.lang.String html, HtmlHandler handler)
          parses the given HTML code
protected  HtmlAttributes parseAttributes()
          Parses the attributes of a tag
protected  java.lang.String parseAttributeValue()
          Parses an attribute value.
protected  java.lang.String parseIdentifier(boolean namespacePrefix)
          Parses an identifier
protected  void process()
           
protected  void processChars()
          Process CDATA
protected  void processComment()
          Process a Comment
protected  void processElement()
          Valid formats are <_name_attr_=_"value"_[/]> <_name_attr_=_'value'_[/]> <_name_attr_=_value_[/]> <_name_attr_[/]> <_/_name_attr_>
protected  void processEntity()
          Processes a HTML entity.
protected  void processWhitespace()
           
protected  void reset()
          resets the internal state of the parser
 void setValidating(boolean validate)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

entityMapper

private EntityMapper entityMapper
The object to map entities


validate

private boolean validate
This flag tells the parser to check if the document is well formed


source

private char[] source
The documents source code


processed

protected int processed
The position (index) to which the document is processed


pos

protected int pos
The current processing position (index)


handler

private HtmlHandler handler
The handler that implements the callback methods the parser will call during processing of the document


elements

private java.util.Stack elements
The element stack the parser uses to track well formed syntax Stack<String>

Constructor Detail

HtmlParser

public HtmlParser()
Constructor

Method Detail

reset

protected void reset()
resets the internal state of the parser


getHandler

protected HtmlHandler getHandler()

getSource

protected char[] getSource()
Returns:
the source

eos

protected boolean eos()

match

protected boolean match(char value)

match

protected boolean match(java.lang.String value)

parse

public HtmlHandler parse(java.lang.String html,
                         HtmlHandler handler)
parses the given HTML code

Parameters:
html - the HTML code to parse
handler - The handler that implements the callback methods the parser will call during processing of the document
Returns:
returns the handler instance

process

protected void process()

isWhitespaceChar

protected boolean isWhitespaceChar(char c)
Returns true when the given character is a valid whitespace character

Parameters:
c - the character to test
Returns:
boolean

isIdentifierChar

protected boolean isIdentifierChar(char c)
Returns true when the given character is a valid identifier character

Parameters:
c - the character to test
Returns:
boolean

isValidating

public boolean isValidating()
Returns:
the validate

setValidating

public void setValidating(boolean validate)
Parameters:
validate - the validate to set

processEntity

protected void processEntity()
Processes a HTML entity. The entity syntax is:


processChars

protected void processChars()
Process CDATA


processComment

protected void processComment()
Process a Comment


processElement

protected void processElement()
Valid formats are


parseAttributes

protected HtmlAttributes parseAttributes()
Parses the attributes of a tag

Returns:
attribute collection

parseAttributeValue

protected java.lang.String parseAttributeValue()
Parses an attribute value. Valid formats are

Returns:
identifier or null when no identifier could be found

parseIdentifier

protected java.lang.String parseIdentifier(boolean namespacePrefix)
Parses an identifier

Parameters:
namespacePrefix - indicates thate there may bee a namespace prefix
Returns:
identifier or null when no identifier could be found

processWhitespace

protected void processWhitespace()


Copyright © 2000-2005 SCC Informationssysteme GmbH. All Rights Reserved.