websphinx
Class HTMLParser

java.lang.Object
  |
  +--websphinx.HTMLParser

public class HTMLParser
extends java.lang.Object

HTML parser. Parses an input stream or String and converts it to a sequence of Tags and a tree of Elements. HTMLParser is used by Page to parse pages.


Constructor Summary
HTMLParser()
          Make an HTMLParser.
HTMLParser(websphinx.DownloadParameters dp)
          Make an HTMLParser which retrieves pages using the specified download parameters.
 
Method Summary
 void dontParse(websphinx.Page page, java.io.InputStream stream)
          Download an input stream without parsing it.
 void dontParse(websphinx.Page page, java.io.Reader stream)
          Download an input stream without parsing it.
static void main(java.lang.String[] args)
           
 void parse(websphinx.Page page, java.io.InputStream stream)
          Parse an input stream.
 void parse(websphinx.Page page, java.io.Reader stream)
          Parse an input stream.
 void parse(websphinx.Page page, java.lang.String content)
          Parse a string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLParser

public HTMLParser()
Make an HTMLParser.


HTMLParser

public HTMLParser(websphinx.DownloadParameters dp)
Make an HTMLParser which retrieves pages using the specified download parameters. Pages larger than dp.getMaxPageSize() are rejected by parse() with an IOException.

Parameters:
dp - download parameters used during parsing
Method Detail

parse

public void parse(websphinx.Page page,
                  java.io.InputStream stream)
           throws java.io.IOException
Parse an input stream.

Parameters:
page - Page to receive parsed HTML
java.io.IOException

parse

public void parse(websphinx.Page page,
                  java.io.Reader stream)
           throws java.io.IOException
Parse an input stream.

Parameters:
page - Page to receive parsed HTML
java.io.IOException

parse

public void parse(websphinx.Page page,
                  java.lang.String content)
           throws java.io.IOException
Parse a string.

Parameters:
page - Page to receive parsed HTML
content - String containing HTML
java.io.IOException

dontParse

public void dontParse(websphinx.Page page,
                      java.io.InputStream stream)
               throws java.io.IOException
Download an input stream without parsing it.

Parameters:
page - Page to receive the downloaded content
java.io.IOException

dontParse

public void dontParse(websphinx.Page page,
                      java.io.Reader stream)
               throws java.io.IOException
Download an input stream without parsing it.

Parameters:
page - Page to receive the downloaded content
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
java.lang.Exception