A B C D E F G H I K L M N O P Q R S T U V W X

A

A - Static variable in class websphinx.Tag
Commonly useful tag names.
ABBREV - Static variable in class websphinx.Tag
 
ACRONYM - Static variable in class websphinx.Tag
 
addCrawlListener(CrawlListener) - Method in class websphinx.Crawler
Adds a listener to the set of CrawlListeners for this crawler.
addLinkListener(LinkListener) - Method in class websphinx.Crawler
Adds a listener to the set of LinkListeners for this crawler.
ADDRESS - Static variable in class websphinx.Tag
 
allMatches(Region) - Method in class websphinx.Pattern
 
allMatches(String) - Method in class websphinx.Pattern
 
ALREADY_VISITED - Static variable in class websphinx.LinkEvent
Link has already been visited during the crawl, so it was skipped.
APPLET - Static variable in class websphinx.Tag
 
AREA - Static variable in class websphinx.Tag
 

B

B - Static variable in class websphinx.Tag
 
BASE - Static variable in class websphinx.Tag
 
BASEFONT - Static variable in class websphinx.Tag
 
BDO - Static variable in class websphinx.Tag
 
BGSOUND - Static variable in class websphinx.Tag
 
BIG - Static variable in class websphinx.Tag
 
BLINK - Static variable in class websphinx.Tag
 
BLOCKQUOTE - Static variable in class websphinx.Tag
 
BODY - Static variable in class websphinx.Tag
 
BR - Static variable in class websphinx.Tag
 

C

CAPTION - Static variable in class websphinx.Tag
 
CENTER - Static variable in class websphinx.Tag
 
changeAcceptedMIMETypes(String) - Method in class websphinx.DownloadParameters
Change accepted MIME types.
changeCrawlTimeout(int) - Method in class websphinx.DownloadParameters
Change timeout value.
changeDownloadTimeout(int) - Method in class websphinx.DownloadParameters
Change download timeout value.
changeInteractive(boolean) - Method in class websphinx.DownloadParameters
Change interactive flag.
changeMaxPageSize(int) - Method in class websphinx.DownloadParameters
Change maximum page size.
changeMaxThreads(int) - Method in class websphinx.DownloadParameters
Set maximum threads.
changeObeyRobotExclusion(boolean) - Method in class websphinx.DownloadParameters
Change obey-robot-exclusion flag.
changeUseCaches(boolean) - Method in class websphinx.DownloadParameters
Change use-caches flag.
changeUserAgent(String) - Method in class websphinx.DownloadParameters
Change User-agent field used in HTTP requests.
child - Variable in class websphinx.Element
 
CITE - Static variable in class websphinx.Tag
 
clear() - Method in class websphinx.RobotExclusion
Clear the cache of robots.txt entries.
CLEARED - Static variable in class websphinx.CrawlEvent
Crawler's state was cleared.
cleared(CrawlEvent) - Method in interface websphinx.CrawlListener
Notify that the crawler's state was cleared.
cleared(CrawlEvent) - Method in class websphinx.EventLog
Notify that the crawler's state was cleared.
clearVisited() - Method in class websphinx.Crawler
Clear the set of visited links.
clone() - Method in class websphinx.DownloadParameters
Clone a DownloadParameters object.
CODE - Static variable in class websphinx.Tag
 
COL - Static variable in class websphinx.Tag
 
COLGROUP - Static variable in class websphinx.Tag
 
COMMENT - Static variable in class websphinx.Tag
 
countHTMLAttributes() - Method in class websphinx.Tag
Get number of HTML attributes on this tag.
crawled(LinkEvent) - Method in class websphinx.EventLog
Notify that a link event occured.
crawled(LinkEvent) - Method in interface websphinx.LinkListener
Notify that an event occured on a link.
Crawler - class websphinx.Crawler.
Web crawler.
Crawler() - Constructor for class websphinx.Crawler
Make a new Crawler.
CrawlEvent - class websphinx.CrawlEvent.
Crawling event.
CrawlEvent(Crawler, int) - Constructor for class websphinx.CrawlEvent
Make a CrawlEvent.
CrawlListener - interface websphinx.CrawlListener.
Crawl event listener.

D

DD - Static variable in class websphinx.Tag
 
DEL - Static variable in class websphinx.Tag
 
deleteAllTempFiles() - Method in class websphinx.SecurityPolicy
 
DFN - Static variable in class websphinx.Tag
 
DIR - Static variable in class websphinx.Tag
 
disallowed(URL) - Method in class websphinx.RobotExclusion
Check whether a URL is disallowed by robots.txt.
discardContent() - Method in class websphinx.Link
Eliminate all references to page content.
discardContent() - Method in class websphinx.Page
Unlock the page's content (allowing it to be garbage-collected, to save space during a Web crawl).
disconnect() - Method in class websphinx.Link
Disconnect this link from its downloaded page (throwing away the page).
DIV - Static variable in class websphinx.Tag
 
DL - Static variable in class websphinx.Tag
 
dontParse(Page, InputStream) - Method in class websphinx.HTMLParser
Download an input stream without parsing it.
dontParse(Page, Reader) - Method in class websphinx.HTMLParser
Download an input stream without parsing it.
download(HTMLParser) - Method in class websphinx.Page
 
DOWNLOADED - Static variable in class websphinx.LinkEvent
Link has been retrieved
DownloadParameters - class websphinx.DownloadParameters.
Download parameters.
DownloadParameters() - Constructor for class websphinx.DownloadParameters
Make a DownloadParameters object with default settigns.
DT - Static variable in class websphinx.Tag
 

E

Element - class websphinx.Element.
Element in an HTML page.
Element(Tag, int) - Constructor for class websphinx.Element
Make an Element from a start tag and an end position.
Element(Tag, Tag) - Constructor for class websphinx.Element
Make an Element from a start tag and end tag.
EM - Static variable in class websphinx.Tag
 
EMBED - Static variable in class websphinx.Tag
 
end - Variable in class websphinx.Region
 
endTag - Variable in class websphinx.Element
 
enumerateHTMLAttributes() - Method in class websphinx.Element
Enumerate the HTML attributes found on this tag.
enumerateHTMLAttributes() - Method in class websphinx.Tag
Enumerate the HTML attributes found on this tag.
enumerateObjectLabels() - Method in class websphinx.Region
Enumerate the labels of the region.
equals(Object) - Method in class websphinx.Regexp
 
equals(Object) - Method in class websphinx.Tagexp
 
equals(Object) - Method in class websphinx.Wildcard
 
ERROR - Static variable in class websphinx.LinkEvent
An error occurred in retrieving the page.
escape(String) - Static method in class websphinx.Regexp
 
escape(String) - Static method in class websphinx.Wildcard
 
EventLog - class websphinx.EventLog.
Crawling monitor that writes messages to standard output or a file.
EventLog() - Constructor for class websphinx.EventLog
Make a EventLog that writes to standard output.
EventLog(OutputStream) - Constructor for class websphinx.EventLog
Make a EventLog that writes to a stream.
EventLog(String) - Constructor for class websphinx.EventLog
Make a EventLog that writes to a file.
eventName - Static variable in class websphinx.LinkEvent
Map from id code (RETRIEVING) to name ("retrieving")
expand(Page) - Method in class websphinx.Crawler
Expand the crawl from a page.

F

FileToURL(File) - Static method in class websphinx.Link
Convert a local filename to a URL.
findEnd(Region[], int) - Static method in class websphinx.Region
Finds a region that ends at or after a given position.
findNext() - Method in class websphinx.PatternMatcher
 
findStart(Region[], int) - Static method in class websphinx.Region
Finds a region that starts at or after a given position.
FONT - Static variable in class websphinx.Tag
 
Form - class websphinx.Form.
<FORM> element in an HTML page.
FORM - Static variable in class websphinx.Tag
 
Form(Tag, Tag, URL) - Constructor for class websphinx.Form
Make a LinkElement from a start tag and end tag and a base URL (for relative references).
FormButton - class websphinx.FormButton.
Button element in an HTML form -- for example, <INPUT TYPE=submit> or <INPUT TYPE=image>.
FormButton(Tag, Tag, Form) - Constructor for class websphinx.FormButton
Make a LinkElement from a start tag and end tag and its containing form.
found(Region) - Method in class websphinx.Pattern
 
found(String) - Method in class websphinx.Pattern
 
FRAME - Static variable in class websphinx.Tag
 
FRAMESET - Static variable in class websphinx.Tag
 

G

GET - Static variable in class websphinx.Link
Use the HTTP GET method to download this link.
getAcceptedMIMETypes() - Method in class websphinx.DownloadParameters
Get accepted MIME types.
getActiveThreads() - Method in class websphinx.Crawler
Get number of threads currently working.
getBase() - Method in class websphinx.Page
Get the base URL, relative to which the page's links were interpreted.
getChild() - Method in class websphinx.Element
Get element's first child.
getContent() - Method in class websphinx.Page
Get the content of the page.
getContentEncoding() - Method in class websphinx.Page
Get content encoding of page.
getContentType() - Method in class websphinx.Page
Get MIME type of page.
getCrawler() - Method in class websphinx.CrawlEvent
Get crawler that generated the event
getCrawler() - Method in class websphinx.LinkEvent
Get crawler that generated the event
getCrawlTimeout() - Method in class websphinx.DownloadParameters
Get timeout on entire crawl.
getDepth() - Method in class websphinx.Link
Get depth of link in crawl.
getDepth() - Method in class websphinx.Page
Get depth of page in crawl.
getDirectory() - Method in class websphinx.Link
Get the directory part of the link, like "/home/dir/".
getDirectoryURL() - Method in class websphinx.Link
Get the URL of a page's directory.
getDirectoryURL(URL) - Static method in class websphinx.Link
Get the URL of a page's directory.
getDownloadParameters() - Method in class websphinx.Crawler
Get download parameters (such as number of threads, timeouts, maximum page size, etc.)
getDownloadParameters() - Method in class websphinx.Link
Get the download parameters used for this link.
getDownloadTimeout() - Method in class websphinx.DownloadParameters
Get download timeout value.
getElement() - Method in class websphinx.Tag
Get element to which this tag is the start or end tag.
getElements() - Method in class websphinx.Page
Get the HTML elements in the page.
getEnd() - Method in class websphinx.Region
Gets offset after end of region.
getEndTag() - Method in class websphinx.Element
Get end tag.
getException() - Method in class websphinx.LinkEvent
Get exception related to this event.
getExpiration() - Method in class websphinx.Page
Get expiration date of page.
getField(String) - Method in class websphinx.Region
Get a named subregion.
getFieldNames() - Method in class websphinx.Pattern
 
getFieldNames() - Method in class websphinx.Regexp
 
getFields(String) - Method in class websphinx.Region
Get a set of named subregions.
getFile() - Method in class websphinx.Link
Get the information part of the link, like "/home/dir/index.html?query".
getFilename() - Method in class websphinx.Link
Get the filename part of the link, like "index.html".
getForm() - Method in class websphinx.FormButton
Get the form.
getHost() - Method in class websphinx.Link
Get the hostname of the link, like "www.cs.cmu.edu".
getHTMLAttribute(String) - Method in class websphinx.Element
Get an HTML attribute's value.
getHTMLAttribute(String) - Method in class websphinx.Tag
Get an HTML attribute's value.
getHTMLAttribute(String, String) - Method in class websphinx.Element
Get an HTML attribute's value, with a default value if it doesn't exist.
getHTMLAttribute(String, String) - Method in class websphinx.Tag
Get an HTML attribute's value, with a default value if it doesn't exist.
getHTMLAttributes() - Method in class websphinx.Tag
Get all the HTML attributes found on this tag.
getID() - Method in class websphinx.CrawlEvent
Get event id.
getID() - Method in class websphinx.LinkEvent
Get event id
getIgnoreVisitedLinks() - Method in class websphinx.Crawler
Get ignore-visited-links flag.
getInteractive() - Method in class websphinx.DownloadParameters
Get interactive flag.
getLabel(String) - Method in class websphinx.Region
Get a label's value.
getLabel(String, String) - Method in class websphinx.Region
Get a label's value.
getLastModified() - Method in class websphinx.Page
Get last-modified date of page.
getLength() - Method in class websphinx.Region
Gets length of the region.
getLink() - Method in class websphinx.LinkEvent
Get link to which this event occurred.
getLinks() - Method in class websphinx.Page
Get the links found in the page.
getMaxDepth() - Method in class websphinx.Crawler
Get maximum depth.
getMaxPageSize() - Method in class websphinx.DownloadParameters
Get maximum page size.
getMaxThreads() - Method in class websphinx.DownloadParameters
Get maximum threads.
getMethod() - Method in class websphinx.FormButton
Get the method used to access this link.
getMethod() - Method in class websphinx.Form
Get the method used to access this link.
getMethod() - Method in class websphinx.Link
Get the method used to access this link.
getName() - Method in class websphinx.Crawler
Get human-readable name of crawler.
getName() - Method in class websphinx.LinkEvent
Get event name (string equivalent to its ID)
getNext() - Method in class websphinx.Element
Return next element in an inorder walk of the tree, assuming this element and its children have been visited.
getNumericLabel(String, Number) - Method in class websphinx.Region
Get a label's value as a number.
getObeyRobotExclusion() - Method in class websphinx.DownloadParameters
Get obey-robot-exclusion flag.
getObjectLabel(String) - Method in class websphinx.Region
Get an object-valued label.
getObjectLabels() - Method in class websphinx.Region
Get a String containing the labels of the region.
getOnlyNetworkEvents() - Method in class websphinx.EventLog
Test whether logger prints only network-related LinkEvents.
getOrigin() - Method in class websphinx.Page
Get the Link that points to this page.
getPage() - Method in class websphinx.Link
Get the downloaded page to which the link points.
getPagesLeft() - Method in class websphinx.Crawler
Get number of pages left to be visited.
getPagesVisited() - Method in class websphinx.Crawler
Get number of pages visited.
getPageURL() - Method in class websphinx.Link
Get the URL of a page, omitting any anchor reference (like #ref).
getPageURL(URL) - Static method in class websphinx.Link
Get the URL of a page, omitting any anchor reference (like #ref).
getParent() - Method in class websphinx.Element
Get element's parent.
getParentURL() - Method in class websphinx.Link
Get the URL of a page's parent directory.
getParentURL(URL) - Static method in class websphinx.Link
Get the URL of a page's parent directory.
getPolicy() - Static method in class websphinx.SecurityPolicy
 
getPort() - Method in class websphinx.Link
Get the port number of the link.
getPriority() - Method in class websphinx.Link
Get the priority of the link in the crawl.
getProtocol() - Method in class websphinx.Link
Get the network protocol of the link, like "ftp" or "http".
getQuery() - Method in class websphinx.Link
Get the query part of the link, like "?query".
getRef() - Method in class websphinx.Link
Get the anchor reference of the link, like "#ref".
getResponseCode() - Method in class websphinx.Page
Get response code returned by the Web server.
getResponseMessage() - Method in class websphinx.Page
Get response message returned by the Web server.
getRoot() - Method in class websphinx.Crawler
 
getRootElement() - Method in class websphinx.Page
Get the root HTML element of the page.
getRootElement() - Method in class websphinx.Region
Get the root HTML element of the region.
getServiceURL() - Method in class websphinx.Link
Get the URL of a Web service, omitting any query or anchor reference.
getServiceURL(URL) - Static method in class websphinx.Link
Get the URL of a Web service, omitting any query or anchor reference.
getSibling() - Method in class websphinx.Element
Get element's next sibling.
getSource() - Method in class websphinx.Region
Gets page containing the region.
getStart() - Method in class websphinx.Region
Gets starting offset of region in page content.
getStartTag() - Method in class websphinx.Element
Get start tag.
getStatus() - Method in class websphinx.Link
Get the status of the link.
getTagName() - Method in class websphinx.Element
Get tag name.
getTagName() - Method in class websphinx.Tag
Get tag name.
getTags() - Method in class websphinx.Page
Get the tag sequence of the page.
getTemporaryDirectory() - Method in class websphinx.SecurityPolicy
 
getTitle() - Method in class websphinx.Page
Get the title of the page.
getTokens() - Method in class websphinx.Page
Get the token sequence of the page.
getURL() - Method in class websphinx.FormButton
Get the URL.
getURL() - Method in class websphinx.Link
Get the URL.
getURL() - Method in class websphinx.Page
Get the URL.
getUseCaches() - Method in class websphinx.DownloadParameters
Get use-caches flag.
getUserAgent() - Method in class websphinx.DownloadParameters
Get User-agent header used in HTTP requests.
getWords() - Method in class websphinx.Page
Get the words in the page.
groups - Static variable in class websphinx.Pattern
 

H

H1 - Static variable in class websphinx.Tag
 
H2 - Static variable in class websphinx.Tag
 
H3 - Static variable in class websphinx.Tag
 
H4 - Static variable in class websphinx.Tag
 
H5 - Static variable in class websphinx.Tag
 
H6 - Static variable in class websphinx.Tag
 
hasAllLabels(String) - Method in class websphinx.Region
Test if all of several labels are set.
hasAllLabels(String[]) - Method in class websphinx.Region
Test if all of several labels are set.
hasAnyLabels(String) - Method in class websphinx.Region
Test if one or more of several labels are set.
hasAnyLabels(String[]) - Method in class websphinx.Region
Test if one or more of several labels are set.
hasContent() - Method in class websphinx.Page
Test if page content is available.
hasHTMLAttribute(String) - Method in class websphinx.Element
Test if tag has an HTML attribute.
hasHTMLAttribute(String) - Method in class websphinx.Tag
Test if tag has an HTML attribute.
hasLabel(String) - Method in class websphinx.Region
Test if a label is set.
hasMoreElements() - Method in class websphinx.PatternMatcher
 
HEAD - Static variable in class websphinx.Tag
 
HR - Static variable in class websphinx.Tag
 
HTML - Static variable in class websphinx.Tag
 
HTMLParser - class websphinx.HTMLParser.
HTML parser.
HTMLParser() - Constructor for class websphinx.HTMLParser
Make an HTMLParser.
HTMLParser(DownloadParameters) - Constructor for class websphinx.HTMLParser
Make an HTMLParser which retrieves pages using the specified download parameters.

I

I - Static variable in class websphinx.Tag
 
IMG - Static variable in class websphinx.Tag
 
INPUT - Static variable in class websphinx.Tag
 
isBlockTag() - Method in class websphinx.Tag
Test if tag is a block-level tag.
isBodyTag() - Method in class websphinx.Tag
Test if tag belongs in the element.
isEndTag() - Method in class websphinx.Tag
Test if tag is an end tag.
isFlowTag() - Method in class websphinx.Tag
Test if tag is a flow-level tag.
isHeadTag() - Method in class websphinx.Tag
Test if tag belongs in the element.
isHTML() - Method in class websphinx.Page
Test whether page is HTML.
isImage() - Method in class websphinx.Page
 
ISINDEX - Static variable in class websphinx.Tag
 
isParsed() - Method in class websphinx.Page
Test whether page has been parsed.
isStartTag() - Method in class websphinx.Tag
Test if tag is a start tag.

K

KBD - Static variable in class websphinx.Tag
 
keepContent() - Method in class websphinx.Page
Lock the page's content (to prevent it from being discarded).

L

LI - Static variable in class websphinx.Tag
 
Link - class websphinx.Link.
Link to a Web page.
LINK - Static variable in class websphinx.Tag
 
Link(File) - Constructor for class websphinx.Link
Make a Link from a File.
Link(String) - Constructor for class websphinx.Link
Make a Link from a string URL.
Link(Tag, Tag, URL) - Constructor for class websphinx.Link
Make a Link from a start tag and end tag and a base URL (for relative references).
Link(URL) - Constructor for class websphinx.Link
Make a Link from a URL.
LinkEvent - class websphinx.LinkEvent.
Link event.
LinkEvent(Crawler, int, Link) - Constructor for class websphinx.LinkEvent
Make a LinkEvent.
LinkEvent(Crawler, int, Link, Throwable) - Constructor for class websphinx.LinkEvent
Make a LinkEvent for an error.
LinkListener - interface websphinx.LinkListener.
Link event listener.
LISTING - Static variable in class websphinx.Tag
 

M

main(String[]) - Static method in class websphinx.HTMLParser
 
main(String[]) - Static method in class websphinx.Page
 
main(String[]) - Static method in class websphinx.Regexp
 
main(String[]) - Static method in class websphinx.RobotExclusion
 
main(String[]) - Static method in class websphinx.Tagexp
 
main(String[]) - Static method in class websphinx.Wildcard
 
makeDir(File) - Method in class websphinx.SecurityPolicy
 
makeQuery() - Method in class websphinx.Form
Construct the query that would be submitted if the form's SUBMIT button were pressed.
makeQuery(FormButton) - Method in class websphinx.Form
Construct the query that would be submitted if the specified button were pressed.
makeTemporaryFile(String, String) - Method in class websphinx.SecurityPolicy
 
MAP - Static variable in class websphinx.Tag
 
markVisited(Link) - Method in class websphinx.Crawler
Register that a CRC32 value of link's URL has been visited.
MARQUEE - Static variable in class websphinx.Tag
 
match(Region) - Method in class websphinx.Pattern
 
match(Region) - Method in class websphinx.Regexp
 
match(Region) - Method in class websphinx.Tagexp
 
MAX_LENGTH - Static variable in class websphinx.Tag
Length of longest tag name.
MENU - Static variable in class websphinx.Tag
 
META - Static variable in class websphinx.Tag
 
monitor(Crawler) - Static method in class websphinx.EventLog
Create a EventLog that prints to standard error and attach it to a crawler.

N

names - Variable in class websphinx.Region
 
nextElement() - Method in class websphinx.PatternMatcher
 
NEXTID - Static variable in class websphinx.Tag
 
nextMatch() - Method in class websphinx.PatternMatcher
 
NOBR - Static variable in class websphinx.Tag
 
NOEMBED - Static variable in class websphinx.Tag
 
NOFRAMES - Static variable in class websphinx.Tag
 
NONE - Static variable in class websphinx.LinkEvent
No event occured on this link yet.

O

OBJECT - Static variable in class websphinx.Tag
 
OL - Static variable in class websphinx.Tag
 
oneMatch(Region) - Method in class websphinx.Pattern
 
oneMatch(String) - Method in class websphinx.Pattern
 
openConnection(Link) - Method in class websphinx.SecurityPolicy
 
openConnection(URL) - Method in class websphinx.SecurityPolicy
 
OPTION - Static variable in class websphinx.Tag
 

P

P - Static variable in class websphinx.Tag
 
Page - class websphinx.Page.
A Web page.
Page(Link) - Constructor for class websphinx.Page
Make a Page by downloading and parsing a Link.
Page(Link, HTMLParser) - Constructor for class websphinx.Page
Make a Page by downloading a Link.
Page(String) - Constructor for class websphinx.Page
Make a Page from a string of content.
Page(URL, String) - Constructor for class websphinx.Page
Make a Page from a URL and a string of HTML.
Page(URL, String, HTMLParser) - Constructor for class websphinx.Page
Make a Page from a URL and a string of HTML.
PARAM - Static variable in class websphinx.Tag
 
parent - Variable in class websphinx.Element
 
parse(HTMLParser) - Method in class websphinx.Page
Parse the page.
parse(Page, InputStream) - Method in class websphinx.HTMLParser
Parse an input stream.
parse(Page, Reader) - Method in class websphinx.HTMLParser
Parse an input stream.
parse(Page, String) - Method in class websphinx.HTMLParser
Parse a string.
Pattern - class websphinx.Pattern.
Base class for pattern matchers.
Pattern() - Constructor for class websphinx.Pattern
 
PatternMatcher - class websphinx.PatternMatcher.
 
PatternMatcher() - Constructor for class websphinx.PatternMatcher
 
PAUSED - Static variable in class websphinx.CrawlEvent
Crawler was paused.
paused(CrawlEvent) - Method in interface websphinx.CrawlListener
Notify that the crawler was paused.
paused(CrawlEvent) - Method in class websphinx.EventLog
Notify that the crawler paused.
PLAINTEXT - Static variable in class websphinx.Tag
 
POST - Static variable in class websphinx.Link
Use the HTTP POST method to access this link.
PRE - Static variable in class websphinx.Tag
 
printStatus(PrintStream) - Method in class websphinx.Crawler
Print current status

Q

QUEUED - Static variable in class websphinx.LinkEvent
Link was accepted by walk() and is waiting to be downloaded

R

readFile(File) - Method in class websphinx.SecurityPolicy
 
readWriteFile(File) - Method in class websphinx.SecurityPolicy
 
Regexp - class websphinx.Regexp.
 
Regexp(String) - Constructor for class websphinx.Regexp
 
Region - class websphinx.Region.
Region of an HTML page.
Region(Page, int, int) - Constructor for class websphinx.Region
Makes a Region.
Region(Region) - Constructor for class websphinx.Region
Makes a Region by copying another region's parameters.
relativeTo(URL, String) - Static method in class websphinx.Link
 
relativeTo(URL, URL) - Static method in class websphinx.Link
 
removeHTMLAttribute(String) - Method in class websphinx.Tag
Copy this tag, removing an HTML attribute.
removeLabel(String) - Method in class websphinx.Region
Remove a label.
replaceHref(String) - Method in class websphinx.Link
Copy the link's start tag, replacing the URL.
replaceHTMLAttribute(String) - Method in class websphinx.Tag
Copy this tag, setting an HTML attribute's value to TRUE.
replaceHTMLAttribute(String, String) - Method in class websphinx.Tag
Copy this tag, setting an HTML attribute's value.
RETRIEVING - Static variable in class websphinx.LinkEvent
Link is being retrieved
RobotExclusion - class websphinx.RobotExclusion.
 
RobotExclusion(String) - Constructor for class websphinx.RobotExclusion
Make a RobotExclusion object.
run() - Method in class websphinx.Crawler
Start crawling.

S

SAMP - Static variable in class websphinx.Tag
 
SCRIPT - Static variable in class websphinx.Tag
 
SecurityPolicy - class websphinx.SecurityPolicy.
 
SecurityPolicy() - Constructor for class websphinx.SecurityPolicy
 
SELECT - Static variable in class websphinx.Tag
 
sendLinkEvent(Link, int) - Method in class websphinx.Crawler
Send a LinkEvent to all LinkListeners registered with this crawler.
sendLinkEvent(Link, int, Throwable) - Method in class websphinx.Crawler
Send an exceptional LinkEvent to all LinkListeners registered with this crawler.
setContentEncoding(String) - Method in class websphinx.Page
Set content encoding of page.
setContentType(String) - Method in class websphinx.Page
Set MIME type of page.
setDepth(int) - Method in class websphinx.Link
 
setDownloadParameters(DownloadParameters) - Method in class websphinx.Crawler
Set download parameters (such as number of threads, timeouts, maximum page size, etc.)
setDownloadParameters(DownloadParameters) - Method in class websphinx.Link
Set the download parameters used for this link.
setExpiration(long) - Method in class websphinx.Page
Set expiration date of page.
setField(String, Region) - Method in class websphinx.Region
Name a subregion (by setting a label to point to it).
setFields(String, Region[]) - Method in class websphinx.Region
Name a set of subregions (by pointing a label to them).
setHostRoot(String) - Method in class websphinx.Crawler
Set the host name of the root so that the crawler only visits root's family web sites.
setIgnoreVisitedLinks(boolean) - Method in class websphinx.Crawler
Set ignore-visited-links flag.
setLabel(String) - Method in class websphinx.Region
Set a label on the region.
setLabel(String, String) - Method in class websphinx.Region
Set a string-valued label.
setLastModified(long) - Method in class websphinx.Page
Set last-modified date of page.
setMaxDepth(int) - Method in class websphinx.Crawler
Set maximum depth.
setName(String) - Method in class websphinx.Crawler
Set human-readable name of crawler.
setObjectLabel(String, Object) - Method in class websphinx.Region
Set an object-valued label.
setOnlyNetworkEvents(boolean) - Method in class websphinx.EventLog
Set whether logger prints only network-related LinkEvents.
setPage(Page) - Method in class websphinx.Link
Set the page corresponding to this link.
setPriority(float) - Method in class websphinx.Link
Set the priority of the link in the crawl.
setRoot(Link) - Method in class websphinx.Crawler
Set starting point of crawl as a single Link.
setStatus(int) - Method in class websphinx.Link
Set the status of the link.
setText(String) - Method in class websphinx.Link
Set the tagless-text representation of this region.
shouldVisit(Link) - Method in class websphinx.Crawler
Callback for testing whether a link should be traversed.
sibling - Variable in class websphinx.Element
 
SKIPPED - Static variable in class websphinx.LinkEvent
Link was rejected by shouldVisit()
SMALL - Static variable in class websphinx.Tag
 
source - Variable in class websphinx.Region
 
SPACER - Static variable in class websphinx.Tag
 
span(Region) - Method in class websphinx.Region
Makes a new Region containing two regions.
start - Variable in class websphinx.Region
 
STARTED - Static variable in class websphinx.CrawlEvent
Crawler started.
started(CrawlEvent) - Method in interface websphinx.CrawlListener
Notify that the crawler started.
started(CrawlEvent) - Method in class websphinx.EventLog
Notify that the crawler started.
startTag - Variable in class websphinx.Element
 
stop() - Method in class websphinx.Crawler
stop crawling
STOPPED - Static variable in class websphinx.CrawlEvent
Crawler ran out of links to crawl
stopped(CrawlEvent) - Method in interface websphinx.CrawlListener
Notify that the crawler ran out of links to crawl
stopped(CrawlEvent) - Method in class websphinx.EventLog
Notify that the crawler has stopped.
STRIKE - Static variable in class websphinx.Tag
 
STRONG - Static variable in class websphinx.Tag
 
STYLE - Static variable in class websphinx.Tag
 
SUB - Static variable in class websphinx.Tag
 
submit(Link) - Method in class websphinx.Crawler
Puts a link into the crawling queue.
substringCanonicalTags(int, int) - Method in class websphinx.Page
Get canonicalized HTML tags found in a region.
substringContent(int, int) - Method in class websphinx.Page
Get raw content found in a region.
substringHTML(int, int) - Method in class websphinx.Page
Get HTML found in a region.
substringTags(int, int) - Method in class websphinx.Page
Get HTML tags found in a region.
substringText(int, int) - Method in class websphinx.Page
Get tagless text found in a region.
SUP - Static variable in class websphinx.Tag
 

T

TABLE - Static variable in class websphinx.Tag
 
Tag - class websphinx.Tag.
Tag in an HTML page.
Tag(Page, int, int, String, boolean) - Constructor for class websphinx.Tag
Make a Tag.
Tagexp - class websphinx.Tagexp.
Tag pattern.
Tagexp(String) - Constructor for class websphinx.Tagexp
 
TD - Static variable in class websphinx.Tag
 
Text - class websphinx.Text.
Tagless text regions on an HTML page.
Text(Page, int, int, String) - Constructor for class websphinx.Text
Make a Text.
TEXTAREA - Static variable in class websphinx.Tag
 
TH - Static variable in class websphinx.Tag
 
TIMED_OUT - Static variable in class websphinx.CrawlEvent
Crawler timeout expired.
timedOut(CrawlEvent) - Method in interface websphinx.CrawlListener
Notify that the crawler timed out.
timedOut(CrawlEvent) - Method in class websphinx.EventLog
Notify that the crawler timed out.
TITLE - Static variable in class websphinx.Tag
 
toDescription() - Method in class websphinx.Link
Generate a human-readable description of the link.
toDescription() - Method in class websphinx.Page
Generate a human-readable description of the page.
toHTML() - Method in class websphinx.Region
Converts the region to HTML, e.g.
toHTMLAttributeName(String) - Static method in class websphinx.Tag
Convert a String to an HTML attribute name.
TOO_DEEP - Static variable in class websphinx.LinkEvent
Link was accepted by walk() but exceeds the maximum depth from the start set.
toRegexp(String) - Static method in class websphinx.Tagexp
 
toRegexp(String) - Static method in class websphinx.Wildcard
 
toString() - Method in class websphinx.Crawler
Convert the crawler to a String.
toString() - Method in class websphinx.LinkEvent
Convert this event to a String describing it.
toString() - Method in class websphinx.Page
Get page containing the region.
toString() - Method in class websphinx.Pattern
Return a string representation of the pattern.
toString() - Method in class websphinx.Regexp
 
toString() - Method in class websphinx.Region
Gets region as raw content.
toString() - Method in class websphinx.Tagexp
 
toString() - Method in class websphinx.Wildcard
 
toTagName(String) - Static method in class websphinx.Tag
Convert a String to a tag name.
toTags() - Method in class websphinx.Region
Converts the region to HTML tags with no text, e.g.
toText() - Method in class websphinx.Link
Convert the region to tagless text.
toText() - Method in class websphinx.Region
Converts the region to tagless text, e.g.
toText() - Method in class websphinx.Text
Returns the region's tagless text
toURL() - Method in class websphinx.Link
Convert the link's URL to a String
toURL() - Method in class websphinx.Page
Convert the link's URL to a String
toURLDelimiters(String) - Static method in class websphinx.Link
 
TR - Static variable in class websphinx.Tag
 
TRUE - Static variable in class websphinx.Region
Default value for labels set with setLabel (name).
TT - Static variable in class websphinx.Tag
 

U

U - Static variable in class websphinx.Tag
 
UL - Static variable in class websphinx.Tag
 
url - Variable in class websphinx.Link
 
urlFromHref(Tag, URL) - Method in class websphinx.FormButton
Construct the URL for this button, from its start tag and a base URL (for relative references).
urlFromHref(Tag, URL) - Method in class websphinx.Form
Construct the URL for this form, from its start tag and a base URL (for relative references).
urlFromHref(Tag, URL) - Method in class websphinx.Link
Construct the URL for a link element, from its start tag and a base URL (for relative references).
URLToFile(URL) - Static method in class websphinx.Link
Convert a file: URL to a filename appropriate to the current system platform.

V

VAR - Static variable in class websphinx.Tag
 
VISITED - Static variable in class websphinx.LinkEvent
Link has been thoroughly processed by crawler
visited(Link) - Method in class websphinx.Crawler
Test whether the page corresponding to a link has been visited (or queued for visiting).

W

WBR - Static variable in class websphinx.Tag
 
websphinx - package websphinx
 
Wildcard - class websphinx.Wildcard.
Wildcard pattern.
Wildcard(String) - Constructor for class websphinx.Wildcard
 
writeFile(File, boolean) - Method in class websphinx.SecurityPolicy
 

X

XMP - Static variable in class websphinx.Tag
 

A B C D E F G H I K L M N O P Q R S T U V W X