Package
Class
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
A
A
- Static variable in class websphinx.
Tag
Commonly useful tag names.
ABBREV
- Static variable in class websphinx.
Tag
ACRONYM
- Static variable in class websphinx.
Tag
addCrawlListener(CrawlListener)
- Method in class websphinx.
Crawler
Adds a listener to the set of CrawlListeners for this crawler.
addLinkListener(LinkListener)
- Method in class websphinx.
Crawler
Adds a listener to the set of LinkListeners for this crawler.
ADDRESS
- Static variable in class websphinx.
Tag
allMatches(Region)
- Method in class websphinx.
Pattern
allMatches(String)
- Method in class websphinx.
Pattern
ALREADY_VISITED
- Static variable in class websphinx.
LinkEvent
Link has already been visited during the crawl, so it was skipped.
APPLET
- Static variable in class websphinx.
Tag
AREA
- Static variable in class websphinx.
Tag
B
B
- Static variable in class websphinx.
Tag
BASE
- Static variable in class websphinx.
Tag
BASEFONT
- Static variable in class websphinx.
Tag
BDO
- Static variable in class websphinx.
Tag
BGSOUND
- Static variable in class websphinx.
Tag
BIG
- Static variable in class websphinx.
Tag
BLINK
- Static variable in class websphinx.
Tag
BLOCKQUOTE
- Static variable in class websphinx.
Tag
BODY
- Static variable in class websphinx.
Tag
BR
- Static variable in class websphinx.
Tag
C
CAPTION
- Static variable in class websphinx.
Tag
CENTER
- Static variable in class websphinx.
Tag
changeAcceptedMIMETypes(String)
- Method in class websphinx.
DownloadParameters
Change accepted MIME types.
changeCrawlTimeout(int)
- Method in class websphinx.
DownloadParameters
Change timeout value.
changeDownloadTimeout(int)
- Method in class websphinx.
DownloadParameters
Change download timeout value.
changeInteractive(boolean)
- Method in class websphinx.
DownloadParameters
Change interactive flag.
changeMaxPageSize(int)
- Method in class websphinx.
DownloadParameters
Change maximum page size.
changeMaxThreads(int)
- Method in class websphinx.
DownloadParameters
Set maximum threads.
changeObeyRobotExclusion(boolean)
- Method in class websphinx.
DownloadParameters
Change obey-robot-exclusion flag.
changeUseCaches(boolean)
- Method in class websphinx.
DownloadParameters
Change use-caches flag.
changeUserAgent(String)
- Method in class websphinx.
DownloadParameters
Change User-agent field used in HTTP requests.
child
- Variable in class websphinx.
Element
CITE
- Static variable in class websphinx.
Tag
clear()
- Method in class websphinx.
RobotExclusion
Clear the cache of robots.txt entries.
CLEARED
- Static variable in class websphinx.
CrawlEvent
Crawler's state was cleared.
cleared(CrawlEvent)
- Method in interface websphinx.
CrawlListener
Notify that the crawler's state was cleared.
cleared(CrawlEvent)
- Method in class websphinx.
EventLog
Notify that the crawler's state was cleared.
clearVisited()
- Method in class websphinx.
Crawler
Clear the set of visited links.
clone()
- Method in class websphinx.
DownloadParameters
Clone a DownloadParameters object.
CODE
- Static variable in class websphinx.
Tag
COL
- Static variable in class websphinx.
Tag
COLGROUP
- Static variable in class websphinx.
Tag
COMMENT
- Static variable in class websphinx.
Tag
countHTMLAttributes()
- Method in class websphinx.
Tag
Get number of HTML attributes on this tag.
crawled(LinkEvent)
- Method in class websphinx.
EventLog
Notify that a link event occured.
crawled(LinkEvent)
- Method in interface websphinx.
LinkListener
Notify that an event occured on a link.
Crawler
- class websphinx.
Crawler
.
Web crawler.
Crawler()
- Constructor for class websphinx.
Crawler
Make a new Crawler.
CrawlEvent
- class websphinx.
CrawlEvent
.
Crawling event.
CrawlEvent(Crawler, int)
- Constructor for class websphinx.
CrawlEvent
Make a CrawlEvent.
CrawlListener
- interface websphinx.
CrawlListener
.
Crawl event listener.
D
DD
- Static variable in class websphinx.
Tag
DEL
- Static variable in class websphinx.
Tag
deleteAllTempFiles()
- Method in class websphinx.
SecurityPolicy
DFN
- Static variable in class websphinx.
Tag
DIR
- Static variable in class websphinx.
Tag
disallowed(URL)
- Method in class websphinx.
RobotExclusion
Check whether a URL is disallowed by robots.txt.
discardContent()
- Method in class websphinx.
Link
Eliminate all references to page content.
discardContent()
- Method in class websphinx.
Page
Unlock the page's content (allowing it to be garbage-collected, to save space during a Web crawl).
disconnect()
- Method in class websphinx.
Link
Disconnect this link from its downloaded page (throwing away the page).
DIV
- Static variable in class websphinx.
Tag
DL
- Static variable in class websphinx.
Tag
dontParse(Page, InputStream)
- Method in class websphinx.
HTMLParser
Download an input stream without parsing it.
dontParse(Page, Reader)
- Method in class websphinx.
HTMLParser
Download an input stream without parsing it.
download(HTMLParser)
- Method in class websphinx.
Page
DOWNLOADED
- Static variable in class websphinx.
LinkEvent
Link has been retrieved
DownloadParameters
- class websphinx.
DownloadParameters
.
Download parameters.
DownloadParameters()
- Constructor for class websphinx.
DownloadParameters
Make a DownloadParameters object with default settigns.
DT
- Static variable in class websphinx.
Tag
E
Element
- class websphinx.
Element
.
Element in an HTML page.
Element(Tag, int)
- Constructor for class websphinx.
Element
Make an Element from a start tag and an end position.
Element(Tag, Tag)
- Constructor for class websphinx.
Element
Make an Element from a start tag and end tag.
EM
- Static variable in class websphinx.
Tag
EMBED
- Static variable in class websphinx.
Tag
end
- Variable in class websphinx.
Region
endTag
- Variable in class websphinx.
Element
enumerateHTMLAttributes()
- Method in class websphinx.
Element
Enumerate the HTML attributes found on this tag.
enumerateHTMLAttributes()
- Method in class websphinx.
Tag
Enumerate the HTML attributes found on this tag.
enumerateObjectLabels()
- Method in class websphinx.
Region
Enumerate the labels of the region.
equals(Object)
- Method in class websphinx.
Regexp
equals(Object)
- Method in class websphinx.
Tagexp
equals(Object)
- Method in class websphinx.
Wildcard
ERROR
- Static variable in class websphinx.
LinkEvent
An error occurred in retrieving the page.
escape(String)
- Static method in class websphinx.
Regexp
escape(String)
- Static method in class websphinx.
Wildcard
EventLog
- class websphinx.
EventLog
.
Crawling monitor that writes messages to standard output or a file.
EventLog()
- Constructor for class websphinx.
EventLog
Make a EventLog that writes to standard output.
EventLog(OutputStream)
- Constructor for class websphinx.
EventLog
Make a EventLog that writes to a stream.
EventLog(String)
- Constructor for class websphinx.
EventLog
Make a EventLog that writes to a file.
eventName
- Static variable in class websphinx.
LinkEvent
Map from id code (RETRIEVING) to name ("retrieving")
expand(Page)
- Method in class websphinx.
Crawler
Expand the crawl from a page.
F
FileToURL(File)
- Static method in class websphinx.
Link
Convert a local filename to a URL.
findEnd(Region[], int)
- Static method in class websphinx.
Region
Finds a region that ends at or after a given position.
findNext()
- Method in class websphinx.
PatternMatcher
findStart(Region[], int)
- Static method in class websphinx.
Region
Finds a region that starts at or after a given position.
FONT
- Static variable in class websphinx.
Tag
Form
- class websphinx.
Form
.
<FORM> element in an HTML page.
FORM
- Static variable in class websphinx.
Tag
Form(Tag, Tag, URL)
- Constructor for class websphinx.
Form
Make a LinkElement from a start tag and end tag and a base URL (for relative references).
FormButton
- class websphinx.
FormButton
.
Button element in an HTML form -- for example, <INPUT TYPE=submit> or <INPUT TYPE=image>.
FormButton(Tag, Tag, Form)
- Constructor for class websphinx.
FormButton
Make a LinkElement from a start tag and end tag and its containing form.
found(Region)
- Method in class websphinx.
Pattern
found(String)
- Method in class websphinx.
Pattern
FRAME
- Static variable in class websphinx.
Tag
FRAMESET
- Static variable in class websphinx.
Tag
G
GET
- Static variable in class websphinx.
Link
Use the HTTP GET method to download this link.
getAcceptedMIMETypes()
- Method in class websphinx.
DownloadParameters
Get accepted MIME types.
getActiveThreads()
- Method in class websphinx.
Crawler
Get number of threads currently working.
getBase()
- Method in class websphinx.
Page
Get the base URL, relative to which the page's links were interpreted.
getChild()
- Method in class websphinx.
Element
Get element's first child.
getContent()
- Method in class websphinx.
Page
Get the content of the page.
getContentEncoding()
- Method in class websphinx.
Page
Get content encoding of page.
getContentType()
- Method in class websphinx.
Page
Get MIME type of page.
getCrawler()
- Method in class websphinx.
CrawlEvent
Get crawler that generated the event
getCrawler()
- Method in class websphinx.
LinkEvent
Get crawler that generated the event
getCrawlTimeout()
- Method in class websphinx.
DownloadParameters
Get timeout on entire crawl.
getDepth()
- Method in class websphinx.
Link
Get depth of link in crawl.
getDepth()
- Method in class websphinx.
Page
Get depth of page in crawl.
getDirectory()
- Method in class websphinx.
Link
Get the directory part of the link, like "/home/dir/".
getDirectoryURL()
- Method in class websphinx.
Link
Get the URL of a page's directory.
getDirectoryURL(URL)
- Static method in class websphinx.
Link
Get the URL of a page's directory.
getDownloadParameters()
- Method in class websphinx.
Crawler
Get download parameters (such as number of threads, timeouts, maximum page size, etc.)
getDownloadParameters()
- Method in class websphinx.
Link
Get the download parameters used for this link.
getDownloadTimeout()
- Method in class websphinx.
DownloadParameters
Get download timeout value.
getElement()
- Method in class websphinx.
Tag
Get element to which this tag is the start or end tag.
getElements()
- Method in class websphinx.
Page
Get the HTML elements in the page.
getEnd()
- Method in class websphinx.
Region
Gets offset after end of region.
getEndTag()
- Method in class websphinx.
Element
Get end tag.
getException()
- Method in class websphinx.
LinkEvent
Get exception related to this event.
getExpiration()
- Method in class websphinx.
Page
Get expiration date of page.
getField(String)
- Method in class websphinx.
Region
Get a named subregion.
getFieldNames()
- Method in class websphinx.
Pattern
getFieldNames()
- Method in class websphinx.
Regexp
getFields(String)
- Method in class websphinx.
Region
Get a set of named subregions.
getFile()
- Method in class websphinx.
Link
Get the information part of the link, like "/home/dir/index.html?query".
getFilename()
- Method in class websphinx.
Link
Get the filename part of the link, like "index.html".
getForm()
- Method in class websphinx.
FormButton
Get the form.
getHost()
- Method in class websphinx.
Link
Get the hostname of the link, like "www.cs.cmu.edu".
getHTMLAttribute(String)
- Method in class websphinx.
Element
Get an HTML attribute's value.
getHTMLAttribute(String)
- Method in class websphinx.
Tag
Get an HTML attribute's value.
getHTMLAttribute(String, String)
- Method in class websphinx.
Element
Get an HTML attribute's value, with a default value if it doesn't exist.
getHTMLAttribute(String, String)
- Method in class websphinx.
Tag
Get an HTML attribute's value, with a default value if it doesn't exist.
getHTMLAttributes()
- Method in class websphinx.
Tag
Get all the HTML attributes found on this tag.
getID()
- Method in class websphinx.
CrawlEvent
Get event id.
getID()
- Method in class websphinx.
LinkEvent
Get event id
getIgnoreVisitedLinks()
- Method in class websphinx.
Crawler
Get ignore-visited-links flag.
getInteractive()
- Method in class websphinx.
DownloadParameters
Get interactive flag.
getLabel(String)
- Method in class websphinx.
Region
Get a label's value.
getLabel(String, String)
- Method in class websphinx.
Region
Get a label's value.
getLastModified()
- Method in class websphinx.
Page
Get last-modified date of page.
getLength()
- Method in class websphinx.
Region
Gets length of the region.
getLink()
- Method in class websphinx.
LinkEvent
Get link to which this event occurred.
getLinks()
- Method in class websphinx.
Page
Get the links found in the page.
getMaxDepth()
- Method in class websphinx.
Crawler
Get maximum depth.
getMaxPageSize()
- Method in class websphinx.
DownloadParameters
Get maximum page size.
getMaxThreads()
- Method in class websphinx.
DownloadParameters
Get maximum threads.
getMethod()
- Method in class websphinx.
FormButton
Get the method used to access this link.
getMethod()
- Method in class websphinx.
Form
Get the method used to access this link.
getMethod()
- Method in class websphinx.
Link
Get the method used to access this link.
getName()
- Method in class websphinx.
Crawler
Get human-readable name of crawler.
getName()
- Method in class websphinx.
LinkEvent
Get event name (string equivalent to its ID)
getNext()
- Method in class websphinx.
Element
Return next element in an inorder walk of the tree, assuming this element and its children have been visited.
getNumericLabel(String, Number)
- Method in class websphinx.
Region
Get a label's value as a number.
getObeyRobotExclusion()
- Method in class websphinx.
DownloadParameters
Get obey-robot-exclusion flag.
getObjectLabel(String)
- Method in class websphinx.
Region
Get an object-valued label.
getObjectLabels()
- Method in class websphinx.
Region
Get a String containing the labels of the region.
getOnlyNetworkEvents()
- Method in class websphinx.
EventLog
Test whether logger prints only network-related LinkEvents.
getOrigin()
- Method in class websphinx.
Page
Get the Link that points to this page.
getPage()
- Method in class websphinx.
Link
Get the downloaded page to which the link points.
getPagesLeft()
- Method in class websphinx.
Crawler
Get number of pages left to be visited.
getPagesVisited()
- Method in class websphinx.
Crawler
Get number of pages visited.
getPageURL()
- Method in class websphinx.
Link
Get the URL of a page, omitting any anchor reference (like #ref).
getPageURL(URL)
- Static method in class websphinx.
Link
Get the URL of a page, omitting any anchor reference (like #ref).
getParent()
- Method in class websphinx.
Element
Get element's parent.
getParentURL()
- Method in class websphinx.
Link
Get the URL of a page's parent directory.
getParentURL(URL)
- Static method in class websphinx.
Link
Get the URL of a page's parent directory.
getPolicy()
- Static method in class websphinx.
SecurityPolicy
getPort()
- Method in class websphinx.
Link
Get the port number of the link.
getPriority()
- Method in class websphinx.
Link
Get the priority of the link in the crawl.
getProtocol()
- Method in class websphinx.
Link
Get the network protocol of the link, like "ftp" or "http".
getQuery()
- Method in class websphinx.
Link
Get the query part of the link, like "?query".
getRef()
- Method in class websphinx.
Link
Get the anchor reference of the link, like "#ref".
getResponseCode()
- Method in class websphinx.
Page
Get response code returned by the Web server.
getResponseMessage()
- Method in class websphinx.
Page
Get response message returned by the Web server.
getRoot()
- Method in class websphinx.
Crawler
getRootElement()
- Method in class websphinx.
Page
Get the root HTML element of the page.
getRootElement()
- Method in class websphinx.
Region
Get the root HTML element of the region.
getServiceURL()
- Method in class websphinx.
Link
Get the URL of a Web service, omitting any query or anchor reference.
getServiceURL(URL)
- Static method in class websphinx.
Link
Get the URL of a Web service, omitting any query or anchor reference.
getSibling()
- Method in class websphinx.
Element
Get element's next sibling.
getSource()
- Method in class websphinx.
Region
Gets page containing the region.
getStart()
- Method in class websphinx.
Region
Gets starting offset of region in page content.
getStartTag()
- Method in class websphinx.
Element
Get start tag.
getStatus()
- Method in class websphinx.
Link
Get the status of the link.
getTagName()
- Method in class websphinx.
Element
Get tag name.
getTagName()
- Method in class websphinx.
Tag
Get tag name.
getTags()
- Method in class websphinx.
Page
Get the tag sequence of the page.
getTemporaryDirectory()
- Method in class websphinx.
SecurityPolicy
getTitle()
- Method in class websphinx.
Page
Get the title of the page.
getTokens()
- Method in class websphinx.
Page
Get the token sequence of the page.
getURL()
- Method in class websphinx.
FormButton
Get the URL.
getURL()
- Method in class websphinx.
Link
Get the URL.
getURL()
- Method in class websphinx.
Page
Get the URL.
getUseCaches()
- Method in class websphinx.
DownloadParameters
Get use-caches flag.
getUserAgent()
- Method in class websphinx.
DownloadParameters
Get User-agent header used in HTTP requests.
getWords()
- Method in class websphinx.
Page
Get the words in the page.
groups
- Static variable in class websphinx.
Pattern
H
H1
- Static variable in class websphinx.
Tag
H2
- Static variable in class websphinx.
Tag
H3
- Static variable in class websphinx.
Tag
H4
- Static variable in class websphinx.
Tag
H5
- Static variable in class websphinx.
Tag
H6
- Static variable in class websphinx.
Tag
hasAllLabels(String)
- Method in class websphinx.
Region
Test if all of several labels are set.
hasAllLabels(String[])
- Method in class websphinx.
Region
Test if all of several labels are set.
hasAnyLabels(String)
- Method in class websphinx.
Region
Test if one or more of several labels are set.
hasAnyLabels(String[])
- Method in class websphinx.
Region
Test if one or more of several labels are set.
hasContent()
- Method in class websphinx.
Page
Test if page content is available.
hasHTMLAttribute(String)
- Method in class websphinx.
Element
Test if tag has an HTML attribute.
hasHTMLAttribute(String)
- Method in class websphinx.
Tag
Test if tag has an HTML attribute.
hasLabel(String)
- Method in class websphinx.
Region
Test if a label is set.
hasMoreElements()
- Method in class websphinx.
PatternMatcher
HEAD
- Static variable in class websphinx.
Tag
HR
- Static variable in class websphinx.
Tag
HTML
- Static variable in class websphinx.
Tag
HTMLParser
- class websphinx.
HTMLParser
.
HTML parser.
HTMLParser()
- Constructor for class websphinx.
HTMLParser
Make an HTMLParser.
HTMLParser(DownloadParameters)
- Constructor for class websphinx.
HTMLParser
Make an HTMLParser which retrieves pages using the specified download parameters.
I
I
- Static variable in class websphinx.
Tag
IMG
- Static variable in class websphinx.
Tag
INPUT
- Static variable in class websphinx.
Tag
isBlockTag()
- Method in class websphinx.
Tag
Test if tag is a block-level tag.
isBodyTag()
- Method in class websphinx.
Tag
Test if tag belongs in the element.
isEndTag()
- Method in class websphinx.
Tag
Test if tag is an end tag.
isFlowTag()
- Method in class websphinx.
Tag
Test if tag is a flow-level tag.
isHeadTag()
- Method in class websphinx.
Tag
Test if tag belongs in the element.
isHTML()
- Method in class websphinx.
Page
Test whether page is HTML.
isImage()
- Method in class websphinx.
Page
ISINDEX
- Static variable in class websphinx.
Tag
isParsed()
- Method in class websphinx.
Page
Test whether page has been parsed.
isStartTag()
- Method in class websphinx.
Tag
Test if tag is a start tag.
K
KBD
- Static variable in class websphinx.
Tag
keepContent()
- Method in class websphinx.
Page
Lock the page's content (to prevent it from being discarded).
L
LI
- Static variable in class websphinx.
Tag
Link
- class websphinx.
Link
.
Link to a Web page.
LINK
- Static variable in class websphinx.
Tag
Link(File)
- Constructor for class websphinx.
Link
Make a Link from a File.
Link(String)
- Constructor for class websphinx.
Link
Make a Link from a string URL.
Link(Tag, Tag, URL)
- Constructor for class websphinx.
Link
Make a Link from a start tag and end tag and a base URL (for relative references).
Link(URL)
- Constructor for class websphinx.
Link
Make a Link from a URL.
LinkEvent
- class websphinx.
LinkEvent
.
Link event.
LinkEvent(Crawler, int, Link)
- Constructor for class websphinx.
LinkEvent
Make a LinkEvent.
LinkEvent(Crawler, int, Link, Throwable)
- Constructor for class websphinx.
LinkEvent
Make a LinkEvent for an error.
LinkListener
- interface websphinx.
LinkListener
.
Link event listener.
LISTING
- Static variable in class websphinx.
Tag
M
main(String[])
- Static method in class websphinx.
HTMLParser
main(String[])
- Static method in class websphinx.
Page
main(String[])
- Static method in class websphinx.
Regexp
main(String[])
- Static method in class websphinx.
RobotExclusion
main(String[])
- Static method in class websphinx.
Tagexp
main(String[])
- Static method in class websphinx.
Wildcard
makeDir(File)
- Method in class websphinx.
SecurityPolicy
makeQuery()
- Method in class websphinx.
Form
Construct the query that would be submitted if the form's SUBMIT button were pressed.
makeQuery(FormButton)
- Method in class websphinx.
Form
Construct the query that would be submitted if the specified button were pressed.
makeTemporaryFile(String, String)
- Method in class websphinx.
SecurityPolicy
MAP
- Static variable in class websphinx.
Tag
markVisited(Link)
- Method in class websphinx.
Crawler
Register that a CRC32 value of link's URL has been visited.
MARQUEE
- Static variable in class websphinx.
Tag
match(Region)
- Method in class websphinx.
Pattern
match(Region)
- Method in class websphinx.
Regexp
match(Region)
- Method in class websphinx.
Tagexp
MAX_LENGTH
- Static variable in class websphinx.
Tag
Length of longest tag name.
MENU
- Static variable in class websphinx.
Tag
META
- Static variable in class websphinx.
Tag
monitor(Crawler)
- Static method in class websphinx.
EventLog
Create a EventLog that prints to standard error and attach it to a crawler.
N
names
- Variable in class websphinx.
Region
nextElement()
- Method in class websphinx.
PatternMatcher
NEXTID
- Static variable in class websphinx.
Tag
nextMatch()
- Method in class websphinx.
PatternMatcher
NOBR
- Static variable in class websphinx.
Tag
NOEMBED
- Static variable in class websphinx.
Tag
NOFRAMES
- Static variable in class websphinx.
Tag
NONE
- Static variable in class websphinx.
LinkEvent
No event occured on this link yet.
O
OBJECT
- Static variable in class websphinx.
Tag
OL
- Static variable in class websphinx.
Tag
oneMatch(Region)
- Method in class websphinx.
Pattern
oneMatch(String)
- Method in class websphinx.
Pattern
openConnection(Link)
- Method in class websphinx.
SecurityPolicy
openConnection(URL)
- Method in class websphinx.
SecurityPolicy
OPTION
- Static variable in class websphinx.
Tag
P
P
- Static variable in class websphinx.
Tag
Page
- class websphinx.
Page
.
A Web page.
Page(Link)
- Constructor for class websphinx.
Page
Make a Page by downloading and parsing a Link.
Page(Link, HTMLParser)
- Constructor for class websphinx.
Page
Make a Page by downloading a Link.
Page(String)
- Constructor for class websphinx.
Page
Make a Page from a string of content.
Page(URL, String)
- Constructor for class websphinx.
Page
Make a Page from a URL and a string of HTML.
Page(URL, String, HTMLParser)
- Constructor for class websphinx.
Page
Make a Page from a URL and a string of HTML.
PARAM
- Static variable in class websphinx.
Tag
parent
- Variable in class websphinx.
Element
parse(HTMLParser)
- Method in class websphinx.
Page
Parse the page.
parse(Page, InputStream)
- Method in class websphinx.
HTMLParser
Parse an input stream.
parse(Page, Reader)
- Method in class websphinx.
HTMLParser
Parse an input stream.
parse(Page, String)
- Method in class websphinx.
HTMLParser
Parse a string.
Pattern
- class websphinx.
Pattern
.
Base class for pattern matchers.
Pattern()
- Constructor for class websphinx.
Pattern
PatternMatcher
- class websphinx.
PatternMatcher
.
PatternMatcher()
- Constructor for class websphinx.
PatternMatcher
PAUSED
- Static variable in class websphinx.
CrawlEvent
Crawler was paused.
paused(CrawlEvent)
- Method in interface websphinx.
CrawlListener
Notify that the crawler was paused.
paused(CrawlEvent)
- Method in class websphinx.
EventLog
Notify that the crawler paused.
PLAINTEXT
- Static variable in class websphinx.
Tag
POST
- Static variable in class websphinx.
Link
Use the HTTP POST method to access this link.
PRE
- Static variable in class websphinx.
Tag
printStatus(PrintStream)
- Method in class websphinx.
Crawler
Print current status
Q
QUEUED
- Static variable in class websphinx.
LinkEvent
Link was accepted by walk() and is waiting to be downloaded
R
readFile(File)
- Method in class websphinx.
SecurityPolicy
readWriteFile(File)
- Method in class websphinx.
SecurityPolicy
Regexp
- class websphinx.
Regexp
.
Regexp(String)
- Constructor for class websphinx.
Regexp
Region
- class websphinx.
Region
.
Region of an HTML page.
Region(Page, int, int)
- Constructor for class websphinx.
Region
Makes a Region.
Region(Region)
- Constructor for class websphinx.
Region
Makes a Region by copying another region's parameters.
relativeTo(URL, String)
- Static method in class websphinx.
Link
relativeTo(URL, URL)
- Static method in class websphinx.
Link
removeHTMLAttribute(String)
- Method in class websphinx.
Tag
Copy this tag, removing an HTML attribute.
removeLabel(String)
- Method in class websphinx.
Region
Remove a label.
replaceHref(String)
- Method in class websphinx.
Link
Copy the link's start tag, replacing the URL.
replaceHTMLAttribute(String)
- Method in class websphinx.
Tag
Copy this tag, setting an HTML attribute's value to TRUE.
replaceHTMLAttribute(String, String)
- Method in class websphinx.
Tag
Copy this tag, setting an HTML attribute's value.
RETRIEVING
- Static variable in class websphinx.
LinkEvent
Link is being retrieved
RobotExclusion
- class websphinx.
RobotExclusion
.
RobotExclusion(String)
- Constructor for class websphinx.
RobotExclusion
Make a RobotExclusion object.
run()
- Method in class websphinx.
Crawler
Start crawling.
S
SAMP
- Static variable in class websphinx.
Tag
SCRIPT
- Static variable in class websphinx.
Tag
SecurityPolicy
- class websphinx.
SecurityPolicy
.
SecurityPolicy()
- Constructor for class websphinx.
SecurityPolicy
SELECT
- Static variable in class websphinx.
Tag
sendLinkEvent(Link, int)
- Method in class websphinx.
Crawler
Send a LinkEvent to all LinkListeners registered with this crawler.
sendLinkEvent(Link, int, Throwable)
- Method in class websphinx.
Crawler
Send an exceptional LinkEvent to all LinkListeners registered with this crawler.
setContentEncoding(String)
- Method in class websphinx.
Page
Set content encoding of page.
setContentType(String)
- Method in class websphinx.
Page
Set MIME type of page.
setDepth(int)
- Method in class websphinx.
Link
setDownloadParameters(DownloadParameters)
- Method in class websphinx.
Crawler
Set download parameters (such as number of threads, timeouts, maximum page size, etc.)
setDownloadParameters(DownloadParameters)
- Method in class websphinx.
Link
Set the download parameters used for this link.
setExpiration(long)
- Method in class websphinx.
Page
Set expiration date of page.
setField(String, Region)
- Method in class websphinx.
Region
Name a subregion (by setting a label to point to it).
setFields(String, Region[])
- Method in class websphinx.
Region
Name a set of subregions (by pointing a label to them).
setHostRoot(String)
- Method in class websphinx.
Crawler
Set the host name of the root so that the crawler only visits root's family web sites.
setIgnoreVisitedLinks(boolean)
- Method in class websphinx.
Crawler
Set ignore-visited-links flag.
setLabel(String)
- Method in class websphinx.
Region
Set a label on the region.
setLabel(String, String)
- Method in class websphinx.
Region
Set a string-valued label.
setLastModified(long)
- Method in class websphinx.
Page
Set last-modified date of page.
setMaxDepth(int)
- Method in class websphinx.
Crawler
Set maximum depth.
setName(String)
- Method in class websphinx.
Crawler
Set human-readable name of crawler.
setObjectLabel(String, Object)
- Method in class websphinx.
Region
Set an object-valued label.
setOnlyNetworkEvents(boolean)
- Method in class websphinx.
EventLog
Set whether logger prints only network-related LinkEvents.
setPage(Page)
- Method in class websphinx.
Link
Set the page corresponding to this link.
setPriority(float)
- Method in class websphinx.
Link
Set the priority of the link in the crawl.
setRoot(Link)
- Method in class websphinx.
Crawler
Set starting point of crawl as a single Link.
setStatus(int)
- Method in class websphinx.
Link
Set the status of the link.
setText(String)
- Method in class websphinx.
Link
Set the tagless-text representation of this region.
shouldVisit(Link)
- Method in class websphinx.
Crawler
Callback for testing whether a link should be traversed.
sibling
- Variable in class websphinx.
Element
SKIPPED
- Static variable in class websphinx.
LinkEvent
Link was rejected by shouldVisit()
SMALL
- Static variable in class websphinx.
Tag
source
- Variable in class websphinx.
Region
SPACER
- Static variable in class websphinx.
Tag
span(Region)
- Method in class websphinx.
Region
Makes a new Region containing two regions.
start
- Variable in class websphinx.
Region
STARTED
- Static variable in class websphinx.
CrawlEvent
Crawler started.
started(CrawlEvent)
- Method in interface websphinx.
CrawlListener
Notify that the crawler started.
started(CrawlEvent)
- Method in class websphinx.
EventLog
Notify that the crawler started.
startTag
- Variable in class websphinx.
Element
stop()
- Method in class websphinx.
Crawler
stop crawling
STOPPED
- Static variable in class websphinx.
CrawlEvent
Crawler ran out of links to crawl
stopped(CrawlEvent)
- Method in interface websphinx.
CrawlListener
Notify that the crawler ran out of links to crawl
stopped(CrawlEvent)
- Method in class websphinx.
EventLog
Notify that the crawler has stopped.
STRIKE
- Static variable in class websphinx.
Tag
STRONG
- Static variable in class websphinx.
Tag
STYLE
- Static variable in class websphinx.
Tag
SUB
- Static variable in class websphinx.
Tag
submit(Link)
- Method in class websphinx.
Crawler
Puts a link into the crawling queue.
substringCanonicalTags(int, int)
- Method in class websphinx.
Page
Get canonicalized HTML tags found in a region.
substringContent(int, int)
- Method in class websphinx.
Page
Get raw content found in a region.
substringHTML(int, int)
- Method in class websphinx.
Page
Get HTML found in a region.
substringTags(int, int)
- Method in class websphinx.
Page
Get HTML tags found in a region.
substringText(int, int)
- Method in class websphinx.
Page
Get tagless text found in a region.
SUP
- Static variable in class websphinx.
Tag
T
TABLE
- Static variable in class websphinx.
Tag
Tag
- class websphinx.
Tag
.
Tag in an HTML page.
Tag(Page, int, int, String, boolean)
- Constructor for class websphinx.
Tag
Make a Tag.
Tagexp
- class websphinx.
Tagexp
.
Tag pattern.
Tagexp(String)
- Constructor for class websphinx.
Tagexp
TD
- Static variable in class websphinx.
Tag
Text
- class websphinx.
Text
.
Tagless text regions on an HTML page.
Text(Page, int, int, String)
- Constructor for class websphinx.
Text
Make a Text.
TEXTAREA
- Static variable in class websphinx.
Tag
TH
- Static variable in class websphinx.
Tag
TIMED_OUT
- Static variable in class websphinx.
CrawlEvent
Crawler timeout expired.
timedOut(CrawlEvent)
- Method in interface websphinx.
CrawlListener
Notify that the crawler timed out.
timedOut(CrawlEvent)
- Method in class websphinx.
EventLog
Notify that the crawler timed out.
TITLE
- Static variable in class websphinx.
Tag
toDescription()
- Method in class websphinx.
Link
Generate a human-readable description of the link.
toDescription()
- Method in class websphinx.
Page
Generate a human-readable description of the page.
toHTML()
- Method in class websphinx.
Region
Converts the region to HTML, e.g.
toHTMLAttributeName(String)
- Static method in class websphinx.
Tag
Convert a String to an HTML attribute name.
TOO_DEEP
- Static variable in class websphinx.
LinkEvent
Link was accepted by walk() but exceeds the maximum depth from the start set.
toRegexp(String)
- Static method in class websphinx.
Tagexp
toRegexp(String)
- Static method in class websphinx.
Wildcard
toString()
- Method in class websphinx.
Crawler
Convert the crawler to a String.
toString()
- Method in class websphinx.
LinkEvent
Convert this event to a String describing it.
toString()
- Method in class websphinx.
Page
Get page containing the region.
toString()
- Method in class websphinx.
Pattern
Return a string representation of the pattern.
toString()
- Method in class websphinx.
Regexp
toString()
- Method in class websphinx.
Region
Gets region as raw content.
toString()
- Method in class websphinx.
Tagexp
toString()
- Method in class websphinx.
Wildcard
toTagName(String)
- Static method in class websphinx.
Tag
Convert a String to a tag name.
toTags()
- Method in class websphinx.
Region
Converts the region to HTML tags with no text, e.g.
toText()
- Method in class websphinx.
Link
Convert the region to tagless text.
toText()
- Method in class websphinx.
Region
Converts the region to tagless text, e.g.
toText()
- Method in class websphinx.
Text
Returns the region's tagless text
toURL()
- Method in class websphinx.
Link
Convert the link's URL to a String
toURL()
- Method in class websphinx.
Page
Convert the link's URL to a String
toURLDelimiters(String)
- Static method in class websphinx.
Link
TR
- Static variable in class websphinx.
Tag
TRUE
- Static variable in class websphinx.
Region
Default value for labels set with setLabel (name).
TT
- Static variable in class websphinx.
Tag
U
U
- Static variable in class websphinx.
Tag
UL
- Static variable in class websphinx.
Tag
url
- Variable in class websphinx.
Link
urlFromHref(Tag, URL)
- Method in class websphinx.
FormButton
Construct the URL for this button, from its start tag and a base URL (for relative references).
urlFromHref(Tag, URL)
- Method in class websphinx.
Form
Construct the URL for this form, from its start tag and a base URL (for relative references).
urlFromHref(Tag, URL)
- Method in class websphinx.
Link
Construct the URL for a link element, from its start tag and a base URL (for relative references).
URLToFile(URL)
- Static method in class websphinx.
Link
Convert a file: URL to a filename appropriate to the current system platform.
V
VAR
- Static variable in class websphinx.
Tag
VISITED
- Static variable in class websphinx.
LinkEvent
Link has been thoroughly processed by crawler
visited(Link)
- Method in class websphinx.
Crawler
Test whether the page corresponding to a link has been visited (or queued for visiting).
W
WBR
- Static variable in class websphinx.
Tag
websphinx
- package websphinx
Wildcard
- class websphinx.
Wildcard
.
Wildcard pattern.
Wildcard(String)
- Constructor for class websphinx.
Wildcard
writeFile(File, boolean)
- Method in class websphinx.
SecurityPolicy
X
XMP
- Static variable in class websphinx.
Tag
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Package
Class
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes