|
|
|
|
|
|
|
|
|
|
|
|
|
|
No
one controls what’s published on
|
|
|
the
WWW ... it is totally decentralized
|
|
To
find out, search engines crawl Web
|
|
|
|
* |
Two parts
|
|
|
|
• |
Crawler visits Web pages building an index
|
|
|
of the content
|
|
|
|
• |
Query processor checks user requests
|
|
|
against the index, reports on known pages
|
|