FIT
100
Assignment 1:
Searching the Web
(or, Finding what you want, and no more!)
Link
to and read the sections on Search Engine Math and Boolean Searching
at the Search Engine Watch website. You
do not have to read the information that is linked off of each of these pages,
only the content at that URL. Review the
Search Engine Features page to help in your search. There are also two required readings
concerning copyright, the web and fair use.
Copyright will be an important issue to consider as we move into Project
1.
Search
Engine Math:
http://www.searchenginewatch.com/facts/math.html
Boolean
Searching:
http://www.searchenginewatch.com/facts/boolean.html
Copyright,
Public Domain and the Web:
http://www.copyrightwebsite.com/info/publicDomain/publicDomain.asp
Fair Use:
http://www.copyrightwebsite.com/info/fairUse/fairUse.asp
Many of you have done a fair
amount of browsing and searching on the WWW.
But have you ever thought about how and where to search in such a way
that you get only those sites you want and no more? Constructing a search that does exactly that
is very difficult, if not impossible.
However, you can learn to search the Web in a way that brings back a
smaller set of “hits” (web pages that match your requirements), and improve the
chances that these hits are more relevant than not.
So, what exactly IS a Search
Engine and why do I care?
A search engine is really
just a program, or series of programs, that is designed to try and help users
find useful information on the Web. A
search engine consists of several components: a crawler (a program), a query
processor (another program that deals with user requests), a user interface
where a user enters terms for the query processor, and an index database. The basic idea is that a search engine takes
terms that you enter and tries to match those terms with documents out on the
Web that are most relevant.
Seems simple, doesn’t
it? Yes, it seems simple… but
relevance is hard for a program to determine when it doesn’t “know” the person
doing the search. This is an exercise
for you to see both the ease and difficulty of searching for information on the
web.
·
To use basic search strategies in a search engine
and bring back sites with information on a topic.
·
Learn to find the best search method for a
particular search engine.
·
To develop systematic and precise search skills.
Below is a list of some
available search engines. There are many
more search engines out there.
Google: http://www.google.com/
Uses link popularity as a
way to rank a web site. If 50 different
sites link to one other site, this is a good indicator that it is a relevant
page for the topic it covers.
WiseNut: http://www.wisenut.com/
A newer search engine and
possible rival to Google. Wisenut
uses an algorithm similar to Google's PageRank.
All
The Web: http://alltheweb.com/
Also a newer search engine.
Allows searches for MP3s, videos, pictures.
AltaVista: http://av.com/
One of the oldest search engines
around. Allows searches just on images
and other formats. Also has a translate
feature.
Vivisimo!: http://vivisimo.com/
Groups results from a search
into hierarchical sets of categories (they show as folders).
Some search engines use a
directory structure to organize human-indexed web sites by subject. These indexers decide which category is most
appropriate for a web site and place it there:
Yahoo!: http://www.yahoo.com/
Directory setup.
Provides email, news, etc.
List of Search Engines by
function:
http://www.searchenginewatch.com/links/
A useful page with lists of major
and specialized search engines.
More resources on Search
Engines:
The Virtual
Chase
http://www.virtualchase.com/Search_Engines/
1. Go to http://yahoo.com and use the categories/directories
to find the web site on Computational Geometry.
A few of the Yahoo
categories
A. What is the
most logical starting point?
B. What is the URL
of the site you found on Computational Geometry?
C. What is the
full path in the directory to get to the Yahoo! page that lists this site? For example, one path in Yahoo! to get to the
Now, go back to the start
page at Yahoo [just click on the Yahoo! logo at the top left of the page]
and try to search for the same website using the search box at the top of the
page.
D. How did you
construct the search? Write the entire
search box contents exactly as you enter them in the search box.
E. Where did the Computational
Geometry site come up in the first page of results? List the page number and result number.
2. Using a search
engine from the list above, (not the Yahoo Directory), search for
information about the riots that broke out in
A. What search
engine did you use?
B. How did you
construct your search? Write the entire
search box contents exactly as you enter them in the search box.
C. How many sites
pertaining to the riots in
D. Try 2 other
search strategies using the same search engine (see the readings on Boolean and
Math searching if you have questions). List
them both here:
E. Look at the
search that appears to be most effective - in other words, the search that gave
you more sites you considered relevant to the topic in the first 2 pages.
Why
do you believe your results were better with this search?
3. Use a search
engine you haven’t tested yet. Look at
the help sheet at Search Engine
Watch
[ http://www.searchenginewatch.com/facts/ataglance.html ] or find the help page for
the search engine to see what search operators to use.
Now, find a site dedicated to the victims of the terrorist attack in September,
2001.
A. What is the
URL?
B. How did you
construct your search? Write the entire
search box contents exactly as you enter them in the search box.
C. Did you adjust
your search at all during this process?
Show the different searches used to finally arrive at the site you
chose.
·
the search engine used
·
the series of queries you used to find the answer
·
an explanation of your search strategy, identifying which
of the above query refinement strategies you applied and why
·
the URL of the page where you found the answer to the
question
The first question shows you
appropriate answers to guide your work:
A.
IBM produced the first hard disk drive. Find a page on an official IBM web site
describing this disk. When was it made,
what was it called, and how much storage space did it offer?
(1) http://altavista.com
(2) search on term IBM to figure out IBM’s domain name, which is ibm.com
(3) search on terms +site:ibm.com +“hard disk” for pages on IBM’s web
site(s) with phrase “hard disk” in them
(4) refine search on terms +site:ibm.com +"hard disk" +first”
with additional keyword “first” to narrow search answer (“1956, 305 RAMAC, 5
MB”) found on second hit:
http://www.storage.ibm.com/hdd/firsts/n1956.htm
B.
Who is often called the “Father
of Computing” and when did he live?
C.
The ENIAC is often called the first general-purpose
computer, though some question this distinction. In kilograms, how much did ENIAC weigh?
D.
What was ENIAC designed to be used for?
E.
Who is considered the father of the concept of hyptertext?
Images and other files and
content on the Internet are protected in the same way as print materials and
photographs. Use of digital images for
purposes of alteration and display on the Internet has limited coverage under
the conditions of fair use. [http://www.templetons.com/brad/copymyths.html].
Public Domain [http://www.copyrightwebsite.com/info/publicDomain/publicDomain.asp] items are those in which
the copyright has been lost, has expired, or the author of the work makes no
copyright claims to reproductions or enhancements of the work.
When
Do Items Become Public Domain?
If you use an image of a
person for reasons of making a profit, you are responsible for obtaining
permission from the person or their heirs.
If you use a trademark image, you must also get permission.
5. Using the
Search Engine Math you read about, construct a search to find sites that
contain images in the public domain. Use Google for
this first search.
A. How did you
construct your search? Write the entire
search box contents exactly as you enter them in the search box.
6. Do that same
search across in AltaVista and Vivismo as well (or 2
other search engines of your choice).
Compare your top 10 hits.
A. What 2 search
engines are you using?
B. Are the top 10
returns the same in each Search Engine?
C. Do all top 10
sites state, very clearly, that they DO contains images in the Public
Domain-free for anyone to use?
7. Try changing
the search technique and see if you get different results.
A. How did you
construct your new search? Write the
entire search box contents exactly as you enter them in the search box.
B. How do your
results change?
C. Can you say,
without any doubt, that all pages returned contain images within the public
domain? What information from the site
leads you to believe this?
8. Now do a
search for sites that contain images free of copyright.
A. What search
engine are you using?
B. How did you
construct the search? Write the entire
search box contents exactly as you enter them in the search box.
C. Of the top 10
results back, how many sites [give the number] were actually very plainly
“copyright free”? How can you tell?
9. Search for
images of your favorite city on the web.
Alta Vista has a way to just search for image media on the web.
A. Give the URLs
to your 2 favorite images from the list.
B. What other
search engines have this same feature? Find
2 and list them.
10. Find an image
of the
Give the URL of the ORIGINAL image location (not just the search engine page
that listed it).
What is the copyright information about the use of this image on your personal
web site? (You may need to search the site a bit to find this)
11. Search for
images you would like to use in a website of misinformation (Project 1) and
save them. (You will not turn in the URL
for your image, this is just to start you on your image search for the
project.)
NOTE: Make sure that any image
you select is in the Public Domain OR the copyright policy on the site where
you find it states that you are allowed to use it for non-commercial
purposes!!!!! If there is no copyright
policy evident, then try to contact the site owner and explain why you would
like to use the image.
You must provide adequate proof why you are allowed to use and manipulate any
images that you find.