Buzzword Soup

Acronym and Buzzword Database

[M.D. 11/9/97]

A word is not a crystal, transparent and unchanging; it is the skin of living thought and changes from day to day as does the air around us. -Oliver Wendell Holmes (quoted in WordaDay 11/8/97).

Have you ever been cruising through a technical article and suddenly been stumped by a term that the author uses without defining? Lots of buzzwords and acronyms are so new that there isn't anywhere to go look them up.

I would like to have a database where people could look up current buzzwords and acronyms and find out where they came from. The queries in this system would be words that the user wants to look up. It would be a type of bibliographic database. It would be more than a simple list. For example, if the user typed in RAID, it would not just come back and say "Redundant Array of Inexpensive Disks" but it would say how it knows: cite an article or publication (name, authors, page number, etc.); quote the actual text of that publication, and ideally provide a hypertext link. Furthermore, it might give a whole long list of other such citations. And it might offer alternative definitions, such as "Random Array of Integrated Disks" and citations to that definition.

There could be many answers to a query. The answers should be given best-first: the definition of RAID that is most authoritative ought to be listed first. Maybe this notion of "authority" would be based on date (recentness) or on the source (BYTE magazine might be more authoritative than People magazine).

Ultimately, I would like the system to read and digest articles on its own. So a user who discovers an article with lots of good buzzwords in it could tell the system where to find the article, and then the data in the article could be extracted automatically and added to the database. It might be wise not to make the database update fully automatic, but to put the new data in a "hold" status until some human (database administrator) reviews it and says "OK." Otherwise malicious users might insert spurious information.

Ultitmately there would be web crawlers which went out looking for specific terms, or looking for likely sources to extract terms from. Existing library database interfaces might be good sources for the crawlers to mine, or Web sites of high-tech companies, etc.

Queries that can't be answered ought to be remembered, so that the system can be watching for them in sources that it reads later.

Ultimately the interface should be an HTML form.

Now, a lot of this is probably beyond the scope of a one-month project. So for purposes of this course, the emphasis would be on the database design and implementation; populating it with enough real data for test purposes (maybe from existing bibliographies or from text lists of acronym definitions, etc.), and demonstrating it with a relatively simply user interface.