BIO 190—Scientific Communication I

Finding resources on the World Wide Web

The World Wide Web has been called the “largest library in the world”, but it is also the worst-organized. Like a physical library, new items are always being added, but items are also being removed, and other items are having their contents changed. An advantage of the Web is that all the content is available: you don’t have to go to a single location and read a physical book or journal. A disadvantage is that the content is much harder to find. (For more information about the Web and finding biology resources, check out the assignment from BIO 256).

There are two basic approaches to locating information on the Web. The first is often called a subject directory and the second a keyword index. A subject directory is put together by people, who evaluate each web page and categorize it by subject. There may be a whole team of people involved, as at Yahoo!, or a single person, such as Steve Wolf at CSUBioWeb. A keyword index, such as Google is a searchable list of words that occur in web pages. It is assembled by a computer program, and requires some skill on your part to formulate the search, but it includes far more pages than a subject directory.

The table below will help you to compare them:

Feature Subject Directory Keyword Index
Creator People, who choose to include links to web pages based on their own, sometimes expert, judgment of content. Computer programs, often called “spiders” or “robots”, that go from page to page making an index of all the words.
Arrangement By subject; often hierarchic. Random access.
Ease of finding information if you aren’t familiar with the subject Moderate to high, depending on how good a job the creators do. Low.
Likelihood of including every relevant page Low. High, if you have chosen your search terms well.
Likelihood of every page being relevant Moderate to high, depending on how good a job the creators do. Low; even with good search terms, there will always be irrelevant pages.
Quality of the pages recovered Moderate to high, depending on how good a job the creators do. Low to high; there is no quality control at all.

Metasearch engines provide an alternate approach to keyword indexes. A metasearch engine sends a query to several different keyword indexes and then coordinates the results.

If you are looking for information about a subject you are not familiar with, it is best to start with a subject directory. Yahoo! is still probably the best, although Open Directory Project and Google Web Directory are strong contenders. Don’t use the built-in “search” feature, which does a keyword search on either the subject directory or the entire Web. Instead, look under Science or Health in the subject hierarchy. Try different approaches; sites about a subject might be categorized in more than one place.

One of the big advantages of Yahoo! (by the way, the “!” is part of its name) is that it provides links to other, specialized subject directories, many of which are assembled by individuals with far more expertise in a specific field than anyone at Yahoo!. Go to these for more specific information; it’s like using the bibliography of a review paper.

Once you have run through all the possibilities with subject directories, it’s time to try the keyword indexes or metasearch engines. The key to using these successfully is careful choice of search words. Typing in something like “AIDS” or “cancer” will get hundreds of thousands or millions of “hits”, few of which will be useful. On the other hand, “protease inhibitor” or “lymphoma” will give far fewer hits, but they are more likely to be useful. Sometimes a term will be so obscure that you will get nothing at all.

Most indexes let you enter a phrase in quotes (protease inhibitor would give every page with either word, but “protease inhibitor” would only give pages where they appear as a phrase). Also, a plus sign before a word tells the index that the word must appear on the page, and a minus sign says that it must not (for example, “Vitamin C” +cancer -horoscope would be more likely to give pages about the use of Vitamin C in treating cancer, and less likely to point to pages about dietary supplements for people born under the astrological sign of Cancer).

Virtually all the indexes have “advanced search” options. Most of them work differently, but you will always get better results if you take the time to figure out the advanced search (all of them have help).

Every year, new search engines are introduced, and older ones are changed or sometimes eliminated. Currently, my favorite is Google, but it's nice to have lists of other search engines, to keep track of what’s new and to try alternatives if your favorite isn't giving you the results you need. Here are some places to look:

Citation: Clark, Curtis. 2001. BIO 190 - Finding resources on the World Wide Web. California State Polytechnic University, Pomona, /~jcclark/classes/bio190/web.html.


These are official class materials of BIO 190 as taught at California State Polytechnic University, Pomona, by Curtis Clark. They are subject to change without notice to anyone but students currently enrolled in the class.

Summer Quarter, 2001
© 2001 by Curtis Clark