SEARCH ENGINE BASICS
A search engine is a database of resources extracted from the Internet
through an automated "crawling" process. This database is searchable through
user queries.
How does a search engine
work?
Words or phrases you enter in the search box are matched to resources in the
search engine's database that contain your terms. These are then
automatically sorted by their probable relevance and presented with the most
"relevant" sites appearing first.
How search results are
organized
Once a search engine has used your search terms to gather "hits" from
its database, it lists or "ranks" the resulting sites in order of its own
estimation of their relevance. The procedures and factors used to create
this ranking are often company secrets, so understanding exactly why one hit
is listed higher than another is difficult.
The following is a survey of
some of the factors search engines use to automatically sort web sites for
presentation to the user.
Relevance Prediction
Currently, search engines predict relevance based on two sets of factors:
those based on a site's content and those external to the site.
Factors based on a web
site's content
-
Word frequency (How often
search terms occur in a page in relationship to other text)
-
Location of search terms in
the document (Are they in the title? Are they near the top of the page?)
-
Relational clustering (How
many pages in the site contain the search terms?)
-
The site's design (Does it
use frames? How fast does it load?)
Factors external to the site
-
Link popularity -- Sites with
more links pointing to them are prioritized.
-
Click popularity -- Sites
visited more often are prioritized.
-
"Sector" popularity -- Sites
visited by certain demographic or social groups are prioritized (Note:
This system requires user-provided information).
-
Business alliances among
services -- Results from a partner search service are ranked higher.
-
Pay-for-placement rankings --
Site owners pay for high rankings.