Monday, January 1, 2007

02. Clean a Snapshot of Google in Time

Google Zeitgeist provides a weekly, monthly, and yearly overview of what the Web was interested in.

Turning to Google itself for a definition of zeitgeist (define:zeitgeist), there's consensus that it refers to "the spirit of the times." And Google Zeitgeist (http://www.google.com/press/zeitgeist.html) is just that: a mirror that the Web (according to Google) holds up to us, providing a snapshot of the week, month, or year that was.

A typical weekly Google Zeitgeist, shown in Figure 1-8, lists the top 15 gaining queries.

Figure 1-8. The week's top 15 gaining queries

It takes only a few moments of visiting Google Zeitgeist before you're itching to go back a little further in time: the week your second child was born, the month during which the Olympics were held, the year you graduated from high school. Click the Archive link to choose any year from the Google Zeitgeist Archive and display links such as those shown in Figure 1-9 for every week, month, and year since January 2001.

!

Weekly Zeitgeist updates actually started in June 2001, at the same time the monthlies switched from PDF to HTML format. In August 2005, Google stopped listing declining queries and started listing 5 more of the top gaining queries, bringing the total to 15.


Figure 1-9. The Zeitgest Archive pages, displaying weekly, monthly, and year-end reports dating back to 2001

Monthly reports provide some information about Google News queries and Google Image Search queries, and you can find monthly reports for countries around the world by clicking the Zeitgeist Around the World link on the front page. Year-end reports provide even more detail with trend graphs.

While Google Zeitgeist's statistics aren't earth-shattering (e.g., searches for iraq more than doubled on March 19, 2003, the date that Operation Iraqi Freedom beganimagine that!), it does provide a snapshot of what the world in aggregate found interesting enough to look up.

See Also

  • If Google Zeitgeist piques your interest, you might also try the Yahoo! Buzz Index (http://buzz.yahoo.com), a similar collection of statistics around popular Yahoo! Searches: the day's top movers (overall and by various Yahoo! categories), most viewed and emailed Yahoo! news items, and a market trendlike chart (click the View Complete Chart... link associated with any of the buzz listings on the front page) of leaders and movers, according to buzz score (http://help.yahoo.com/help/us/buzz/#buzz-04).

  • Google Trends (http://www.google.com/trends) is a new product from the Google Labs that graphs the mentions of words of phrases over time. Type in two words separated by commas to get a quick visual sense of the popularity. For example, "Google, Yahoo" shows you which search engine is mentioned more across time, regions, news stories, and languages.

01. Browse the Google Directory

Google's Web Search indexes billions of pages, which means it isn't suitable for all searches. When you have a search that you can't narrow downfor example, if you're looking for information on a person about whom you know nothingbillions of pages will get very frustrating very quickly.

But you don't have to limit your searches to the Web. Google also has a searchable subject index, the Google Directory, at http://directory.google.com. Instead of indexing the entirety of billions of pages, the directory describes sites instead, indexing about five million URLs. This makes it a much better search for general topics.

Does Google spend time building a searchable subject index in addition to a full-text index? No, Google bases its directory on the Open Directory Project data at http://dmoz.org/. Unlike the results at the standard Google Web Search, the collection of URLs at the Open Directory Project is gathered and maintained by a group of human volunteers rather than automatic algorithms, but Google does add some of its own Googlish magic to it.

As you can see in Figure 1-6, the front of the site is organized into several topics. To find what you're looking for, you can either do a keyword search or drill down through the hierarchies of subjects.

Figure 1-6. The Google Directory

Beside most listings, as shown in Figure 1-7, you'll see a green bar. The green bar is an approximate indicator of the site's PageRank in the Google search engine. (Not every listing in the Google Directory has a corresponding PageRank in the Google web index.) Web sites are listed in the default order of Google PageRank, but you also have the option to list them in alphabetical order.

Figure 1-7. Individual listings under Science Physics Quantum Mechanics People Feynman, Richard

One thing you'll notice about the Google Directory is how the annotations and other information vary between categories. This is because the information in the directory is maintained by a small army of thousands of volunteers who are each responsible for one or more categories. For the most part, annotation is pretty good.

Searching Versus Browsing

There are two different kinds of shoppers, and they illustrate the difference between searching and browsing. Some shoppers know exactly what they're after, and they want to find a store with the item, locate the item, and purchase it as quickly as possible. As with a web search, it helps to know a bit about what you're looking for if this is your style.

Other shoppers want to explore a particular store, see what the store offers, and choose an item if the right one comes along. This style of browsing is suited for people who want to get a larger survey of items in a particular category before they necessarily know what they're looking for.

If you were interested in looking at sites about child psychology, you might try a search at http://search.google.com with the query child psychology. You would find thousands of sites in the search results, along with news articles about child psychology, college papers about the topic, and even pages that mention the terms child and psychology without relating to the topic. But browsing the Child Psychology category in the Google Directory (http://directory.google.com/Top/Science/Social_Sciences/Psychology/Child_Psychology/) gives you hundreds of links selected by Open Directory volunteers as being relevant to the topic.

There are still times when you need to search the directory, and Google has provided a couple ways to accomplish this.

Searching the Google Directory

Because the Google Directory is a far smaller collection of URLs, ideal for more general searching, it does not have the various complicated special syntaxes for searching that the Web Search does. However, there are a couple of special syntaxes that you should know about:


intitle:

Just like the Google web special syntax, intitle: restricts the query word search to the title of a page.


inurl:

inurl: restricts the query word search to the URL of a page.

When you're searching on Google's web index, your overwhelming concern is probably how to reduce your list of search results to something manageable. With that in mind, you might start with the narrowest possible search.

That's a reasonable strategy for the web index, but because you have a narrower pool of sites in the Google Directory, you want that search to be more general.

For example, say you were looking for information on author P. G. Wodehouse. A simple search on P. G. Wodehouse in Google's web index gets you over 1,100,000 results, possibly compelling you to immediately narrow your search. But doing the same search in the Google Directory returns only 176 results. You might consider that a manageable number of results, or you might want to carefully narrow your results further.

The Directory is also good for searching for events. A Google web search for Korean War will find over 24 million results, while searching the Google Directory will find just over 138,000. This is a case where you will probably need to narrow your search. Use general words indicating what kind of information you wanttimeline, for example, or archives, or lesson plans. Don't narrow your search with names or locations; that's not the best way to use the Google Directory.