News Blog
The official blog from the team at Google News
Google News Comes Back For More
Monday, January 25, 2010
Posted by Andy Golding and Kiran Gunda, Software Engineers
If you read news online, you've probably noticed that articles aren't static. They often change over time, to reflect things like typo fixes, shifts in emphasis, new information or corrections of previous mistakes. Sometimes they even switch URLs, or become unavailable after a certain period of time. As a human being, reading at most a few dozen articles a day, this is no big deal.
But if you happen to be, say, a news search engine that crawls hundreds of articles at thousands of sites every minute, this presents a unique set of challenges. How do you balance looking for new content against the need to update older content? How can you make sure the content is fresh, doesn't link to dead pages or display headlines that have been changed by the publisher?
To deal with these issues, Google News has implemented a recrawl feature that allows us to focus on getting the newest articles around while still ensuring that we're displaying the most up-to-date information. From the moment we discover a new article, we'll keep revisiting it looking for changes. Since we've noticed that most changes to articles occur just after they're published, we revisit articles most frequently in the first day after we've found them. In some cases, we'll even revisit articles we had trouble crawling the first time around. After that, we visit them less often. Either way, we try hard to present users with the freshest news. (We bet whoever wrote "
Dewey Defeats Truman
" wishes they had recrawl!)
For readers, this feature is intended to reduce the number of outdated headlines and dead links you might find. And for publishers, rest assured that we'll be back to find your latest stories and updates as soon as we can.
Labels
announcements
30
currently in the news
13
features
43
Google News Blog
153
help for publishers
21
languages and editions
13
looking backward
7
Archive
2016
Sep
May
Apr
2015
Aug
2014
Aug
Feb
2013
Dec
Jun
Mar
2012
Dec
Oct
Sep
May
Mar
Jan
2011
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feb
2010
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feb
Jan
2009
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Nov
Oct
Sep
Aug
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
Feed
Google
on
Follow @google
Follow
Give us feedback in our
Product Forums
.