Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Newspaper Archive (news.google.com)
124 points by jervisfm on March 16, 2014 | hide | past | favorite | 26 comments


I've done a lot of genealogy work for my family name and have used http://newspaperarchive.com/ extensively. In comparing a few quick searches, the Google Newspaper Archive is not even comparable. We're talking 2 irrelevant hits for Google Newspaper Archive vs. thousands of relevant hits for newspaperarchive.com.

And newspaperarchive.com only has a fraction of the newspapers in the country within their records. There is definitely a lot of room for improvement in this space because it's such a large task.


You know you are comparing apples to oranges? Google's archive is free and open.


You know you are comparing apples to oranges? Google's archive is free and open.

I'm not comparing Apples and Oranges. At worst what I'm doing is comparing store-bought Oranges in very good condition to free Oranges you could pick up off the side of the road that fell off a truck two weeks ago and are half way to rotten.


This is incredible! The search function works well, which means they've OCR'd the papers. Is there a way of grabbing this text? I've not seen anything obvious.

Also the "link to this article" doesn't seem to work for me, although the search had taken me to the article just fine.


The search/OCR seems patchy. I tried a few (presumably) unique phrases from some and the article wasn't found.

For example with:

http://news.google.com/newspapers?nid=PQY3Tb_h0-cC&dat=19111...

I tried:

"marshalling that unspeakable parade" (wonderful phrase!)

another dull and listless session

cattle prices high granby quebec

and various other phrases from the home page both with and without quotes. Nothing returned the edition in question.


Holy shit, this is awesome. Lots of papers. LOTS. Even local papers. And the resolution is good!


just give it 5 years to google to shut down the service


I was under the impression that it had been shut down.

They announced in 2011 that they would no longer add new content to the archive (the last new addition had been in 2009), which I took as indicating a future phase-out: http://www.pcmag.com/article2/0,2817,2385664,00.asp

Then in 2013 they removed the archive-search functionality from Google News (these newspapers used to come up in Google News if you chose "archive" or an old enough date range). The direct URL to the archive search also stopped working a bit after that; if you go to http://news.google.com/archivesearch you now just get redirected to the main Google News homepage.

I assumed that was the completion of the phase-out, and that it was no longer available. Didn't know about the new URL. Cool that it's still online. I hope they plan to restart/reintegrate the project at some point, but at least leaving it in a frozen-but-accessible state is still useful.


5 years of free access to a massive archive of high quality newspaper scans (with decent OCR and a search)? What bastards!


Google may shut it down, but it won't throw away the data. Nobody would do this.


Talk to the Archive Team about that...


Is there a way to download all of it?


I'd like to know this too. I can't find a way. Maybe I'll right a script to scrape it.


Oh god, shut up already. You start the service and keep it going indefinitely then.


Here is he folks. I was just about to open a book on how long before the kneejerk google shutdown guy arrived.


Just checked it out and unfortunately landed on The Times edition from 1804, the paper was filled with classfieds announcing awards for returning lost slaves, the casual manner of those ads made me lose my appetite for browsing further... very different times they were...


Does anyone know how to submit a newspaper to this archive? I have all 51 editions of a now closed newspaper in PDF format and it would be lovely to find them a home here...


They announced in 2011 that they were no longer going to be updating the archive, so I would guess there isn't a way to do so: http://www.pcmag.com/article2/0,2817,2385664,00.asp

You might try archive.org? https://archive.org/details/newspapers


Please submit to the Internet Archive!


Thanks all for the suggestions. I shall do just that!


If you haven't contacted them yet, archive.org might be interested in hosting a copy.


Slightly related, but does anyone know where to get the equivalent of news.google.com or news.yahoo.com, but with more than 30 days of history? Ideally several years worth.

Lexis/Nexis appears to only cover print news, and their articles aren't timestamped.


Google News used to do that (and this newspaper archive was part of it, along with some others), but they seem to have pivoted towards only current news. Not entirely sure why. During the time that it had that functionality, I often found it useful.


The Google logo at the top appears misaligned for me. Also when I click it, it redirects to a 404. Nonetheless very cool archive.


I wonder if there is an easy of downloading this and OCRing it. I would love to use this as training material for some ML algos.


Jesus, this is fantastic. As others have pointed out OCR isn't so hot but you should be able to nab topics and names.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: