Downloading more than 20 years of The New York Times

Articles for the period from 1987 to present are available without subscription. Their copyright notice is web scraping friendly:

“… you may download material from The New York Times on the Web (one machine readable copy and one print copy per page) for your personal, noncommercial use only.”

Why waste the opportunity to download these articles then?

fat-pope-y-tho.jpg

Please read their terms of service here.
Please subscribe to The New York Times here.

Next time, I’ll modify the code so you can download articles from some other major online newspaper.

2 thoughts on “Downloading more than 20 years of The New York Times

  1. With a few small edits, you could add the ability to download comments/commenter info. It seems that although the community api has depreciated, it still works fine.

    Like

    • Although the article search API won’t download the full text it might be useful in order to find abstracts and keywords for the articles. My point is that API is a great idea if you want to get some additional data.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s