Data Is Plural – Structured Archive

jsvine · on May 30, 2019

Hi, author/creator here. Very neat to see this on HN, thanks. The spreadsheet is a byproduct of the weekly newsletter I publish: https://tinyletter.com/data-is-plural

The spreadsheet simply contains the text and links from each newsletter edition ... but in a tabular format. (One advantage to the newsletter over the spreadsheet: the links make a bit more sense, since they're associated with specific anchor text.) The "non-structured" archive of previous newsletters can be found here: https://tinyletter.com/data-is-plural/archive

Happy to answer any Qs!

danso · on May 30, 2019

I'm giving a talk on data journalism to journalists next week and I was already going to recommend they sign up for up for the newsletter. But I'm glad to be reminded how you track its content with a spreadsheet, which means I can mention you again when I talk about creative useful usecases for spreadsheets.

cateye · on May 30, 2019

Nice resources.

Made me think that there is still a gap between web publishing tools (like MediaWiki, WordPress, Drupal) and online spreadsheets.

Html pages are easier to open and the layout options can provide better usability but spreadsheet are more convinient to maintain.

It is also a danger that big companies like Google Or Microsoft dominate the publishing tools and govern the content.

arthurdenture · on May 30, 2019

Origin: https://tinyletter.com/data-is-plural

arafalov · on May 30, 2019

I like that meta-dataset. Actual source (newsletters) is at: https://tinyletter.com/data-is-plural

I even used it itself as well as one of mentioned datasets (puppies! kittens!) to explain Apache Solr search engine features at the ApacheCon 2018.

So, if you want to play with it, you may find my presentation useful: https://www.slideshare.net/arafalov/from-content-to-search-s...

Examples: * Slide 45: how to find entries with maximum links * Slide 59: merle puppy!

Matching github repo is: https://github.com/arafalov/solr-apachecon2018-presentation (including the final configuration, if you are not into learning Solr step by step)

I also have an idea of taking datasets from there one by one and doing series of blog posts of how to actually get them into Apache Solr showcase advanced search functions. But it is a lot of effort to prepare and I am not sure how many people would actually find it interesting.

jjuel · on May 30, 2019

Shouldn't it be 'Data are plural' then?

kelnos · on May 30, 2019

No. The verb modifies a sort of implied subject. Read it as "The word 'data' is plural", and it makes more sense.

Put another way, using a more standardly-pluaralized word, you would say "puppies is plural", not "puppies are plural".

akincisor · on May 30, 2019

Can we get a canonical url for this? It's a useful resource, but in its current form might disappear at any time.

jsvine · on May 30, 2019

That's an interesting suggestion, thanks. Certainly worth looking into. In the meantime, you can consider the newsletter's landing page to be a canonical URL of sorts — it will always link to the structured archive: https://tinyletter.com/data-is-plural

crazygringo · on May 30, 2019

Just open it in regular Docs form (replace /htmlview... with /edit):

https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...

and go to "File > Download as..." and pick your desired format!

mbauman · on May 30, 2019

It's a little funny that someone so invested in public and open datasets would embed their own data in a Google Spreadsheet.

danso · on May 30, 2019

Serious question: what do you think would be the better alternative? Especially when factoring in ease-of-upkeep (for the creator), convenient and familiar interface for the majority of users, Google's generally good uptime and server performance, and that a Google Sheet set to public access is not closed by a reasonable definition of that word.

I was going to quip about how I am annoyed that Google removed download-as-a-CSV as a URL endpoint, but it appears I misheard about this because /export?format=csv still turns the Google Sheet URL into a direct download link:

https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...

mbauman · on May 31, 2019

I personally didn't find it easy to consume, and others here didn't either (see the confusion about what it is, asking for context, etc). The links aren't clickable, there are multiple newline-separated links in a single cell, etc.

I'm not just objecting to the use of Google; part of an open dataset is its ease of use.

As for alternatives, I agree, it's hard to rival Google Sheets in terms of the creator's time. But again, for someone invested in curating open datasets, I'd hope for a bit of time invested in curating their own data — even if it remains within Google sheets.

skadamat · on May 30, 2019

Super neat! This is a compilation of datasets sent out from Jeremy Vine's Data Is Plural Newsletter - https://tinyletter.com/data-is-plural

mcphage · on May 30, 2019

What is the context for this?

TimSchumann · on May 30, 2019

It looks like a curated list of various open, free to use data sets.

I agree though, while useful, seems like this should be embedded in the context of a short write up somewhere. Or, at least, have a title that's descriptive as opposed to what looks like a vague attempt and branding.

dangwu · on May 30, 2019

I've gone insane. I thought this was a bunch of links supporting the argument that the word 'data' is plural.

TimSchumann · on May 30, 2019

It's just poor marketing/branding to anyone not in the know. Much like the name of the current site we're on, though maybe that's what they're going for.

Titles like this are absolutely useless. I shouldn't have to click on a link and spend 45 seconds reading spreadsheet entries to figure out what it is I'm reading about.

danso · on May 30, 2019

FWIW, the Google Sheet has a "Notes" tab which explains the spreadsheet and its content: https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...

weatherlight · on May 30, 2019

this made me laugh out loud, thank you.

jsvine · on May 30, 2019

Thanks for flagging this confusion. It's helpful to hear how this is perceived. Short answer: The spreadsheet is, indeed, embedded in a broader context. (And has a pointer to that context in the "Notes" tab.) Slightly longer explanation in my main comment on this thread.

programmertote · on May 30, 2019

In the 'Notes' sheet, it says we can create a copy of this Google Sheet, but I am having a hard time seeing the Menu to do that. Does anyone have success downloading this sheet into CSV or something? If so, I'd like to know how. Thank you.

danso · on May 30, 2019

The submitted URL is the endpoint to view the GSheet as HTML. The canonical URL for the spreadsheet contains the usual menu options and interactive features:

https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...

ianmcgowan · on May 30, 2019

File, Make a Copy. Or File, Download to get in excel and other formats...

1f60c · on May 30, 2019

It just redirects to a Google Support article ¯\_(ツ)_/¯

crazygringo · on May 30, 2019

Usually a cookie problem. Log out of your Google account, clear your cookies, and then it should work.

isa · on May 30, 2019

I am honestly surprised that Beyonce isn't the top hit for Houston.