Hacker News new | past | comments | ask | show | jobs | submit login
Data Is Plural – Structured Archive (docs.google.com)
120 points by joshdance on May 30, 2019 | hide | past | favorite | 27 comments



Hi, author/creator here. Very neat to see this on HN, thanks. The spreadsheet is a byproduct of the weekly newsletter I publish: https://tinyletter.com/data-is-plural

The spreadsheet simply contains the text and links from each newsletter edition ... but in a tabular format. (One advantage to the newsletter over the spreadsheet: the links make a bit more sense, since they're associated with specific anchor text.) The "non-structured" archive of previous newsletters can be found here: https://tinyletter.com/data-is-plural/archive

Happy to answer any Qs!


I'm giving a talk on data journalism to journalists next week and I was already going to recommend they sign up for up for the newsletter. But I'm glad to be reminded how you track its content with a spreadsheet, which means I can mention you again when I talk about creative useful usecases for spreadsheets.


Nice resources.

Made me think that there is still a gap between web publishing tools (like MediaWiki, WordPress, Drupal) and online spreadsheets.

Html pages are easier to open and the layout options can provide better usability but spreadsheet are more convinient to maintain.

It is also a danger that big companies like Google Or Microsoft dominate the publishing tools and govern the content.



I like that meta-dataset. Actual source (newsletters) is at: https://tinyletter.com/data-is-plural

I even used it itself as well as one of mentioned datasets (puppies! kittens!) to explain Apache Solr search engine features at the ApacheCon 2018.

So, if you want to play with it, you may find my presentation useful: https://www.slideshare.net/arafalov/from-content-to-search-s...

Examples: * Slide 45: how to find entries with maximum links * Slide 59: merle puppy!

Matching github repo is: https://github.com/arafalov/solr-apachecon2018-presentation (including the final configuration, if you are not into learning Solr step by step)

I also have an idea of taking datasets from there one by one and doing series of blog posts of how to actually get them into Apache Solr showcase advanced search functions. But it is a lot of effort to prepare and I am not sure how many people would actually find it interesting.


Shouldn't it be 'Data are plural' then?


No. The verb modifies a sort of implied subject. Read it as "The word 'data' is plural", and it makes more sense.

Put another way, using a more standardly-pluaralized word, you would say "puppies is plural", not "puppies are plural".


Can we get a canonical url for this? It's a useful resource, but in its current form might disappear at any time.


That's an interesting suggestion, thanks. Certainly worth looking into. In the meantime, you can consider the newsletter's landing page to be a canonical URL of sorts — it will always link to the structured archive: https://tinyletter.com/data-is-plural


Just open it in regular Docs form (replace /htmlview... with /edit):

https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...

and go to "File > Download as..." and pick your desired format!


It's a little funny that someone so invested in public and open datasets would embed their own data in a Google Spreadsheet.


Serious question: what do you think would be the better alternative? Especially when factoring in ease-of-upkeep (for the creator), convenient and familiar interface for the majority of users, Google's generally good uptime and server performance, and that a Google Sheet set to public access is not closed by a reasonable definition of that word.

I was going to quip about how I am annoyed that Google removed download-as-a-CSV as a URL endpoint, but it appears I misheard about this because /export?format=csv still turns the Google Sheet URL into a direct download link:

https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...


I personally didn't find it easy to consume, and others here didn't either (see the confusion about what it is, asking for context, etc). The links aren't clickable, there are multiple newline-separated links in a single cell, etc.

I'm not just objecting to the use of Google; part of an open dataset is its ease of use.

As for alternatives, I agree, it's hard to rival Google Sheets in terms of the creator's time. But again, for someone invested in curating open datasets, I'd hope for a bit of time invested in curating their own data — even if it remains within Google sheets.


Super neat! This is a compilation of datasets sent out from Jeremy Vine's Data Is Plural Newsletter - https://tinyletter.com/data-is-plural


What is the context for this?


It looks like a curated list of various open, free to use data sets.

I agree though, while useful, seems like this should be embedded in the context of a short write up somewhere. Or, at least, have a title that's descriptive as opposed to what looks like a vague attempt and branding.


I've gone insane. I thought this was a bunch of links supporting the argument that the word 'data' is plural.


It's just poor marketing/branding to anyone not in the know. Much like the name of the current site we're on, though maybe that's what they're going for.

Titles like this are absolutely useless. I shouldn't have to click on a link and spend 45 seconds reading spreadsheet entries to figure out what it is I'm reading about.


FWIW, the Google Sheet has a "Notes" tab which explains the spreadsheet and its content: https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...


this made me laugh out loud, thank you.


Thanks for flagging this confusion. It's helpful to hear how this is perceived. Short answer: The spreadsheet is, indeed, embedded in a broader context. (And has a pointer to that context in the "Notes" tab.) Slightly longer explanation in my main comment on this thread.


In the 'Notes' sheet, it says we can create a copy of this Google Sheet, but I am having a hard time seeing the Menu to do that. Does anyone have success downloading this sheet into CSV or something? If so, I'd like to know how. Thank you.


The submitted URL is the endpoint to view the GSheet as HTML. The canonical URL for the spreadsheet contains the usual menu options and interactive features:

https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4jucl...


File, Make a Copy. Or File, Download to get in excel and other formats...


It just redirects to a Google Support article ¯\_(ツ)_/¯


Usually a cookie problem. Log out of your Google account, clear your cookies, and then it should work.


I am honestly surprised that Beyonce isn't the top hit for Houston.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: