Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scoop: A Glimpse Into the NYTimes CMS (nytimes.com)
190 points by edavis on June 17, 2014 | hide | past | favorite | 71 comments


Context: I'm a former copy editor with experience working for digital-only (e.g. Forbes.com) as well as print-driven magazines newspapers.

One of the things that's difficult when designing a CMS that works for both digital and print is that there are far fewer space constraints online than in print, and you want to ultimately generate an article that works in both formats. (Oh, and on mobile, and in a condensed version in a sister publication, and an expanded version for the wire.)

For hard news, the inverted pyramid format comes in handy -- if you don't have enough space in the print edition, you just lop off the last few paragraphs -- but for things like op/eds and magazine-style pieces, that doesn't always work.

What I'd love to see is a CMS (and, more fundamentally, a way of representing the underlying data) in which writers and editors can designate certain paragraphs or sentences or phrases as more important than others, so that even a story with a complex format can be dynamically "scaled," sort of like what web designers do with media queries, or what image editors do with seam carving.


I was once involved in developing a system like the one you describe, and it turns out it's a really hard problem, because for most non-trivial articles you want to do more than just skip a paragraph: you cut an explanation here, which means you need another word there and possibly that sentence doesn't make sense anymore so you'd rewrite that... and as a result, as a copy editor, you very quickly lose track of the flow of a story if there's two or more different variants all merged into one.

You mention elsewhere that "It potentially saves copy editors a lot of work." but I think you'd have to be a pretty darn good copy editor to keep multiple versions of a story in your head at the same time and ensure all of them keep their flow and coherence. Perhaps it really just is faster to read and edit any and all versions of a story separately.

What you're describing could work, but only if people changed the way they wrote to make it easy to cut things out without having to rewrite other things. But then you're back to something like the inverted pyramid.

What we ended up doing instead was to give authors the ability to "fork" stories into different editions that are, from that moment on, edited completely independently from each other, but behind the scenes keep track of which parts of the story are still the same, so you can provide intelligent diffs, notifications and potentially even merges like "In the web edition it talks about a guy named Jon but the print edition refers to John -- which is it?" without changing the experience for writers and editors too much.


Finally, a role for the DFW-inspired footnote-heavy writing style!

Edit: to be more constructive, it doesn't seem like this would be hard to represent in the underlying data:

    ...
    <p>omnia haec, quaecumque feret uoluntas
    caelitum, temptare simul parati,
    pauca nuntiate meae puellae
    non bona dicta.</p>

    <p>cum suis uiuat ualeatque moechis,
    quos simul complexa tenet trecentos,
    nullum amans uere, sed identidem omnium
    ilia rumpens;</p>

    <aside>
    The next part is usually taken at face value, but, really? Try
    reading it in the most insincere and melodramatic way possible and
    it becomes quite funny. His love is cut down like a flower by the
    plow? Seriously? Have you read the rest of Catullus's poetry?
    Have you read the rest of this poem? The last verse was about how
    many lovers his ex girlfriend is sleeping with (300 at once).
    </aside>

    <p>nec meum respectet, ut ante, amorem,
    qui illius culpa cecidit uelut prati
    ultimi flos, praetereunte postquam
    tactus aratro est</p>
I'm sure there are much better ways to do it than that; that seems like a workable starting point, though.

But I'm really surprised that newspapers and magazines really present different versions of the same story to digital readers vs. print readers.


NPR has a system call COPE (Create Once Publish Everywhere) use to COPA Create Once Publish Anywhere but they modified the data structure again. The system takes a DBA kind of approach to defining your "model." Ie byline, really really short abstract. etc. It's really tech agnostic.


Assuming such a tool existed, would writers be ready to use it? Is there much precedent for writing "scalable copy" to the level of detail you describe? Seems like it would be pretty difficult and the necessary magic (e.g. in the example you posted, cutting out the "Well, " would require capitalization of "yes") could lead to unexpected results and awkward flow.


Would writers be ready to use it? Not at first. Well, at least not the older writers who are still resentful that they have to write with a digital audience in mind. But those writers are dropping like flies.

Is there much precedent for writing scalable copy? Yes! I used to have to cut wire stories to fit. Let's say 25 newspapers run the same wire story, each with different length requirements. That's 25 editors who have to make different cuts, and some might in haste cut sections that are actually pretty important. Some writers give cues to the editors -- "cut this graf if necessary" -- to avoid that happening.

Is there much NEED for scalable copy? Yes! It potentially saves copy editors a lot of work. And if the "magic" can be automated, it saves a lot of time, too!


Interesting idea but I'm skeptical that any writer would ever be able to adopt something so "technical" (for lack of a better word). Writers just don't think like that and I don't think they ever will. It's a wall between creative types and tech types that is here to stay, imho.


I'm a writer and I think "like that". I know many other writers that do, as well.

And it's not surprising, because writing itself is a technology in a traditional sense. Further, it's more "tech" oriented (in the more contemporary sense) than you give it credit for.

It's very close to coding: It's structural, admits of procedure, copy reuse, partakes of historical design advancements, adapts to tech around it, etc. Fuzzier, maybe. But try and do it well, sometime. The edges are harder than you might think.

(And, as an exasperated aside: Ugh!! There is no wall between creative types and tech types! No such types exist in the first place. Only weird, sticky prejudices.)


You have a point, but so does he. How many nonsensical comments have you seen online that are obviously the result of failing to proofread from start to finish after editing? You move a sentence from here to there, or rewrite or remove one, and the flow is broken, and it no longer makes sense. Writing and reading are--at least, within a single piece--linear activities. Creating a system for removing or rearranging sentences or paragraphs creates an exponentially increasing number of potential combinations, each of which would need to be proofread completely by a human or an advanced AI. Otherwise you'd end up with nonsensical articles and dumbfounded readers, and then the whole system would be thrown out.


I think you're definitely an exception. I work with large editorial staffs every single day and haven't come across a single person who I think would feel comfortable crafting a "dynamic" story that is susceptible to losing and gaining chunks of copy depending on how much space there is. It would be a total sea change for how most editorial shops operate. They are so detail oriented, and to them, every word counts and has a distinct purpose within the greater text. Basically, if it could be cut, it would have already been cut (either by the author or by a managing editor).

To have a whole section arbitrarily cut from the version of an article that user X reads on their iPad vs user Y on a big screen laptop would be met with skepticism at best, outright horror at worst.

I am a writer as well and I can't imagine writing like that or how such a system could possibly work from a technical perspective, much less a UI perspective.

PS. I didn't mean to say that creative people can't be technical or vice-versa, just that writing tends to be a creative-oriented job while programming is obviously a technical-oriented job. Obviously there is overlap between the two and many people excel at both. Sorry if it came out sounding like I believe there is a creative|technical binary.


> a way of representing the underlying data in which writers and editors can designate certain paragraphs or sentences or phrases as more important than others

Doesn't HTML do this? <h1> is more important than <h2>, etc.


<p><span class="cut-me-first">Well, </span>yes, but <span class="essential">in only the broadest sense possible.</span> <span class="cut-me-next">For the purposes of an op-ed piece <span class="nice-to-have">or a feature story</span>, though, you'd need to have much more control.</span></p>

<p class="normal-priority">Actually, this comment illustrates that such a system might not be as easy as you'd think. Some natural-language processing/generation would come in handy to ensure the following <num-of-bullets> things:</p>

    * proper capitalization
    * subject/verb agreement
    * proper punctuation
    <span class="cut-me-first">* number agreement</span>


Aren't these the job of a human editor? (or is this the reason why my e-newspapers have so many copy editing errors?)


No. Or rather; that would be pretty horrible html from a semantic standpoint -- and perhaps more importantly, it would need parsing into text anyway -- so might as well be more explicitly different from html.

That said, html5 is pretty free in terms of what you put in the document -- personally I'd probably prefer a "weight" or "weight-group" attribute applied to elements and/or set on span-elements. Weight might be a float between 0.0 and 1.0 (pick one to be most important, maybe let "heavy" weights "sink to the bottom", and "fall off"... ?) -- the idea of a "weight-group" would be to group headers and/or phrases/paragraphs to be dropped together -- eg for this comment, you might want to drop all references to "weight-group" -- if you wanted a shorter, simpler version.

Too far down this path and you're square in natural language processing territory -- but it should be easy to mock up a web app that lets the author/editor preview how different versions would render/look -- and allow adding/removing tags/weights to the text(s) as appropriate.

Perhaps one useful extension would be auto-promoting a paragraph and/or (depending on format/space) part of a paragraph, or just a header -- to be used as a "deck" and/or summary (for eg rss feeds) etc?


You're missing the mark of this scenario — the headings are all more important than paragraphs, but maybe a summary or "background info" paragraph is clearly less integral than its nearby paragraphs, or maybe the second half of an intro feels a bit "fluffy" but still adds. If you have space constraints, they'd be the first priority to go.


Those are for headers. Hence the "h". Not for paragraphs or sentences.


The CMS is in a renaissance period with Wordpress, Joomla, Drupal and the like falling out of favor.

I believe the CMS is bifurcating into two specialized directions.

Several online publishers are coming out and describing their new, home-grown custom CMS. The features are rich and provide robust, innovative tools across the long-form content lifecycle: writing, editing, and publication. There is special attention to collaboration.

On the other hand, more and more website developers align themselves with the goals and properties of static site generators. SSGs are best suited what I call "malleable" websites.

Thus, I think the way to think about this CMS renaissance is that traditional the CMS tried (and failed) to optimize for both long-form content and the malleable website. As a result, people are sick of trying to patch the traditional CMS with plugin after plugin and instead are simply crafting their own.


I would humbly suggest that static site generators are not nearly as popular as you might think if you only view the CMS market through the eyes of the HN crowd.

SSGs are hugely popular among nerds and practically unknown among everybody else. The result is a kind of distortion where, to nerds, it looks like SSGs are about to take over the world -- "all my friends use them!" But step outside that tight network of people and they are more or less completely off the radar.


I agree - I find the idea of static site generators (SSGs) very appealing, but they are not user-friendly to setup - unless you happen to be technically-minded. Some tell-tale signs that SSGs are made by programmers for other programmers: command line-installation and configuration, a liking for markdown (and a dislike for WYSIWYG).

Just to be clear, I'm not knocking any of this. The fact that people have put their own time and effort into building SSGs and then generously open sourced them is pretty awesome.

In my view, the audience for SSGs, whether intentional or not, is mostly other programmers. What would it take to make an SSG appeal to a broader set of users?


I'd agree with your assessment. We're testing an SSG with an international website to see what the experience is like. We've traditionally used Drupal, although we use almost zero of the CMS-functionality of Drupal (hence why we were looking at SSGs). We're using Hexo for this deploy. Here's what we've found:

- For complex web pages with a lot of design, we have to put most of the content into .jade files - mainly because of the style of our webpages (long-single-page style with a lot of sections). If we were doing a more traditional page structure, then we could have markdown files. This is "ok" but requires content authors to get into .jade files which can be a little intimating. (if anyone has any thoughts about how to get around this, I'm open to hear them).

- We're storing the files in Github, but have now figured out that if we want a distributed team to work on this, we'll need something like Jenkins to auto-deploy to a staging server on commit. The problem is that the first thing people want to do once they've made an edit check the page to see if it worked. This is trivial in WP or Drupal, but without Jenkins in place, I don't see an obvious way to do it unless we have the content folks run Hexo locally (which I'd like to avoid).

- Huge improvement in the speed of the site vs. the Drupal website. Not shocking given the move to static files, but there is also a reduction in javascript - some of which was coming from Drupal.

- Nice to have all content under source control.

All in all, it's a bit of a mixed bag at the moment - the SSG is supposed to be a simplification of our stack, but now we have to run Jenkins to manage deploy (probably not a bad thing but no one here is an expert in it), and our content people are finding it a bit intimidating.

My worry: we setup all this stuff, and then someone key on the engineering team leaves and we're left with an overly complex stack vs. just going with Wordpress.


> What would it take to make an SSG appeal to a broader set of users?

I don't think the SSG, as they largely exist today, will ever gain widespread usage by mainstream users.

But one thing that could drive greater adoption would be to develop an aesthetically pleasing web front-end that lets users write content using familiar WYSIWYG tools, but save the content to local files instead of a database. From there, the process would be largely the same as it is now: transform this directory of lightly marked up plain text files into raw HTML.

So, to answer your question: Make it more like Wordpress.


There is no doubt that SSGs will "grow up" and appear to gain features like Wordpress. The difference will be that SSGs are born out of loosely coupled tools in a toolchain ecosystem. Over time, I believe the successful SSGs will have a decidedly Unix flavor to them. Take http://www.metalsmith.io/ as an example.


SSGs are fundamentally flawed. The web is moving fast away from being a static page-centric medium to a dynamic content-centric one.


I think the future of "SSGs" is that they're going to end up as programs running on the server that watch data from various feeds - databases, RSS feeds, APIs - and generate output from those. Basically, a streaming processor framework. The output would likely be in a "semi-baked" format so that they can contain processing instructions that are executed at request time, or otherwise lazily.

These won't be static site generators any more, and will lose out on the "you can throw it up on Github Pages" benefit, but they'll be far more powerful, and it'll be easier to develop dynamic, data-driven websites.


Here's that aesthetically pleasing web front-end for SSGs, though it's tied into GitHub: http://prose.io/


> SSGs are hugely popular among nerds and practically unknown among everybody else.

Depends on your definition of SSG. I'd consider an app such as RapidWeaver[1] to be a SSG. They aren't targeting "HN nerds."

[1] http://realmacsoftware.com/rapidweaver


Ah, don't forget that HN-types determine the future direction of products, services, and tools. If that weren't true, what are we all doing here?


The market determines the future directions of products, services, and tools. While HN-types deliver them, by-the-numbers, they usually fail.


I don't know about Joomla but Wordpress and Drupal are hardly falling out of favor; they're actually both growing quite rapidly.

Developing a custom CMS makes sense for exactly one type of company: a publisher whose entire business depends on their CMS. For companies in that situation, developing a custom CMS is a capital investment in their core product.

Everyone else will save time, money, and security headaches by building on a popular open source platform. Heck even Twitter uses Drupal to power their developer site.


Ironically, I've heard the argument that, for the malleable website case, most people will save time, money, and security headaches by avoiding the traditional CMS.

My point is that the traditional CMS is falling out of favor as the default solution for content on the Web. Need a blog? Wordpress is increasingly NOT the solution. As developers, we are trending away from the one-size-fits-all CMS and into specialized categories of CMS.

The CMS renaissance is due to two factors:

1) A proliferation of great tools emerging into a toolchain ecosystems. Node's Grunt and Gulp are fine examples. This is largely different than the traditional CMS because...

2) Developers are increasingly wanting to gain more control over our craft.


No way, building a custom CMS for a site which would have always been a good candidate for an off-the-shelf CMS is just as bad of an idea today as it ever has been.

> As developers, we are trending away from the one-size-fits-all CMS and into specialized categories of CMS.

Most of the sites which would typically get built on Drupal (and especially Wordpress) aren't owned by developers. Further, these owners probably don't even have a developer on staff or even on a maintenance retainer. Many of these sites are setup by relatively non-technical users. Many of these sites are setup by a developer the owner has paid and then that developer is out of the picture to be run by the owner.

Even if you are a start-up building your own web app, it still often doesn't make sense to build your own blogging system. Rather, you would get your first version out the door ASAP, throw your blog up on a sub-domain running Wordpress and then maybe down the road figure out something different.

Do what you want for your own site. But when you are doing something for clients who don't have a tech staff, it's not about what the developer wants.


I agree that building a custom CMS is not the right answer in most circumstances. I'm not advocating building custom CMS. My position is that what has "always been a good candidate for an off-the-shelf CMS" is evolving. The traditional CMS will be losing ground at the high and low ends. On the high-end, long-form content organizations striving to differentiate are building their own systems focused on collaboration and content lifecycle features and, on the low-end, static site generators for custom websites. There is a middle, but the middle, over time, appears to be shrinking.


None of the metrics I've seen outside of Acquia marketing materials suggest Drupal is growing. Developer engagement plateaued a couple years ago if Google keyword searches and module issue queues are anything to go by. The number of purely Drupal dev shops have plummeted over the last few years. Job postings (locally and elsewhere) appear to be limited to well-established large enterprise and .edu, and regional shops that service these markets.

Wordpress, on the other hand, appears to be on fire.


Drupal and Wordpress are growing in different ways. Drupal is growing into more complex, higher-cost-per-project jobs that would have gone to big ticket closed source CMS platforms a few years ago. And Wordpress is growing into the mid-size jobs that Drupal and Joomla used to be the only option for.

By any metric of public popularity, Wordpress is growing a lot faster than Drupal. Drupal's growth is into the enterprise arena which is not as public.

The number of purely Drupal shops is dropping because pretty much every agency of note had added Drupal to their stable of solutions. It's not an esoteric differentiator anymore--which is actually a form of growth for the platform.


The number of Drupal core contributors has shot up from the hundreds to the thousands over the past couple of years


Wordpress is not falling out of favor. If anything it's eating the lunch of a lot of previously custom or nonexistant CMS solutions. For small business, it's just getting started.


Heh. You might find my CMS (http://www.webhook.com) funny then. It's a CMS based on static-site-generator concepts... it deploys static websites for example and is easy to use and malleable. However it still has a firebase backend and a traditional /cms/ page with all the modern trappings you'd expect.

So I kind of went for both :)


Ooohh. I like where your head is with webhook.com. (How did you get that domain!) I would have backed your campaign had I been exposed to it. Any way to join late?

Alright... hear me out... thoughts on combining webhook.com with Atom as a single download/install. There are custom Atom packages that provide UI to achieve what you are doing through the CLI. Preview locally, too.


I bought the domain along with webhook.org awhile ago. They were surprisingly inexpensive. If you'd like to get in, just send me an email dave@webhook.com and reference this post.

Atom stuff sounds like a neat idea, but I'd rather not tie us to any specific editor. Obviously people can write whatever they want though. We plan on open sourcing most of the code.


Any news regarding hosting this yourself? I remember this being asked a few times, but I'm curious if your stance has changed since.


It's still roughly the same answer. Likely later this year. Certainly not from lack of want! More just because you'd have to sign up / install so many different services: firebase, elastic search, app engine for image resizing, mailgun or something for emails. There's just a lot of pieces.


Do you have any data that actually indicates a move away from off the shelf cms-es?

There is always a case for building a custom CMS and some companies go that route. That has been case since the WP/Joomla/Drupals off the world came into existence. I don't see a trend though, in fact I think all 3 of those are growing.

For some companies a custom CMS just makes the most sense. It doesn't really reflect anything on the off-the-shelf cms-es.

And static site generators are not exactly taking off like crazy either, in terms of growth.

Choice is always good though. In that respect, I agree that there is a renaissance There are lots of great ways to run a site now. There are some nice looking CMS platforms and the popular platforms continue to mature as well. It's good stuff all around.


OS tools such as WP, Drupal, et al aren't even remotely in the same class of app as Scoop, or those built by various other online pubs (exa: Salon), nor are they "new", some have roots going back five and ten plus years in the respective org.

The conclusion is way off base and ill informed; early digital pubs recognized immediately that OS CMS systems and blogware weren't going to cut it and brewed their own tech (a few of which were subjected to attempted productization, which inevitably failed, exa: vignette storyserver), and nothing's really changed in that regard.

know your history before you declare a renaissance, mm-kay


Take any normal CMS, cache the html, and you have a static site generator.


Exactly. My favorite static site generator is Varnish.


> The CMS is in a renaissance period with Wordpress, Joomla, Drupal and the like falling out of favor.

I think Wordpress is still going to stay in the game. It still has a road ahead of it. I do think that the big newsrooms are going to expand past it's limiting multiple-people collaboration features into a new CMS, but most blogs will want to use WP.


Are there any google docs based CMS's ? Seems like you could offload the editing, saving, tracking changes to google docs - a platform many people seem familiar with now - and then keep the actual CMS to a bare minimum. Have it import the text & spit out a static page even.


Bangor Daily News does all of their editing in Google Docs. They then feed things into WordPress and InDesign, but I imagine you could feed it into pretty much anywhere.

http://dev.bangordailynews.com/2011/06/16/marrying-google-do... http://toc.oreilly.com/2011/06/google-docs-wordpress-indesig...


That is a very interesting idea, but since NYTimes opensourced their ICE editor, I am not sure what Google Docs would provide.


Transparent version control, tracking of changes, collaborative editing, familiar interface, familiar workflow (just drop finished articles into this directory to be available to the CMS)


This appears to be unrelated to the Kuro5hin derived CMS named Scoop?


I was always under the impression that they're Django/Python. Can anyone confirm?


No.

We use a pretty normal ("boring") Java stack with Spring, Hibernate and Jersey. Some of the older components use Struts + JSPs whereas the newer components use Backbone (and related libraries).


Which database? A JDBC one?


scoop uses mysql for its managed repository. we also use mongodb for our published repository that serves our apis to everybody else.


The NYT devs consist of several dev groups and a mix of stacks...some which are business-facing/product, and others that are in editorial (i.e. interactive news graphics). On the editorial side, they are one of the few in the news business that were using Ruby/Rails (the vast majority of newsrooms use Python/Django). There's also obviously a primarily-JavaScript group.


See krishy's comment above. It looks like they do not use Ruby/Rails for the CMS for editors, but use Java instead.


Right...I should've clarified...by "editorial", I meant the team that focuses on the public facing news projects, such as Derek Willis's campaign finance apps and utilities (http://itemizer.herokuapp.com/)...I don't think they do a lot of monolithic-type Rails apps currently, though, at least compared to client-side-heavy projects


It looks surprisingly nice for an in-house app. Usually there's no point to focus on web-design for these kind of apps since the public won't be able to see them anyway.


Very interesting and cool article, although I wonder what the point of this is? Just a show and tell? Doesn't seem like they're open sourcing it.


Like any other company's engineering blog, a post like this can help with recruiting and it's also a nice way for the team to summarize and share what they built.


It's also relevant to the NYT's goal of surviving the digital shift. Giving the world a peek under the hood isn't just an interesting story, it's brand-building for a publication that wants to be seen as modern.


Open sourcing it is a good suggestion and one that some have made previously. Open sourcing it requires a fair amount of work to remove company-specific, internal stuff and making it easy for others to install, upgrade and maintain it on their own. That fair amount of work hasn't yet made it high on a priority list, but let us hope it will :-)



I know ICE is on it. That's just a VERY small part of the entire CMS, however. Why the downvotes?


I didn't downvote you, but vague comments idly questioning the point of the article usually don't do well on HN.


Oh, I should have written my post better.

It should be known that I love behind-the-scenes more than the movies, and I love these peeks more than anything else. Just was genuinely curious why NYTimes would take the time to show us. I don't think it would help any with the subscriptions/circulation numbers but I am probably wrong.


I could see this as premium wordpress service successfully charging $15 - 25 a month for bloggers.


I don't, and it would be too expensive even at $15 imo. But even if something like this had a market there's the problem of the editor, as it uses contenteditable which is at best inconsistent across browser and at worst completely broken. That's also the reason Google Docs dropped it and started doing editing the hard way. As long as you know who uses your software and you can enforce the use of a given browser (as is the case with the writers for the NYT) there's no problem with contenteditable, but for a general solution you need to drop it.


ice plugin works for tinymce as well as contenteditable http://nytimes.github.io/ice/demo/


Looks like Drupal with a few node_hooks to me




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: