I honestly don't see a problem here. They decided to change a backend library for a non-essential system in their product. Most services don't ask for permission or make announcements when they make changes like this.
The approach seemed to be, if things break, people will report it and we’ll fix it.
While this may not be the best approach, the number of languages supported is too high for a person to check each one manually. Generally, I imagine they wouldn't expect a change like this to break anything significant.
[people] use it as a portfolio. [..] To suddenly doink the appearance of people’s portfolios is unfortunate.
It is very unlikely that syntax highlighting errors in GitHub will affect someone's chances of getting a job.
Sure, this switch could cause some issues but they don't seem to be severe enough to kick up a fuss over.
I beg to differ. Github pushes its code rendering as far as possible, including providing widgets to embed code snippets into you blog. It is also the _main content they display all the time_. That's not "non-essential".
Just switching the library and breaking things at a whim is problematic.
Also, the number of languages may be 316 (including some oddballs like "Unified Parallel C") but that's still a possible number to check for at least for major, obvious breakages. Still, for people that do use Unified Parallel C, adequate highlighting might just be the reason to choose that platform and use it to write your blog in, instead of writing a custom highlighter for prism.js.
Sorry, if you business is code and you decide to support 316 languages, expect people to hold you on that promise.
That said, also: errors happen. But that isn't a reason to give them a pass, just not to put too much weight on such things. It doesn't break the platform at large, but terribly inconveniences some users, and they are very right in being upset, too.
The other half being providing a service they find worthwhile that doesn't change at every whim.
Taken to the other extreme, a constantly breaking Github doesn't have any value proposition.
Promises are not always explicit and not at all related to laws.
The next Transformers being something to look forward to is a promise, but that doesn't mean not liking it is in any way wrong or that I could sue someone for it. I can choose to leave the movie and never go see one again. The producers of Transformers certainly owe me nothing, but they also cannot tell me how to feel about their handling of the material.
Github promises the most awesome code hosting around. That's a very different thing for many people. And to some people you can live up to the promise, to some people you cannot and it's perfectly fine for those to feel let down. Calling all those people "entitled" is, quite frankly, insulting.
As I said: putting this like it is the end of the world is overreaching, but it is a valid complaint and a valid sentiment. Saying that this is the most important thing on Github, because it happens to be specific your problem is entitlement.
Finally, it isn't true that you are only bound by spelled out things. The law many people cite so often has the concept of Good Faith: http://en.wikipedia.org/wiki/Good_faith_%28law%29 and similar fun things that extend beyond that. So Github _does_ owe me beyond their ToS.
(I appreciate that this is probably not a case covered by this)
It's a common sentiment in these circles that only the rules written on the contract are the ones that count, while nothing can be further from the truth, widely varying from legislation to legislation.
Agreed. A business relationship is more like personal relationships than I think some technical people think. Pointing at the contract is a failure condition.
nobody's talking about legal contracts. the issue is of good will. the promise github has made is they would provide all kinds of hosting (git, gitsts, pages, whatnot). people move from other solutions to github because the github's promise (again, not a legal contract) is "we're better".
the other half the difficulty of running a business comes from making tacit promises and then acting annoyed that the other side holds you to them.
I didn't expect my blog post to be on the front page of HN. Here's a TL;DR summary:
For many languages this is a significant and distracting degradation in the presentation.
I could understand GitHub removing highlighting completely because they feel speed is the overriding priority. That would be even faster than what they're doing now. Languages would look "plain" instead of "wrong". Not my first choice, but a reasonable choice.
The situation now is that they've replaced a library that had been handling highlighting thoroughly, with a variety of text-editor lexers that mostly are not. People like me who already contributed to Pygments, aren't feeling motivated to do this all over again for no good reason. So it seems likely the lexers will remain poor for quite a long time. Which is unfortunate.
Finally, at the time I wrote my blog post, I was speculating about the motivation because GitHub hadn't explained why, yet. Someone later did explain ("because speed") in the issue thread.
Are the majority of languages now broken, or just a small niche subset that don't see much use vs. ruby, python, etc?
If a few minor languages hardly anyone uses as compared to the whole site might need some fixing, this still seems like a win from Github's side of things since per the graph the change did in fact significantly improve render times.
Which is sad, because the TextMate lexer design is really really awful. Mostly undocumented. Lots of oniguruma-specific regexes used in the syntaxes. Inefficient beyond comprehension.
For instance, TM syntaxes can legally have recursion loops in them, which TextMate will cut so that the app doesn't spin into infinite recursion. But the precise way that it does this is a mystery.
The pygments design is better for static syntax highlighting.
Point of curiosity: Chocolat is compatible with TextMate syntax files, IIRC. Was going with TM syntax purely a pragmatic choice? Is it not so bad for in-editor syntax highlighting? It seems like virtually every text editor that hit the market -- or whatever one would say for free programs like Atom -- after TextMate adapted TM syntax files. (While BBEdit's comparative inflexibility in syntax highlighting, even the new BBEdit 11 format, irks me, it's hard not to notice that it's a much better performer on giant files.)
There are two types of syntax highlighting: static and dynamic. Static is like GitHub/Gist/Pastebin, dynamic is like Atom/TM/Sublime. Static highlights the file straight through, and the result can be cached indefinitely. Dynamic highlighting in a text editor parses as little of the document as is theoretically possible, in response to an insertion, deletion or replacement.
For static there's tons of choices. Pygments, prism.js, GeSHi for PHP, etc. Any idiot can write a static highlighting system. But none of these can be used in-editor.
For dynamic highlighting, there is only one game in town and that's tmbundles. Only TextMate has support for the 100s of languages in existence, including the new ones that pop up each day.
I would love to replace tmbundles. I know just how to implement it. But the problem is, who is going to write all the long-tail language support? VHDL, Pascal, GAP, AtScript, Julia, ...
- - -
Interesting you mention BBEdit. I have a test file I call "the behemoth" which consists of a python file with 32000 copies of this:
""" """
The challenge is to insert """ at the top of the file and see how the text editor cries in pain. It's torture to a syntax highlighter.
To pass, the editor must
1. Load the file quickly
2. Have smooth scrolling inside the file, even after making the change.
3. Color the quotes properly through the end of the file, before and after.
To my knowledge BBEdit is the only editor to pass the test. Emacs is a good 2nd place.
The rule of thumb is that most languages are rarely used (since there are a finite number of users), so if you support only the most used languages you necessarily drop support for most languages.
Then depending on exactly how popular a language must be to be supported you could end up breaking language support for quite a lot of them.
It's not that people necessarily choose to host portfolios on github, but more that employers will treat whatever's on there under your name as a portfolio regardless.
This sort of thing feeds my paranoia about GitHub being a giant single point of failure in the open-source world.
I know the argument: Someone, somewhere has a copy of each repo checked out, so we (the nebulous "we") could reconstruct everything from the diaspora of ".git" directories.
It just bothers me to think how dependent OSS has become upon GitHub.
I don't think it's paranoia. I think it factually is a single point of failure, which is a bad thing(tm). And GMail is a single point of failure for my (and my companies) e-mail, and that bothers me too, despite the fact that I'm not doing anything about it :)
At one project I worked, someone created a bug "system will stop working with the end of the world on <last popular End_of_Word prophecy>" (yes, it was a joke)
Google is a giant single point of failure. Others would rush to fill the vacuum, but it'd be a brutal few months (years) while we all relied upon Bing.
Is there any GitHub-esque outfit waiting in the wings that provides free OSS hosting?
I can really recommend the open source gitlab for self hosting repos. I have been running it on a ec2 micro instance for 5-10 people over years without a single problem.
They do: https://gitlab.com/gitlab-org/gitlab-ce/. They maintain a Github mirror to allow for easier contribution. Github is more than just a version control software; it's a community.
We've been happily using GitLab for months now, can't recommend it highly enough. If there was an open source bug tracking/ticket app that played nicely with it I'd switch all of our teams over to it sooner rather than later.
GitLab is great for lots of projects; especially internal ones. The inability to control your splash page makes it less than ideal for OSS projects trying to be welcoming to newbies though. (When you navigate to a project page, instead of displaying the readme, it just shows recent commits, etc, which is nice for people working on the project but completely irrelevant to end users.)
There's also SourceForge, which hosts SVN, Git, Hg, etc and is the largest host of FLOSS applications (Git is the largest host of FLOSS components and web code).
I'm not sure if it's still the case, but SourceForge used to have an approval process to get a new project / repo set up, which defeats a lot of the utility of GitHub for being able to store whatever projects you want to toss up there.
"To create a new project you simply register at SourceForge and then submit a new project request. Most projects are approved immediately, and you'll typically get an email notifying you of the approval in ~ 24 hours "
It might have happened between when I'd originally registered and the present. It does make sense as SourceForge was providing free FTP uploads for binaries at the time and the automated tools to handle abuse were pretty poor. Lots of folks were looking to abuse it for bandwidth and malware distribution. The tools are better now, which is one reason Github finally added binary releases last year.
I really like bitbucket. They don't have as many features, but free private repos, unlimited academic repos, means I'm not actually a GitHub user, except for the OSS projects that insist on GitHub.
True, but it's anything but the end of the world if GitHub vanished from the face of the Earth right now. Everything would be running full steam within a week.
There is team and org level authZ and authN data at github that is needed by companies working on cross-company projects. That data is not stored in code repos.
I disagree. Many OSS projects rely upon GitHub, especially projects that package or distribute other projects. Sure, everyone would eventually work out their own hosting, but it'd be a nightmare of changing URLs and uneven servers in the interim. I think a week would be extraordinarily optimistic.
If you have an external hard-drive backup of your laptop, that's 2 points of failure, right? If someone else has 10 external hard-drives that they keep in different places, that's 10 points, yes? But what stops you from calling all those hard-drives "a giant single point of failure"? If all of them are destroyed, the data is lost.
I just don't get these arguments... The chances of all GitHub data being lost is probably less likely than BitBucket and SourceForge combined.
By "failure," it never occurred to me to mean a technical or operations failure.
I'm thinking: shut down due to the business model not working, or some other business-model variant, such as gradually getting intolerably crappy like SourceForge. Cash flow isn't great; let's introduce "sponsored" code. Etc.
You're right about the 10 hard drives : if you've bought them identically at the same time, it's likely they all gonna fail in a short time span. I've recently heard about the "3,2,1" rule concerning backup :
- 3 images of your data (1 original + 2 copies).
- 2 types of media (e.g. extHD and DVD)
- 1 offshore location (e.g. bank vault, your parents' house,etc.)
Of course it's not a golden rule, but it can prevent catastrophic failures.
It's organizationally a single point of failure. The company could easily decide to shutdown access to all of it's servers and everyone relying on github for issue tracking, etc would be screwed.
It's not limited to open source, but commercial enterprises pay for GitHub's services, so at least there's some sort of service agreement. OSS depends upon GitHub's largesse (which is extensive). However, if GitHub changes its mind (deciding, for example to focus on selling code-editing software), or if something goes wrong with the commercial side, then OSS is left high and dry.
Would a P2P version of GitHub make sense? Each outward-facing local repo would provide a GitHub-like interface for browsing code, submitting PRs etc., and would in addition route repo search requests to neighboring repos using something like a Gnutella protocol.
I don't think that would work well...Github has too much momentum. But I think open protocols and folding more of the functionality that has been proven out at Github into the core SCM could have a very positive impact.
To start with, pull requests could be implemented in the main git tool. They're no longer experimental and many, if not most, git users rely in them in some form or another (Github, Gitlab and BitBucket all support them). Folding them into the core would just standardize all the implementations.
It would also be good to define protocols for collaboration. Off the top of my head, that could mean a fork:// browser protocol that would allow a BitBucket user to fork a Github repository and seamlessly submit pull requests back upstream. Some of that is possible today, but there would be some new requirements around federated authentication to enable this (i.e. how to allow a user who is registered with a different service to create a pull request).
If the mechanics of interoperability are standardized, people will develop competitors to Github and things will get more decentralized. But, as mentioned above, Github has a lot of momentum and no incentive to cooperate with other providers. The only way they're really forced to work with others is if these changes and standards are coming from the core Git project.
Should there be a limit on the size of any organ, to avoid this bit bus factor ? Like hierarchies starting to be needed after a group reaches 150 people.
Well, the current Internet pendulum has swung to nearly the polar opposite of federated services, much to the dismay of my inner software-libre graybeard.
Even a duopoly (!) would be preferable to a single vendor.
The initial drafts of CSS 2.1 on the other hand, were published in 2002, yet it was 2011 by the time it became a full reccomendation despite having been in use for a long period by that stage. CSS3 colors (i.e. rgba() etc.) also became a full recommendation on the same day.
> Two months earlier, in March of 1998, CSS 2 had become a W3C Proposed Recommendation (PR) which meant that it was considered "done" and was simply awaiting a procedural W3C member review and vote. For all practical purposes, nothing else was going to get fixed in CSS 2. There wasn't a Candidate Recommendation (CR) phase back then, as evidence by the fact that no-one (including my Tasman team as part of Microsoft Internet Explorer 5 for the Macintosh) was able to implement CSS 2 as specified. The problems in CSS 2 were far more severe than mere errata - we had to develop a full revision to fix it.
How hard is it for people to switch, if they want/have to? They might be dependent on GitHub issues. That might be preventable by using some issue tracker that can actually be put under git (like Ditz), though I guess it makes collaboration with non-owners of the repository more difficult. Other than network effect, what else is it that binds people to GitHub?
F# highlighting is also now totally gone with this change. Code is highlighted with some random lexer than doesn't even understand // comments. Rather frustrating, given that a lot of F# development is centered around GH, and GH themselves use F# in a couple of places.
Browsing the issues list, this isn't just "fringe" languages, either. Perl, PHP, Go, and Clojure all appear to have regressed to some degree.
I don't know about pygments but my experience with writing a custom highlighter for Sublime Text (aka Textmate, Atom and what Github seems to use now) was that it is not really a good and reliable system for highlighting.
It is really easy to highlight simple things (keywords, numbers, ...). However when it comes to more complex scenarios (e.g. where the type of a word depends on the previous one) then the singleline regex based mechanism shows it weakness. Due to that many language support plugins will yield wrong results when you start to split things like function declarations over several lines, even though it's perfectly legal in the languages.
Some things can be worked around with the start/end regexes, but nesting those multiple levels deep can get quite akward and I don't think that they were thought of for things beyond braces and multiline comments.
Therefore I don't know if Githubs move here is a really good choice. However I think their main motivation might be that this file format already has such a big ecosystem due to Textmate, Sublime and Atom and the parser has a high performance so that they went for it.
(Generic identifier looks the same as class identifier looks the same as a property in an object literal looks the same as a string. I guess maybe orange is just the color used left of an equal or colon, but in that case the string color should be different.)
I assume the new syntax highlighter is way, way faster than pygments. It's written in C++ rather than Python. (Atom uses the same grammar format, but a Node implementation.)
>By using TextMate grammars we also get some nice features like highlighting SQL inside Ruby heredocs. But the main motivation was improving performance.
That's the nice thing about actual software with actual versions that you actually install: it doesn't change out from under you at someone else's whim. No sane person would use a "Cloud" C compiler. Of course, GitHub is just a mashup of online backup and Facebook, so it doesn't matter if it breaks.
It's not like they broke git, or some actual important interface. They stopped syntax highlighting on non mainstream languages until new lexers can be written. What on earth would you as a customer have done if they had said, "syntax highlighting in the browser for <your-favorite-language> is going to go away in a week"? Are you going to migrate away from github? Or shrug and say, "bummer"?
Github did the most efficient thing: break it and let the people who actually care fix it.
I don't think it is necessary for Github to provide choice for this sort of thing. It isn't a ground-breaking, massive change to their main product. To me, it seems like an unnecessary choice that would only create confusion and make their product more difficult to navigate and use.
Dear downvoters, have you heard of the Streisand effect? This thread now has the most downvotes (-14) of the many comments I've posted on HN. There is nothing factually incorrect in my posts. Why does this question deserve five downvotes?
> Why not give existing users the choice of migration and timing?
People are downvoting you because you're bikeshedding like a madman. The post I'm replying to will soon be downvoted too, because you're whining about getting downvoted.
This situation is very different from the Streisand effect. In the Streisand effect a single party is trying silence everyone else. Here everyone is trying to silence you.
They are nearly opposite situations. A minority trying to silence a majority vs a majority trying to silence a minority.
Downvoting is (afaik) for not meeting the norms of polite / intelligent discourse as opposed to "we disagree with your opinion". There are opinions which if expressed breach out taboos - the usual range if racism sexism etc - but this is certainly not the case here.
So no, I think downvotes for a comment that is polite, intelligent, on topic and part of a discussion was the wrong use of downvotes.
I still don't agree with it of course :-) User choice is a bugger on cloud services especially where they are shelling out to run the pygments lexer.
On top of which is probably the interesting issue that there is no longer a binary on/off for most features. AB testing, feature toggles, stayed rollouts all mean we never quite know which version of a service we are running.
But those cultural norms are also only strong in a small subset of users. Plenty of people vote to agree or disagree, and this was acknowledged by pg. "Don't down-vote to disagree" is something brought over from Reddit and there is some strong resistance to it in some long term users (here longer than six years).
It's worth noting that even syntax highlighting on common languages like Java is currently messed up on Github. I hope it's fixed eventually, but kinda lame to take something that wasn't broken and break it.
GitHub: a product so close to our hearts that even extremely well meaning changes that really do help in many cases can sting sometimes.
I normally wouldn't understand this type of thing (others say they don't see the problem and it's quite clear where they are coming from), but in a way I _do_ see the author's point of view. When you build something people really care about, any change, no matter how minor, has the opportunity to impact someone. That's why we all build things, isn't it?
I understand the author's exasperation, but the post is surely filled with loaded words and wild speculations. It may get the message across, but this is not the way to start a conversation.
Rendering of reStructuredText disappeared in GitHub Enterprise ~6 months ago... I wonder if the reasoning was the same. More limited resources in the GHE VM environment?
Racket is a fringe language. Github has about as many Prolog repositories as Racket repositories.
If Racket syntax highlighting was causing performance issues that were noticeable to Github, performance must have really sucked. Why should Github let Racket drag down its capacity?
Is that the message an open-source vendor wants to send to developers: your code is welcome, unless it's a new and unproven "fringe" language?
How would a "fringe" developer feel about that vendor's brand after their language has become successful? Would they like/trust/recommend the vendor to upcoming developers?
"""pygments.rb had an interesting history of trying to use a Python library in Ruby on a high-traffic web site. They had tried various approaches. The final approach — piping to a long-running Python process — seemed to work well. """
That doesn't fill me with a lot of confidence in this thing.
It is still difficult for me to express my opinion of this move without simply resorting to strings of profanity. Frankly, I suspect Atom has more to do with it than anything.
Why is Atom more strategically important to Github than existing developers? Is there an economic belief that Atom will reach some larger base of untapped developers who don't know how to use other editors?
If a vendor cannot be trusted to host your content without regression, why would you trust that vendor to supply your mission-critical editor?
No, but the article highlights that maybe they are using Github's traction to get a better syntax highlighter for Atom. As in, instead of waiting for a large community of Atom users contribute Atom syntax highlighting, they simply push the same engine onto Github and let that community take care of the syntax highlighting they actually care bout.
If this were the case, it would be slightly evil, but very clever business-wise. But we don't know, so all this is is a small conspiracy theory.
The approach seemed to be, if things break, people will report it and we’ll fix it.
While this may not be the best approach, the number of languages supported is too high for a person to check each one manually. Generally, I imagine they wouldn't expect a change like this to break anything significant.
[people] use it as a portfolio. [..] To suddenly doink the appearance of people’s portfolios is unfortunate.
It is very unlikely that syntax highlighting errors in GitHub will affect someone's chances of getting a job.
Sure, this switch could cause some issues but they don't seem to be severe enough to kick up a fuss over.