Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Twenty years is nothing (deprogrammaticaipsum.com)
225 points by ingve on March 4, 2024 | hide | past | favorite | 285 comments


Young devs won't believe it, but until the early 2000s, most companies used source control systems that forced you to LOCK any file on a centralized repo before you were allowed to modify it (MS's SourceSafe, ClearCase, Perforce)... This was often the case even if you just wanted to touch a file locally for some experimentation... insane.

So, at a customer site (while I was doing consulting for them), I couldn't stand it anymore (I had been using CVS myself since 1997). So I installed SVN to work on their project and showed it to the team. I was called an "irresponsible engineer"... "Modifying files without locking is a crazy! You don't do that on real engineering teams!"

The open-source world was at least 10 years ahead.


Young devs won't believe it, but in the late 1990s, quite a number of software companies didn’t use source control at all. You just copied source files from/to a central location, and from time to time made a version_x.y.z copy of the source directory.


I briefly worked for a company as recently as 2010 that didn't use source control at all, and didn't even do the "copy source files and version their file names" practice. Whenever they had to cut a release for an angry customer, they'd just find an engineer whose local copy actually built successfully, bump the version number in the code, build on his workstation, and release that. CEO didn't want his software engineers doing anything but writing code, including setting up source control, tooling, automation, tests, documentation...


this rings so painfully true for me. Except it was in 2008


Devs who haven't worked in industrial controls won't believe it, but the overwhelming majority of critical OT software still doesn't.


Because the lunatics who wrote the software turned it into a stupid contraption where the source files are mummified in xml and entombed in a zip file or some other proprietary binary container. PLC coding standards should be taken out back one by one and shot twice in the head.


As a Rockwell user, I can't agree more. But check out Copia Automation if you want to see an improvement to this problem.


What's OT?


Operational Technology.

It's basically IT but for industrial operations instead of commercial.

https://www.cisco.com/c/en/us/solutions/internet-of-things/w...


Thanks a lot!


Also production servers with /etc full of files appended with .bak, .bak.oct, .bak.incident, .bak.goddamnit2


I spend way too much time thinking back on how many things like this I've left around


You may be the only one. I wish the vendors I deal with spent even a nanosecond thinking about leaving stuff like that around. (Let alone multi-gigabyte copies of production databases, long-since obsoleted patches, etc.)


Or possibly RCS, which was kinda-OK for config files.

https://en.m.wikipedia.org/wiki/Revision_Control_System


I remember these times. But diff tools were still used in the 90s even when no scm available?


I see we worked at the same place.


True story, I was around in the 1990s so I was aware of those practices. Really advanced people would have a scheduled job running every hour (or on some basis) to keep an old copy just in case it was needed, usually a zip or tar file. So, a really crude versioning system.

Fast forward to 2008 and I am working on a project where we had an older person working on the project that would do this, use a scheduler to create a zip version. The younger guys I worked with said it was a crazy system and didn't understand why someone would do that. I told him it was an advanced practice just 10 years ago. Basically, people did a lot of stuff that would seem crazy now given the tools we have now.


> I told him it was an advanced practice just 10 years ago

This is a problem that can affect everyone, especially those working in more isolated areas. You come out with a great workflow in year X that's 5 years ahead of the industry, and it works, so you never progress.

Meanwhile the industry catches up, solving the problem another way, but you never learn because it's a step backwards, until it isn't, and then it's a massive learning curve.

You have to actively keep up with what the industry is doing and be willing to adopt the new ways of working even when they are in some ways a step backwards.


Goldman Sachs set up continuous integration (with automated testing etc) back in the late 1990s / early 2000s for the eco-system of their in-house language 'Slang'. Great system, way ahead of its time. (The language was also pretty neat for its time. You can tell that the people who came up with had a Lisp background. Think of it as a worse version of today's Python. And probably on par or ahead of 1990s Python.)

In any case, because of when they built that Continuous Integration system, it was all jerry-rigged on top of CVS. Problem is, they were still using that system for Slang 15 years later.


The last I heard companies using the naming convention version XXXXXX.extension where each X is a feature flag was in 2023. Bringing source control into practice was an controversial ideas. No, I didn't work there


Both of you are right. Right kind of right :).


Young devs won't believe it but that is still the case today in parts of my company


I remember this. It pushed the team I was on to adopt CVS (this is before SVN).


Same. This was in 2003.


Ha, that's still better than editing files directly on a shared drive.


My first job in 2012 was still working this way....


Old dev here--that SourceSafe behavior could be disabled entirely ("Allow multiple checkouts" in VSS admin), and you could undo other people's checkouts. It wasn't nearly as bad as people today make it out to be. The merge conflict UI was really nice, actually. Back then, as it is today, the bigger issue was the choices companies made in setting up their environments and the rules they set for their developers, rather than the shortcomings of the tool.


If people knew about it, sure.

I was tasked with 'finishing up' someone else's work who was away on vacation. All his code was in sourcesafe... locked to just him. And he was away for another 9 days with no cell phone or email (not really a big thing in 1999) and... I was immediately getting pestering emails from the PM. "When is this getting finished? Ben said it was nearly done, just needed a few more bits.". There was pretty much nothing I could do. VSS was something he'd set up on a server that only he had a password to, and... no one had thought to have him to coordinate with anyone before he left on vacation for 2 weeks.

That was one of the first times I felt "everyone else just thinks we're interchangeable cogs". Possibly the first time. If one of the PMs took vacation, or one of the accounting folks, there was some defined handoff process "to keep things running smoothly". No one seemed to give a toss about "devs" in that case.


SourceSafe was rather nice with its integration too. Right from the GUI (amazing!). The downside to SourceSafe was its tendency to randomly corrupt things and then you get to deal with that for the rest of the day instead of working on things.

Locking had some nice side effects. The devs talked to each other. Things where someone was getting in the way all the time showed where the code needed to be abstracted better. In practice it was not that big of a deal (all in 1995).


I worked on a project that used SourceSafe for several years. Never recall any corruption.

What was really different from today though is that most developers did NOT run the application locally or in an individual environment. They all worked together in a common dev environment. It was either too difficult or impossible to run copies of the application locally. So that is why file locking was used. You had an entire team of developers simultaneously working on the same set of files.


That was not my exp at all. We ran out of our local source dir. Think my manager would have had a fit if the checkout failed to build on his machine. We tried to keep our dev machines as close to what an end user was going to end up with and keep all the devs in sync with each other. So if an issue came up you could have a few people look at it.

You could also 'break the lock' locally then fix it up later (though this was discouraged). It also kind of forced you to make sure what you checked in built. As you would have 5 other devs coming over to ask why you broke the build. It was a manual process. Most of that sort of thing is handled by CI/CD type systems now. You can break the build on your branch. But no merging up of that junk...

We used a rudimentary CI/CD to enforce 'always builds and runs'. Someone wrote a bit of script that would check it out at 5PM and kick the build on a 'blessed' machine. If that failed the people who checked it in were on point to fix it the next morning.

It was not better or worse. It was just different and more manual. Manual though means steps get forgotten or skipped.


when i was in high school i did an internship at a family friend's ASP.NET shop; we used VSS and I once got a call from an irate coworker who shouted me down for minutes for keeping a file locked after I wasn't using it until I started crying. I wasn't the one who had left the file locked this time; i'd done it in the past and he assumed it was my fault again.

sure do wish someone had checked those boxes for him!

sometimes it's still a surprise to me that I decided this is the career i still wanted to pursue after that experience but i haven't been shouted at like that since then.


We were a small team and loved visual source safe. I also wondered about his claim that he couldn't download a copy - yes, he was unaware you could allow multiple checkouts.

We were forced to migrate to SVN which I remember as being a pain compared to VSS.

The worst was our GIT migration. At the time it was not an easy Integration with visual studio. The process of migrating history was a challenge. We lost control of the process - suddenly GIT and corporate defined the process which for a small team was overkill and burdensome. The builds became laborious and time consuming.

This was years ago. I'm sure they've worked all that out now.


> Back then, as it is today, the bigger issue was the choices companies made in setting up their environments and the rules they set for their developers, rather than the shortcomings of the tool.

I partially agree. But: one of the really nice things about distributed version control systems like git is that the guys running the servers can't make rules about what you are doing locally.


> It wasn't nearly as bad as people today make it out to be

This flies in the face of my personal experience (first 7 years in industry was before Git adoption), and also in the face of what most developers at the time said.

I mean, just look at Joel Spolsky, one of the most popular developer-bloggers, who also founded StackOverflow, Trello and other things. IIRC, he wrote a blog post calling the invention of DVCSs the most important innovation of the last decade of innovations, then he put his money where his mouth is by creating a new product specifically built around the idea of DVCSs. (Though he made a "bad bet" by focusing on Mercurial instead of git, IIRC.)


The open source world seems to always be decades ahead from "most companies" on every single dimension.

Just look at all the craziness about being able to manage software dependencies by publishing different microservices versions, or all the emphasis on software testing without any other kind of correctness verification getting the spotlight.

Oh, and those are the advanced companies out there. Most of them aren't even there yet.


visual studio's debugger is miles ahead of gdb and has been for 30 years


...package managers to (un)install software. RedHat Linux had RPM since 1998 at least


We’ve already come full circle.

Nobody uninstalls packages from a Docker container…

With immutable designs, we’re just reinstalling the whole OS on each update…just a bit more efficiently thanks to cheap storage.


Slight tangent:

Back in the primordial olden days, people used to patch binaries. These days, we just compile them from scratch, when we want to make a change.

So it's not too surprising that people apply the same approach to the whole OS installation.

Mutation is hard to get right!


With ephemeral NixOS, I'm reinstalling the whole OS (sort of) on every boot!


> Young devs won't believe it

Sure they would. They believe anything. I made up some nonsense one day that bytes weren't always 8 bits and they believed that shit.


Umm, how to break it to you...


I suspect you are joking, but bytes weren't always 8-bit long, were they?


I think the OP was being ironic.


They were, if you were born in 1993?


Was this ironic? Bytes weren't always 8 bits.


Baudot enters the chat... 8)


12-bit words enter the chat


> The open-source world was at least 10 years ahead.

I'm not sure it is the win you're suggesting. I had brief stints with CVS and TBH it really sucked.

Luckily when I started coding seriously (which coincided with my first job), SVN just reached 1.0 and it was OK. This was 2004 already.

IMHO, the whole situation was fixed only after Linus decided he had enough of the crap and wrote his own.


CVS certainly sucked by today's standards, and SVN was indeed a big improvement, and Git much more so.

All I wanted to stress is that the "lock-before-you-can-touch" approach was the norm in the enterprise world at the time. IMHO, this fact alone made SVN (even CVS) better than all the commercial products which I used at the time.

When some of the "enterprise" world started adopting SVN and commercial products started relaxing the locking mandate, the open-source world was already embarked on the decentralized model (git, darcs, arch, mercurial, etc).


I used ClearCase quite a bit (and other (ir)Rational tools) at a previous job. It sucked pretty hard. But, it didn't _require_ you to lock files. It _strongly encouraged_ you to, and made working with hijacked files a pain in the ass, though. In practice, this is largely worked around with branches, though. And, branches from branches... Kind of mimicking the distributed workflow we know with git with more branches.

But, yeah, ClearCase sucked pretty hard and was frustratingly slow. Also, too easy to miss adding files to source control and really annoying interface to _find_ view-private files. 0/10, would not use again unless I had a gun to my head.


At the time we thought our tools were good, and really meant it.

Now it's different, we think our new tools are good, and we really mean it. But it's different.


> I'm not sure it is the win you're suggesting. I had brief stints with CVS and TBH it really sucked.

What had you used that was better? CVS seemed like quite a step forward compared to RCS and the other available options.


The point I was originally trying to make was that they all sucked before SVN came out.

Especially when compared with the options we have now, I think it's slightly misleading to say OSS solutions were in any real way better back then...


Hello Bitkeeper, hello Mercurial!


That wasn't the best bit. Try having having someone in the organization leave without checking back in his or her code before they left. Guess who's IT department is getting a phone call to manually break the locks. ;)


Or the guy who really just needs it:

"Hey, Alex. I saw you've head BasePage.aspx locked since last month. Mind if I work on it for a bit?"

"I still need it."

"Any idea when you'll be done with it?"

"No."

This is what the waterfall model was made for.


Even better: You could set your system clock to a future date, check in some code, roll your clock back, and leave the company. Version Control Time Bomb!


I was at Intel in the late 80's through the 90's. We used RCS for chip design (storing IHDL and validation test suites), and SourceSafe for any group writing windows device drivers (MASM 6.1 anybody?).

You reminded me of locking the centralized repo, I completely forgot about the headaches that caused! I'm now recalling how painful source safe was. I wonder if trauma caused me to forget.

Eventually everything migrated to SVN as part of the Linux migration, including windows desktop revision control. I was so used to the SVN/CVS patterns that I strongly resisted git later on after I'd left. I'd say it took me a good two years of grumbling before I finally opened my mind to git, and in hindsight it's brilliant. The biggest thing that I had trouble letting go of was having a monotonically increasing version number of the repo. That felt ... safe. I mean, now you just have to be liberal with tags.

I agree with the OP, git couldn't work 20 years ago on desktops because drive sizes were too small. Sure it could work on distributed computing, but AFS drives were slow, and NFS quotas were teensy. Git only worked because drive sizes exploded.


I've been using perforce continuously since 2000. It is rare to lock files. The only case I'm aware of are things that are binary blobs to perforce, eg, if you open a windows .doc file to edit it. But 99.9% of the time I'm operating on text files of one kind or another and perforce doesn't lock it; it is the usual merge and then sync/resolve if anyone has changed the file since you opened it.


Seems to be that CI was also nonexistent then.

QA was one guy doing manual testing with a checklist.


QA was the dev doing manual testing with one eye closed...


Young devs won't believe it, but until the early 2000s...

...a lot of companies didn't use any form of source control at all. :)


In the late 2000s I worked at a lab that used subversion, which I also found annoying after getting used to git. My hack was to use git-svn - it let you locally use git, and eventually sync your work back to svn. Worked like a charm. The rest of the lab soon got sold on git afterwards, though.


Perforce is still in wide use today in certain types of organizations.

In all the configurations I'm familiar with, one can unlock a file for modification but the only effect it has on others, is that they will see that you are modifying it in one of your workspaces.


I worked on a system with more or less this workflow…

In 2018


"There are two interesting contenders worthy of mention: Pijul, written in Rust"

Why is it that every time something is written in Rust, it needs to be mentioned, just as that the language of choice would make it magically much better?

Version control is just a lot of bookkeeping so I doubt possible added performance (outside some well defined algorithms) can matter much.


I can answer that question, as both the author of Pijul and as someone most Rust zealots usually don't like very much.

Nothing magical in Rust, at the time it was the only language that worked seamlessly on Windows and Linux, and made it easy to manipulate on-disk datastructures, which is what makes Pijul fast. The new take on version control that Pijul brings can be roughly described as seeing "files with conflicts" as a CRDT. It was therefore crucial to be able to manipulate complicated datastructure without loading anything, and Rust seemed to let us do that without having to do C++ and spend our time debugging allocation issues in our tricky algorithms, while never being really confident that we were done.

The library I wrote for this, called Sanakirja, is actually faster than the fastest C equivalent, but this isn't because "Rust is magical", far from it.


Is there a good video introduction to Pijul?

Is there an online tutorial where you are given small tasks to do in Pijul?

I'm curious, but not willing to read in dept documentation just yet :)


Because mentioning it works. As somebody who uses many different languages professionally, I immediately assume projects written in Rust are more reliable, more robust, and more efficient. My own projects are higher quality when I can use Rust to implement them, despite having much more experience with most of my other languages.

As a user experimenting with tools, ones written in Rust have justified these prejudices. They simply tend to be better.

It's an effective advertisement. I'm a little confused why there's always someone who brings it up. It's not like programming language makes absolutely no difference. If it's written in C or C++ and it's relatively young, I'll expect it to be unstable and to have a clunky and unpleasant interface. If it's in Go, I'll expect much better in stability, but with a funky CLI. If it's Python, I know it uses argparse, so at least the CLI will be consistent. If it's Rust, I expect it to be fast, to have very few or no panics, to have no memory issues, to have zero null pointer problems, and to use clap for a very consistent, predictable, and good looking CLI.

The language has built a good reputation; I'm shocked that commenters on HN have somehow avoided picking up on that and are perpetually surprised that it has a real audience.


I personally find the mention of the implementation language as part of a project description to be slightly interesting/informative, or at worst benign. Unlike the inevitable comments complaining about it.

Mercurial, written in Python, definitely experienced performance problems. Hilariously, the first search result I found when searching for "mercurial performance python" is https://wiki.mercurial-scm.org/OxidationPlan, which discusses reimplementing parts of Mercurial in — you guessed it — Rust.


Thank you! And informative.

Some days are just like that. Your synapses trigger on something because of historical reasons.


> Your synapses trigger on something because of historical reasons.

When you read a product description written by someone who hopes to draw positive attention to the product, it'll usually draw attention to its most important and distinctive features first.

For example, if you look up Stripe, you'll get "Stripe | Financial Infrastructure for the Internet - Stripe powers online and in-person payment processing and financial solutions for businesses of all sizes."

If you look up Uber you'll get "Get a ride in minutes. Or become a driver and earn money on your schedule." Clang? "The Clang project provides a language front-end and tooling infrastructure for languages in the C language family". Ubuntu? "Ubuntu is the modern, open source operating system on Linux for the enterprise server, desktop, cloud, and IoT."

When the place you usually see the product's best feature says "written in rust" it sounds like they're damning it with faint praise.

Of course, not everyone follows the 'best feature first' pattern. For example, when asked about Stripe the first thing Wikipedia tells you is that it's dual-headquartered in South San Francisco, California, United States and Dublin, Ireland.


Wait, nobody in the Pijul team brags about being in Rust. If Pijul were written today it would probably be in Zig, for many reasons.


I'm fascinated by this. I like both Rust and Zig, but I would have guessed that for the kind of complex data structure manipulation required for CRDTs and other clever algorithms Pijul employs, the degree reliable correctness Rust gives you would win out. Are you doing something (like entity-component models?) that disables much of Rust's smarts? Or is it just that much more pleasant to program in Zig?


I've done very little Zig, so this may not be very informed. Indeed we don't use Rust intrinsically that much. With associated type constructors, Rust would be the best language (I know of) to write Pijul in. Without it, most of the polymorphism uses macros anyway. Zig or Rust would make little difference for Sanakirja I believe, and this is where the entire CRDT thing is done.


As somebody who's interested in both languages and follow zog development relatively closely I'm interested in these many reasons. Can you give a few example of zig features or other reasons that would be useful for Pijul over Rust?


It's mostly about Rust adding lots of features I am not interested in, and not adding the ones I need for the project. Sanakirja was hard to write in Rust, not a single concept of the language matched what I needed, in the end I had to write tons of macros, and the API is hard to use. Zig would have probably made it more natural from the beginning.

There are other things related to the community/zealots/Mozilla/Rust foundation, but I'm not sure this is the proper place.

Edit: Git zealots are worse than Rust zealots, I attribute this to Git being "harder to learn" (i.e. never really does what people think it does) than Rust.


But implementation language is also predictive of the values and style likely to be embraced by a project. I found my current job (as Square) because in 2015, I got on the Atlanta 404 Slack, and asked “Who in Atlanta is doing interesting things in Go?” For Go, that has diluted over time as Go has become more of a default choice, but in 2015, it was highly predictive of a certain sort of work and a certain mindset.

And, in some domains (version control is one of them; type checkers/linters/bundlers/compilers are another), it has proven to be extremely important.


That's the criticism I make to all these services and software that use the privacy argument.

Most of the times, privacy is not hard, you just have to not do stuff (tracking, data collection, etc...), and maybe use some crypto library. But before telling me what you don't do, what do you do? For example, when I see a search engine claiming privacy, I think "yet another Bing proxy", now, what do your search engine have over the other Bing proxies? Better results? Performance? A specific niche? Better UI? That's what I want to know.


No worries. And sorry for my unnecessarily sharp response! :-)


I believe this is twofold:

1. Unless you abuse Rust horribly, it is both memory-safe and fast (which to me implies a degree of quality, unlike Python which makes me more... apprehensive in using tools written in it)

2. Rust is still not a major language but has an enthusiastic community. Rustaceans like the spread the word!


> apprehensive in using tools written in it)

Yes. I have that too; mostly due to the very low quality of code ‘hacked’ in the language. It is the (not so) new BASIC and everyone and their sister are launching things in it that should be considered ‘weekend side hacks’ at most, but are announced as the next best thing.


3. Someone who deliberately chooses to use Rust is someone who cares enough about the program being correct to put up with the borrow checker and all that entails. This increases my expectation of the quality of the resulting program.

4. The resulting binary will be a static self-contained compiled binary. This means a) no cluttering up your environment with dependencies b) fast start-up time


Go also does the latter to be fair.

If it weren't for Rust, I'd use Go. But Rust's development tools are incredible with rust-analyzer, clippy, etc.


The one point on the list that Go doesn't automatically bring is #3. All the others are there.

Also 5) it will keep working over time.

I know that if it's in Javascript, it will take a while to make it run at all; in Ruby I've spent too long reading your site, it's already incompatible; in Python it will break in a couple of years; and in PHP it will keep doing what it promises, but I'll have to work again and again to keep it only doing what it promises.


Well yes, and I'm also more likely to install a random golang tool than a random python tool. :-)


> 1. Unless you abuse Rust horribly, it is both memory-safe and fast

The fact that you don't mention computational complexity as the primary concern here tells me that there is something wrong with programming as of 2024. And not necessarily with Rust per se but maybe all the other languages.

Why should humans bother with anything else than choosing the right algorithm for the task? And possibly not even that. It should be "combine X with Y such that i get Z"

Rhetorical question. I know I open up for many counter arguments, so please choose wisely to keep the discussion alive.

And maybe it is just as simple that Rust is the answer to that question?


There are times when the right algorithm dominates, and you end up using something written in a language that doesn't fit your expertise/deployment system/preferences/etc. because it's the only thing that implements a particular thing well. There are a lot of great tools like Redpanda (C++) where there isn't a memory-safe alternative filling the same niche.

But as a professional programmer for 25 years now, I'd say the fraction of our job that boils down to finding, designing, or implementing the right algorithm is sadly tiny. It's all the messy details that end up soaking our time and attention. One way to increase the amount that algorithms matter is to work at greater scale, which is basically why I moved from "programming at some corporation that does something else" to Silicon Valley technology companies.

The other important detail is that different languages allow for very different kinds of refactoring. I've found the Oxide and Friends podcast to be possibly the most powerful vote of confidence in Rust that I've ever seen in this regard. In their most recent episode -- https://oxide.computer/podcasts/oxide-and-friends/1734108 -- they discuss how Rust allowed them to make sweeping and rapid structural changes to their storage code that they wished they could have been able to do in the underlying layer -- ZFS. (This from folks who helped create ZFS or were at least adjacent.)

If your language allows you to make sweeping refactors safely, it enables you to mutate your code towards better algorithms for the task.

Also, it can give you more time to focus on the algorithms. If I, a novice C++ programmer, allowed to actually use C++ in anger, I'd spend all my time trying to figure out memory safety and ownership problems, not focus on the algorithms. I'm simply not willing to put in the multiple years of hard work it would take to be able to program C++ safely. But I'd be willing to let myself, a novice Rust programmer, use Rust. It might not be pretty, and the borrow checker and I might come to blows, but it wouldn't be dangerous.

(fwiw, I'm really a Go programmer. But I've lurked many programming languages and their communities during their early phases (Go, Rust, Zig, Roc, Perl6), and it's hard to argue that Rust is _really_ good at certain things.)


That's fair criticism. I was writing this under the pretense that you get the basics right and don't have needless garbage between you and CPU execution.

There's also the fact that Rust tends to draw those who are already proficient with computational knowledge. Whereas the typical Python programmer is just a poor dud that never had any formal education in accessing computer resources.


So if i loop back to the original message "Written in Rust", what they mean to convey is that "this is done by smart people" ;)


You jest, but if we are talking average it would not be hyperbole.

But what I want to add to my last point: Most performance pitfalls I have seen are not due to choosing wrong algorithms, but bad implementations driven by messy implementation and low maintainability. Python and JS are absolutely among the worst of the bunch and weak dynamic typing is a terrible burden to maintain and scale.

Mind you, it depends on which domain we are talking about. If you are working close to the metal and you are using something like PyTorch, your performance is indeed determined by choosing the right algorithms and language overhead is less significant in comparison. But most software is just moving things around in memory and do a little networking.


Reminds me of an old joke among auto enthusiasts: "How do you know if someone drives a manual transmission? They'll tell you".


The most profound dilemma in life is a vegan, rock-climbing, manual-driving, Crossfitter trying to decide what to tell you first.


In Sweden she would tell you she is winter swimming.


Only a kid may consider it worthy to mention driving a manual. Manual transmission is the natural state of things. If you can't drive manual, you can not drive.


Are you from Europe?


Yes, I am. Does it matter?


If it's closed source, it makes no difference what the implementation language is. But if it's open source, it matters because part of deciding which open source tools to use is considering whether you would or wouldn't be comfortable fixing issues or contributing features yourself. Rust in particular seems to have developed into a net positive signal in this consideration. But it's also useful information for people who see it as a drawback!


To prove that at least a single piece of novel software was ever written in it. It's an improvement over the decade we had here and elsewhere of people asking devs to rewrite their software with it; a phenomenon big enough to have it's own acronym (RIIR).


> Why is it that every time something is written in Rust, it needs to be mentioned

Correspondingly:

Why is it that every time it is mentioned that something is written in Rust, someone has to ask why it is mentioned?

Two answers for your question:

  * Because Rust has some special qualities, and evangelism is real.

  * Because we, as the developer audience, afford some additional consideration to choices made by the developers of the open source tools we use. Sometimes because we contemplate making changes, or at least reading the code to understand behaviour better.
No good answers to my question, unfortunately.


> "so I doubt possible added performance can matter much."

From the linked "Mozilla moving FireFox to Git" blog post:

> "a Mercurial and a Git mirror of the Firefox source repository [...] I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git"

> "as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone."

> "My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. [...] You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now)."

> "the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes."

And in the comments:

> "There is at least one currently very active (but private) repository that is (much) larger than Mozilla’s at Jane Street. We (Octobus) are working with them to improve a lot of aspects of Mercurial’s speed."


The ancestor of pijul is darcs, whose main weak point was slowness. Git is very simple (simplistic) in its design (in use it is another story), patch-based VC isn't.


I think the algorithms and data structures in Pijul have been carefully designed to be fast over a decade, and one of the original points of departure was making high-level decisions to facilitate performance. I know performance optimization in a lazy language is a special art, but I don't think it's Haskell vs Rust that makes Pijul fast. I'm sure the author can weigh in here.


Yes indeed, we would not have started Pijul without the hope that at least theoretically, patch-based designs could be faster than snapshots. "Patch-based is slow" without any other argument is not a very informed claim, Pijul is actually faster than Git in some cases (in fact Pijul is faster where it matters most IMHO: large files and/or large repos, conflicts and blames). Not because we're better at tweaking C code (we're definitely not!), but because we designed our datastructures like theorists, and only then looked at how (and whether!) to implement things. One advantage we had over Linus is that we had no time pressure: we could well use Darcs, Git, or Mercurial to write Pijul initially (we used Darcs, actually), and it didn't matter much if we failed.

It took a little bit of work to get that down to actual fast code, for example I had to write my own key-value store, which wasn't a particularly pleasant experience, and I don't think any existing programming language could have helped, it would have required a full linear logic type system. But at least now that thing (Sanakirja) exists, is more generic, and modular than any storage library I know (I've used it to implement ropes, r trees, radix trees…), and its key-value store is faster than the fastest C equivalent (LMDB).

Could we do the same in Haskell or OCaml? As much as I like these two languages, I don't think I could have written Sanakirja in a garbage-collected language, mostly because Sanakirja is generic in its underlying storage layer: it could be mmap, a compressed file, an entire block device in Unix, an io_uring buffer ring, or something else. And the notion of ownership of the objects in the dictionary is absolutely crucial: Sanakirja allows you to fork a key-value store efficiently, so one question is, what should happen when your code drops a reference to an object from the kv store? what if you're deleting the last fork of a table containing that object? are these two the same thing? Having to explain these to a GC would have been hard I think.

I wouldn't have done it in C/C++ either, because it would have taken forever to debug (it already did take a long time), and even C++-style polymorphism (templates) isn't enough for the use of Sanakirja we have in Pijul.

remember the "poop" paper about mmap for databases, right? Well, guess what: having a generic key-value store implementation allowed me to benchmark their claims, and actually compare congestion, throughput, speed between mmap and io_uring. Conclusion: mmap rocks, actually.


Wow, thanks for the detailed and informative comments!

Did you write up the mmap results as a paper? Sounds like it would be quite useful. (Is this the paper you're referring to? https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf)


One of the reasons git got popular is that it's FAST.

One of the reasons is that's in C.


Because language is everything. There is no such thing that sets the context such as language, whether a spoken language or a computer language. You would never understand the "true" meaning or words without knowing the language.


Maybe because it's also an attempt at marketing as well.


because it has nothing to offer so it appeals to certain audience. there, i said it.

p.s., i know it's good stuff. but still.


What comes after GIT?

Nothing. I just don't see it, text based software development is at its best currently. Only thing nowadays is AI that helps you to figure out or speed up creating code in text. AI models are all about text based languages, all the low code or image based programming is not good base for AI models - generating images is obviously not going to generate block diagrams for working software as good as text generation.

So text manipulation is going to stay, we might have additional tools like AI to quicker create/process text and GIT is already best model for keeping history of textual changes and yes we want that history, because we still need to be in control of complex systems that are described by textual representation in whatever language is used.

People who find reading and writing tiring and nuisance that should go away are not going to change it. Even if neuralink happens to be everywhere I don't think it has a way to write information back to the brain so I think we are stuck with reading for quite some time. While images are much more information dense, text can be very precise and one cannot describe complex system with images/feelings only.


> GIT is already best model for keeping history of textual changes

No it’s not. Git isn’t a model of textual changes. It’s just a model of textual snapshots connected over time.

If I were building a replacement for git, it would store character by character changes instead - though for that to work properly you’d want editor integration. Then the same tool could act as both a version control system and as a platform for realtime collaborative editing (via CRDTs). You can still have commits - they’re essentially just tags. And I’d keep the concept of branches.

Most of the algorithmic work has already been done - at least from the CS research side. Combine my work in diamond types with some of Martin Kleppmann’s work on BFT CRDTs & automerge. There’s a few more tricks needed. For example, you don’t want a hash on disk per keystroke. I’ve figured out a nice answer for that but haven’t written it up yet.

Deleting data is an unsolved problem. For example, if someone accidentally pastes in a password you need a way to prune it from the history. And async merges probably want to emit conflicts - which nobody has done on top of CRDTs yet as far as I know.

If I were building it, I’d also throw in a simple crdt based data store, synced over the same protocol. Then we could also store GitHub issues and other stuff inside the repository itself (a la fossil). CRDTs resolve conflicts better than git ever will, because they have access to more information than git does about what happened.

It’d be a pretty cool model. Definitely worthy as a successor to git in my opinion.


> If I were building a replacement for git...

The biggest problem with git is anything that isn't just text (e.g. binary files). What's needed is a source control system that understands binary file formats well enough that it is capable of producing a user (developer) friendly diff.

For example, if git understood PNG you could do a diff on two versions of that PNG and it would show you that e.g. a color was changed or a TM symbol was added or similar.

I think git will be with us for a long time but whatever replaces it will be able to handle binary diffs in an intelligent way. Probably with AI-generated summaries.

Even then it might still be git... Just an improved version.


Agreed -- the problem with binary files is twofold; one is a storage issue and the other is a presentation issue. Both are solvable without having to violate git's fundamental data structures.

If a png is stored in a repository and changed, then logically the repository contains two snapshots identified by hash.

When a user tool (or git itself) wants to display the diff, it can be as creative as it wants in figuring out how to render that diff, including understanding file formats.

When git itself wants to store the two images, it has to do so with bitwise fidelity. Nonetheless, as with text files, it is totally possible for it to represent it internally as a delta-compressed view of the file, which, for an image file format, might be more compactly represented than by looking at the binary difference between the files. So long as this can be done deterministically and reproducibly, this is something that can make storage of large files more compact over time.


Character by character changes is almost certainly too low level. I understand wanting symbol by symbol. Heck, I even understand pulling it up the abstraction ladder and look at it at the semantic level. Having a diff of the form, "variable name changed, conditional added, etc" looks magical. I recall seeing some good demos of tools that could do this.

Unfortunately, none of these efforts have ever panned out. I cannot claim that they are unable to work, but they have not done so, yet. And I'm not clear on why that we think they will.

From evidence I have seen, there is no magic data structure that can help resolve conflicts. Pretty much period. The only algorithm that always works is to bounce it back to a user. With LLMs gaining in power, I suppose you could lean on one of them for the first non-mechanical pass?

Which sucks, because quite frankly CRDT and friends are fun data structures. But, they make the easy and trivial cases much much harder. Which is why a series of snapshots is far more effective. Those snapshots are not for the convenience of the tool tracking changes. They are for the convenience of every other tool that you are using.


> From evidence I have seen, there is no magic data structure that can help resolve conflicts. Pretty much period.

CRDTs (and similar algorithms) can do a whole lot better than git does today. They simply have more information to work with about what changed, and where. Git has to reconstruct that from re-diffing snapshots - which is always a lossy process.

It hasn’t been done because until recently (the last few years), CRDTs were considered too slow for practical use. It’s just recently that we as a species figured out how to make them fast and lightweight.


I feel you are slightly cheating on this, though. I'll explain. If you are only storing snapshots, I worry that you actually cannot do better. If you are willing to store the snapshots as either a list of operations from the last state, or you can decompose state changes to something coherent, then you can maybe do better.

That is, the cheat, I feel, is that most of what anyone does with anything from source control, is to work with the snapshot. Building a diff between trees is surprisingly rarely done. Yes, you see the reconstruction of one every time you run `git log -p`, but by and large, that is not what you work with. (And this is ignoring that you often want to see the diff between two tree heads...)

Now, you seem to be pushing to change this and to have people always work with the CRDT for everything they do? Does this not push it so that the source control now becomes something that every editing tool in use has to also work with it?

Aside, I'm also not at all clear that CRDTs have gotten to the point that they can do octopus merges at a reliable rate. More, I'm not clear on what merges actually look like with CRDTs, in a source control sense. Happy to see any reading, if you have a link.


I don’t understand how I’m “cheating” - at what? Character by character editing traces can be made incredibly efficient with some tweaks. Is that what you’re getting at? In the example traces I’ve gathered, the inserted characters themselves still make up the majority of the resulting file size. That is, files on disk are almost always smaller than 2 bytes per keystroke. And we can always reproduce a document at any snapshot state - quite easily, too.

I don’t have a link at the moment, but I’ve ported editing traces from git (by re-running diff match patch) and used CRDTs to do octopus style merges. They merge fine - just as you’d expect. (As I said earlier, the hard problem is adding back any conflict markers so humans can eyeball the result).

There’s nothing special in doing branches. You just only merge the set of edits (crdt messages) that show up in any particular branch’s history.


Yes, that is what I was getting at. The cheat is that none of that information is available in the snapshots. And we use the snapshot, as that is what all of the other tooling creates and works with.

The fact that most people using git don't refer to the snapshots as a thing is pretty much how invisible that is to all of our tooling. It isn't a special thing, it is the thing.

If you are pushing for a state based crdt that can be reconstructed from snapshots, then you aren't needing to change get as a whole, but only the diffing tool? If you want all diffs to be based on the edits users did, you aren't wanting to change get, as much as all of the tooling users use. No?


Logically, yes, git is a list of snapshots over time.

There are two issues that you touch on here, though. One is data storage. There is nothing stopping git from storing those snapshots as deltas from previous snapshots, so long as it is able to reproduce the tree at any point in time with bitwise fidelity.

Already, internally, git will store deltas to compress the store instead of storing content for each changed file.

And at display time, it can make inferences about what has changed between two snapshots. So over time history gets "better" as it tracks file and line moves better just by making better heuristics around how to present changes to the user.

The main defect you point out is the immutability of history, especially when it comes to deleting history. I don't really have an answer for that, and I'm not sure that this is a solvable problem when you have a distributed version control system -- no matter how much you try to purge something, someone somewhere can have a copy of the unpurged repository so the idea of a definitive "delete" sounds incompatible with a distributed vcs.


I can see why in your head that may look like a successor to git, but the... news... be it good news or bad news for you... is that everything you've described is actually almost entirely orthogonal to git.

When researching the question of "Why did X occur?", I absolutely, positively, beyond a shadow of a doubt, do not want to filter through literally hours of keystroke-by-keystroke data. I need some sort of indication as to what states are worth looking at. Those are commits. Whatever further thing you may want to say about them, like "well, whenever the tests all pass we'll add a checkpoint", well, you can do that today. If you don't, it's either because you haven't thought of it, or the resulting multiplicity of commits is already too much to deal with.

Moreover, just having a CRDT history of keystrokes is an inadequate data structure for many things people do in git all the time. Merging a branch based on keystrokes is crazy; all you end up with is more opportunities for conflicts than the current system has. You think more information is good, I suspect in practice it would actually be bad. Git has no problem merging things where five people were working on a given repo at the same physical time, but on different things; a CRDT model would have to do weird things to have a sensible merged view after that, because a CRDT system is concerned with creating the final document; a source control system needs a human-meaningful history as well. (You can't just interleave their work, because the result moment-by-moment view would never make any sense. You can keep track by a branching mechanism where you can only go down one of their work path, but the natural implementation of that would have to not have the other branches available either. The problems go on; you actually end up with a much less flexible approach to slicing & dicing code without a lot more work, which is quite likely PhD-level work, assuming it's even possible.) Rebasing, if you are OK with it, effectively breaks CRDTs as you observe (it's the same basic issue as deleting something). CRDTs may effectively handle offline work getting merged back into the "main" online view but what CRDT setup is designed for browsing through history, and potentially pulling things back out of it?

I think if you actually tried to manifest this as the root level of a version control system you'd rapidly find it is full of problems, both theoretical and practical.

However, you could adjoin these capabilities to something like Git (or Fossil or Mercurial or whatever, I don't much care which). It's just instead of trying to build a single CRDT history that goes all the way back to the beginning of the repo, you adjoin a CRDT representation to commits, but allow each commit to serve as the base of a series of CRDT transactions on its own terms. So a merged or rebased commit could still be traced back to its original, which would contain all your information that you're looking for, but the general manipulations of commits would still look like a current system. IIRC, git allows you to stick arbitrary metadata on to commits which can themselves be blobs, so if you have an editoring system that can emit CRDT information from an edit stream, you could prototype this right now with a bit of wrapping around Git (or, again, any other source control that has that capability) that adds the CRDT stream as metadata to the commit. Get a feel for how it works, see if it is as useful as you think, and then if it is, you can go forth to prove the rest of my message wrong by trying to build something where it's the base abstraction instead of an adjunct.

I would also encourage you with the fact that Linus wrote the first, useful version of git very quickly. A slick, polished source control system is a big task, but a source control system, one that mostly works, may ignore some corner cases, doesn't have any handling for symlinks, punts on file system case differences, probably has O(n^2) algorithms that break if you try to stuff it full of gigabytes, is covered with disclaimers to store no real data in it yet because it may break, etc., is actually not too hard. Proving my skepticism wrong via prototype is not necessarily that difficult.


As I said above, I imagine keeping commits. Think of them like checkpoints in a race. Any typed character will be between two commits - so git log and git blame can still work just fine. They just operate on a different data structure internally.

Merging is also fine. How do you think CRDTs work? You know merging is the entire point, right?

I agree that the system needs to handle both synchronous edits (peer programming) and asynchronous edits (the kind of editing we do now with git). CRDTs can handle separate concurrent branches just fine. It’s just none of the existing crdt libraries support branches.

You’re right that rebase is weird in CRDTs, but I think we can still build it if we need to. Rebase is mostly used in git to work around one of git’s design flaws: Git is never sure if commits should represent the actual history of changes, or the semantic history of merged tickets. But a better system could just separate those ideas anyway.

Yes, it is a PhD level problem right now. But so was diff-match-patch a few years ago. I’m an expect in this field. And I believe we now know how to solve all the sub problems here - all but one: how do we actively generate conflicts from asynchronous edits in a crdt? That needs to be figured out first.


Nice idea

What if I'm trying to fix a bug, and so testing a file, while someone else's edits are being applied to it. That would be painful

CRDT would be great in many ways, but I think for programming I would most often want to reduce the variables, and not see other people's edits until commit/merge

Would CRDT still be useful in that context?


Yeah. In the tool I’m imagining, you can just each edit in a separate branch and merge when you want to, just like you do now. The data on disk can be a crdt - but that doesn’t mean other users’ changes need to be merged as soon as they’re made.


> I need some sort of indication as to what states are worth looking at. Those are commits.

I would welcome some tool that could guess/infer where some commit points should be, even if I didn't explicitly commit at those points. I often tend to jump about between files when debugging/tracing a problem, making some changes here and there, then... when it's fixed... I don't always have a good frame of mind to commit as cleanly as I'd like. Something that could try to group changes together - by function, or type of change, etc - and propose those commits would be helpful. I've not tried some of the coding AI tools in that respect - maybe they do that now? I've only ever read about "they can write good commit messages for you!" as a selling point.


Man, just a polite friendly advice, whatever message you are trying to convey would be 10x more efficient if you cut word count to maybe 10%. Few want to read small novels in discussions.


I don’t see how character by character changes are an improvement.

The same with handling image files.

Both of these “downsides” don’t make sense for me. I make set of changes that make sense, if character change makes sense to be historically relevant I make a commit. Yes it is a lot of work but I make choice what is relevant and what is not and put it in a commit. (changing text layout and white paces can be relevant on its own ;))

On the other hand I fail to see how bit level changes of an image makes a difference. Image is all or nothing the same executable file. Patching parts is 99% of time not relevant or simply not something people do or care about.


Why?


> GIT is already best model for keeping history of textual changes

Git doesn't even keep history of changes, just snapshots before and after the changes. A very common problem is viewing history of a single file as it gets modified and renamed - this information just isn't stored. It's common for tools to show file history incorrectly - i.e. a previous version is removed and the new one (with small changes) magically appears.


This. Git is not an Edit history. Another surprising (at least to me) thing is that you can't add empty directories (without placeholder files) : https://stackoverflow.com/questions/115983/how-do-i-add-an-e...


This is not an uncommon behavior in source code control systems, e.g. both Perforce and Mercurial behave like Git too.

E.g. see https://portal.perforce.com/s/article/3447 for Perforce.


Yet, it's wrong.

Just the fact that people keep using placeholder files should be enough to convince anybody. But if you want to use git for software development, well, almost all development conventions mandate directory intentionally kept empty or to exist before their contents exist. I've never seen anybody decide on something that doesn't.


> almost all development conventions mandate directory intentionally kept empty or to exist before their contents exist

Never heard about this, nor can I imagine what purpose it could serve.


As an aside, aren't folders in *nix also files ? So how did this happen with git, written by Linus ??


Object stores like S3 work similarly, the entire path is to an object, and unless there is data at a path, the path doesn't exist. And you don't need to create the prefix before storing something with that prefix. That the tooling abstracts that a way to make it look more like a filesystem is a layer on top.


My pet peeve is that in enterprise development is that files grow into monsterous god objects: ten of thousands of lines long. There is no way to track that a single file was split into multiple files. They are all sucessors of it. When I go to a split of a file I want to see the history and blame of the method not just "brand new, created yesterday". This has led to the "pattern" of never moving anything because you will get blamed for it in the marginalia and it will be up to you to pull the old version and find the real author.


Have you tried `tig`? I can't remember trying out exactly this, but I wouldn't be surprised if it has better support than `git` for this kind of thing?


I would argue the reason that git is the answer going forward is that what you describe is a UX issue, not a data structure issue.

There's nothing stopping a UI tool from figuring out that a file was moved or split into multiple files -- it just has to do so from looking at the state before and the state after. Git has gotten better at this over the years and keeps improving, so newer versions of git will be able to look at existing git repositories and synthesize better views of the history.


> Git doesn't even keep history of changes, just snapshots before and after the changes.

This seems like splitting hairs given you can trivially derive one from the other.


Git doesn't do stacked commits well (you want to rebase all downstream local branches when you modify an earlier commit) and doesn't do commit revisions well (tracking changes to earlier commits).

Git hews too closely to immutable history, which is in tension with the simultaneous goals of code review and async development.

If code review is blocking for merges and results in comments that require altering commits, how to you continue work without waiting for review to complete?

You stack changes. But the problem of propagating review-driven edits through to your stacked changes remains.

Gerrit tries to fix the above on the server side by requiring every commit to be marked with a change id so edits to commits can be tracked. But it's not a great experience on the client side.


> Git hews too closely to immutable history

Is it a bad thing? How can you have a consistent decentralized system without an immutable history?

If you alter a commit, you create a branch, it may not be explicit, it may be called another name, but there is no way around it. An altered copy of any kind is effectively a branch. In the same way, if I take your modifications to put them in my own workspace, then it is effectively a merge, no matter how you call it.

And long as you allow clones or even just backups, the history is immutable whether you like it or not. Changing a commit on your server won't change it from everyone's computers and tape backups, the real history is there, somewhere and no review is going to change that.

Git simply makes all that explicit. The history is immutable (it is effectively a blockchain), and every change is a branch.

Note: while the git history is immutable, the branch names are not, so you can simulate rewriting history by creating a new branch with the altered commit, and giving it the original branch name. The original branch will then be nameless, and may be removed from the server if garbage collection is active, but it won't make it disappear as it can still be present in clones.

As for change ids, it is not about commits, it is about tickets, something that git doesn't take care of, but it can be done at a higher level. Git only tracks changes, it doesn't care about your workflow. This is what Gerrit, GitHub, GitLab, Bitbucket/Jira, etc... are for.


The link between revisions of a commit is not explicit. Actually immutable history is fine, as long as all history is captured. But git doesn't capture all history, instead making revisions to an earlier commit is done by "rewriting" history and orphaning the older commits, instead of referencing them as earlier revisions of a persistent mutable-until-committed unit of change.

The only git-native way to capture revisions is to use separate commits for every edit, and avoid squashing on merge. But then you have a history full of work-in-progress commits, which isn't great either - especially not great for git bisect. And separate commits still doesn't help with continuing work asynchronously while waiting for review to complete - you still need rebases, or merges (yet more weird history)...


You can make a separate commit for each edit, and do it in its own branch, and when you are done, you make a merge commit in the release branch, the release branch would be made only of merge commits, with a clean history and making git-bisect possible.

Git is a little quirky with that kind of workflow because when you are making a merge commit, you don't really know which parent is which. There is a convention however, and it is that the first parent is the branch being merged on, and the others are the ones you merge from. Commands like git-bisect and git-log come with the --first-parent to avoid having your history polluted by broken work-in-progress commits. As for your asynchronous work, same thing, you can use merge commits, making sure the first parent is your work branch and using the --first-parent option to make sense of that mess.

The weird history problem is just a UI problem, which is actually the worst part of git. Its core is solid, but its UI, especially the command line is a mess.


Git doesn't do a lot of things well. The UX is pretty damn horrible (even linus admitted this) and it's had some pretty terrible defaults.

It's become kind of a cultural symbol for "developerness" though. This shifted the balance of shame. If you don't "get git" that isn't seen as a git problem. That's seen as a you problem. "You're" not a proper developer. This gives it a pass for sucking.

As such, the only people who can criticize it are people who know it back to front. Most other people (especially learners) will assume that if they don't get it that the problem is with them.


> "As such, the only people who can criticize it are people who know it back to front. Anyone else will be told that they're not using it properly."

...they said, in a thread full of people criticising different aspects of Git without proving their git-expert credentials nor being accused of being the problem themselves.

I'm sure that as with pretty much any widely used thing there are both people who criticise it unfairly because they don't know how to use it for what they're trying to do, and also fans who over-defend it blaming users for not knowing every single little detail and sometimes blaming the user for a problem with Git. But it doesn't seem from what I've seen generally that it's a problem for anyone to criticise Git's flaws, just that there often isn't an answer to "well, what better alternative is there?".


I always preferred mercurial to git. I held out a long time using it but it became clear over time that the benefits of using it were outweighed by the network effects that accrued to git simply because everybody was using it.

HN tends to be a bit more open and less conservative than the average dev team.

The thing about horrible UXes is that you get used to them. Then once you know it back to front that becomes an impediment to switching. There's an entire generation of developers now who have grown up never having known anything other than git.


Yeah, mercurial is better than git, but not that much better. It's not so hard to learn both well, and be happy when you have a chance to use mercurial, while remaining content when you're using git instead.


Not that much better is true, but every time I see tech that is worse win because it had better marketing it makes me sad and a little angry.

It was quite irritating to invest time not only in figuring out which was better initially (only to discover that this didn't matter), but to invest time learning the better one and then to lose all that investment in memorizing commands/keystrokes/concepts.

It's true that most of them map so it's way easier than picking it up from scratch, but it's still a hefty chunk of effort and the annoyance I felt because I felt like I shouldn't have to do this made it harder.

Mongo makes me rage for the same reason (but way worse). I know I could learn about its equivalents of indexes and foreign keys and materialized views or whatever but I honestly really just want it to fuck off and die and get replaced by postgres in every single situation where it is used. It is a genuinely shittier technology that grew popular on the strength of marketing (image) rather than quality and that is why I'm forced to learn it.


I don't think "better marketing" is the main thing going on. I think the main thing is network effects. For git specifically, I think having a giant project right out of the gate gave its initial "network" of people who knew how to use it such a huge head start that it was always going to be difficult for anything else to catch up, at least in open source, where network effects matter a lot.

I do think the story with MongoDB in particular has a lot to do with marketing, but IMO the broader story is that objects (/ "documents") are actually a more natural fit for modeling the storage of application state in oltp databases than are relational databases (though relational databases are certainly unmatched for analytical use cases). I'd say that the popularity of MongoDB despite the relative weakness of its implementation of that insight has been more annoyingly detrimental to the object/document store approach than to the relational DB approach. That is, the backlash to "MongoDB doesn't work very well" has been "just use postgresql" rather than "we need a better document database implementation".

Coincidentally, just yesterday I was looking into pg's object store capabilities in order to suggest "use pg as a hybrid relational / document store" as a potential compromise solution to get to consensus in a relational vs. document-store discussion. But I found it pretty lacking, frankly. It's nice that jsonb exists, but its capabilities seem quite limited. For instance, it seems I can't use json-schema to define the schema of a jsonb column? (I did conclude that I need to do more research before rejecting this out of hand, but my initial research was not as promising as I had hoped.)


Using mercurial/sapling at work is really nice


To be fair, most of the criticism upthread embeds the criticizer credentials just by the fact that those problems are very hard to notice.

The thing that you can't easily criticize is that the UI isn't very clear and gives no confidence that what you are doing is actually correct. -- But I imagine at this point, it's safe :)

Other VCS out there do have better UIs, with concepts are are more clearly told apart (or not confusingly separated, like tags from commits). But none is so much better that it's worth abandoning everything an copying them.


I really hate rebases.

Maybe, with a little compression, we have enough disk space to simply do both.

After squashing and rebasing, some normally-hidden data in the commit lets a client resurrect the original, highly-detailed commit graph including every "uh forgot to commit on Friday afternoon, dunno what this was but it's important" commit.

(Of course you can always redact sensitive data, as long as you control and trust every hard disk the data is on)


Rebasing can do a lot more things than just squashing.

I personally hate seeing dozens of semantically meaningless pull-merge commits ("merged branch main into main"). I don't care when you pulled or what effect it had on your local history. When you push, push a clean linear history starting from where other people left off. If you spend too much time working independently on a "branch" (your version of main diverging from everyone else's version of main), then you're not collaborating and you're not continuously integrating your work.

I do like the idea of being able to compress the history you're squashing, though. There can be a lot of trivial commits, but depending on workflow, squash-merge can hide subtle things like the reason you made a very important one line change.


> some normally-hidden data in the commit lets a client resurrect the original, highly-detailed commit graph including every "uh forgot to commit on Friday afternoon, dunno what this was but it's important

Is that supposed to be a good or even useful thing?


There are in-band and out-of-band ways of communicating information that matters to your collaborators (including future-you) but not to the compiler.

Commit messages (out-of-band) tie this information to the specific time when the change was made. They also don't clutter the code up.

Comments (in-band) tie this information to the specific place where the change was made. If overused, they make the code harder to read.

Commits can get lost in the noise, though blame and bisect can help find them. Comments can get stale and cause more confusion than assistance.

Honestly, if we spent more time reviewing and revising each other's code than just writing it, then comments (combined with refactoring for readability) would probably be the better solution. But that would slow down development. If you're going fast or lean, then relying on commit messages is better than nothing at all or fired-and-forgotten comments.


It's one of those things I rarely need, but really need when I do need it.

Commit numbers are an API and it's frustrating when I write something like "Fixed in commit xyz" or "Broken in commit xyz" or "Tested manually in xyz" and then after a rebase (Not a private rebase I asked for, but part of the CI routine), my comment makes no sense and Github says "We can't find that commit". You can paste it in manually but the context is lost, it's not in `git log --graph` anymore.


Graphite attempts to make stacked diffs more palatable, built on top of git. Never used it myself but have a colleague who swears by it.


Ooof. There are lots of things git does well (distributed development), things it doesn't do very well that were done better by (some) other systems before git was even built (automated merges), and things git just doesn't bother with doing well (looking at that command line syntax.)

There's a LOT of room for improvement, but there's no profit incentive for improvement, and let's be honest, git is very nice at it's price point, so we're likely at a local maxima for a while. Not the worst situation, but the world could be better.

Disclosure: I spent the 90s working on a different source code management system. People have _opinions_ on what makes a great SCM, and it's definitely not 'one size fits all'.


> There's a LOT of room for improvement, but there's no profit incentive for improvement, and let's be honest, git is very nice at it's price point, so we're likely at a local maxima for a while.

There was no profit incentive for creating Git. It still got created and became popular.

It's just that Git is far closer to good enough than what came before it - many of those were true horrors (I'm looking at you ClearCase).


> There was no profit incentive for creating Git. It still got created and became popular.

That's not the whole story.

There was lots of profit incentive behind Github. And is was Github that won, not Git. Git was carried along as a parasite.


Text is too broad a category for version control for software to be convenient. I would much prefer version controlling on a subset of text that correctly parses to a domain specific AST. Imagine how much better and informative the diffs and conflict resolution and blames would be. Also imagine moving code across files, e.g. by splitting them and actually having your version control understand that. You could probably push bisects and the like to new levels.


Git's model of snapshot-only storage actually facilitates this quite well. All you need it a custom difftool and a custom mergetool which understands syntax. You also get graceful fallback to textual diffs when your cozy dev environment inevitably encounters the real world.


If someone took the time to build such a tool and put a price tag on it and made it intuitive they would probably make a lot of money.


One could probably build a MVP for this by using tree-sitter parsing, re-injecting identifiers if that is wanted, and then commiting the results to git via a commit hook.


At the same time text is to narrow a category to cover many project needs. There are lots of projects where needing manage and version GBs of various binary data is vital. Git defaulting everything being plain text will miss out a lot of special cases where are more specialised tool might help, but on the whole is probably a good balance and 'good enough' for the vast majority of use cases.


That being said, if you want to build a git 'competitor' one of these two avenues can probably offer plenty of interesting opportunities.


Perhaps you're talking about vastly different systems, but FWIW I'm personally hoping that Pijul matures and sees some adoption.

I can see how people who are into the simplicity of the git on-disk format don't mind the downsides, but I personally hate spending time handling concurrency-related conflicts that the system could have handled for me (some of which are probably because I or my coworkers did not do the right incantations in the right order).


Thinking about large parts of the game development world still using Perforce (or even SVN!) because they frequently need to commit lots of binary asset files to their repos... I guess text is not the only thing that runs the world.

Really there is still no real alternative to Perforce when it comes to handling binary files. Previously PlasticSCM was a good contender but Unity bought it for their own version control system and seems they've stopped developing the original software (which is a shame...)


Pijul handles binary files natively, using a number of mathematical tricks removing the need for extra layers (layers like LFS). If you're interested come talk to us!


I use SVN for game dev and love it. It just works.


A mod for a game I help develop uses SVN still. Granted, the Mod started development around 2010, but for games it seems like it works well


Git LFS isn’t a real alternative?

https://git-lfs.com/


No. (with a large period)

I've tried using it myself for just simple tasks and was pretty unusable - lots of weird bugs and unintuitive behavior... Can't really imagine how painful it would be for a mid-to-large size gamedev team.


Another down-side is that it forces you into having secondary server components. You can't bare-host git-with-LFS over ssh with just some directories on disk—you'll need Github, Gitlab, Gitea, or whatever, with LFS support (or that one "alpha quality do not use!" Python[?] project on Github that's supposed to handle LFS)


I store images, executable and DLLs in SVN without any issues. Most of the them are smaller than 10MB though.


Isn't Git LFS enough for those cases?


Is keeping history of textual changes better than (also) keeping history of semantic changes? We focus so much on the representation (text in a programming language) that we might miss capturing intentions (when clear).

I love GIT and GIT is great for what it does, but most of the times I feel that what I need from a history is different than just text changes. Two simple examples: white-space changes and variable name changes could be ignored for some cases (of course, not for all, but that's the trick).

Such changes might be built on top of git (and some are today), but at some point new requirements might mean a different system.


Keeping the "intention" of the changes in the system only works if you actually bother to record or annotate the intention while making the commit.

If your idea is to use heuristics to automatically infer intention during commit time and record it into the system, then obviously you can also do that in git after the fact by comparing two commits and running the same heuristic.

In most cases (except file move?), I doubt the intention of the changes are so easily/cleanly captured. You'll probably need to have some deep IDE integration (eg. so it knows you actually renamed all your variables using the refactor tool). I can't see that having widespread application at all.


Simple thing that gets lost: someone adds very long comment on one line (maybe has word wrapping on); someone formats the comment to be on multiple lines; diff will be a mess even if there is no text difference in the actual comment.

One thing to investigate would be to include AST (abstract syntax tree) concepts in the source control or tools (although for comments might not solve the issue). Renaming a variable, moving a function around and other cases would be caught by an AST aware source control, but they are not today.


At least for whitespace changes git should have you covered

--ignore-space-at-eol Ignore changes in whitespace at EOL.

-b --ignore-space-change Ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more whitespace characters to be equivalent.

-w --ignore-all-space Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has none.

--ignore-blank-lines


I will play a devil's advocate.

What about white space sensitive language like Python (that's one of things I hate about the language)?

Diff also gets easily confused, because its context window is pretty small.


I'm repeating my comment in a sibling thread, but I think it's worth repeating (paraphrasing) -

(A) If the thing you have in mind can be inferred by looking at two commits, then you don't have to record the intention of the changes into the version control system because you can compute it later when needed.

(B) If the thing you have in mind can't be reliably inferred by looking at two commits, then you need some other way to tell the version control system about your intention.

For example, if you're just re-indenting a python source file, are you going to:

1. expect the system to automatically/heuristically realize you're just re-indenting it --> see (A)

2. explicitly tell the system that you've reindented it when making the commit --> are you sure you're bothered to do that?

3. Consistently and exclusively use a specialized IDE that records all your actions and transforms it to the corresponding intentions as recognized by the version control system?


There are alternate diff algorithms that can be selected within even standard git diff. It's just stuck on the traditional one as the default.


Git as a data structure is ok, but I wish we could eliminate PRs. There is too much overhead in managing those, code reviews, branch rebases.

A better way would allow you to push your changes as stacks of commits. No branches. Just push, and have a CI that automatically builds, tests, merges and deploys your commits. And everything that didn't pass gets handed back to you. Instead of change reviews, do pairing and code reading sessions.


Git is not ideal for a rebase workflow (such as when maintaining forks of software that you want to keep updated with upstream). Also I would love an option to "fold" commits into larger ones when viewing, as currently I have to choose between a clean history and small, self-contained commits. Bisecting and ending up on some refactoring commit with a twenty thousand line diff is a nightmare, but so is browsing the commit graph of 200 merge commits.

The underlying graph model seems to be underappreciated by upper layers of Git, which instead tries to pretend that history is linear. Lots of useful graph operations like longest path between vertices are missing. Rebasing an entire isolated sub-graph should be possible too.


Maybe more of an evolution but if git could understand it the context of the code, big refactors, that file B was parts of file A and C, etc. It would be nice.

A CRDT type thing where your intentions are stored would be cool. The C means conflict free… just imagine!


In Pijul the head is a CRDT. Having used it for years to develop itself, I can definitely imagine!


It would be cool if Git got good at dealing with media and large files. There's other tools for it, but Git is powerful and I'm comfortable with it.


I feel like large file storage can still be improved upon.


A proper interface to GIT. Beleive it or not, but before git we developed software just fine and spent close to zero time using or learning whatever tool was being used for source control. GIT requires the end user to know way too much about it's internal implementation and this has leaked out into the interface. It's half done.


Two alternatives - Wikipedia models and Google Docs style models

Wikipedia - collab code and it just keeps everybody's edits and you can branch wherever.

Google Docs - inline marking and writing all over the code. Color segments for different users, ect...


Git and vim editing are two of our most used software, both text-based, and both would be greatly improved if they were AST-aware!


Neovim is now AST-aware with Treesitter and my, is it such an improvement? You have an editor fully aware of core programming constructs like function arguments, loops, conditionals, functions, classes, etc.


For the latter, LSP support in neovim is primordial but exciting.


> What comes after GIT? Nothing. I just don't see it

I would argue this is a lack of imagination more than attaining a global maximum.


Something that's easy to use.


”24 years ago, Joel [Spolsky] was one of the first influencers of the burgeoning field of software engineering.”

I got a laugh out of that line. Hard to think that you could be “one of the first” in 2000! I guess we all have different perspectives of time.


One of the first influencers, not the first random nobody


dijkstra would like a word! (as long as that word isn't "goto")


I don't think he'd appreciate being described as an "influencer", though.


Why do you think Spolsky would ;)


Dijkstra's blog posts were handwritten with a fountain pen and distributed by being photocopied. :)


What a hipster. Probably drove around in an old Volkswagen bus too.


He's a computer scientist, not a software engineer.


there wasn't a lot of difference back then; the foundations of both fields were being laid by the same people


> There are two interesting contenders worthy of mention: Pijul [and] Fossil

Also might be worthy to mention Jujitsu [0], a vcs out of Google with a git-compatible backend and more ergonomic interface [1].

[0]: https://github.com/martinvonz/jj

[1]: What if we could replace Git, Jujutsu might give us a shot https://v5.chriskrycho.com/essays/jj-init/


I am personally surprised that TFA didn't mention either jj or Sapling [0] given its emphasis on how both Git and svn were both made to be backwards compatible!

[0] https://github.com/facebook/sapling


Not the same category of tools: Pijul and Fossil have radically different designs, whereas jj is a Git frontend, and Sapling a Mercurial fork.


It is sobering to realize that in twenty years, Github will probably be gone, and no one will be using Git anymore. I have used so many different version control systems, starting with PVCS in the 90s. Subversion, CVS, Mercurial, and others I can't remember the names of! I hate Git, but everything I've ever done is living in Github now. Where will it be in twenty years?


Fossil is really interesting and Mercurial is probably one of the better git competing version control tools, but git is too ingrained to be replaced.

The only thing that I can see ever replace git is something that is fully backwards compatible with git and insanely intuitive and possibly extents git or replaces it, but again key thing would be 100% support for regular git repositories, if people enjoy using it with existing git codebases thats step 1. Im not sure if anybody is even trying to engineer such a thing, and lets say they have 100% backwards compatibility with git, what would you change about it to migrate into? Do you keep the same exact underlying git but the commands are more ergonomic somehow? Or are there alternative approaches to how git stores code that could be more efficient somehow?

It takes someone making something better but also compatible with the dominant offering.

Sidenote I had a coworker who worked with people using SVN but he kept a git branch still so he could more easily revert code and experiment, I forget his approach but it seemed to work. I assume the repo from git was a layer above the SVN directory maybe. This goes back to what I am saying though, even though its a little different he was able to still satisfy the needs of the client with their tooling but still use tooling that hes productive on.


Jujutsu is along the lines of what you describe: https://github.com/martinvonz/jj

You can drop it in and work seamlessly from git repos


Git-svn is very mature and feature rich. I've used it a couple of times to good effect. Little need for it these days though.

https://git-scm.com/docs/git-svn


I'm not sure if he used this or not, I just know he had both an SVN repo (went to client) and a git repo (for himself).


I want the reverse: svn-git. That would be a real relief to so many developers.


Er, git-svn is bi-directional? Targeted at allowing a developer to have a local git workflow that syncs up nicely with a central svn server? Is that the svn-git you're after?


No, the exact reverse: Hide away the brain-damaging GIT database system behind a proxy, so that I can (again!) concentrate on my job as developer. Instead I have to spend at least one day every week as "version managment administrator" just to keep GIT from falling apart because of weird internal errors. I will never understand why a plain user must deal with internal data structures (like db-indieces) of some tool on a day to day basis.


A good reason to try to rely only on the things that are "Git" as opposed to "GitHub"!

(And it's totally possible GitHub will still be there in 20 years - we're still all using Windows and macOS and Unix, aren't we?)


I got curious and asked Google when Git and Github were released.

Git: April 7th, 2005

Github: April 10th, 2008.

So Git will be 20 years old next year, and Github is only 4 more years and about a month short.

That's staying power I am hesitant to just handwave away.

Also, I am reminded I am an ancient relic. Has it really been that long? Damnit.


Git (and almost GitHub) are closer to the release of the first version of Linux than today.


Probably VCS based on ASTs instead of text changes directly.

Or it'll all just be done by LLMs then.


> It is sobering to realize that in twenty years, Github will probably be gone, and no one will be using Git anymore.

It'll be interesting to see if this plays out, or if git (and github) have reached some sort of local maximum in VCS where it does enough for most people that there's not much benefit to moving to a new tool.

Of course there might be some massive leap in VCS technology, but it'd be hard to predict.


Got a good chuckle out of https://git-man-page-generator.lokaltog.ne .

Even after using git for years now (and having been forced to use CVS in the 2000s) , I feel that git is an evolutionary plateau. A really good one. However, I wish some features of the tightly integrated IDEs from e.g. Rational could see a revival.


Looks like you lost the last character of the URL. Should be:

https://git-man-page-generator.lokaltog.net

Worth correcting to share that site.


Seeing Rational Rose, ClearCase, etc. mentioned without expletives is novel! My impression of that family of tools was not favorable. They worked… ok at first. But were slow, and if things got messed up, recovery was very difficult. Also, I once found an entire install of Perl down in the directory structure of a Rational product!

I'm not sure I used a pure IDE from Rational though, so maybe that was better?


Good article, but I might be misunderstanding this one bit:

> These days, we are used to cloning an entire project on our computer, after which we can safely plug it off the network and continue writing software in a completely disconnected way. This simple paradigm was utterly and completely unthinkable 20 years ago.

I was doing this with CVS or SVN over 20 years ago. (For "https://www.neilvandyke.org/linux-thinkpad-560e/".)


You can't commit locally with either CVS or SVN, if I remember correctly?

If the premise is "I can work on this locally", then you could do this since forever by just having a local directory...


You can't, and it was one of the major pain in the ass points allowing git and Mercurial to eat their lunches. Also developers "locking" entire subdirectories in SVN and blocking any others to merge into, bonus point when they forgot it and went on vacation.

Merges in SVN and CVS were also very painful to do. After being bitten by it, every single place I worked at had to create protocols and guidelines on how often to merge/reconcile code to not waddle in a eventual merge hell.

VSS, CVS and SVN were ok to use when there were no other choices, Perforce was really good but very expensive, git and hg felt like the future when they showed up.


>Also developers "locking" entire subdirectories in SVN and blocking any others to merge into, bonus point when they forgot it and went on vacation.

Subversion?

I used subversion for an entire decade at one job and never ran into this, in fact, I did not even know it was possible in subversion.

When it happened with VSS before that we just got an admin to unlock the file(s) on the server.

>Merges in SVN and CVS were also very painful to do. After being bitten by it, every single place I worked at had to create protocols and guidelines on how often to merge/reconcile code to not waddle in a eventual merge hell.

I've just never had any issues with merging. shrug.

>VSS, CVS and SVN were ok to use when there were no other choices

I've used all three professionally and VSS never felt "okay" to use, it was very slow, super clunky, and prone to data loss and corruption. I would back up my files before commiting to VSS. You also just simply just can't view history of a folder if it was really big - the operation would just never complete. Basically, using VSS was just terrifying.


Well, we should be a little fair to cvs and svn. The C in cvs stands for "concurrent" as in the absence of locks. That how they differed from competing products. Lockfiles existed, but were intended to be mostly informative as they could easily be forced by anyone with write access, and were intended for special situations.

The merge logic were pretty much identical for git and svn. The reason git is so much more advanced now is that is has improved a lot while svn doesn't see that much development anymore.

I agree that git and hg felt like the future, and bazaar and arcs before them, but mostly because they represented another workflow very different from other products, not because of some specific technical reason. Version control is a network effect and you don't see the gains until everyone agrees on a common workflow. The rest is an implementation detail. OpenBSD is famous for being a strong culture around their cvs for example and it works for them.


VSS was awful if you had a high-latency link to the server. VSS used a very chatty protocol, and was impossibly slow over (say) a transatlantic frame-delay link. I had to update our local copy of the repo overnight; sometimes the update just failed, and my team was dead in the water for several hours.

A local SVN repository worked miles better. But house policy was VSS.


For the thankfully very short period that I ever used SVN, I remember it being massively painful if the network was down. The local processes would just hang and then your entire workspace was pretty much unusable. Couple this with a single-core 32bit machine and less than 1G RAM and you're pretty much stuffed for any sort of productivity.


My first job was at HP in the late 90s and I remember that "the merge" was an event for which significant time was reserved. It was heavily planned. I don't recall the system we used, though.


IIRC it wasn't just commits. You couldn't diff to previous versions offline either.


You can't commit to a server, obviously, but you can commit to a local directory. You would have needed set one up in advance however. Checkouts are sparse, in git terminology.

Commit here approximates to a push in git terminology. The intended use is a little bit different. Normally you wouldn't have committed until you were really done with the work (and all reviews etc) since there is really no concept of a rebase. So a commit is really something you "commit" to. You work on the patch instead.

Some of the difference is from the tools, but some of this is also due to the difference between a patch based vcs and content addressable one.


How did you "commit locally" with Subversion twenty years ago? And if "local commits" were a thing, what was Svk for?


svk is a tool to do exactly this. You can always copy the actual repo, you have read access to it after all. Something like svk vastly simplifies this workflow and can keep your local copy up to date.

Keep in mind that the intended workflow with svn is to perfect a patch locally and commit when you are done. So something like svk is considered a special case, not an integral part of the workflow. git actively encourages you do split your work in separate commits using rebase.

So there is a big difference in intended use, perhaps not as much in technical ability. Linus would likely have gone mad had he forced his workflow on svn.


> These days, we are used to cloning an entire project on our computer, after which we can safely plug it off the network and continue writing software in a completely disconnected way. ... And guess what: your local repository also contains the full history of every single change ever made to your project

This feature also sets a hard limit on the size of your project, its files, and its history: it's limited to the smallest laptop of a developer.

Most of the time this doesn't matter, but alternative (often terrible) VCS still exist in industries where the files are much larger such as game development, video production, and chip design.


How do you commit to CVS when you're off the network?


I don’t think you can. But you can still work on your project and commit when you get back online.

Even 20 years ago, you could reasonably assume regular online connections.


Yeah, twenty years ago, it was quite important to already have everything onboard that you might possible need, when the modems didn't work.

I don't know what happened here, with this amnesia - twenty years ago, my CD stack was vital for some things - if only for finding code samples in the comp.lang.c archives that I long had on rotation in the multi-CD selector.

Thirty years ago, you had to have books. And magazines.

Fourty years ago, a guru friend.


I don't think any software that kept the full history of a repository locally would be usable 20 years ago, at all, simply because hard disks were measured in 100's of MB back then - just a few projects with full history could fill up the average disk.


I don’t think it would have been unusable. The first distributed version control systems are more than 20 years old. Plus, the size of hard drives was in the 10s of GBs [0] then and a lot of large projects have repositories that are a few GBs or less. [1]

So I think keeping the entire version history on a local hard drive was doable 20 years ago, but I think you’re ultimately right that keeping history locally would have been a less appealing solution.

[0]: https://www.tomshardware.com/reviews/15-years-of-hard-drive-...

[1]: This is just based on what I happen to have checked out locally, but it includes Rust and CPython.


This was a time that "cloud" was still a decade or so in the future for the huge majority of people. You had to keep all your data locally, including all your music and photos. There's no way anyone would want to keep full histories of their repositories locally, competing with their mp3s.


Did you have your own custom script for cloning an entire CVS repo?

I don't remember CVS having one out of the box, and just checking out a working copy via `pserver:` would leave you with a tree you couldn't even `diff` in a disconnected state IIRC.

...but also, if you create a local clone of the whole repo, and then commit a series of patches to it, how did you even manage to keep track of which ones weren't in upstream? I mean, keeping track of patches with the external `cvsps`/`quilt` tools wasn't monstrously difficult (even if it wasn't especially easy, either), but keeping track of which patch sets were in one repo but not another?


This article has made me feel decidedly old, given that I've used all of the repositories listed at one time or another.

Of all words I could utter to induce fear and an instant panic attack in any technologist: "Microsoft Visual SourceSafe"


"Rational Clearcase"


I started using that with the IBM version of Eclipse IDE (what was that called??). I remember having to wait for people to come back from vacation to unlock a file they were editing otherwise no one could change it :D.


I much prefer fossil to git in almost every metric. I use fossil for all my stuff. Much easier to use, imo, and provides a lot of useful features out of the box.

Caveat: I'm not a dev.


I've spent ages trying to get an RSS feed for changes to a particular file (on the main branch) out of fossil. I can't tell if I'm misunderstanding some fundamental organizing principle, or if RSS feeds from searches/filtered paths is just so far off the important path that it's missing functionality.

Help appreciated!

Sadly, though, realizing it got iffy when you strayed from the well-trodden path hurt my opinion of fossil a bit.

[Edit] My use case is that I ported pikchr (and lemon) to Go and want to be able to notice changes to both. Like everything from the SQLite folks, pikchr is amazing.


I used it for all my hobby projects created in the last three years. I still use Git for everything else. Fossil is nice. I can't imagine using it in a big project with many developers, but it was never designed for that. As long as Fossil is around I will probably keep using it for all my projects. I still push some code to GitHub to share with others. They do not have to know that I use a simpler tool for development.


Yeah I like that it can export, so to speak, to git.

I think it should be ok for bigger projects, I mean sqlite uses it, but I've no knowledge about whether it's considered big or not other than Sqlite being on every device (more or less).


20 years ago, I was using Visual Source Safe. The amount of time lost to corruptions from internal VSS bugs, deleting files forever, SMB issues (one of them being people editing files off it directly), its abysmal branching and pinning strategies, truly awful performance, etc[0]. was considerable. We tried TFS as soon as that was available, got rid of it in less than a week in favor of SVN. That was a considerable improvement until about 2010 when we got sick of constant merge headaches and poor performance and awkwardness trying to jump between branches. People slam on git a lot, but the number of incidents resulting in lost work or bad merges went from fairly routine to maybe a scant handful in the past 14 years, most of which could have been avoided by protecting mainline branches and tags from junior devs or just following good source control daily routines.

[0]: https://developsense.com/visual-sourcesafe-version-control-u...


SVN is still fine in 2024. I run a 1-man software business and see no need to take the time and effort to move to Git. I can see the benefits for a distributed team though.


an aside: de programmaticā ipsā is the correct latin (macrons can be elided just fine). i wonder why people would reach for latin but don’t bother to do even the irreducible minimum amount of work to ensure that they got the expression right.


> (As a side note, I started my professional career as a software developer in 1997, and nope, we did not use source control, not even CVS.

When I started, in 1994 we did use source control. We zipped up our changes and sent them to The Guy Who Merged them into the code-base using a commercial version of emacs, that nobody else in the company used. That's source control, right?


OMG, I realized while reading this that (1) I haven’t heard CVS or Subversion mentioned in a long time and (2) Spolsky probably needs an introduction at this point. Both of these point to the fact that I’m getting old!

I had read somewhere that Linus coded the first version of git on an airplane ride.


I know 20 year old college grads who work with Subversion in large firms - I guess Europe really is the old world in many ways.


Man, I forgot all about SLM (“slime”). That thing was so awful. At one point the Windows team had to start using “merge windows” to make any progress, like the networking team gets exclusive SCM access to merge their code between 8 and noon on Tuesdays.


In mentioning the myriad of git UX "porcelain" offerings, an important historical point is missed: Torvalds couldn't just implement git due to the BitKeeper license.

So Linus essential wrote a data model, and others implemented the logic. The UX was always a train wreck because there was no UX.

Focusing on the data, then the logic, and punting the presentation layer has proven a unique and pervasive way to do a piece of software.

One is skeptical that this can be repeated for much of anything else. The reality that this originated as a linux kernel support tool is probably another factor in its rise.


I've used just about all of the above (and a couple that weren't mentioned, like Apple's MPW Projector, and Voodoo).

Git knocks them all into a cocked hat. Linus changed the world (again).

The one thing that I would like to see in Git, is true support for fractional repos (setting up a "window" into a Git repo that only includes a subset of files). SourceSafe (another one that wasn't as bad as the comment mentioned) had one neat trick: You could create "aggregate" views, encompassing files from different repos.


I don't know, I personally have a slight preference toward hg, the UI feels a bit nicer.


HG has an infinitely better UX than git, right up until the exact second you screwed up something and committed it to the core repo, and now you have to figure out how to undo history in a system that doesn't want you to.


This kinda exists now with the newer partial clone options: https://github.blog/2020-12-21-get-up-to-speed-with-partial-...

(Edited because submitted too early :D)


Not sure that would work.

For example, in all of my repos, I have massive test code. It usually far surpasses the implementation code.

It would be nice to be able to clone just the one 300-line implementation file, as opposed to the several thousand lines of test code.

Also, the “aggregate” workspace thing that SourceSafe used, would be a great help, for a lot of dependency hell issues.


Well the idea would be you clone just the files you want with a partial clone, and git finds the rest for you hot as you need them


CVS was a vast improvement over its predecessors. RCS was an improvement over SCCS. Networked drives were an improvement over floppies with sneakernet. Been there, used them all and more.

Twenty years is nothing.


What I really don't understand is why have nobody ever bothered to decentralize Pull-Requests and Issues data, just like the code sources.

Why can't I pull all the PRs from a git forge to grep through them locally?

Well, maybe someone did try to figure that out and perhaps even did, then I don't understand why that never popularized.


TIL that Firefox was on Mercurial until last year.


I don't like git. I appreciate that it's great in certain very distributed environments; but I'm still using SVN, because my projects are not distributed, and I can do without the opacity of git.

The biggest whoopsie with SVN, in my experience, was "externals". That was both a bodge and a footgun, and got me into trouble more than once.


The thing is even if the git command is superceded i feel like it will remain the underlying tool. Like if someone wrote a better interface to git, I would still think it would be git underneath.

Just look at the concept of an MR, thats so, well used these days we forget that was not always a super thing and that many projects did and do use git over email


I don’t think Sourcesafe was mentioned. I have some nostalgia for that monster.

Actually it was perfect for solo use. It was the team use (on a network share of course) where it would get corrupted. You had to make regular backups!


I absolutely LOVED Sourcesafe! As someone working on competing products, Sourcesafe's tendency to (very occasionally) eat a repo basically bought my house.


Yet twenty years is also easily 1/4 of our entire lifes. We have such an incredibly short lifespan.


git is amazing, it has a myriad of ways to play with the data it stores. I can't see myself changing from it as it's been the only one of the many source controls I have tried that managed to work in all the usecases I had for it. I use it even in webservers.


Git rebase is still hard though.


Git rebase always seemed psychotic to me, like you're intentionally removing important historical information about what branch changes came from so that you can make the diagram of your git database a little prettier.


> intentionally removing important historical information

What makes that information important?

> about what branch changes came from

Is it even "what branch"? Most of time I'm rebasing my changes from one commit on `main` onto a slightly newer commit on `main`, and when they're finally ready to be reviewed/pushed publically, I'll rebase (and fix) them onto the newest commit on `main`.

And I'm generally not doing it so the commit graph looks prettier. I'm doing it so that the changes are as self-contained and as easy to review (either now, or at some point in the future) as possible, because the merge commit is as simple as possible. (If it's even necessary.) If a merge commit involves a bunch of differences from both parents, that can be tricky for people to reason about.


The biggest problem with git is that it is entirely a distributed version control system. In git you are always the client and the server. Your clone, normally, contains the entire history of development and all branches. In a fully distributed development model like Linux, this is a good design.

However, most development is done at least partially centralized where you have a real "source of truth" and everything else is derivative/subordinate to that. In that environment, there is little reason to have the entire history of everything on your machine since you can always call out to the real source. It is the clone having everything that causes problems with binary files and large code repositories since a clone must download the entire universe unless you are very careful. Companies work around this limitation by doing clever virtual filesystem hacks that act as-if you downloaded the whole thing but actually only demand load in things as needed allowing them to transparently work with large repositories without the crippling download/storage costs of every random laptop actually holding the universe.

This does not mean we should go back to centralized version control systems where everybody is only a client. In that model you must reach out to the source of truth for everything and every branch only exists if the source has it. No, what we need is a system where you are a client and a subordinate (up to full/peer) server. You pull a subordinate server with as much, temporally and spatially, of the true server as you want. You can then operate as you can in git creating branches and as many local commits as you please with your subordinate server tracking those changes. You can then synchronize the real server with the sub-server as you do in regular old git. The logic should all be basically the same, since sub-server synchronization is actually a simpler, degenerate case of the synchronization of true peers which git does normally.

You can also create a chain of sub-servers, what are basically mirrors of various completeness, to support whatever degree of centralization to distributed you want. No, central system, but also not fully distributed; a hierarchy as deep or flat as you want supporting both extremes in one model while incurring minimal structural differences or complexities above the currently used fully distributed model.

This provides the development advantages of a monorepo while retaining the infrastructure advantages of the multirepo. You can just create spatial sub-servers for each "logical" repo that the primary development team on that component pulls from normally. If you need to reach outside you pull either multiple sub-servers or the parent. The advantage you get over multirepos is that your sub-servers know they are actually one whole, so pulling multiple "repos" and simultaneous commits will not result in bizarre history tearing or figuring out how to synchronize the disjoint histories, which are some of the major advantages of a monorepo.

You can handle binary assets or files easily because you just set the main development sub-server to not include the assets and their history. Maybe you have a test sub-server that when pulled only pulls the current version of the assets and nothing more so you can build the newest version. Not like you need the entire history to build with every random old version of the assets. But again, by having a unified version control for the assets and the code you can know if your local server is out-of-date while still working totally locally until you need to synchronize.

While you are at it I guess you can fix the git UX as well, but the real prize is improving the model.


You remember CVS hell yeah and SVN too ;) Fuck am I getting old...


projrc3.zip

I know people who still make zip/tarball copies of source trees because they don’t trust git, etc.


> Only wimps use version control: real men just merge patches from their email inbox, upload release tarballs on ftp, and let the rest of the world mirror it ;)

-- Linus Torvalds

(not really, but almost)


Meanwhile, other people trust git so much that they treat it more as a tool for doing an incremental back-up than as a distributed VCS (git commit -a -m x && git push).

I must admit I have occasionally used git as a tool for transferring files (https://xkcd.com/949/).


I am actually doing exactly that to manage my personal KeyPass database, down to using "x" as the commit message; it works reasonably well since GitHub allows you to download files with just HTTPS.


Eh, there is nothing particularly wrong with

    #!/bin/sh -e
    TARGET=../$(basename "$(pwd)")@$(date --utc +'%Y-%m-%dT%H:%M:%SZ').tar.gz
    tar czf "$TARGET" .
You can even add a fourth line to upload it somewhere automatically...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: