"BuildXL runs 30,000+ builds per day on monorepo codebases up to a half-terabyte in size with a half-million process executions per build... You may find our technology useful if you face similar issues of scale."
I know this wasn't supposed to be a humorous announcement but I couldn't help laughing out loud at that! Kudos to the managers at Microsoft who now seem to be asking "Why not?" instead of "Why should we?" when the topic of releasing code to the community is raised.
You'd be surprised at the volume of code a smaller company can produce.
Former employer was a big C++ shop in finance. Of around 1000 employees, roughly 3/4 of those were developers. They definitely could take advantage of something like this. I dont know how many 100s of million of LOC they have between C++ and later C#, but I was responsible for around 3 million alone (largely generated). A full coordinated firm wide rebuild could take weeks.
I've a similar story or three of finance companies that in just ten years produced enormous amounts of legacy code. It's really not that hard. Solaris was enormous too, with just 2k devs for all the time I was there, and about 30-40 years of history, depending on how you count it. If your 1k developers each write 10Kloc/year on average, then after a decade you can expect to have 10Mloc, but since a lot of code will be forked external open source (or even not forked, but just imported to freeze at a particular version, or for some other reason) you might find your devs building and looking after many tens more Mloc than that. If you hire lots of 5x and 10x engineers, that too leads to a sizeable increment.
There are many many companies out there that have huge megarepos.
Ok but, what's the byte size of 10MLoc, and how many process executions per build - since these were actually the metrics used. My experience is that lines of code don't actually take up that much space.
Depends largely on how the code is structured with C++.
There's the number of compilation units within a lib vs overall. Typically you can parallelize within a module, but not externally unless you have some smarts.
Edit: I use module in this sense as a producible result, not the future language concept of modules.
We (the developers) were tasked with enormous responsibilities. I was back office, but responsible for managing a client/server for all non security reference data. There were easily over 200 data objects modelled. No direct DB access was allowed except for the owning service. Although it ended up around 3M LOC, it wasnt as bad as it seems, because only about 10% was manually written code. A lot was generated C++ (and aside: I was able to write a C++ wrapper around a generic API that exposed some 300 types through a home grown reflection esque API amd the Python wrapper never had to be updated necause it could use this reflection and run time cose generation to generate a strongly typed API at
runtime; to my knowledge, the Python code has remained unchanged since about 2006 despite the underlying C++ API changing constantly, when I first wrote it - I get occasional updates from former coworkers).
The big problem was the dependency management and scale. At least at the time I was there, neither were done well.
Scale was a problem because of tight coupling between libs. Upgrade a core lib? Everyone had to rebuild. Want to upgrade a 3rd party dependency? Firm wide rebuild that took a min 2 weeks. It was a mess. We were supposed to be client/server to minimize dependencies, but we so tightly coupled our clients to our servers, we just exacerbated the problem. A few us could handle multiple client versions with a single server, but most couldn't. Don't recommend.
I agree with the other commenters. You'd be surprised how much code companies you've never heard of. I used to work for Axway, an enterprise middleware vendor. They had at least 3 products that I knew about with multi-million LOC codebases, in Java and C and various other languages.
I wouldn't be surprised if they had at least 50 million LOC in total. Actually, now that I think of it, 100 million was more likely. And that was almost 10 years ago...
I think the GP is more taking a jab at MS's history of being anti-open source, which has taken an about face in the last 5 years or so.
I mean we had Ballmer under whom you'd likely never have seen anything (or only trivial, non monetizable things) open sourced and likely no Linux support.
Now under Satya, we have MS open sourcing lots of projects and embracing Linux as a first class citizen. Probably far a complete list of Linux support, off the top of my I can name: VS Code, building for Linux via VS proper, Cmake support, WSL, SQL Server on Linux, SQL server odbc drivers, .Net Core and Linux support in Azure.
Even if projects are completely open, such as WSL and connhost, they have github projects at least for bug tracking that allows end-users to directly interact with the teams responsible for those projects. Personally, I've filed several issues against WSL and I've gotten fairly quick responses and resolutions usually appear in a few weeks time (I'm in the Fast Insider's ring at home).
I'm pretty impressed with WSL. I've just moved back to Windows (Dell XPS) after ~13 years using Mac OS, and in many ways WSL is nicer than what I used to do before with juggling Homebrew packages and Parallels VMs depending on the demands of a particular task. Between that, PowerShell, and installing stuff via Chocolatey, I feel right at home.
But without things like WSL and native OpenSSH, I doubt I would even have looked; I would have stuck with MacOS forever or just gone native Ubuntu.
Yes, it's on my list to check out! I just haven't yet found something I wanted to install that wasn't available in chocolatey—scoop's forte appears to be CLI utilities, and I haven't gotten as far into that realm.
I know this wasn't supposed to be a humorous announcement but I couldn't help laughing out loud at that! Kudos to the managers at Microsoft who now seem to be asking "Why not?" instead of "Why should we?" when the topic of releasing code to the community is raised.