Catching use-after-move C++ bugs with Clang's consumed annotations

matthewbauer · on July 10, 2019

Perhaps the future of software isn't "rewrite everything in Rust", but instead we end up annotating existing C/C++ with borrow checking information. This would be similar to how there is a push in JavaScript to add TypeScript annotations everywhere. It gives you the flexibility of the existing language and ecosystem while still allowing you to use new techniques in programming languages.

rubber_duck · on July 10, 2019

My problem with C++ isn't the lack of borrow checker - this is the feature I like the least in Rust (I know it's their core design goal but frankly the inconvenience and limitations it imposes don't seem worth it for my use case, and then theres the compile times).

C++ lack of modules and package management on the other hand is a huge PITA and I'm not optimistic either of those bolted on so late in to the language lifecycle will provide a useful solution.

It's a pity D took it too far in the other direction with GC and runtime - I really could use a C with classes and modules.

logicprog · on July 10, 2019

I recently found out[2] about DasBetterC[1] and it seems like a super convincing way to get just C with classes and modules, as you say. Essentially, it removes the runtime and all the features that rely on it.

[1]: https://dlang.org/blog/the-d-and-c-series/#betterC [2]: https://theartofmachinery.com/2019/04/05/d_as_c_replacement....

pjmlp · on July 10, 2019

Modules are coming in C++20, with vcpkg and conan becoming the two most used package managers.

jeremyjh · on July 10, 2019

They were coming in C++17 too.

zone411 · on July 10, 2019

Back 2015, when the work on C++14 was still going on, it was already known that modules probably won't make into C++17. Very different situation now.

https://botondballo.wordpress.com/2015/06/05/trip-report-c-s...

pjmlp · on July 10, 2019

The set of ISO C++20 features is pretty much frozen after the upcoming Cologne meeting, and modules are in.

wyldfire · on July 11, 2019

So the aforementioned death-on-arrival [1] was premature? Are there really implementations which end up faster as a result?

[1] https://vector-of-bool.github.io/2019/01/27/modules-doa.html

abraxaz · on July 10, 2019

I think conan has real potential when it comes to package management

Some example of what can be done with it https://gitlab.com/xadix/xonan

It is kind of like gentoo portage or nix pkg and can be used to manage your tool chain also

plq · on July 10, 2019

I don't get conan. Portage can already create a linux root in your homedir (see the prefix project) and has a huge package repo. What am I missing?

pjmlp · on July 10, 2019

Portage doesn't work across Windows, QNX, IBM i, IBM z, Android, iOS, macOS, mbed, RTOS, ClearPath, Aix, HP-UX, INTEGRITY, FreeBSD, NetBSD, OpenBSD, PS4, Switch, XBox, Tizen,...

ailideex · on July 11, 2019

In addition to what other response said it also lacked binary caching last I checked

ComputerGuru · on July 11, 2019

I find vcpkg to be a much more well thought out solution, or at least did at the time I was doing research into suggesting one for neovim during that discussion. It was clearly developed by a team that’s been around the proverbial dependency management block a few times, and offered more accommodations for working with existing codebases than Conan did.

colatkinson · on July 11, 2019

I could be wrong, but conan's "killer feature" is its integration with traditional artifact management solutions. Using it with Nexus (which also takes care of PyPI, npm, and basically every other binary package type under the sun) is fairly smooth sailing.

vcpkg, on the other hand, as far as I know only exports to nuget and compressed archives. Nuget is great and all if you're purely in MS land, but otherwise... not so much. And with compressed archives, you're kinda on your own w.r.t versioning and so on. That being said, I'm much more familiar with conan, so please correct me if I'm wrong.

Additionally, conan, being configurable via normal Python code (for better or worse), can really hack together pretty much any codebase (I've used it with autotools, MSBuild, CMake, and even Xcode).

I will definitely agree that the vcpkg team does ultimately seem to be more experienced, and it's a more polished tool (the conan docs are... lacking in areas, and updating conan will occasionally break things). It'll be interesting to see the direction both tools go in the future, as I would really like cross-platform C++ dependency management to stop being a total PITA.

paulddraper · on July 10, 2019

> C++ lack of modules and package management

What exactly does "modules" mean to you? JS modules, Ruby modules, Java modules, Python modules? Those are substantially different ideas. C++ namespaces are Ruby modules, sans inheritance and runtime representation.

> package management

Does every language now require its own bespoke package management? npm, gem, pip, CPAN, OPAM, composer, maven, ivy, NuGet, Cabal, Cargo, CocoaPods, Quicklisp, LuaRocks, CTAN, Anaconda, etc.

themusicgod1 · on July 11, 2019

Not to mention you can get software that is unlicensed on npm, and god knows what gets pulled in when you use cargo or any of the smaller ones. This is a gigantic clusterfuck for those who are interested in using free software that encourages the user to be aware of the importance of free software.

steveklabnik · on July 11, 2019

crates.io (for Cargo) requires that every package include a SPDX identifier, and there's tools that can show you what licenses are used in your project.

pjmlp · on July 11, 2019

Yes it does, because most of us want to support a single package format, not one per OS.

Get every OS to agree on an universal package manager and then we may get rid of language specific ones that work across all OSes.

paulddraper · on July 12, 2019

Language-based systems are very limiting.

Want to use OpenSSL in your Rust project?

Cargo won't/can't get you what you need.

steveklabnik · on July 12, 2019

https://crates.io/crates/openssl ?

pjmlp · on July 12, 2019

All the ones I use are multi-language, while being independent of the underlying OS, using OpenSSL isn't a problem.

mhh__ · on July 10, 2019

D is perfectly usable as a C with classes and modules if you use the -betterC flag, with optional lifetime checking if you want it (@safe)

de_watcher · on July 11, 2019

Just using package management of the OS distribution.

An with the GUI going into Web it then just installs into some kind of Docker image anyway.

wyldfire · on July 10, 2019

In fact, Sutter (et al) are working on lifetimes in CppCoreGuidelines [1]. I built their clang tree and tried it out without bothering to RTFM and tried out the warning on a pile of C++ code. I naively assumed that it might be a generally-useful warning ("-Wlifetime") that's not ready to be introduced upstream. That's not the case AFAICT. What I suppose I would've learned from RTFM is that the profile specified by the guidelines is sorta like an opt-in 'dialect' to annotate/specify lifetime information. Without it, there's lots of spurious findings. Either that, or the codebase I tried it on isn't as good as I thought it was.

Here's a couple of interesting examples of failure modes -Wlifetime can detect on godbolt[2][3].

I watched a video [5] a while back on the Guidelines Support Library (GSL) [4] and it seemed like a really interesting concept. I think it's a valuable idea and I'd love to see popular C++ projects leveraging it.

I'm a card-carrying RESF member† (but have a day job w/mostly C++). Don't RIIR for one thing, RIIR to get all the things. Cargo is the sleeper hit of Rust. Hygienic macros and more!

[1] https://github.com/isocpp/CppCoreGuidelines/blob/master/docs...

[2] https://godbolt.org/z/dymV_C

[3] https://godbolt.org/z/_midIP

[4] http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#...

[5] CppCon 2017: Kate Gregory “10 Core Guidelines You Need to Start Using Now” https://www.youtube.com/watch?v=XkDEzfpdcSg

† Rust Evangelism Strike Force -- maybe we really should have cards

matthewbauer · on July 10, 2019

Rewriting stable software in Rust is a very bad idea. Especially in open source, where we don't have enough maintainers, it ends up hurting the ecosystem. A certain prominent GNOME maintainer has been trying this recently and breaking things along the way:

https://gitlab.gnome.org/GNOME/librsvg/issues/456

Apparently, GObject Introspection doesn't even work in Rust, yet it is expected to be a perfectly valid replacement:

https://github.com/gtk-rs/gir-files/issues/35

Please don't be like these people! Keep the stable software we have, and maybe write new software in Rust.

wyldfire · on July 10, 2019

Your argument is not compelling: you've described the drawback without considering the benefits. What if that prominent maintainer hadn't made the mistake (a mistake attributable to poorly captured design or requirements, but not the implementation). Is there something intrinsic to Rust that would lead them to this error? (no, I think not).

> Please don't be like these people! Keep the stable software we have, and maybe write new software in Rust.

I say: spend your time how you see fit. I love Open Source software, I love GNU/linux and GNOME. But IMO it's always been about scratching an itch first and commitment to a cause second.

matthewbauer · on July 10, 2019

It's certainly up to them if they want to rewrite everything in Rust. I'm just saying the net benefit just does not exist for stable software like librsvg or bzip2.

> Is there something intrinsic to Rust that would lead them to this error? (no, I think not).

There is nothing intrinsic to Rust that creates this kind of problem, besides it being a different language than the project was originally written in. I'm sure Go, Swift, Haskell, Java, or any other language (with the possible exception of C++) would have similar issues. No other language community is quite as arrogant as the Rust community though.

bsder · on July 10, 2019

> I'm just saying the net benefit just does not exist for stable software like librsvg or bzip2.

Are you sure?

Both of those libraries are, in fact, poster children to be rewritten in Rust as they have to eat untrusted input.

If I offered $20 on "I can find a use-after-free or undefined behavior bug somewhere in those libraries" I'm pretty sure nobody would take that bet.

pjmlp · on July 10, 2019

To be faire, that is not everywhere.

The design team always has nice conversations with me, in spite of my schizophrenic view of C++ vs Rust (like both languages, see some negative issues in both), and there are places with joint community events between C++ and Rust.

bluejekyll · on July 10, 2019

> Please don't be like these people!

As the maintainers of the software shouldn't they have the prerogative to maintain the software in the best way they see fit?

Bugs happen while refactoring, regardless of how the refactoring is done.

matthewbauer · on July 10, 2019

They do, but that doesn't mean the decision is wise. The point is you are creating more bugs than you could ever possible fix in this kind of refactor.

pcwalton · on July 10, 2019

And those bugs get fixed, and the software ends up better.

I've contributed to two successful "rewrite it in Rust" projects now: Stylo and WebRender. Both of them ended up fixing long-standing bugs in the previous implementation that were difficult to address in the old codebase, but a new clean approach offered a nice opportunity to fix them.

arcticbull · on July 10, 2019

I also err on the side of never re-writing things, and pros and cons must always be weighed on a case by case basis. It's not possible to say in general that re-writing software in a language not prone to many important classes of bugs would "create more bugs than you could ever possibly fix." We won't know the decision is wise or unwise until it's attempted and studied maybe even a few times over.

xedrac · on July 11, 2019

If you're rewriting into a language like C++, this may very well be true. However rewriting in Rust in my experience yields far fewer bugs, and the ones that do surface are usually simple to fix. If a project truly is "stable" - there's very little effort involved in maintaining it, then yeah, it doesn't make sense to rewrite it. But if it's plagued by bugs and is painful to maintain, you'd be much better off rewriting it in Rust, assuming you're familiar with the existing project's limitations.

pcwalton · on July 10, 2019

Rewriting stable, network-facing, C or C++ software in Rust is a very good idea, because that way that software will require less ongoing maintenance to avoid security problems in the future.

Const-me · on July 10, 2019

We don’t know yet how much Rust software will cost in ongoing maintenance to avoid security problems.

Unsafe keyword disables many safety features. It already caused security issues: https://medium.com/@shnatsel/how-rusts-standard-library-was-... That particular one was fixed long ago, but unsafe is used a lot in libraries, both in standard and third-party crates. The reasons include native interop e.g. OS kernel calls, language limitations e.g. in collections and other data structures, also sometimes they’re performance optimizations.

pcwalton · on July 10, 2019

Yes, we do. Empirically, there have been far fewer memory safety problems in the Rust components of, for example, Firefox than there have been in the C++ components.

That particular memory safety issue was not a security issue, because it did not cause any problems in actual software. Rust is very conservative about issuing CVEs for any issue which could conceivably be a security problem. (The equivalent would be if C++ the language issued a CVE because the language lets you clear an array while you're iterating over it. The CVEs for exploitable problems that this kind of thing enables are for the vulnerable software, letting C++ off the hook.)

In my opinion, Rust is too conservative and shouldn't even call those things security issues until they affect real products, because it leads to misunderstandings like this.

Const-me · on July 10, 2019

It’s safer than C++, sure, but half of other languages are safer. Java doesn’t have unsafe code at all, not even in standard library, and it’s very hard to interop with. Same with JS & TS.

When people move from C++ to another language for more safety, Rust is not their only option.

Couple decades ago everything was written in C or C++. Desktop apps, mobile apps, web servers (cgi-bin, then COM objects for asp.classic).

People wanted higher level, easier to use, faster to compile, and safer languages. Java was the first popular one, then C#, then the rest of them followed. C++ is continuing to lose the market it once had. 15 years ago, people generally stopped using C++ for web servers. Some still do for unusual requirements, but most people don’t. 10 years ago, most people stopped programmed mobile apps in C++ (I did before that time, for WinCE and PalmOS), now it’s mostly managed languages there. Recently the same happened for desktop apps, no one likes Electron but it does the job, and people do use the software. Embedded and videogames have already started the transition, IMO.

Moving from C++ to better languages is a long trend, the industry is doing it for decades now. I don’t think Rust is a universally good option for that. Has issues with usability, labor market, libraries, platform support, and community. Good one for niche stuff, like some components of a web browser, or a bare metal hypervisor, but that’s it.

bluejekyll · on July 11, 2019

> Java doesn’t have unsafe code at all, not even in standard library, and it’s very hard to interop with.

Isn’t all JNI code in Java considered unsafe?

Then there is http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot...

Const-me · on July 11, 2019

JNI is what I meant by "hard to interop with".

That second thing is fine I think, the docs say "Although the class and all methods are public, use of this class is limited because only trusted code can obtain instances of it."

Similar policies are in .NET and they work well IMO, you can compile a .NET DLL with /unsafe but most people leave default options, then both compiler and runtime verify that it's true. Other way works too, set up the VM to partial trust and you won't be able to access native memory at all, except memory in managed arrays and structures.

bluejekyll · on July 11, 2019

But this isn’t different in Rust. unsafe is generally only reserved for low level features. In the trust-dns libraries, there’s not a single line of unsafe code, whereas in the Postgres extension library I’ve been working with some others on there’s a ton of unsafe for ffi with the C. Even with that, the intention is that for most users, they’ll never have to write unsafe.

Just because of the presence of unsafe (which still guarantees more in Rust than C), doesn’t mean that the entire codebase is unsafe.

This is a nitpick that is used to malign Rust, but really it’s a freedom to implement as low or high level functionality as you want, with out cumbersome interfaces like JNI or even JVM features in Java.

Const-me · on July 11, 2019

.NET or Java sandbox is extremely hard to escape unless you have a door open. It's borderline impossible. Their byte code is simple, the runtime has a chance to check which classes are created or what pointers are accessed. It then has complete supervision over the program as it's running.

Escaping Rust sandbox is a short keyword away. That keyword is used all over codebase, in both standard and third party libraries, to implement these lower-level features, or overcome language limitations. You can't disable it because you'd loose vectors and strings.

> really it’s a freedom to implement as low or high level functionality as you want

I'm a C++ and C# developer. I often use both in the same software, and I think my difference in levels is wider. On the lower level I have SIMD intrinsics, simple but very fast unsafe data structures, manual control over RAM layout. On the higher level I have LINQ, async-await has been working for a decade now, reflection.

> with out cumbersome interfaces like JNI

Here it's not too bad. C-style interop i.e. [DllImport] just works, even on ARM Linux where it loads .so libraries. On Windows, COM interop allows to export C++ objects from C DLLs, or expose .NET objects to C++, or move ownership either way. Some boilerplate is required on Linux to expose OO APIs without IUnknown.

pcwalton · on July 11, 2019

> Escaping Rust sandbox is a short keyword away. That keyword is used all over codebase, in both standard and third party libraries, to implement these lower-level features, or overcome language limitations. You can't disable it because you'd loose vectors and strings.

There is no meaningful difference between the JNI and the HotSpot VM and the Rust unsafe keyword. The HotSpot VM is hundreds of thousands of lines of unsafe C++ code. It doesn't get a pass because it's the VM.

Rust uses unsafe to implement things like hash tables and lists, primitives that would otherwise be implemented in unsafe code in the compiler, because it was easier to write code than to write code that generates code. I actually hand-coded early versions of Vec in LLVM IR in the compiler. It was a miserable experience! Moving the implementation to unsafe code instead of raw LLVM IR made it easier to maintain, which made it safer.

Const-me · on July 11, 2019

> There is no meaningful difference between the JNI and the HotSpot VM and the Rust unsafe keyword

JNI: used rarely. Can opt out, the runtime will enforce the policy, disabling all unsafe code. Most third-party libraries don't use nor require it.

JVM: Identical prebuilt binaries are installed by millions of users. Developer is a for-profit company who's existence depends on the security of their product. Only small count of people can technically change the unsafe code. Can't opt out this will disable Java.

Rust unsafe: the unsafe code is used in half of the libraries, first and third party, i.e. authored by very large group of people. There's no single binary, potential bugs are spread across entire ecosystem. Test coverage varies greatly, Vec from stdlib is used by everyone and very likely fine, but a long tail of unsafe code not used in practice much, or at all. Can't opt out this will disable Rust.

The difference is quantitative, in my estimation of risks. I don't exclude someone has deliberately placed a security critical bug in a JVM using some real-life bug (blackmail, bribe, etc.) but other people are IMO much more likely to notice, than if similar security bug is in some obscure Rust crate. It's same with human errors.

pcwalton · on July 11, 2019

> Rust unsafe: the unsafe code is used in half of the libraries, first and third party, i.e. authored by very large group of people. There's no single binary, potential bugs are spread across entire ecosystem. Test coverage varies greatly, Vec from stdlib is used by everyone and very likely fine, but a long tail of unsafe code not used in practice much, or at all. Can't opt out this will disable Rust.

I'm glad you agree that unsafe code in the standard Rust library is fine.

Concern over unsafe code in dependencies is a legitimate concern, but it's one that we have tooling for, such as cargo-geiger. With that tooling, you can opt out of unsafe code in dependencies in Rust just as you can in Java. (Note that unsafe code isn't the biggest potential problem with dependencies, though. A malicious dependency is a serious issue, regardless of whether it does pointer arithmetic or not.)

Besides, this is basically splitting hairs. Empirically, we know that memory safety problems are incredibly rare in Rust programs compared to those in C++ programs. Maybe Rust programs have slightly more memory safety problems than Java programs do, or vice versa. The difference isn't a meaningful one.

Const-me · on July 11, 2019

> I'm glad you agree that unsafe code in the standard Rust library is fine.

I don't. I have no doubts in Vec class because it's used a lot, not because it's in the library.

I'm pretty sure the standard library also has that long tail of barely tested unsafe code.

I've recently looked at sources of stdsimd crate, it has hundreds of unsafe functions doing transmute. I code SIMD intrinsics a lot in C++ and I know they're probably OK. Apparently, wasn't even for pointer math, the only goal of unsafe was to workaround a language limitation (they're pure functions without side effects)

> A malicious dependency is a serious issue

I never found any, but I found bugs in my dependencies many times. Not fun either.

> memory safety problems are incredibly rare in Rust programs compared to those in C++ programs

I agree, quite expectable. BC works two ways, it checks ownership at compile time, also raises entry barrier. Both help with memory safety. The observations don't tell which effect contributes more.

pjmlp · on July 11, 2019

There are JVMs written in Java, and in fact that is what Project Metropolis is all about, take code from GraalVM and increasingly replace that C++ in OpenJDK with more Java code instead, including defining a so called System Java subset to be AOT compiled by SubstrateVM.

damnyou · on July 11, 2019

You still have to deal with both the #1 and #2 worst ideas in computer science — nullability and inheritance.

Const-me · on July 11, 2019

You don't have to.

C# has value types incl. user ones, they are not nullable.

If you don't like inheritance, don't use it. I sometimes write pages of code without any non-static class. When doing OO, if you want polymorphism but don't want inheritance, use interfaces. They are not base classes, you can implement many, or implement on a value type.

ncmncm · on July 12, 2019

> C++ is continuing to lose the market it once had.

This is an amusing remark. The number of C++ programmers is growing faster than ever, C++ discussion boards are busier than ever, and attendance at C++ conferences is higher than ever, and the growth rate is increasing. The number of C++ programmers is growing at a far, far greater rate than of Rust programmers.

It's the difference between fads and real trends. C++ is where the hard work gets done.

The language had a renaissance in 2011, and is no longer what it was. Moving to a better language is easier than ever, because C++ is that language. C++14 is better than C++11, C++17 is better than C++14, and C++20 will be way better than C++17.

Talk about safety is misplaced. Modern C++ is as safe as any language you can name. Of course there are (literally!) many billions of lines of old code, most of it nothing to write home about. But it's overwhelmingly easier and safer to modernize it than to rewrite it. Save rewriting for C, where it makes the most difference; then, rewrite it incrementally in C++, keeping it fully functional at all times.

The safest code is code you don't have to write at all, because it's in a fast, powerful, well-tested library you can just call. C++, overwhelmingly more than other languages, is focused on enabling powerful libraries you won't be tempted to bypass, for speed or because it might be a PITA to use. As a direct consequence of that focus, we have the libraries.

Talk about Java or C# as "better" languages makes me laugh. The number of bugs is proportional to the lines of code. Such infamously prolix languages provide fertile soil for bugs, even neglecting their poorer library support.

Rust is an interesting modern language that is relatively good for writing libraries in. It is a sound choice anywhere that the alternative would be C, Go, or Java, provided tool maturity and coder availability are not serious concerns. It is a somewhat risky choice, because it is still far from clear whether it will still be growing in five years, but so far so good.

MaxBarraclough · on July 14, 2019

> Modern C++ is as safe as any language you can name

It absolutely isn't. C++ will always have the looming spectre of undefined behaviour.

I've seen repeated instances of undefined behaviour in respected textbooks: initializing a struct by zeroing it out with memset. It'll probably continue to work fine on MSVC/x64, but presumably the (very knowledgeable!) author didn't realise C++ doesn't guarantee that null will be bitwise zero. Or perhaps they didn't care, and were fine with introducing undefined behaviour into their example code (the kind of sloppiness that demonstrates the value of safe languages).

I suppose you could argue that isn't modern C++, but that would be weak sauce given that we're discussing the safety of the language.

0815test · on July 10, 2019

> When people move from C++ to another language for more safety, Rust is not their only option.

That's arguably missing the point. For a very long time, managed, GC-based languages like Java or Go were the only option for workable memory safety! (Yes, I'm ignoring a lot of stuff because it's practically irrelevant. No need to tell me about ATS and the like, thank you so much.) Cyclone and now Rust have changed that, which makes for a real paradigm shift in the industry, at least potentially. Rust even contributes meaningfully to thread safety by getting rid of simple data races, which are a huge can of worms not just in C/C++ but managed languages as well!

Const-me · on July 11, 2019

> which are a huge can of worms not just in C/C++ but managed languages as well

Multithreading with mutable shared state is harder in general. That's language agnostic.

But it's easier to do correctly the way managed languages do. They don't usually create threads, they use thread pool and post messages to each other. This is language-agnostic too, I've tried doing same in C++ and it works, e.g. https://github.com/Const-me/vis_avs_dx/blob/master/avs_dx/Dx... but with a GC and async-await it's much easier.

> which makes for a real paradigm shift in the industry, at least potentially

I think for most people, getting stuff done is more important than shifting paradigms.

Rust is less productive than managed languages (GC helps, BC interferes), compiles slow, slightly less safe.

For network people, golang is quite popular already, .NET core wants there, too. Game developers use too much C++ libraries, don't need security (console games are sandboxed or run under HW hypervisor), their most painful C++ issue is build time, and Rust is even worse. Low-level embedded, driver, linux OS kernel developers need C, too much integration and too many weird target platforms. HPC and ML people need manual SIMD and corresponding libraries. CUDA people need nVidia support.

Open a job board, type C++ and press search. Do you see many positions except listed above, where people are still writing C++?

0815test · on July 11, 2019

"Mutable shared state" is precisely what the Rust compiler checks for. Mutable state (in sequential code), no problem. Shared immutable data, OK. But as soon as you inadvertently mingle both the compiler will notice, and demand that you deal with the issue (by explicitly serializing access, and/or by using lightweight runtime checks that ensure you're not "sharing" anything that shouldn't be - this is what happens when you use Rust's 'interior mutability' facilities).

And because these places are explicitly marked in the code, it's easier to audit them for bugs, same as with 'unsafe'. "Posting messages" is just another way of sharing non-mutable data, Rust has facilities to help you do that as well.

By the way, this is why the "borrowck interferes" quip is only true if you haven't internalized the things BC checks for. Once you have done so, you understand how it helps. The slow compiler is an issue unfortunately, but that only affects code generation - the static checks Rust does are quick and easy. Besides, "if it compiles it works", right?

Const-me · on July 11, 2019

> is precisely what the Rust compiler checks for.

Mutable shared state is required in many real-life cases. CPUs suffer heavy penalty for random RAM reads, random writes are free (especially when using MOVNT* instructions to bypass caches). When working on computational code you want to optimize for read access. This often means write patterns are random, i.e. you can’t slice arrays per compute thread.

> because these places are explicitly marked in the code, it's easier to audit them for bugs

C# has very good OOP support. All class members are private by default. Wrap that mutable shared state into a class with thread safe public API, will be equally easy to audit.

ComputerGuru · on July 11, 2019

> But it's easier to do correctly the way managed languages do. They don't usually create threads, they use thread pool and post messages to each other.

C# certainly never worked like that. `System.Threading.Thread` was the backbone of its concurrency model, together with everyone’s favorite auto/manual reset events, mutexes, and semaphores. While with .NET Core there are some libraries providing channels seeing adoption, the majority of the decline of threads in both legacy and Core .NET is from the move to async everything, which really obviates most of the need the casual developer has for dealing with concurrency primitives (at the cost of being absolutely dumbstruck when the abstraction leaks, obviously).

Redmond did an incredible job updating basically every single API in the BCL for async (contrast to the pathetically slow going in the world of nodejs).

Const-me · on July 11, 2019

> C# certainly never worked like that.

https://docs.microsoft.com/en-us/dotnet/api/system.threading... scroll to the bottom and you'll see it's from .NET 1.1. That's 16 years ago. In modern world, Task.Run does approximately same.

> together with everyone’s favorite auto/manual reset events, mutexes, and semaphores.

When I'm not sending messages, I prefer lock() keyword. A syntactic sugar over Monitor class. Unlike events and semaphores, it's not a wrapper around an OS sync.primitive, but implemented on the VM level. Pulse and TryEnter APIs are nice.

> incredible job updating basically every single API in the BCL for async

The important job was done much earlier then that. Before .NET 1.1 MS decided the fundamental IO class, Stream, needs async API, exposed BeginRead/EndRead, and the rest of them. That's how updated to async-await that fast, they are thin wrappers over the underlying async API which was in standard library for years already, tested and debugged. Few people used them before because callback hell was bad.

Matthias247 · on July 11, 2019

> The important job was done much earlier then that. Before .NET 1.1 MS decided the fundamental IO class, Stream, needs async API, exposed BeginRead/EndRead, and the rest of them.

I think the reasoning for this goes back even further. Windows NT already offered a set of comprehensive async APIs in term of IO completion ports and overlapped operations. The .NET APIs are merely wrappers of those. They are too good to omit them.

Matthias247 · on July 11, 2019

> he majority of the decline of threads in both legacy and Core .NET is from the move to async everything, which really obviates most of the need the casual developer has for dealing with concurrency primitives

I would argue that async/await in languages like C# (and Kotlin - and even Rust!) makes things even harder for the casual developer. Since async functions are scheduled between a variety of threads developers must now keep track of 2 different concurrency units: Tasks and Threads. Depending on where continuations are called (which is quite complex in C#s world of SynchronizationContexts and TaskSchedulers) the rules are slightly different. And misusing synchronization can have the effect of anything between race conditions (as in normal threaded systems) and starved systems (because the programmer blocks the whole scheduler - which is unfortunately a quite common issue in async/await code).

In my opinion the simple languages are the ones which don't offer 2 worlds. Singlethreaded models (like in Javascript and Dart) are the simplest, since they offer only one model since the cooperative runtime prevents lots of races right from the start. The slightly less simple thing is only offering a good threaded model, which is what Go does with a lot of success. Here we at least only have to learn the old fashioned synchronization primitives, plus some new ones (Channels). Multithreaded languages with support for async/await are absolutely awesome tools! But I think mastering them requires more understanding compared to languages without those features.

pcwalton · on July 10, 2019

I agree that there are many other fine languages that people should be considering.

pcwalton · on July 11, 2019

> Java doesn’t have unsafe code at all, not even in standard library, and it’s very hard to interop with.

Look at Android system libraries. They use JNI all over the place.

pjmlp · on July 11, 2019

You mean like using unsafe all over the place in many third party Rust libraries?

Also Android is a poster child that Google hiring practices don't lead to the best quality of code implementation.

pcwalton · on July 11, 2019

> You mean like using unsafe all over the place in many third party Rust libraries?

Yes?

I don't think Rust is particularly more, or less, memory safe in practice than Java. They're both memory-safe languages.

pjmlp · on July 11, 2019

Ah sorry, I misunderstood your remark then.

blub · on July 11, 2019

It's a very good idea because you don't have to hire or retrain the people to perform the work and maintain it.

Instead you can just talk about how it'a a good idea.

pjmlp · on July 11, 2019

Microsoft security unit also thinks that way.

https://www.youtube.com/watch?v=PjbGojjnBZQ

https://github.com/microsoft/MSRC-Security-Research/tree/mas...

"Future development should be a mix of C#, Rust and constrained C++."

Which is also a reason why they aren't that bothered into keeping plain old C around on VC++, beyond ISO C++ requirements.

blub · on July 11, 2019

Microsoft say a lot of things. I'll consider listening when they will (re)write their stuff in Rust.

TBH what MS does development-wise is irrelevant for me, since I gave up on their platforms a long time ago. But if they put their money where their mouth is, it might give Rust the push it needs.

Wouldn't count on it though, MS can't even figure out what own development story to support for more than a couple of years.

pjmlp · on July 11, 2019

Then start listening.

Azυre IoT Edge is written in C# and Rust, Visual Studio Code uses Rust for its search, the Web framework Actix is written by Microsoft employees and used internally at Azure, they are helping port Firefox to HoloLens and that included hiring Rust devs.

blub · on July 12, 2019

First of all, thanks for letting me know which Rust projects MS are directly or indirectly involved in.

It's more than I expected, but only because my expectations were very low.

Do you consider that list in any way impressive or representative for the MS organization? I mean writing the search function of their hobby project text editor in language X is not exactly a ringing endorsement. Neither is anything which has IoT in its name.

pjmlp · on July 12, 2019

Microsoft security advisor for new projects is C#, Rust and C++ with Core Guidelines profile.

So yes, I find relevant that an IT gigant actually cares to spend money on Rust, regardless how much.

Rome wasn't built in a day.

pcwalton · on July 11, 2019

Thanks for needlessly personalizing the argument, but I've worked on two "rewrite-it-in-Rust" projects that were successes, shipping right now: Stylo and WebRender.

blub · on July 11, 2019

Third parties pressuring the people doing the work to do more work to satisfy the criteria of said third parties => not ok.

1st parties deciding to rewrite their projects, like in your two examples => sure, it's their projects.

There is a special case when new projects are started in C or very security-critical projects are started in C++.

Here public interest should take precedence - it's reasonable to criticize the creators of such a project.

0815test · on July 10, 2019

Rewriting stable software is a bad idea, but not because we have too few maintainers. Indeed, the way of dealing with the "too few maintainers" problem is to bite the bullet and start aggressively paying down technical debt so that the ecosystem can (1) become more sustainable going forward, given the same amount of maintainers, and (2) attract more people to the job of being a maintainer, by easing some of the hardest parts of that job. Rewriting stuff in Rust is one way - though a highly risky, perhaps even extreme way - of paying down technical debt.

blub · on July 11, 2019

Rewriting is not like paying down technical debt, it's declaring bankruptcy and trying to start anew from the ashes.

Or depending how it's done, like abandoning the debt and starting a new life on a tropical island under an alias.

bsder · on July 10, 2019

Far from seeing this as a downside, I see it as an upside.

Gtk support on OS X has been absymal, forever. If this helps that, I'm all for it, thanks.

arka2147483647 · on July 10, 2019

The value borrow checker in Rust comes from the systemic ability of to eliminate an entire class of bugs.

The value these kinds of smart Cpp/Compiler features comes from the ability to eliminate some instances of bugs.

Which is great, and all, but I don't see that all of the great masses of Cpp code and libraries would be rewritten or annotated to use these new features.

Which will sadly leave the impact of these developments particial.

ethanhunt_ · on July 10, 2019

While partially using annotations only allows you to eliminate /instances/ of bugs, if you completely apply the annotations then it can allow you to eliminate the entire class of bugs (theoretically). Consider nullabiliy in java.

> I don't see that all of the great masses of Cpp code and libraries would be rewritten or annotated to use these new features.

It's much more likely and tractable that C++ libraries will be annotated than that they'll be rewritten from scratch in Rust.

jononor · on July 10, 2019

For Python there are tools that automatically add static-typing hints. One could imagine the same to add most of the annotations for existing C++ code, such that only some of the code must be manually annotated. For critical codebases one could also imagine a linter rule that insists on full coverage.

But sure, it is somewhat of an uphill struggle. Still a worthy goal, as I don't think we will get rid of the many massively popular C++ softwares anytime soon.

swsieber · on July 10, 2019

I think we'll c/c++ increasingly use those (in addition to the other wonderful tooling it has). I think Rust still has a strong future though - it has safety by default, and I really like it apart from the borrow checker.

That said, I like this push for more programmatic checks.

ohazi · on July 10, 2019

> we'll c/c++ increasingly use those

We'll c/c++ increasingly use

We'll see c++ increasingly use

My brain. It's too early for puns. So good though.

swsieber · on July 10, 2019

I'll just act like I meant to make that pun. I guess it was too early for me too. Good catch.

awesomekling · on July 10, 2019

I certainly hope that's where we're headed. The more we can tell the compiler about our code, the better. It might also be interesting to provide hints like "this int should only ever be in the range 1-30" :)

Animats · on July 10, 2019

"this int should only ever be in the range 1-30"

That's in all the Pascal-family languages - Pascal, Modula, Ada.

"Perhaps the future of software isn't "rewrite everything in Rust", but instead we end up annotating existing C/C++ with borrow checking information."

Trying to retrofit this to C++ runs into the problem that too often you're lying to the language to get something done. Like extracting a raw pointer to be sent to a system call. You have to break a lot of existing code to make this work. Which means a new safer C++ like language. There are about five of those to choose from, none of which are used much.

Rust was right to do a clean break. But then they just got too weird, with their own brand of functional programming and their own brand of a type system.

arcticbull · on July 10, 2019

I'd love to hear your thoughts on Rust having an in-house type system and functional programming paradigm. It doesn't seem all that different than comparable era languages like Swift (maybe through inspiration) or Haskell. Macros, are IMO, where things go a little wonky but once you learn you can actually trust the compiler it's easy to let go.

Animats · on July 11, 2019

That's it exactly. If you're into Haskell or Swift, Rust isn't that strange. To a C or C++ or Go programmer, Rust is very strange.

Rust has some brilliant ideas, but it's too hard. Go is rather dumb, but good for getting server-side applications running.

blub · on July 13, 2019

Surprised to see that you put Swift in the same bucket as Haskell, since Swift is to me the prime example of successfully balancing power with usability in language design.

This is where languages like Haskell, Rust and others fail: the programming language nerds take over and design a language for themselves, making it weirder and weirder as time passes and new arcane features are added.

tomnj · on July 11, 2019

As a C++ programmer dabbling in Rust I don’t find it strange. Many of the core ideas were motivated by C++. But I also like Haskell too :)

jcelerier · on July 11, 2019

> Trying to retrofit this to C++ runs into the problem that too often you're lying to the language to get something done.

you mean like the gigabytes of rust code just put in unsafe {} blocks or doing panic! when they don't know what to do with the error ? :p

markdoubleyou · on July 10, 2019

This stuff has been out there for a long time, but it's never become popular outside of outside of safety-critical or high-integrity systems. Examples include:

Microsoft's C/C++ annotations: https://docs.microsoft.com/en-us/visualstudio/code-quality/u...

Ada's ranges: https://en.wikibooks.org/wiki/Ada_Programming/Types/range

Fun stuff to learn about in theory, but, unless you're working at a place where formal verification is part of the culture, good luck selling these tools to upper management.

More resources discussed here: http://highscalability.com/blog/2018/9/19/how-do-you-explain...

sfkdjf9j3j · on July 10, 2019

Ada's ranges are runtime checked right? How would you do this in a compiler?

olliej · on July 10, 2019

Ada checks all arithmetic - the compiler can elide bounds checks if it can statically verify that no out of range error can happen, but if it can’t verify statically you get bounds checks at runtime.

gameswithgo · on July 11, 2019

a little of both iirc

bluGill · on July 10, 2019

you mean a contract https://en.cppreference.com/w/cpp/language/attributes/contra... c++20 has this, baring something unexpected happening in ISO.

awesomekling · on July 10, 2019

Oh that looks awesome. I’ll definitely be using that once available, thanks for bringing it up!

zingermc · on July 10, 2019

Is this a baby version of dependent types?

riffraff · on July 10, 2019

I believe contracts are checked at runtime, so not really.

bluGill · on July 10, 2019

Contracts come in several forms. Some are checked at compile time. Some are checked at run time. Some the check is too expensive to actually run but is left for documentation. Compilers are expected to have a switch to turn off run time checking for production (just like assert)

bigcheesegs · on July 10, 2019

Nothing in C++20 contracts requires any checking to be done at compile time. The only options currently required are runtime checking and no checking.

bluGill · on July 11, 2019

Contracts were designed so that they can be checked statically.

Pedantically you are correct: checking contracts is expected to be too expensive to actually do as part of a build. C++ expects that static analysis will check contracts as well (static analysis is might take 10 hours to check what compiles in 10 minutes)

jcelerier · on July 11, 2019

C++ already had a level of dependent types for a long time.

    template<int N> 
    class T;

    constexpr int N = f();
    T<N> var;

billylindeman · on July 11, 2019

As someone who just re-wrote his project (from python) in rust I will say that it has been an incredibly rewarding and pleasant experience.

C++ is fine, but most projects are riddled with ugly macros and #defines for features / platform specific code. Rust solves this in a fairly elegant way. Also, not having a package ecosystem in C/C++ is frustrating as hell.

Cargo was the thing that pushed me over the hump to learn rust instead of just opting for c++, and it has paid off.

It's not quite as ergonomic as go, but overall it has won me over and is my new favorite language. I'm excited to see how the story for rust plays out over the next 5 years :)

pjmlp · on July 11, 2019

Have you ever written a Rust macro? They aren't properly an example of beauty versus C and C++ ones.

Cargo has gotten definitely better, but lack of support for binary libraries is a pain point.

zahllos · on July 11, 2019

"C++" (C++ people would argue this is not a language issue and therefore out of scope for them, hence the quotes. I'm not implying this is a bad decision to make, simply saying it's the argument I've read before) doesn't really support binary libraries well either. Name mangling, object layout, exception handling, dynamic dispatch are all things that different compilers are not guaranteed to implement in the same way. There's a survey of some of the implementations of these features here: https://www.agner.org/optimize/calling_conventions.pdf , although it is a few years old now. This is to say nothing of differing implementations of functionality between different versions of the same compiler, which is of course possible.

Most binary C++ libraries I've dealt with either require you use the same compiler, or export a C interface through extern "C".

Rust doesn't really support shared objects from Rust to Rust at all, from what I can tell (caveat that I've never actually tried it). There's no stable ABI (https://github.com/rust-lang/rfcs/issues/600), and therefore the only way to go is via a C interface. So Rust is more or less in the same place, just with fewer compilers.

pjmlp · on July 11, 2019

Fact is, they do exist, and in what concerns Apple and Microsoft platforms, we use them all the time.

pcwalton · on July 10, 2019

C and C++ cannot add borrow checking soundly while remaining any semblance of compatibility with the ecosystem. Consider what you would do with global variables, just to name one of many, many issues.

de_watcher · on July 11, 2019

Why not? It's all done when you don't pass around raw references and pointers. The only hole that's left is use-after-move.

pcwalton · on July 11, 2019

Virtually all C++ in existence uses raw references and pointers. You can't even call a method without using one (the "this" pointer).

Ar-Curunir · on July 11, 2019

The plus points of Rust aren't just borrow checking, but also the removal of sons of cruft and the benefit of modern tooling and language design.

arcticbull · on July 10, 2019

This is especially frustrating as the whole concept of moved values in C++ was introduced fairly recently in C++11. They did such a poor job of it that they introduced this whole new class of use-after-move bugs that should never have existed in the first place. Now we need annotations to make sure the new feature they half-assed a few years ago works the way it was supposed to? It appears the C++ working group is firing out new bugs faster than third-party teams can patch them.

IMO it's time to accept C++ is a failed state, and move on. Luckily there are compatibility options to help you use C/C++ libraries in Rust.

_pmf_ · on July 11, 2019

Evolutionary improvement of mature tools instead of jumping on bandwagons is not very street, yo. Greenfield development or GTFO.

sudeepj · on July 10, 2019

With C++ even if your project follows this, there is no way to enforce this across your project's dependencies & wider eco-system.

With Rust, its entire ecosystem (however nascent) is subjected to the same strict rules. This is big plus when Rust eco-system matures functionality wise.

awesomekling · on July 10, 2019

My project doesn’t have this issue ;) https://github.com/SerenityOS/serenity

nerdponx · on July 10, 2019

This looks great! I'm going to try this out over the weekend.

logicprog · on July 10, 2019

Wait, we enforce borrow checking rules on SOS?

awesomekling · on July 10, 2019

At the moment it only shows up as an on-screen warning in my Qt Creator thanks to its clang integration. The default Serenity toolchain is using GCC-8.3.0 which doesn't support this trick at all, so we don't get any compile-time enforcement. :( I keep thinking about switching over to a clang toolchain to be able to do more stuff like this.

coldtea · on July 10, 2019

On the other hand the Rust ecosystem is insignificant (size and adoption wise) compared to the C++ ecosystem, so there's that...

stirfrykitty · on July 10, 2019

Rust is also fairly nascent. If/when more people use it for serious projects, it will mature and get more features. I've been meaning to start learning it, but life gets in the way.

gaogao · on July 11, 2019

In my opinion, it's hit the inflection point at Facebook. The new shiny, Libra, is written mostly in Rust.

tfha · on July 10, 2019

Sure you can. Just reject any dependencies that don't pass your own linter. By switching to rust, you are doing the same thing anyway.

dpc_pw · on July 10, 2019

But with Rust you will get a whole ecosystem working. In your method you will get 0 dependencies working, no?

paulddraper · on July 11, 2019

Same as the Rust ecosystem a few years ago.

damnyou · on July 11, 2019

And not the same as the Rust ecosystem today.

patrick5415 · on July 11, 2019

If the rust people took your attitude a few years ago, where would they be today?

damnyou · on July 11, 2019

They would hopefully have realized that Rust is a much better language than C++ and not taken that attitude.

msbarnett · on July 10, 2019

Is the assumption here that you're recompiling and statically linking all of your dependencies?

nonbirithm · on July 10, 2019

That's one of the benefits of designing the language such that using it normally forces people to share valuable information about programs without them knowing it.

A somewhat related example is how Emacs was built to support introspective programming, where nearly anything can be inspected to provide a docstring. This is both invaluable when extending the editor and only possible because Emacs established conventions early on about documenting public objects. Though it isn't obligatory to document things, the audience Emacs appeals to seemed to keep doing so through force of momentum over the years. I find this momentum incredibly important to have.

The problem is you can't just shoehorn this way of thinking onto any arbitrary language/programming environment, because the issue of dependencies following whatever code annotations they have arises. For C++ it's hopeless to imagine we can expect these annotation benefits to be universal because using the language didn't obligate adding them in some way, so the vast majority of people didn't. The author mentions there aren't even any real-world codebases using them. Every corner of the system/language has to be designed from the beginning with annotation in mind.

I've been pining over the same issue as I'm trying to design a "living system" that supports user extension. I'm still wondering if I'm missing anything that could help with introspection, is not detrimental to user experience and can only be added in the nascent stage the project is in. Once all the conventions are in place, everything will have to be built on top of them no matter how imperfect they are.

haberman · on July 10, 2019

Why isn't a moved-from object always considered "consumed"?

hedora · on July 10, 2019

std::move is lower level than that. It is a primitive that you can use to implement whatever lifecycle semantics you want.

To see what I mean, in the examples in the article, the author has to tell the compiler all sorts of things that are hard to infer. For example, the object comes into being in the “unconsumed” state (what if I instantiate it with a nullptr? Can I even do that?), and a prerequisite to dereferencing it is that it is “unconsumed”.

It is plausible that there would someday be a set(foo&&) method that had “unconsumed, consumed, unknown” as a prerequisite, and always brought the type into the “unconsumed” state.

In practice, you shouldn’t be spraying these annotations all over a code base. They should be in library methods that are reused frequently, so that you get a lot of static analysis benefit from a small number of manual annotations.

jdsully · on July 10, 2019

Because its still a legal and valid object according to the standard.

A moved from vector might get reused to store new things for example.

ndesaulniers · on July 10, 2019

Are you sure? I thought a moved from object was left in an indeterminate state and further use was undefined behavior. Move constructors and move assignment operators that zero out the rhs members try to prevent duplicate references from an object likely to be destructed soon, which might lead to dangling pointers and then use after free in lhs. Can someone cite the spec and prove me wrong, please?

jdsully · on July 10, 2019

A move constructor is supposed to leave the donor object in a valid state. All STL objects with one will do so.

If you write a poor move constructor that doesn’t do this you will get exactly what you asked for. However the compiler has no way to know you half assed it and must support a legal valid case.

See this stack overflow: https://stackoverflow.com/questions/9168823/reusing-a-moved-...

To quote office space, “Your moved from object only needs to be destructible. But some people choose to allow more and we encourage that.”

comex · on July 11, 2019

"if (x = y)" is also a legal valid use case (it sets x to y, and then tests whether it's nonzero), but that doesn't stop compilers from complaining if you write it, on the grounds that it's most likely a mistake. They should do the same with use-after-move. There could then be some workaround or annotation to disable the warning if you really did want a use-after-move, as there is for the warning I mentioned (adding extra parentheses).

0xffff2 · on July 10, 2019

Note quite the standard, but [1] says under the "Notes" section:

>Unless otherwise specified, all standard library objects that have been moved from are placed in a valid but unspecified state. That is, only the functions without preconditions, such as the assignment operator, can be safely used on the object after it was moved from

[1] https://en.cppreference.com/w/cpp/utility/move

connicpu · on July 10, 2019

The standard says you must leave the object in a valid state for the destructor to be called, but many STL objects provide stronger guarantees for the state of objects after a move operation. For example, std::vector<>'s move constructor guarantees that the moved-from vector will be `empty()`[1]

[1]: https://en.cppreference.com/w/cpp/container/vector/vector

0815test · on July 10, 2019

> The standard says you must leave the object in a valid state for the destructor to be called

Note that Rust also uses a bit of a hack - a remnant of the typestate machinery, actually - to deal with cases where a drop implementation (equivalent to a non-default destructor) must be called on something that may or may not have been moved from, in a way that cannot be determined at compile-time. They limit it to that case though, it's a purely-internal detail and does not affect the reference-level ("standard") description of the language.

kllrnohj · on July 10, 2019

std::move itself doesn't define what state the object ends up in. The object's type defines that. So it is both undefined (by std::move) and well-defined (by the concrete type). The standard simply requires a moved-from object to still have a valid destructor, but types are free to (and do!) allow for more than that to still be valid.

The default move constructor does not zero out the rhs. It simply does a std::move on each of the fields, which for all the primitive types is simply a copy, including for bare pointers. For example this:

    struct Foo { int a; void* b; };

    int main() {
        Foo f{ 20, (void*) 0xBADBEEF };
        Foo f2 = std::move(f);
        printf("f[%d, %p], f2[%d, %p]\n", f.a, f.b, f2.a, f2.b);
        return 0;
    }

will print this: f[20, 0xbadbeef], f2[20, 0xbadbeef]

There is no active zero-ing or "empty" state at any point unless the specific class defines that its move does that. unique_ptr defines that get() will be nullptr after-move, for example, but that's something unique_ptr itself is specifically defining.

ori_b · on July 10, 2019

Because move is just a cast to an rvalue reference. It doesn't do anything. It just gives type level hooks for operator overloading.

mannykannot · on July 10, 2019

Semantically, it should be, but, on account of the way C++ does things, at some point it is very likely to be destroyed, and whenever that happens, you want it to be in a state such that the running of its destructor has no consequences for the rest of the program.

innot · on July 10, 2019

> Once you std::move an object and pass it to someone who takes an rvalue-reference, your local object may very well end up in an invalid state.

As far as I remember, move constructors/assignments must leave the moved-from object in a valid state - just that the standard doesn't say anything about what that state is for standard classes.

Also, I have seen code where some kind of "result" can be moved from an object and then re-generated from scratch with different inputs. In that case it was perfectly valid to use it after move. But that's nitpicking, anyways.

hermitdev · on July 10, 2019

Yes, valid, but unspecified. Typically what I've seen is that a move operation is effectively a swap between the source and destination. I've also seen where a move leaves the source in effectively a default constructed state.

foota · on July 10, 2019

I believe the standard says that move does "valid but unspecified" for standard library objects, but does not generally guarantee that moved from objects must be valid.

ori_b · on July 10, 2019

The destructor gets called on them. They need to be valid enough for that.

foota · on July 10, 2019

Hm, technically don't think this would be required. Take for instance:

  auto no_destroy = new MoveNoDestroy();

  MoveNoDestroy* moved_into;

  *moved_into = std::move(*no_destroy);

Wouldn't call the destructor of the moved from MoveNoDestroy.

0xffff2 · on July 10, 2019

Yes, if you leak resources, their destructors won't be called. That really has nothing at all to do with move semantics though. I think the important point is that move semantics don't alter the lifetime management of the moved-from object.

olliej · on July 10, 2019

Valid for the purpose of calling the destructor. That’s the only requirement.

The actual semantic state of the object may not make sense.

saagarjha · on July 10, 2019

Looks like a lightweight borrow checker, although I wonder how well it fares in places where lifetimes are difficult to track. Or is there a way to annotate methods with this information as well?

cesarb · on July 10, 2019

To me, it looked more like Rust's move semantics: in Rust, when an object is moved it's "consumed" and cannot be used anymore. The borrow checker is for when the object is not "consumed", only temporarily borrowed by some other code.

saagarjha · on July 10, 2019

I'm using the term "borrow checker" to encompass Rust's whole memory model, but yes: this only seems to provide information about when something's been "moved" (or "dropped") rather than "borrowed" in the temporary sense.

twic · on July 10, 2019

Or maybe a heavyweight one. I note that the clang annotations refer to "typestate", which was a mechanism that existed in primordial Rust, but which was dropped because it was unnecessarily complicated:

http://pcwalton.github.io/2012/12/26/typestate-is-dead.html

Although to be fair, clang's version is a highly limited particular case of typestate, with one axis of state with three values, rather than a general heavyweight typestate mechanism.

gumby · on July 10, 2019

Very nice!

pjmlp · on July 10, 2019

Cool idea.

rsp1984 · on July 10, 2019

I've been writing C++ for 21 years now (started when I was 14). To be honest, I have never seen a solid case where move semantics provided added value (in terms of code readability and maintainability) over just passing object references as function parameters.

That big ugly object that would get copied on function return -- just create it before the function call and pass it in as a reference! No copy required.

d1zzy · on July 10, 2019

Then it sounds like you haven't worked much in system development with many classes that have identity semantics (encapsulate system resources like processes, locks, mutexes, threads, etc) trying to write highly performant code while being typesafe (turn invariant violations into compile time errors). If you did you'd find out that using identity semantics objects is a PITA compared to value semantics ones.

For example, how do you cleanly create a factory function? A pretty simple thing. You'd return a pointer to a dynamically allocated object? But how do you guarantee that the caller doesn't just discard that return value or doesn't forget to delete it? Also this forces dynamic allocation for that object and adds an indirection to access, even when the caller might not want that.

Move semantics allow you to either make your resource wrapping objects movable (so you return them "by value" as value semantics objects but they get moved) or to use something like std::unique_ptr to wrap the returned dynamically allocated object. The former has the advantage that it gives the caller complete flexibility where to store the object (ex. it can store it locally on the stack or as a member) and avoids a pointer indirection.

Similar issues exist for producing copyable but expensive to copy objects (ex. containers). Move semantics allows for a typesafe/clean way to return them from factory functions while not having to worry about lifetime and performance.

rsp1984 · on July 10, 2019

Can you explain what you mean by Identity Semantics?

Believe me, I would dearly like to learn something new that can benefit me in my daily work. But unfortunately I can't make sense of your examples as they stand so it would be great if you could provide some example code to illustrate your point.

I do not know about your background or experience in C++ and you may very well have valid points (hence my request for code examples). However I get to read a lot of code of inexperienced programmers in my job who use && references and std::move "because that's how you do it in a modern way" and then run into crap like use after move and then wonder what the hell went wrong...

rleigh · on July 11, 2019

Resources referenced by an index. Examples were provided in the post you replied to. Threads, processes, file handles, mutexes, etc. Or in an embedded context could be memory addresses for control registers etc. Many of these require functions to acquire and release, and wrapping that integer into a moveable class allows its ownership and lifetime to be carefully and safely managed.

jcranmer · on July 10, 2019

C++ has a few major flaws with respect to move and copy semantics. The biggest one is that copy semantics are default and silent: it requires less work to copy something than it does to use it by reference, and there is no visual indication if the value is being copied or accessed via reference. This means that it is way too easy to accidentally copy large objects (such as std::vector) without realizing you're copying them.

Most newer languages have realized that implicit copy semantics is usually a bad thing, and duplicating objects requires explicitly saying that you're duplicating them (such as calling a .clone() method). Of course, some types are value types, where copy is cheap and better than reference, but such mechanisms are opt-in. C++ automatically generates these mechanisms, and gives you nasty error messages when you opt-out of default copy.

Move semantics are almost always better, but in C++, with its historical baggage of opt-out-of-implicit-copy-semantics, it means that constructing move-only types requires a lot of excessive calls to std::move. Compilers do a good job of telling you when you put one too many calls to std::move in, but the code is definitely verbose compared to C++, to the point that it tends to strongly weigh against actually using C++'s ability to annotate move-only methods. Furthermore, without something like the mechanism in the blogpost here, compilers don't give any indication of API misuse, so you can't leverage move-only types to construct safe-by-construction APIs.

This is something I've been tripping over a lot recently, as I have a type system where calling most methods makes the original object unusable.

blub · on July 11, 2019

Copy semantics are not a flaw, they're a feature because they always work reliably.

If move semantics were the default, we'd either get more use-after-frees or require the awkwardness of the borrow checker to have a semblance of sanity.

rsp1984 · on July 10, 2019

I appreciate your reply. But frankly, why would someone "accidentally" copy something by value?

I don't mean to sound arrogant but if someone is writing C++ at a level where he/she does accidental copies then that's a very clear sign to me to stay the hell away from move semantics and other advanced features.

I personally find things being copied by default a nice feature. It's more consistent than "PODs by value, objects by reference" such as in Java.

steveklabnik · on July 10, 2019

> But frankly, why would someone "accidentally" copy something by value?

Well; https://groups.google.com/a/chromium.org/forum/#!msg/chromiu...

damnyou · on July 11, 2019

There is no such thing as a bad programmer, only bad tooling. C++ is bad tooling.

patrick5415 · on July 11, 2019

A craftsman doesn’t blame his tools.

damnyou · on July 11, 2019

A great craft recognizes how important good tools are.

cozzyd · on July 11, 2019

The horrors I've seen in academic code speak otherwise.

namirez · on July 10, 2019

> The biggest one is that copy semantics are default and silent

This is not true. The rules are simple: rule of three, rule of five, or the recommended one, rule of zero [1]. If you define a copy constructor, the compiler won't generate a move constructor. If you define a move constructor explicitly, the compiler won't generate a copy constructor silently. You have total control over how the language behaves.

So in summary, either stick to the rule of 5 or the rule of zero and you won't be surprised. If you don't mind expensive copies, rule of 3 is sufficient.

[1] https://en.cppreference.com/w/cpp/language/rule_of_three

jcranmer · on July 10, 2019

My point is that doing nothing causes copy semantics to be generated by default. It requires some action (specifying move constructors, for example) to override the silent generation of copy semantics. In other words, if you're not explicitly thinking about whether copy semantics are appropriate (or even legal!), then C++ decides for you that they are. That's not a good default.

namirez · on July 10, 2019

Again not true! If you do nothing, the compiler generates both copy constructors and move constructors. If you pass an xvalue reference, the move is used. If you pass an lvalue reference or a value, the copy is called. The compiler doesn't do anything silently.

jcranmer · on July 10, 2019

Again, you're not understanding, or perhaps, you're assuming that users have a much more thorough understanding of the precise semantics of C++ than they usually do.

It generally takes extra effort to force the compiler to use a reference or the move constructor instead of the copy constructor. For example, "for (auto x : container)" will usually involve a copy constructor where a reference might have sufficed. And because copy constructors are generated (unless you take extra effort), there can often be no indication that the most natural form of the construction is actually less efficient than originally desired.

Yes, if you're aware of the gotchas, you can avoid them, but it is consistently extra effort that has to be put into doing so, and slipping up and missing the efforts in a few key places means you won't even be alerted that you might have forgotten something.

namirez · on July 10, 2019

I understand your point, but the language is evolving to be more powerful and expressive. There were certain rules in the past, and there are some new rules in C++11 and beyond.

With regard to your example, that's why you need to use universal references like "for (auto&& x : container)" [1]. It's just a new way of doing things, but it's hardly ambiguous.

[1] https://isocpp.org/blog/2012/11/universal-references-in-c11-...

badamp · on July 10, 2019

> I've been writing C++ for 21 years now (started when I was 14).

So? As far as I know this could be 1 year+ 20 repeats.

In contrast to your experience, I missed move semantics since the early 2000s. It’s not just to avoid copies.

How about the ability to avoid pointers, aliasing, and allocation. Move semantics afford more than just avoiding a copy.

rsp1984 · on July 10, 2019

Please. May I ask that you provide some code examples?

kllrnohj · on July 10, 2019

std::vector<std::unique_ptr<SomeThing>>

That works thanks entirely to move semantics. Do you dispute the usefulness of such a container?

> That big ugly object that would get copied on function return -- just create it before the function call and pass it in as a reference! No copy required.

Or just return it without a std::move and it still won't be copied. In fact 'return std::move(any_local_value)' is not just silly it's bordering on wrong as it defeats some RVO optimizations (and returning a T&& of a local is actually wrong as it's returning a reference to a local). That's not what the purpose or use is of move semantics at all.

rsp1984 · on July 11, 2019

Finally, an answer with some code!

> Do you dispute the usefulness of such a container?

Sort of. I agree it can be useful in some cases but in reality the added value over vector<shared_ptr<SomeThing>> or just vector<SomeThingPtr> (both of which do not require move semantics) is way too small IMO to justify the butchering of the language with std::move and rvalue references.

At the end of the day to me std::move is like const_cast. It's a quick fix that you can use to sort out a situation, but you shouldn't be proud of it because if you require it, chances are your design is messed up in a more fundamental way and the actual fix is sorting that out.

kllrnohj · on July 11, 2019

> vector<shared_ptr<SomeThing>>

Also uses move semantics. It's not _required_ here, no, but it highly benefits from it. Huge efficiency gain to avoid all the atomic refs & unrefs during resize.

> vector<SomeThingPtr>

This is error-prone and a leak risk. You just made the code buggy and worse, kinda like using const_cast.

> At the end of the day to me std::move is like const_cast. It's a quick fix that you can use to sort out a situation, but you shouldn't be proud of it because if you require it, chances are your design is messed up in a more fundamental way and the actual fix is sorting that out.

No it's pretty much the exact opposite. Avoiding std::move or insisting you don't need it is like scattering const_cast everywhere. You're avoiding a thing that helps you categorically avoid bugs (like unique_ptr does) in favor of just winging it and hoping for the best.

rsp1984 · on July 11, 2019

> This is error-prone and a leak risk. You just made the code buggy

Well, that entirely depends on your skill and your overall design. But again, there's still vector<shared_ptr<SomeThing>> to the rescue if we need it.

> Avoiding std::move or insisting you don't need it is like scattering const_cast everywhere.

Strawman. I never said that. I said if you require std::move in a certain situation, have another look at your overall design. Again, same as const_cast.

kllrnohj · on July 12, 2019

> Well, that entirely depends on your skill and your overall design.

If your argument against std::move & unique_ptr is "I don't make mistakes" then your ego is vastly too large to be a C++ developer.

The design doesn't change. unique_ptr just solidifies the design such that the compiler can help enforce the design.

> But again, there's still vector<shared_ptr<SomeThing>> to the rescue if we need it.

If you want to insist all heap allocates are always shared_ptr you can certainly do that. Personally I'd consider that dreadfully slow & expensive, and at that point you're far, far better off just using a GC'd language instead.

> I said if you require std::move in a certain situation, have another look at your overall design.

std::unique_ptr single handily proves this wrong.

rsp1984 · on July 12, 2019

> If your argument against std::move & unique_ptr is "I don't make mistakes" then your ego is vastly too large to be a C++ developer.

Please stop the strawmans and polemics and allow your ego to consider the opinion of someone who is, in all likelihood, more experienced in shipping product written in C++.

> If you want to insist all heap allocates are always shared_ptr you can certainly do that. Personally I'd consider that dreadfully slow & expensive

Oh well. I have a feeling we're not going to agree. Maybe let's discuss again in a few years. Maybe I'll have tried std::move then and found that it is somewhat useful in certain special situations.

And maybe you'll have realized that the "dreadfully slow & expensive" shared_ptr doesn't make a difference in terms of end product quality in, I'd wager to say, 99.99% of situations.

> at that point you're far, far better off just using a GC'd language instead.

Uuhhm, no.

DerDangDerDang · on July 10, 2019

References mean the optimizer cannot rule out aliasing, which is kind of a big deal.

namirez · on July 10, 2019

Move semantics is the language support for shallow copies. If you never use shallow copies, fine! But a lot of code bases use shallow copies, and it's a valuable tool.

Someone · on July 10, 2019

Move semantics moves objects, it doesn’t copy them.

For example, if you have a class C containing a pointer P to some data, where C’s destructor frees that pointer:

- a shallow copy of object O would return an object O2, containing that same pointer P, and leave O unmodified.

- a move of O to O2 would (1) make O2 contain P, but also would update O to no longer have that pointer P (it has to make that change, as destructing O at a later time shouldn’t free P anymore)

(1) yes, an implementation could also copy the data or increase a reference count, or, possibly, a zillion different things without running into problems.

namirez · on July 11, 2019

Shallow copy is the general concept. Move semantics is a special case where the unique ownership is enforced. Here is simple example of the idea:

  Widget::Widget(Widget&& w) {
    this->ptr = w.ptr; // shallow copy
    w.ptr = nullptr; // nullify the source
  }

clappski · on July 11, 2019

You could use;

auto* p2 = std::exchange( p1, nullptr );

rsp1984 · on July 10, 2019

No, you do shallow copies by using pointers or references. No move required.

kllrnohj · on July 10, 2019

Who owns the pointer or reference and what, if anything, enforces that?

ska · on July 10, 2019

You can structure your code so that you don't need it - sure. But that doesn't mean that there aren't use cases where this makes managing lifetimes much clearer.

RVO can often sort out your object return problem just fine. But lifetimes are more subtle. Sometime move semantics can significantly clean up your overall design.

hermitdev · on July 10, 2019

You see the ROI when your big ugly object has a bunch of heap allocated data. You won't see any benefit when the object has everything stored by value.

pjmlp · on July 10, 2019

That doesn't work for composable functions or fluent API designs.

BubRoss · on July 10, 2019

Moving out of functions simplifies dependencies and arguments of calling a function. Ownership control simplifies architecture.

All of eigen was originally based on template expressions, which are very complex. That technique / hack is not necessary if you move data structures. You no longer have to do elaborate work around to avoid copying temporaries.

Const-me · on July 10, 2019

> That technique / hack is not necessary if you move data structures longer have to do elaborate work around to avoid copying temporaries.

AFAIK Eigen uses these scary template things to save RAM bandwidth.

When you write x=a+b+c for large vectors, lazy evaluation will produce code that sequentially loads a, b, c, computes the sum in a register, stores to x.

If you’ll use move semantics and temporary variables, the code will be compiled into auto tmp = a+b; x=tmp+c; This code computes the complete tmp variable, stores in RAM, then load again. For simple things like sum or component-wise multiplication of long vectors, RAM is way slower than ALUs.