Experimenting with Rust in Chromium

riwsky · on June 22, 2022

Even though it'd be a while before this really affects the Chrome codebase, it's a real testament to how well Rust nails the safe-but-low-level niche. Google does not lack resources to tool (or staff!) a C++ codebase correctly, nor does it lack resources to build languages[1] targeting these specific problems; that they'd consider Rust isn't just because "it's there".

[1] https://github.com/google/wuffs

cletus · on June 22, 2022

Story time.

I worked at Google years ago and there was a presentation once done on some optimizations done on Chrome performance. This is probably 10 years ago now.

So C++ has std::string of course. C libraries however use "const char ", which has lots of problems. The C++ designers allowed you to avoid friction here by allowing you to pass a std::string to a function expending a const char . Technically, this is an operator method.

It was discovered that the Omnibar in Chrome went through many layers of translations between std::string and const char *, back and forth, such that there were approximately 25,000 string copies per keypress in the Omnibar.

So my point is that even with a ton of resources writing good, efficient and performant C++ is still nontrivial. And that's really the point of Rust (well, one of them).

imron · on June 22, 2022

> The C++ designers allowed you to avoid friction here by allowing you to pass a std::string to a function expending a const char . Technically, this is an operator method.

It's the other way round. You can pass a const char* to a function expecting a std::string. Passing a std::string to a function expecting const char* will generate a compile error.

You need to call c_str() on the std::string if you want to pass it as a parameter to a function expecting a const char*.

gary_0 · on June 22, 2022

I'm not sure if this is what OP was referring to, but in the ancient past before the STL was fully standardized, some implementations had an `operator const char*` in std::string to allow implicit conversions.

tialaramex · on June 22, 2022

> the ancient past before the STL was fully standardized

Specifically the "ancient past" here is prior to C++ 11 when C++ decided now it wanted to actually define how its string type works because C++ 98 and C++ 03 strings are both even more dangerous than most things are in C++ and had to be put out of their misery.

jcelerier · on June 22, 2022

> because C++ 98 and C++ 03 strings are both even more dangerous

... how so ? they were just CoW which is actually I think the better choice most of the time... now there are copies all over the place

tialaramex · on June 22, 2022

The C++ API lets you take references into the string. These, understandably, are lightweight, no C++ programmer would expect a reference into the sixth character of a string s[5] to be expensive to make or carry about, but they're mutable and as a result of being lightweight they are not reference counted...

So I've got the string "IR Baboon big star of cartoon" and I take references into it, which are cheap and then you use your C++ 98 copy constructor to get another string, which of course also says "IR Baboon big star of cartoon", when you took it -- and then I scrawl "I AM Weasel" on top of my string using my reference and now your string was changed because it was COW.

If you liked COW for this purpose Rust has std::borrow::Cow which is a smart pointer with similar flavour, Cow<T> is a sum type that's either a thing you own T (and thus you could modify it) or it's a reference, perhaps &T (and thus you can't modify it) but which promises you could get an owned thing (e.g. for strings by deep-copying the string) if you need one. Methods that would be OK to call on the immutable reference (e.g. asking how many times an ASCII digit appears in the string) work on Cow<T> and if you find you need to mutate it (maybe in a rare case) you can ask the Cow for the mutable version, if it already had the owned version you get that, if not it will make one for you.

Rust's traits kick in here, Cow<T> requires T: ToOwned, which is a trait saying "I can make an immutable reference to T into a new thing T you own", obviously types you shouldn't do that to simply do not implement ToOwned and so you can't make a Cow of those types. The standard library provides in particular an implementation of ToOwned for &str which makes Strings from it.

jcelerier · on June 22, 2022

> The C++ API lets you take references into the string

> and then I scrawl "I AM Weasel" on top of my string using my reference and now your string was changed because it was COW.

I mean, that's the point of references... no ? If I wanted a different object I'd make a copy.

Like, even with just one string, without any CoW, your post makes it sound like you'd be surprised than if you had:

    void set_some_config(const char*);
    char* get_some_config();

    std::string s = "foo";
    set_some_config(&s[1]);
    s = "bar";
    get_some_config();

you'd get "ar" in get_some_config().

tialaramex · on June 22, 2022

> If I wanted a different object I'd make a copy.

In the explanation I posted, you do make a copy to get a different object "you use your C++ 98 copy constructor to get another string".

The problem happens because both strings share the same bytes to represent the text "IR Baboon big star of cartoon" as part of the COW optimisation. But my reference can scribble on this shared text.

I don't see how your get_some_config is similar at all. Notice that with C++ 11 strings, the copy constructor gives you a deep copy of that "IR Baboon" text and so my references can't smash your string.

jeffbee · on June 22, 2022

CoW strings with atomic reference counting was definitely the wrong choice for a multi-core universe. The performance penalty is way too high. If you need that semantic there are other ways to get it.

jsmith45 · on June 22, 2022

> Performance pentalty way too high?

Is a single atomic increment really that expensive? I mean we are not even talking about a full memory barrier here, just the atomic increment's implied acquire and release on the single variable. Other operations not dependent on a subsequent read could still be re-ordered in both directions.

And also keep in mind that the alternative was copying the whole string instead. Which means both heap memory allocation (which is often pretty expensive, even with per-core heaps), plus the actual copying. Unless a platform has a terrible implementation of atomic increment, or you have a std::string that is frequently getting copied on multiple cores (so as to have meaningful contention), I would have expected the actual copying implementation to be slower. But I'm not super familiar with the timings of these things, so i certainly could be mistaken.

My understanding was that the change was more for about being able to set proper bounds on some operations, ensuring .c_str() is O(1), and not O(n) sometimes, and similarly with string writes, etc.

jeffbee · on June 22, 2022

Copying short strings does not necessarily involve an allocation in implementations using short string optimization. Shooting down the cache line in a remote CPU that happens to have used a frequently-used string recently is absurdly expensive by comparison.

jsmith45 · on June 22, 2022

The COW and short string optimizations are not mutually exclusive. If we assume short string optimization is implemented both before and after, then we are back to comparing the atomic increment to allocation. And different allocation approaches can make the cost of heap allocation differ quite substantially. I'd fully expect that some allocation approaches are cheaper than the cache line invalidation from atomic increment, but some others that tend involve a lot of pointer chasing can be rather costly.

Certainly plenty of widely copied strings are short strings, so a COW implementation that lacks the short-string optimization could very easily be a bad bottleneck for multi-core compute.

jeffbee · on June 22, 2022

You have accurately described the GNU CoW string :-)

My impression through the fog of history is that what happened was a really clever GNU person with little foresight and no access to an SMP system implemented std::string with CoW. Its performance in practice was so poor that the standard committee intentionally changed the standard to make it an illegal implementation, thereby eradicating the GNU CoW string. There was no higher principled logic.

jcelerier · on June 22, 2022

Yet more recent benchmarks show that there are pretty important use cases where CoW string can be faster:

https://blogs.msmvps.com/gdicanio/2016/07/09/is-copy-on-writ...

https://oribenshir.github.io/afternoon_rusting/blog/copy-on-...

Also, the point of that was to improve multithreading of string: I think this very idea is problematic. I've written at this point hundreds of thousands of line of C++, and the number of times where strings are really, by design, supposed to be shared across threads is honestly counted on the fingers of one hand, just like e.g. justification for using Arc over Rc in rust. 99% of string handling is done as some GUI work on the main thread or as part of some task processing done in some network thread, which stays in that thread.

jeffbee · on June 22, 2022

Clearly there's a frontier where the cost situation begins to favor the CoW approach, and I think authors should consciously choose whether they want a CoW string or not based on their use-case, but that goes against the idea of std::string as a jack-of-all-trades. Personally I don't really like std::string as a concept. It overlaps with too many other concepts. It is just vector<char> or std::unique_ptr<char> with SSO? The latter is nice in cases where you want std::string to adopt or release existing memory. Or do you want something like absl::Cord, which is like the old GNU CoW string but with even more stunts under the hood?

imron · on June 22, 2022

Perhaps, but in the in the ancient past before STL was standardized, Chrome didn't exist.

10 years ago (when the parent mentioned they were still at Google) c++11 was already out.

tialaramex · on June 22, 2022

While it's true that the Standard Template Library is truly a "long time" ago, being a 1990s project, the poster's phrase "before STL was standardized" actually refers to C++ 98 and C++ 03 where the C++ standards don't specify std::string internals.

Originally C++ doesn't have a string type, the C++ 98 standard does standardize a string type but it's only loosely specified. Most implementations do something "clever" which it turns out is a bad idea (this is a recurring theme in C++. Only in C++ 11 does the standard say OK, we'll prescribe how the string class actually works, making it more complicated but hopefully avoiding the worst problems.

Chrome was launched in 2008, and much of its internal structure was far older having incorporated work by Mozilla and Apple.

bluejekyll · on June 22, 2022

> work by Mozilla and Apple

Don’t forget the origin of WebKit, KHTML from the KDE folks.

southerntofu · on June 22, 2022

Ten years ago was 2012. C++11 came out in 2011. Do you believe a big codebase like Chrome would be converted to C++11 less than one year after the spec was published? I find that unlikely but i never worked on such a big codebase so i wouldn't know.

CJefferson · on June 22, 2022

The STL was mostly standardised in C++98, 1998. There were additions in C++03 and C++11, but nothing like removing this type of overload.

Long running systems were still using pre-standardisation libraries for string up to 2012 however, so you may well have come across such projects.

tialaramex · on June 22, 2022

> nothing like removing this type of overload.

Bzzt. C++ 11 completely overhauls how std::string is defined.

imron · on June 22, 2022

> C++11 came out in 2011

C++0x was a thing, with varying levels of support from all major compilers, for years before C++11 was finally ratified.

In the context of my original comment though, no matter how dated the code base, I think it unlikely that Chrome was using any variant of std::string that had an implicit conversion operator for const char* such that string could be passed as a parameter to a function taking const char* without needing to call c_str().

ncmncm · on June 22, 2022

Ja, they have no excuse. They mostly just don't care. And why should they? Google cares nothing for them.

steveklabnik · on June 22, 2022

HN comments on this story: https://news.ycombinator.com/item?id=8704318

blub · on June 22, 2022

Based on the code changes made at that time, it seemed that Chrome developers didn’t know how to write performant C++ code.

Those were not difficult to understand C++ features either, but basic ones which were very well known by then. I remember reading Bulka & Mayhew’s Efficient C++ (published in 2000) which mentioned the importance of avoiding copies, calling reserve and many other techniques.

So your point is wrong. Not copying strings all over the place, calling reserve, not creating temporary containers are junior-level C++ skills.

ncmncm · on June 22, 2022

Yes.

In two words, Google programmers are, as a rule, vastly overrated. They can maybe rope in 100,000 cores on one query, but nothing in their recruiting selects for good coding habits.

Anybody coding C++ in this day'n'age and getting use-after faults needs to go to the back of the line. They will certainly succeed in writing new Rust code that is as bad as their old C++ code. (Note: deadlocks are officially "safe".)

Recently Google made a big push to change the std::string constructor from a null char* to yield the empty string, instead of honestly segfaulting. That failed. They had a half-baked (and hellish, for users) async/await design they tried to put up as worth delaying the whole feature into 2023. That failed.

pornel · on June 22, 2022

So where are those mythical "good C++ programmers"? I keep hearing that if only you find them, your C++ will be secure. But so far nobody has found them. Not Google, not Microsoft, not Mozilla.

Rust succeeds, because it does not rely on programmers writing bug-free code. Bad Rust code is not as dangerous as bad C++ code.

BTW: deadlocks are not exploitable for RCE, and are quite easy to debug compared to data races and heap sprays.

ncmncm · on June 22, 2022

There is a very great deal of good C++ online. Google and Microsoft are handicapped by their need to hire in huge numbers, and must take who they can get.

mkj · on June 22, 2022

I don't normally want to be personal, but ncmncm you chime in with this thread each time. Can you give an example of what you yourself have written that would live up to your standards? I'm curious what type of thing you're talking about.

fractalb · on June 22, 2022

If all the dependent C libraries are replaced with C++ versions, then the no.of translations will become zero?

woodruffw · on June 22, 2022

Nominally, yes. But conversions between C strings and `std::string` are just a small corner of the problem: C++ makes it very easy to accidentally call copying constructors and perform nontrivial copies when doing e.g. implicit argument conversion.

jeffbee · on June 22, 2022

Yep, another serious performance problem (also at google, not in chromium) was caused by inaccurate declaration of lambda arguments in an STL algorithm call … ie std::something(begin, end, [](std::pair<foo, bar> foobar) -> bool {}). The actual type (iterating over an unordered map, I believe) would have been const foo, but the compiler correctly concluded it could implicitly create a pair of foo,bar by copy from pair of const foo,bar. These were the days before such arguments could be declared “auto” which would have avoided the problem.

You have to think very very carefully about every line and character in C++ to figure out what it’s doing. Sometimes the easiest way to review it is to compile it and read the assembly.

zaarn · on June 22, 2022

Conversions that create copies are always explicit in Rust (unless it's a copy type, which strings aren't). Conversion between the string types is at minimum taking the borrow of it and then taking a copy of the borrow is once again explicit. You can also cheaply use a CoW wrapper to get a no-copy string passed around into plenty of places.

The point is, with rust you have more options to enforce no-copy through the type system.

mseepgood · on June 22, 2022

How would Rust help here? Isn't it famous for having too many string types?

imron · on June 22, 2022

It's the 'too many string types' that helps.

With C++, if you have char*'s (because you don't need to own the memory) and you pass it to a function that takes a const std::string& (because it also doesn't want to own the memory), then there will still be an implicit conversion to a temporary std::string (involving an allocation) despite neither the caller or the callee needing to own any memory.

With Rust, if you have a &str (because you don't need to own the memory) and you pass it to any function that takes a String (or even the unidiomatic &String), then you will get a compile error. There won't be any implicit conversion of types and therefore no implicit allocation. If you really want to pass it, you need to explicitly convert it, making the cost of the allocation explicit.

Rust's "too many strings" model says "there are many different ways in which you can use string-like objects, each with their own performance tradeoffs. Know which one you want to use in your code or I won't compile".

mwcampbell · on June 22, 2022

This discussion is making me wonder if windows-rs [1], the crate with official Rust bindings for all Windows APIs, is doing something that's not idiomatic Rust. Specifically, for any Windows API function that takes a UTF-16 string as a parameter, the signature for that parameter is something like "impl IntoParam<PCWSTR>". The crate then implements that trait for String and &str, so you can pass a normal Rust UTF-8 string (even a string literal), and it'll be automatically converted to a freshly-allocated, null-terminated UTF-16 string (which gets freed after the function call). That seems like it could lead to the same thoughtless inefficiency as in the story about the Chrome omnibox.

[1]: https://github.com/microsoft/windows-rs

infogulch · on June 22, 2022

Well that will be necessary until windows gets UTF-8 APIs. Probably not soon. Until then there are various optimizations you can do, like caching the UTF-16 conversion alongside the UTF-8 string (good for calling OS APIs frequently with with long-lived strings), allocating temporary UTF-16 conversions on the stack (good for infrequent calls with strings up to a certain size), or storing raw UTF-16 strings as opaque bytes in Rust memory (good for providing strings back to the OS that you got from the OS).

You should try to avoid calling OS APIs in general and cache the results as much as possible. Who knows what the performance characteristics are of an API that has to serve 7 layers of historical OSes simultaneously. Unless you're directly interfacing with the kernel you shouldn't expect much. Omnibar-like layered calls between your app and the OS are a worst-case scenario regardless of conversions.

estebank · on June 22, 2022

winapi does support UTF-8 on recent versions:

https://docs.microsoft.com/en-us/windows/apps/design/globali...

infogulch · on June 22, 2022

Very interesting I wasn't aware. After glancing over that doc, it looks like they smuggle UTF-8 in through the -A variant windows APIs [1] by explicitly setting the CP_UTF8 codepage in an application manifest. I wonder if this actually uses UTF-8 internally to service the API call or if it just manually converts strings to wide form and calls the -W variant on the windows side instead of making you do it on the app side. If the latter it may be better to avoid this feature so you don't close the door on potential optimizations like I mentioned above.

[1]: Windows has two variants of many API calls with either -A or -W suffix, where the -A suffix is for strings formatted as 1-byte ASCII (or a specified codepage) and the -W suffix is for strings formatted as 2-byte UTF-16 (kinda). Example: DlgDirListA / DlgDirListW, https://docs.microsoft.com/en-us/windows/win32/api/winuser/n...

bluejekyll · on June 22, 2022

That might hide it from the caller, but the function that receives that IntoParam type will still need to explicitly call the conversion function.

mwcampbell · on June 22, 2022

Yes, and all those receiving functions are auto-generated as part of windows-rs.

flohofwoe · on June 22, 2022

It would most likely suffer from similar problems when interacting with the C and C++ APIs in the rest of Chrome though (e.g. what to do if you have a Rust String, but the other side wants a const ref to a C++ std::string).

imron · on June 22, 2022

Use a CxxString: https://cxx.rs/binding/cxxstring.html

At some point there will need to be an allocation when crossing Rust -> C++ boundary because Rust strings are not null-terminated.

The difference Rust makes is that unlike C++, it is always explicit when the allocation occurs.

IshKebab · on June 22, 2022

Unfortunately the restrictions mentioned on that page make it quite a pain to use in practice.

throwaway894345 · on June 22, 2022

My minor, unpolished grievance with Rust's approach is that you have to do this for all kinds of types (e.g., Path vs PathBuf). It's tedious to have to write these pairs all the time, along with all of the trait implementations and so on. It almost feels like it would be nice if the type system could allow us to write `String` or `PathBuf` and automatically generate the corresponding `str` or `Path` types.

josefx · on June 22, 2022

> With C++, if you have char*'s (because you don't need to own the memory)

If you are using C strings in C++ you are either doing something incredibly low level or don't care about performance at all. C strings require strlen calls or something equivalent for basic operations and you can easily run into code with exploding runtime if you aren't extremely careful.

saagarjha · on June 22, 2022

> If you are using C strings in C++ you are either doing something incredibly low level or don't care about performance at all.

…or interoperating with C code?

sseagull · on June 22, 2022

But the temporary copy only happens going from const char * to std::string, so the C code would have to be calling C++ code.

std::string to const char doesn’t (usually?) require copying.

hoseja · on June 22, 2022

psst... std::string_view

imron · on June 22, 2022

A step in the right direction if you have a compiler with c++17 support.

Note: chrome only supported c++17 features in Dec 2021 [0], and whether std::string_view is allowed to be used is still 'to be determined'.

0: https://chromium.googlesource.com/chromium/src/+/HEAD/styleg...

cmrdporcupine · on June 22, 2022

A variant of what is becoming string_view in the standard has existed within Google's codebase(s) for years. I don't recall using it much in Chromium when I worked in there, but it's all over Google3 and is now in absl (Google's open sourcing of some of its base c++ components).

Chromium has "string_piece": https://chromium.googlesource.com/chromium/src/base/+/refs/h... which is at least 9 years old (was moved from elsewhere in the repo into base/ then)

imron · on June 22, 2022

The point is not that c++ can't do this (I also have code that does this dating back over 10 years), it's that despite having code to do string_view/string_piece, Chromium was still performing 25,000 allocations per keystroke in its Omnibox because c++ has other common ways to represent "constant string owned by someone else", and there are hidden performance issues that will trip up even experienced programmers when mixing these ways incorrectly.

Despite having better options available (either in the standard library or custom code), the less optimal ways still get used.

Rust had the benefit of learning from c++'s mistakes and separated the concepts of owned vs unowned strings in to separate types with explicit conversions required whenever an allocation would occur. This was baked in to the language from the beginning and so you don't get a mix of different types in signatures to convey the concept of pointing to a slice of a string owned by someone else, you just have &str.

Even if you get fancy with your interface and do things like AsRef<str>, there's still no concern about implicit or hidden allocations. Any time you need to own the memory (either for yourself or to pass in to another function) you need to do so explicitly and you end up with a different type (much to the chagrin and confusion of newcomers to the language).

C++ is trying to correct its mistakes also, but not everyone can use those latest features and even if they can, the mistakes still have to be left in for compatibility reasons.

seritools · on June 22, 2022

Ah yes, the one that still just wraps a raw pointer in the end

https://github.com/isocpp/CppCoreGuidelines/issues/1038

andreidd · on June 22, 2022

And then someone will convert a std::string_view to a const char* and things will explode...

tialaramex · on June 22, 2022

A mixture of culture and technology.

Technologically, Rust's only built-in string type, &str, is a reference to a string slice - that is, you can't change it (the reference isn't mutable) and it is both a pointer to the start of some UTF-8 and the length of the UTF-8.

What encoding? Always UTF-8. Only UTF-8. Not "Well, it's kinda UTF-8 but..." it's just always UTF-8. This moves the burden to a single place, your text decoding code, to do things correctly, and great news - the entire world is moving to UTF-8, so you're on a downhill gradient where every week this works better without you lifting a finger.

That reference knowing the length is brilliant. Trimming whitespace off a string? You can just make another immutable reference to the smaller trimmed string. Zero copies. Slicing a URL up into components? You can do that too, zero copies. And yet it's all memory safe.

Now, Chromium is not some raw firmware for a $1 micro-controller, so it has library types like Rust's alloc::string::String (you can just name it "String" in normal Rust code but that is its full name) which, as its presence in alloc suggests, is an allocating String type, you can concatenate them, you can make them by formatting a bunch of other variables, the default ones are empty, the data goes on your heap and so on. But, String is AsRef<str> which means if what you've got is a String, and what you're doing is calling a function that wants &str Rust is OK with that and it costs nothing at runtime. Why? Because that &str is just two of the elements of the String type you had, the pointer into the heap and the length, it's easy.

Rust has lots of other types for stuff like Foreign Interfaces, like CStr and CString (for the C-style NUL-terminated array of bytes which might be text) but your pure Rust code shouldn't care about those, often it can say (unsafely) "Look, the C++ promises this is UTF-8, we'll take their word for it" or "I only need it to have bytes in it, let's make [u8] and we're done".

Culturally, Rust programmers write &str when that would do. There's a strong cultural pressure not to write String when you really mean &str, and the compiler won't let you write &str if you needed String. So this results in less thunking of the sort complained about in C++

cowmoo728 · on June 22, 2022

In C++ when I see `DoX(y)` I have to worry every time about temporary lifetimes, copy vs move operator, and a bunch of other things that are easy to miss during code review. It is so easy to accidentally copy large strings around many times in a performance critical loop.

Rust makes all of that easier to see during code review. It is very explicit about these things.

I'm a Google employee working on chromium and chromeOS and have been asking internally about rust support for over a year now, so it's exciting that it's making progress.

_nalply · on June 22, 2022

It's a complicated but well-thought out system which tends to avoid copies by making them explicit in the source code and preferring taking references or slices which are cheap operations.

The string slice for example is an Unicode-capable view into bytes of the string (immutably pre-compiled static bytes in the binary, bytes of fixed length on the stack or heap-allocated). The aliasing rules are enforced by the compiler, so it is safe to throw around pointers and sizes and not to worry about buffer overflows, as long as it compiles.

vlovich123 · on June 22, 2022

Everyone’s trying to justify the response when the honest truth is that no, Rust doesn’t solve the problem of abstraction layer impedance mismatches causing ownership to be dropped only to be reacquired at the next level. On a sufficiently large/complicated code base, the problem will arise.

As others have mentioned, various kinds of string types are baked into the language which makes it ergonomic to do “the right thing” from the get go, but hard to say. I would be skeptical of claims that it would make a difference, especially in the interim where you now have an added impedance mismatch with C++, Rust, C.

tialaramex · on June 23, 2022

> Rust doesn’t solve the problem of abstraction layer impedance mismatches causing ownership to be dropped only to be reacquired at the next level

The expression of what C++ does here in Rust is awkward and, I think, nobody has proposed it because you'd never write that. Basically C++ char* is a raw pointer. Rust does have those, but you'd never use them in this context.

What you would use is either the borrowed slice reference &str or the owning String type, but in both cases we have an owned object and there's our crucial difference. If you've got the owned String, and I needed an owned String, I should ask for your owned String, and we're done.

In C++ "dropping" ownership as you describe is no big deal, the C++ design doesn't care, but in Rust if you actually drop(foo) it's gone. The references to it can't out-live that, if it's gone then they're gone. If you write code that gives away references and then tries to drop the thing they're references to, Rust will object that this is nonsense, because it is nonsense, you need to ensure those references are gone before dropping the thing they refer to.

As a result I feel you're greatly under-estimating the ergonomic difference.

vlovich123 · on June 23, 2022

> In C++ "dropping" ownership as you describe is no big deal, the C++ design doesn't care, but in Rust if you actually drop(foo) it's gone

I think you’ve built a straw man of my argument and then argued with that.

Clearly I meant that it seems possible that a sufficient complicated call stack could still be set up to jump between needing the owned String type and the borrowed &str type. That’s what I meant by dropping ownership as that’s what’s happening in the c++ code when you go between char*/string (the API is dropping its need for ownership). The argument of “ If you've got the owned String, and I needed an owned String, I should ask for your owned String, and we're done” is weak because that same argument would apply to C++ code and yet the code still ended up that way when you pasted together components in a very large code base. Now maybe it’s a bit simpler because you have string, string&, const string&, and const char* and doing that antipattern that happened in C++ just wouldn’t be ergonomic in Rust. Maybe. But that feels like a very thin argument and not “this is impossible in Rust”.

tialaramex · on June 23, 2022

I am definitely not arguing that it's impossible but my experiences with Rust lead me to think you've significantly underestimated how important those ergonomics are.

The "I should ask for your owned String" argument does not apply equally well in C++ because of a crucial design infelicity in C++. Your caller may well not have an owned std::string.

In C++ raw char pointers are totally a thing. Because std::string is a late addition (if you learned C++ in the early 1990s a "string" class was maybe an interesting exercise, not a library type) the string literals aren't a built-in string type, and much of the API isn't shaped for such a type either.

Now, the effect is those are (sometimes) owning pointers, it is possible I own some C++ string in this sense, and all I have is a pointer into it. If I give you that pointer, it's not because I didn't give you the owned string, that pointer is my owned string. You want a std::string and there's no reason I would have one at all.

You can mutate these strings, but of course you can't extend them because you've got no way to know how to communicate with the allocator, maybe they live on the stack, or in a private heap. At the time this seemed like a good idea, today we don't think so.

vlovich123 · on June 24, 2022

How is this solved in Rust though? The only thing Rust solves is that you don't accidentally hang onto a bad reference, but I'm failing to see how &str/String is meaningfully different from char/string since the same issue applies (char = borrowed, string = owned). My Rust is rusty so apologies for the pun & any syntax errors. Let's say you have the following:

    // some process managed by team a
    fn caller1(...) {
       String s;
       level1Callee(&s);
    }

    // some process managed by team b
    fn caller2(s: &str) {
       level1Callee(s);
    }

    // library 1 by team c
    fn level1Callee(iDontNeedOwnershipOrDoI: &str) {
       level2Callee(iDontNeedOwnershipOrDoI.to_owned())
    }

    // library 1 by team d
    fn level2Callee(iNeedStrongOwnershipBecauseIMutateTheStringAnyway: String) {
       level3Callee(&iNeedStrongOwnershipBecauseIMutateTheStringAnyway)
    }

This is roughly what happened in Chrome as I understand it (except multiple times because of independent libraries that didn't notice that they probably should have just made a copy to begin with). Let's pretend the codebase had been written originally in Rust. How does Rust avoid this problem from coming up? This didn't happen in Chrome because of ownership. It came up organically because of years of refactoring obfuscated things. For example, level2Callee started out not needing strong ownership but then started calling a library that did (refactoring a complex codebase is very hard & time consuming). Rinse & repeat after many years. Now maybe Rust tooling is better able to point out the unnecessary acquiring/dropping of the strings but that seems unlikely - the problem is statically very difficult to lint around.

tialaramex · on June 27, 2022

Chromium had a few problems, they're public so we can go read the changes made and look at the context.

In a lot of places in Chromium the impedance choices are arbitrary. You have a raw pointer but need a string or vice versa. Ownership isn't the problem, so in Rust you literally just always choose &str for these APIs and pay nothing. A team who design their API taking &String in this situation get the same treatment as a team who name all their types Data1, Data2, Data3 and so on. Somebody senior fetches the water spray, "No. Bad programmer".

You might be outraged, surely C++ programmers also never get this wrong. But nope, happens all the time as Chromium illustrates.

Added: One cause that shows up in my review is this:

C++ strings know how long they are. The raw pointer does not. As a result if we have lots of people asking if their thing is "some text" it's tempting to demand they give us a string, since if the string's length isn't 9 we don't need to look at the text itself. It's an optimisation! Rust's &str knows how long it is.

vlovich123 · on June 27, 2022

Again I’m failing to see the distinction. Why did that team need a string? Presumably it’s mutating right? &mut str would let you mutate the existing characters (similar to char) but it doesn’t give you permission to resize (since doing so obviously might involve a reallocation and change underlying pointers referenced elsewhere).
> A team who design their API taking &String in this situation get the same treatment as a team who name all their types Data1, Data2, Data3 and so on. Somebody senior fetches the water spray, "No. Bad programmer". You might be outraged, surely C++ programmers also never get this wrong. But nope, happens all the time as Chromium illustrates.

So the thrust of your claim is that Rust programmers are better. The more generous interpretation I’ll read here is that Rust has stronger conventions here. Even still. I don’t see it. The problems are still the same. I think there’s a hyper focusing on ownership when it has nothing to do here. In c++ Const Char (and now string_view) == &str. Chrome developers were going from &str to String and back many times. If Rust had some way to convert &mut str to String without a copy if none was needed, then I think that might apply.

tialaramex · on June 27, 2022

>Again I’m failing to see the distinction. Why did that team need a string? Presumably it’s mutating right?

As a Rust programmer this is what I'd expect, because that's what alloc::string::String is for but this is not how std::string is used in C++ & especially not how it was used at that time

So I spent some more time reading. Beyond signifying ownership std::string has other properties that the raw pointer char * does not have which influence the C++ programmer and especially the enthusiastic but perhaps less experienced C++ programmer

1. It's a real C++ type whereas char * is left over from C

2. Unlike char * the std::string remembers the length of the string which also speeds up equality comparison (we know "classification" != "class" from the length before we even look at the text data)

3. std::string has a "Small String Optimisation" which will be emphasised repeatedly to you by C++ gurus. This means small local strings don't need heap space which is good. So... you should use std::string?

Now. If we compare &str and alloc::string::String:

1. Both Rust types, not left over from some prior language

2. Both know how long they are

3. Neither has "Small string optimisation". You can do this trick (oh boy and how) in Rust, but Rust's standard library intentionally does not provide it and nobody's public APIs expose such a thing.

> &mut str would let you mutate the existing characters (similar to char) but it doesn’t give you permission to resize (since doing so obviously might involve a reallocation and change underlying pointers referenced elsewhere).

Yes it would, and this exists but I've rarely seen it put to any use.

> The more generous interpretation I’ll read here is that Rust has stronger conventions here.

Ultimately yes, the conventions are stronger, a cultural difference. You can go look for yourself, at both the sprawling vastness of public Rust library and Chromium's own APIs in that era which often take string despite having no interest in ownership.

Chromium has a map (of configuration parameters) inside it, in which the keys are std::string. If you understand this as an owning object for mutation that sounds insane but if you just think it's a convenient object that knows how long the text is and keeps shorter text out of the heap it's awesome. Right?

Of course, you can't just compare a char * to a std::string, the map has no idea that would be possible, so you make a std::string from your char * and compare that. Don't worry there are only a few dozen configuration parameters to check, what do you mean this hash lookup now incurs a heap allocation ?

secondcoming · on June 22, 2022

That’s just poor software. A poor Rust dev would be one who clones everything.

seritools · on June 22, 2022

`.clone()` is visible right in your code, though, unlike the implicit conversion/constructor magic.

sseagull · on June 22, 2022

But you could write a C++ class with a manual .copy() method and a deleted copy constructor (and other conversion constructors not implemented).

Doing that seems a bit easier than integrating a whole new language into your project (although I know Rust would have other benefits).

TheCoelacanth · on June 22, 2022

You could do that if you are the author of std::string. Unfortunately, very few of us are.

Gigachad · on June 22, 2022

For me, the fact that the Linux project has decided that Rust will be included in the kernel was the point Rust has become a success. It's pretty clear at this point that Rust will be around for decades and start to replace the well established C space.

flohofwoe · on June 22, 2022

> the Linux project has decided that Rust will be included in the kernel

Isn't this a bit oversimplified? IIRC the Rust support was only started as an option for writing device drivers in Rust, not actual kernel code, and even this relatively simple use case looks like a herculean effort which required changes to feed back into Rust - which is a good thing of course, but currently it looks like the Rust ecosystem benefits more from that project than the Linux ecosystem ;)

tialaramex · on June 22, 2022

Firstly yes, not all Linux targets are supported Rust targets, as a result if you were to rewrite Linux memory management code in Rust you can't ship that without breaking some Linux platforms. Over time it is expected that: Linux will stop caring about some very, very old targets; Rust will gain support for some more targets (especially those that aren't very, very old) and so this issue goes away. Meanwhile drivers cannot run on all targets so them being in Rust doesn't change whether your target can run Linux.

Rust for Linux was driven by existing Linux developers. So they obviously think that's great news for Linux not just for Rust. One reason is: Nobody else was delivering a viable way forward. If there were ten of these safe low-level languages and by doing nothing Linux could be sure one would pick "Make ourselves suitable for Linux" as its goal then they could just sit back. Instead Rust is the only game in town, so it's either adjust Rust [ e.g. Rust for Linux alloc crate doesn't have the concatenating + operator on String like your userspace Rust does because that's the sort of thing which makes Linus angry ] or risk not having any safe path forward for the foreseeable future.

And yes, ultimately whether Rust for Linux goes into the Linus tree at some point is always ultimately a decision for Linus Torvalds. It's just that he made friendly noises about it, and there are people putting the work in, so it's like for dozens of other prospective Linux features, we can assume it will land but we shouldn't assume when.

erlend_sh · on June 22, 2022

Rust For The Kernel Could Possibly Be Merged For Linux 5.20: https://www.phoronix.com/scan.php?page=news_item&px=Rust-For...

IshKebab · on June 22, 2022

Initially for device drivers. Rust isn't precluded from other parts of the kernel forever. It's just that device drivers are an obvious starting point.

bluejekyll · on June 22, 2022

> Rust is not yet available on all Chromium platforms (just Linux and Android for now)

The beginning of this sentence didn’t surprise me, but the fact that it’s just Linux and Android did. Rust supports macOS and Windows really well, so I wonder what the gap is here?

> Facilities and tooling in Rust are not as rich as other languages yet.

Is this meant in the context of Chromium?

cmrdporcupine · on June 22, 2022

The amount of things that have to happen for a modification to the build environment for a target is more than just that the tooling exists and is supported on those platforms.

Picking just one piece: Consider that Chromium builds happen in a distributed build farm. There's multiple variants of this (goma, rbe). I'd imagine those systems would have to be modified to support the Rust toolchain for that target.

And it looks like this work is built around making GN/ninja support Rust, just using cargo directly. So that's what they likely mean by "not as rich as other languages yet."

flohofwoe · on June 22, 2022

AFAIK it took years for the Chrome code base to allow a different C++ compiler than MSVC for the Windows build (e.g. Clang is now supported too). It's not surprising that they don't provide Rust support for all platforms right from the start.

ntoskrnl · on June 22, 2022

It was bound to happen eventually. Big projects like Chromium tend to move slowly so it will probably be a few years before any Rust code ends up in a shipped release. But this is a great start!

keyle · on June 22, 2022

Chrome is indeed big, however calling them slow is incorrect.

In fact there are articles of people burning out with the release process of Chrome, left behind by the sheer speed at which the project is moving forward.

summerlight · on June 22, 2022

This is so true. There were tens of Chromium based browsers which had a good niche market interests but eventually failed to keep up with Chromium's super fast development pace. I'm pretty sure that the Edge team spends lots of their eng time solely on rebasing and one of the key factor for a successful Chromium based-browser project is to manage clean separation of your domain code and Chromium codebase. Otherwise, your team member will be quickly burnt out and leave the team.

ntoskrnl · on June 22, 2022

You're right. I meant to say big projects tend to be conservative with adopting new technologies. I didn't mean to imply anything about development pace.

legolas2412 · on June 22, 2022

Can you give examples for the articles? Very interested in knowing

jupp0r · on June 22, 2022

I don’t think Chrome is moving slowly. They are on the very edge of web technology. They are for example relatively fast at adopting modern C++ standards (compared to other big code bases). [1]

[1] https://chromium.googlesource.com/chromium/src/+/HEAD/styleg...

nikeee · on June 22, 2022

> The language, at least for now, is Rust.

What are the alternatives in the context of Chromium development as a replacement for C++?

pjmlp · on June 22, 2022

https://www.chromium.org/Home/chromium-security/memory-safet...

See "Using safer languages anywhere applicable".

lordofgibbons · on June 22, 2022

Interesting that Java, Swift, and Javascript are listed there but not Go. I wonder why

pjmlp · on June 22, 2022

Because they are relevant on the context of Android, Apple OS, and ChromeOS respectively, as the main OS languages alongside C and C++.

gitgud · on June 22, 2022

It's funny that it was created in Mozzila and first used in the [1] servo browser, and now is getting adopted by Mozzila's rival Chromium

[1] https://en.m.wikipedia.org/wiki/Rust_(programming_language)

staticassertion · on June 22, 2022

I hope this works out. A memory safe browser would be a huge win for security.

brabel · on June 22, 2022

Are there many exploits caused by memory issues in Chromium?

staticassertion · on June 22, 2022

Yes, I believe literally every single 'in the wild' exploit has abused memory unsafety, as well as hundreds of vulnerabilities every year.

brabel · on June 22, 2022

That's strange... if they'd written the code in a memory safe language, say, Java... there wouldn't be any vulnerability?

I don't know... I see plenty of vulnerabilities in the Java world, no memory unsafety needed.

UncleMeat · on June 22, 2022

Vulns exist in Java applications. Logic bugs can open all sorts of doors to exploitation. But empirically we observe that a huge portion of real vulnerabilities in applications written in C or C++ are memory errors. We've spent decades trying to get people to write C and C++ applications without these errors and utterly failed.

A browser written in a memory safe language won't be free from vulns but it will be free from a huge class of recurring and very serious vulns.

pornel · on June 22, 2022

Yes, if they used Java then a whole class of use-after-free exploits wouldn't be possible.

Security is not a binary yes/no. There are many many ways in which programs can be insecure. Eliminating one class of bugs helps reduce total amount of issues and their severity.

It's like car crashes: cars with seatbelts, airbags, and auto-braking systems kill way fewer people than they used to, but are still deadly.

throwaway894345 · on June 22, 2022

Security isn't a binary, and the vulnerabilities that are possible in a memory safe language are a subset of those possible in a memory unsafe language. We want to minimize the number of possible vulnerabilities.

Yes, the above is a bit oversimplified: most memory-safe languages have an "unsafe" escape hatch, so technically these vulnerabilities are possible; however, these escape hatches are rarely used, explicitly opted-into, and clearly demarcated in the source code such that the number of vulnerabilities in "memory safe" languages is far smaller than "memory unsafe" languages.

brabel · on June 23, 2022

I was not the one claiming security is binary. The opposite of that (I happen to work in security)... people just casually claiming that 100% of the vulnerabilities in Chromium are due to memory unsafety are implicitly claiming that none of them were logic bugs and what follows is that had the same code been written in Java, Rust, even Python for that matter, none of those vulnerabilities would've been possible - which means there would be no vulnerabilities at all!? I would expect people commenting here to have basic understanding of propositional logic.

staticassertion · on June 23, 2022

100% of exploited vulnerabilities leveraged memory unsafety. 70% of vulnerabilities discovered are from memory unsafety.

dgb23 · on June 22, 2022

A security related bug often comes down to violating the principle of least power either in a technical way by introducing a leak that can be exploited with crafted input, or via design, where a human participant is assumed to be trustworthy.

I think wider memory safety, SQL injection and things like log4j are related to the former. Some aggregation of data that should be dumb and restricted is given too much trust, so data can be lifted to code and code is too powerful. In essence they are all similar, even though we don't use the same technical terms for each of them.

And yes, if a given programming environment restricts a class of operations, it is given less power so the attack surface is qualitatively smaller. Languages that restrict memory management are an example. Another one would be file/disk access, network access and so on.

staticassertion · on June 22, 2022

The question was if the exploits were due to memory safety. The answer is, yes, 100% of them are due to memory safety.

As for Java, quite a lot of the exploits against it (when it was a browser plugin) were in fact memory safety issues in the VM. But more recently what we see are serialization issues, which Rust also does not have.

imron · on June 22, 2022

About 70% of all exploits: https://www.chromium.org/Home/chromium-security/memory-safet...

staticassertion · on June 22, 2022

Just to clarify, exploit generally refers to software that takes advantage of vulnerabilities, vulnerabilities are the flaws themselves. As in, one writes code to "exploit" a "vulnerability".

70% of all vulnerabilities reported to them are memory safety issues. To my knowledge, 100% of exploits against Chrome in the wild leverage memory unsafety.

j-krieger · on June 22, 2022

How anyone can read a survey like this and still argue that the benefits of Rust (or any language with the ownership model) don't outweigh the risks / negative aspects is beyond me.

yjftsjthsd-h · on June 22, 2022

Oh, that's easy; all you have to do is argue that it wasn't a representative sample. Just because 70% of security problems in Chromium are memory-safety problems doesn't mean that arbitrary project X has the same proportion or risks. Chromium is a very specific kind of application (network client that almost exclusively talks to untrusted servers, does media decoding, large, runs as an application, long-running), so it's plausible that its issues are unique.

However, it gets a lot harder to argue that it's just Chromium when Microsoft found the same thing: https://www.zdnet.com/article/microsoft-70-percent-of-all-se... At that point, the strongest argument shifts from "Chromium is the outlier" to "my code is the outlier". And that's... possible to defend (ex. the OpenBSD folks have a track record that says they can write safe C), but certainly harder.

staticassertion · on June 22, 2022

TBH it's only easy if you're ignorant. It's quite obvious to anyone who's informed or educated at all that using Rust would address major security issues. It sucks that people require so much convincing.

comment500 · on June 22, 2022

One caveat here is that many of the vulns used in the wild are in V8 and related to JIT code generation. Unfortunately rewriting in Rust can't really help with this.

staticassertion · on June 22, 2022

Yes, that's a really good point. The vulnerabilities are across a number of components (skia, for example, is another big one) but memory safety in VMs is particularly tricky. Still, my personal, somewhat unfounded belief is that Rust has a lot of potential there.

staticassertion · on June 22, 2022

Most developers understand virtually nothing about security.

imron · on June 22, 2022

Good point!

blub · on June 22, 2022

I have a hard time imagining a more accidentally complex piece of software than a web browser written in C++ and in Rust.

All the chaos of the so-called web standards, decades of accumulated C++ complexity and the eccentricity and burgeoning complexity of Rust on top. Getting assigned to such a project must be akin to punishment.

dblohm7 · on June 22, 2022

Not really. That’s a typical workday for many Gecko developers.

Some useful guidelines made it pretty straightforward:

https://wiki.mozilla.org/Oxidation

saagarjha · on June 22, 2022

In chaos some see order.