More

loftsy · 2024-04-03T11:08:18 1712142498

I took a look at the diff linked in the article with code that "we are all running". The top of the diff certainly looks interesting. They remove the bounds check in dict_put() and add a safe version dict_put_safe().

This kind of change is difficult to make without mistakes because it silently changes the assumptions made when code calling dict_put() was originally written. ALL call sites would need to be audited to ensure they are not overflowing the dictionary size.

The diff I am referring to is here:

https://git.tukaani.org/?p=xz.git;a=commitdiff;h=de5c5e41764...

justinsaccount · 2024-04-03T12:56:59 1712149019

Also because the 'safe' version only checks

  dict->pos == dict->limit

and not

  dict->pos >= dict->limit

if you can get one call of dict_put somewhere to pass the limit, all later calls of dict_put_safe will happily overwrite memory and not actually be safe.

Calzifer · 2024-04-04T08:22:55 1712218975

No, because dict_put will update the limit value if the new pos exceed it.

justinsaccount · 2024-04-04T11:55:16 1712231716

I don't see anything like what you are describing. What line exactly are you talking about?

ahartmetz · 2024-04-03T21:04:19 1712178259

Wow, that is 1000% obviously malicious

Matumio · 2024-04-03T12:26:36 1712147196

Agree, nice catch. Also, there are many other opportunities in this patch to hide memory safety bugs.

This is the kind of optimization I might have done in C 10 years ago. But coming back from Rust, I wouldn't consider it any more. Rust, despite its focus on performance, will simply not allow it (without major acrobatics). And you can usually find a way to make the compiler optimize it out of the critical path.

kmfpl · 2024-04-03T11:36:25 1712144185

I agree, this looks extremely sketchy. Especially because the code is just writing a fully controlled byte in the buffer and incrementing its index.

This would give you a controlled relative write primitive if you can repeatedly call this function in a loop and going OOB.

liendolucas · 2024-04-03T12:13:16 1712146396

I think at this point is clear that everybody has to assume that XZ is completely rotten and can no longer be trusted. Is it XZ easy to replace with some other compression tool? Or has it been so widely adopted that is going to take huge effort moving out of it?

dralley · 2024-04-03T12:39:30 1712147970

There is no reason to assume that. Even if you assume every commit since Jia became a maintainer is malicious, the version from 3 years ago is perfectly fine.

Zstd has a number of benefits over Xz that may warrant its use as a replacement of the latter, and this will likely be a motivating factor to do so. But calling it entirely rotten is going way too far IMO

mmd45 · 2024-04-03T13:19:34 1712150374

There is an interesting argument to be made that pre-JT xz code is probably pretty secure due to the fact that the threat actors would have already audited the code for existing exploits prior to exerting effort to subvert it.

tripflag · 2024-04-03T13:16:04 1712150164

I always use "zstd --long=31 -T0 -19" to compress disk images, since that is a usecase where it generally offers vastly superior compression to xz, deduplicating across bigger distances.

XZ offers slightly better compression on average, but decompression is far slower than Zstd.

dralley · 2024-04-03T13:19:57 1712150397

IIRC memory consumption is generally worse for Zstd at comparable levels of compression. Which, these days, is generally fine, but my point is you can't thoughtlessly substitute the two.

liendolucas · 2024-04-03T12:48:32 1712148512

What keeps ringing in my head is the "." that was found that invalidates compilation. I personally don't buy it (but is my opinion).

dralley · 2024-04-03T13:08:45 1712149725

What do you mean "don't buy it"?

liendolucas · 2024-04-03T14:22:05 1712154125

My bad. I thought that the person who made that commit was someone else than JT. Can't delete comment nor self-down-vote it.

kzrdude · 2024-04-03T12:30:17 1712147417

Huge effort, because it is the default .deb compressor in Debian for example

rthnbgrredf · 2024-04-03T12:36:03 1712147763

Arch Linux has replaced it with zstd in 2020 already. It's doable for the next major release of Debian.

kzrdude · 2024-04-03T13:43:53 1712151833

Certainly, but we need an xz decompressor to read the current debian repo versions for the next decades, when they are oldstable or archived.

formerly_proven · 2024-04-03T14:36:48 1712155008

Decoding is easy.

logro · 2024-04-03T18:09:52 1712167792

This is 100% malicious or novice coder. And we surely know it's not the latter.

If you need an unsafe call, you add a dict_put_unsafe(). That again should of course be rejected in a code review.

loftsy · 2024-03-18T12:02:58 1710763378

It's not enough to write a law on principles alone. It must be clear and practical to comply and clear how it will be enforced. The EU should not have created a situation where the most practical solution for 1000's of companies is a cookie banner.

lesuorac · 2024-03-18T13:56:22 1710770182

Eh, I think people have the wrong take-away from all of this.

Imagine if the banner said "This website is known to the state of California to cause cancer". Would you keep visiting the site?

Like if every time you went the bar, the bouncer asked "Hey, can I punch you in the face?". Would you keep going to that bar?

As annoying as the banners are, they actually aren't annoying enough to change mass-behavior.

loftsy · on Nov 14, 2023

Why is a Debian developer a better gatekeeper against supply chain attacks than the developer of a popular Rust library?

For me the alternative to Crates.io/npm/PyPi is a "platform release" like Android where a bunch of stuff is signed off by a corporation.

palata · on Nov 14, 2023

I would assume that the Debian _maintainer_ cares about what is being shipped with Debian. And for each dependency (transitive or not), the Debian maintainer has to make sure that it is shipped by Debian, and therefore has to not only know about the dependency, but also to make a package out of it (if it is not already being shipped by Debian).

Whereas the developer of a popular Rust library most likely just added some dependencies that were convenient, and doesn't know the full story of the transitive dependencies.

To me it is much more likely that somebody at Debian _saw_ the dependency being shipped than an arbitrary Rust developer, who probably does not even know how many dependencies are being shipped with their program.

dathinab · on Nov 14, 2023

because it often takes so long to do any updates that people just give up on updating in time, weather that is for bug fixes or for injecting supply chain attacks /s/j

yjftsjthsd-h · on Nov 14, 2023

Well if nothing else, it's the difference between one person being able to ship something vs needing at least one person to review before shipping.

throwaway322321 · on Nov 14, 2023

Debian developers go through a long step-by-step process to get in that position. Volunteering as a package maintainer first, getting Debian dev to advocate for you, cross-checking people passports, doing interview/exams to become a DD.

DDs review the libraries they package. All uploads are personally signed to keep people accountable.

Instead opening a github account and developing a library can be done anonymously and there's been supply chain attacks done this way.

riku_iki · on Nov 15, 2023

maybe adaptation: debian has higher adaptation than rust, meaning more people will start screaming if something wrong is going.

codys · on Nov 16, 2023

I'm not sure how one would define "adaptation" (presumably meaning "adoption", ie: how widely used something is) in a way that could have debian have a higher adoption than rust: rust ships in programs within many linux distributions and other operating systems (mac os and windows). rust is used in widely used programs/kernels (Firefox, Chrome, Linux itself, Windows) that span further than debian itself.

All this is to say: a large amount of debian users likely are using rust, and an even larger amount of non-debian users is likely to be using rust.

riku_iki · on Nov 16, 2023

> rust is used in widely used programs/kernels (Firefox, Chrome, Linux itself, Windows)

those few narrow components likely have good support.

loftsy · on Feb 2, 2023

I've written some Erlang and found the syntax awkward. One example of this is that variables can only be bound once. This means you end up writing:

  Val = SomeFunc()
  Val1 = Func2(Val)
  Val2 = Func3(Val1)

Yes I know you can use the functional style to minimise having to do this but you still end up seeing this pattern in real code.

Kubernetes has done a good job of replacing OTP.

That said I think Erlang is still an interesting language to learn just because of how different it is.

davidjfelix · on Feb 2, 2023

> Kubernetes has done a good job of replacing OTP.

I feel like this statement alone demonstrates a wild misunderstanding of what OTP does and is. Kubernetes is at best a replacement for VM-level scheduling, but without a slew of additional functionality and application-level interaction with the kubernetes runtime, individual servers are basically running blind in a container, totally unable to do anything that OTP provides to gen_servers for free.

codesnik · on Feb 2, 2023

I personally don't find this binding once behaviour problematic in any way, but Elixir specifically addresses this (it creates another variable with the same name, though)

holsee · on Feb 2, 2023

I wish they didn't tbh, it makes me sad.

Not allowing rebinding does provide a little bit of encouragement towards factoring things better, but I have seen manys a `X1` `X2` in my time :(

weatherlight · on Feb 2, 2023

its a bit of a smell, but good elixir code doesn't have to worry about rebinding for bindings in the same scope.

Jose Valim goes on as to why he made that design decision from the following: https://groups.google.com/g/elixir-lang-talk/c/w83lKZs4YS8/m...

    I will post the same answer from before: I like the explicitness of pattern 
    matching. In Elixir, as soon as I see ^foo, I know I am matching. If there is 
    no ^, I know the previous value regardless if it there is one or not, will be 
    discarded. If pattern matching is not explicit, I always need to know if a 
    variable was previously defined or not to know what is going to happen. To me 
    this behaviour is non negotiable. In my experience, it is more likely to run 
    into accidental matches than into accidental rebindings.

    Another possible limitation to the suggestion above can be related to macros. 
    Let's suppose you have a macro that stores a value in a hygienic variable:

    defmacro do_something(a, b) do
      quote do
        var!(hello, Hygienic) = calculate_something(unquote(a), unquote(b))
      end
    end

    Someone may call this macro multiple times but we always care about the last 
    value 
    of hello. If we make a distinction in between being assigned once and then 
    multiple times, the macro simply won't work. Or the macro developer would need to 
    work around this behaviour by inspecting the environment or we would need to 
    provide some sort of functionality that does it for you. Maybe this behaviour 
    could be built-in in var! or maybe we'd need to introduce something like:

    defmacro do_something(a, b) do
      quote do
        set!(hello, Hygienic) = calculate_something(unquote(a), unquote(b))
      end
    end

    We can see how this is getting complex:

    1. Use = to define variables
    2. Use ^ for pattern matching
    3. Use := for rebinding
    4. Use set! in macros when you don't care if a variable was previously defined or not

    It is also interesting to point out that the := operator won't work for rebinding 
    inside clauses. For example, how would you make the rebinding below explicit?

    x = true
    case false do
      x -> :ok
    end

    Of course, there are interesting consequences for making a distinction in between 
    defining and rebinding a variable by introducing something like the := operator. 
    We could have better control of the scope to provide better warnings or even make 
    it easier to implement imperative for loops:

    x = 0

    for i <- 1..5 do
      x := x + i
    end

    x #=> 15

    But the fact Elixir have different ways for variables to be introduced in the 
    scope, adding more rules can make the overall system very complex.

holsee · on Feb 2, 2023

Sounds like a good old foldl may or may not have brightened up your day (with the "functional style" you mentioned) e.g.

  Pipeline = fun(Functions, Initial) ->
    lists:foldl(
      fun(Function, Acc) ->
        Function(Acc)
      end,
      Initial,
      Functions
    ),
  Initial = SomeFunc(),
  Pipeline = [Func0, Func1, Func2, Func3],
  Result = Pipeline(Pipeline, Initial).

At a certain point I grew to not mind the punctuations `; , .`

I often miss language itself given how verbose things are in elixir by comparison.

The language ecosystem is pretty good with LFE for the Lispers https://lfe.io/ and https://gleam.run/ for the OCaml inclinded.

weatherlight · on Feb 2, 2023

seems more complicated than the following elixir code.

    Val
    |> SomeFunc()
    |> Func2()
    |> Func3()

holsee · on Feb 7, 2023

I agree, or a nice `with` clause.

When strict simplicity is not the ultimately goal dare I say leaning into the monads is the way...

kaptainscarlet · on Feb 2, 2023

Why do you need the extra variables when you can pass function expressions into other function calls?

loftsy · on Feb 2, 2023

It just turns out you can't always do that in a real codebase. For example see here:

  https://github.com/apache/couchdb/blob/23efd8e5b1aa96ef01640fec03a5fedc945ba8b9/src/couch_mrview/src/couch_mrview_http.erl#L228

loftsy · on Jan 12, 2023

As the engineering manager you should implement an automated code formatter in your CI checking. I think clang-format will do this for C. Doing this simply removes this whole class of conflict from your team.

This is a program that you run which will deterministically re-format your whole codebase. The CI checks on a PR should not pass until the PR diff matches what the formatting program says is the correct formatting.

loftsy · on Aug 1, 2022

Plyable | Graduate Research Engineer, Full Stack Engineer | Oxford, UK | Full-time | ONSITE

At Plyable we are automating to the composites manufacturing process. We have started by automatically pricing and designing molds from a part design. We make parts which go on superyachts, electric cars and aeroplanes.

We have found a good product/market fit and have the ambition to take our automation to the next level.

See https://www.plyable.com/careers for more information.

loftsy · on June 21, 2022

An "isolate cloud" is exactly how Google App Engine worked 10 years ago. It was a good idea then and is still a good idea now.

loftsy · on Feb 26, 2021

I haven't worried about SSH keys in years since moving everything to GCP or AWS. One less thing to worry about.

loftsy · on May 1, 2019

Mouldbox | Frontend | Oxford, UK | Onsite | https://mouldbox.com

Mouldbox are adding automation into the composites manufacturing process. Our technology helps to make moulds for parts which go on superyachts, electric cars and aeroplanes. Founded in 2018 Mouldbox have raised venture capital funding in 2019 and has received government grants. We are a team of 6 and expect to double our headcount in the next year.

Feel free to reach out directly to me if you have any questions (I'm the CTO), a.lofts@mouldbox.com

* All open roles: https://mouldbox.workable.com/

loftsy · on Dec 2, 2017

Try out dart or typescript.

spraak · on Dec 2, 2017

But those just transpile to JS as well and likely suffer the same problems.