Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Socket – Secure your JavaScript supply chain (socket.dev)
133 points by feross on March 1, 2022 | hide | past | favorite | 42 comments
Excited to share the project I've been working on for the past 7 months!

We've seen nearly weekly attacks against the open source software supply chain. I saw the seeds of this trend start in the mid 2015s as an open source maintainer and I've watched it only get worse over the years. I finally decided to try to solve this problem.

Socket is taking an entirely new approach to one of the hardest problems in security in a stagnant part of the industry that has historically been obsessed with just reporting on known vulnerabilities. Unlike other scanning tools, Socket actually analyzes the package code to characterize the package's behavior. This way, Socket can detect when packages use security-relevant platform capabilities, such as the network, filesystem, or shell.

You can search for any npm package and see issues that've we've flagged for each package. We look for 70 issues (full list here: https://socket.dev/npm/issue) and we put those into a Package Health score. See these examples:

https://socket.dev/npm/package/left-pad

https://socket.dev/npm/package/lodash

Socket looks for indicators present in all of the recent npm supply chain attacks. We're proactively auditing every package on npm to flag these issues.

Separately, we have a GitHub app that you can install. It detects typosquat attacks and leaves a comment on your pull request to let you know you might have installed the wrong package. We're currently working to enable it to leave comments for more of the package issues that we can detect, but we want to get the UX really good on that first, so we've released it and labeled it "beta".

Happy to answer questions.



I would honestly completely ditch the Socket "ratings". React = 70 - what the hell does that mean? "quality = 82, license = 70" where did you get those numbers? They are basically made-up.

There are certain ways you can objectively determine the security of a package:

- If it or one of its dependencies has been compromised

- If it and its dependencies are made by a trusted org (e.g. I would trust something from Microsoft more than something from a random developer)

- If it and its dependencies have been audited and verified (this could even be a quality audit, or something like the Elm ecosystem)

- If it and its dependencies are made by you, you can be pretty confident that nobody has tampered with them

Similarly there are ways to determine the quality (e.g. code coverage, usage, whether it has tests). And of course you can see the license.

But state these things directly, don't try to summarize them with numbers. Because ultimately those numbers are almost meaningless and make your tool look fake.


I think this is a valid criticism of using scores to summarize a complex subject, and I agree that we can definitely do a better job on this.

What's interesting is that arbitrary, unit-less scores seem to work well in products like Lighthouse [1] and SSL Labs [2] even though they're also "basically made up" metrics. I'm curious if anyone has ideas about why this may be the case.

To your point about just stating things directly, we also do that too. See https://socket.dev/npm/package/left-pad/issues, for example.

[1]: https://developers.google.com/web/tools/lighthouse [2]: https://www.ssllabs.com/ssltest/index.html


SSL Labs works because letter grades, like most low granularity ordinal measures, clearly read as qualitative and approximate rather than having false precision. No one is saying things can't be in aggregate better or worse. Lighthouse is closer to nonsense, though the category and subcategory explanations at least seem transparent enough.


I work as a security engineer for a benefits provider. This stuff has been top of mind for me, the rest of my team, and our CSIO.

I love the direction you're going with this. Would love to hear more about:

1. Status. Is this a beta product or do you have enough here to sell a service?

2. Roadmap. What are you building out next? What are core capabilities you see being added to Socket?

3. Pricing. How much does it cost? What's your revenue model?

Depending on how you've put this together you may be addressing a serious need in this space that I've only previously seen addressed by stuff like Deno (where you set application capabilities prior to execution) or via virtualization.

Contact details in my profile. If your product does what it says on the tin I'd love to give it a spin.


1. We are in beta right now! We have the data and analysis, and are working hard to make it easy to integrate project specific views into it.

2. https://socket.dev/roadmap See https://socket.dev/npm/issue for analysis we have developed on a per-package basis. We are working on integrating these with our GitHub App to provide custom tailored project views into this data. More info here https://socket.dev/blog/inside-node-modules

3. It will be a paid product, but is free while it is in beta. We plan on keeping it free for open source.

We are looking into providing support for the Deno ecosystem down the road as well. The capabilities stuff they have is super great, but you lose all of that benefit for every dependency as soon as you turn it off, so we think there is probably room for this kind of analysis there. Hopefully Socket can provide a similar signal that --allow-net provides, but for all of npm!


Why does Socket flag npm packages with "MIT" in the license field as non-FSF approved, such as in lodash [1]. It appears the FSF seems to prefer to call the MIT license "Expat" -- but Expat's SPDX identifier is literally "MIT".

[1] https://socket.dev/npm/package/lodash/files/4.17.21/package....

[2] https://directory.fsf.org/wiki/License:Expat


I may be able to speak on this specific issue! (I had to get this license approved by legal recently) Lodash is MIT, but contains a non-standard clause referencing the fact that it’s based on Underscore.js. Legal said this was totally fine, but as usual: my lawyer is not your lawyer ;)


Non-FSF approved licenses as just flagged as informational – they shouldn't detract from the score. But perhaps that bit of info just isn't that interesting to most people.

If you find more examples like this, please email them to me (see bio) so we can fix them. We appreciate the feedback!


It looks like having an MIT license gives you a score of 70. It seems wrong that the most commonly used license on the platform is poor and it makes it hard to trust your scoring on anything else.


Great idea! As an aside, I can't help but immediately comment on the name, which kinda seems like a semi-poor choice given that socket.io is a very popular javascript websocket library and your product doesn't seem to have anything to do with sockets or websockets. (I can definitely get really wanting to use socket.dev after snagging such an obviously cool-to-have domain though.)

Also "left-pad Supply Chain Security: 0", lmao.


Unmaintained: Package has not been updated in more than a year and may be unmaintained. Problems with the package may go unaddressed.

Packages can be finished and do everything they set out to do. The fact they haven't been updated in a while can be a measure of high quality as much as it can be a sign of low quality.


> Socket is built by a team of open source maintainers with over 1 billion monthly downloads. Everyone on the Socket team is an open source maintainer.

So is Socket.dev open source? On your Github (https://github.com/SocketDev) I cannot find the Github App source code. So being such proponents of open source, why don't you release it as open source?


We want to open source as much of Socket as we can. However, our first priority is to build a sustainable company. This whole party ends with us losing our jobs if we fail to make this profitable.

As a maintainer I know how much time it takes to run an open source project beyond just writing code. There's triaging issues, reviewing PRs, answering questions, adding support for alternative operating systems/runtimes, and more. At this precise moment we just don't have that capacity, although I look forward to the day when we'll be able to share more of the code publicly.


I don't know how installing a GitHub app to give someone access to a company's repositories improves security. Might as well require people to enter an email address to learn what your product is.


You're right that introducing a GitHub app comes with certain risks, but "security" isn't a monolithic concept. For many people, installing a security tool like Socket to prevent supply chain attacks will on-net improve their security posture.

Also, for the record, the Socket GitHub app was designed to only read package manifest files such as `package.json`, `package-lock.json`, and `yarn.lock`. It doesn't read other files, and it definitely doesn't send those files to remote servers.


I think it would be awesome if Socket could be used through an auditable GitHub Action where developers can manually send over just their package*.json files.


We agree. It's on our roadmap.


What is the name of the animated visualization on the front page? I thought it was static, but it is an amazing way to display such a dense and dry subject. 10/10


The visualization is a custom component that our own Mik Lysenko (of https://0fps.net fame) made. I agree it's a super useful way to visualize package data.


Congrats on the launch.

There is both a large need for improvements in npm supply chain security and a market willing to pay for them.

Concerns:

1) npm Open-Source Terms condition four[1] states.

You may access and use data about the security of Packages, such as vulnerability reports, audit status reports, and supplementary security documentation, only for your own personal or internal business purposes. You may not provide others access to, copies of, or use of npm data about the security of Packages, directly or as part of other products or services.

This statement seems vague enough to potentially include your use case. It also seems to include what snyk, jfrog xray, sonatype, and white-source do, so maybe this is not an issue.

2) It appears that this will be an open-core business. What capabilities are you willing to provide in the free/community edition and under which licenses?

3) The website doesn't show pricing. Can you provide details on this?

Questions:

1) What are your thoughts on using reproducible builds[2] plus Diverse Double-Compiling (DDC)[3] on the dependency graph to ensure build artifacts originate from known git repositories? Disclosure, I've been working on this for a few months now.

2) Where do you run your analysis? AWS and DigitalOcean have terms that prevent running high risk code.

3) Do you have examples of previous attacks and how your tooling would handle them?

Best of luck.

[1] https://docs.npmjs.com/policies/open-source-terms#conditions [2] https://reproducible-builds.org/ [3] https://dwheeler.com/trusting-trust/


Concern 1) I wasn't aware of this clause. Given how widespread the use of "npm data" is by the community I can't imagine they want to actually enforce this. But good to know.

2 and 3) We're still figuring out the business model, but here's our current plan: Package search and Package Health Scores are free for everyone to use through our website https://socket.dev.

Socket integrations, such as the GitHub App, are free for open source repositories forever. For private repositories, Socket is free while we're in beta, but we'll eventually charge something like ~$20/developer/month for private repos. We're still working out pricing but our #1 aim is to keep it affordable so everyone can get protected.

Question 1) I love this idea! This is something the team is already talking about. We want Socket to report reproducible builds and use them as a positive signal, as well as highlight them as a badge on the package page. For npm packages, lots of them probably already have reproducible builds that we can check by just running `npm install; npm build; npm pack`. I need to think more about DDC and how that would fit it. Perhaps we can chat about it sometime?

2) We're currently doing static analysis, so not actually running the code. Our dynamic analysis isn't ready yet so we'll cross that bridge when we get there.

3) All of the issues that Socket detects were picked with previous npm supply chain attacks in mind. You can see a list packages npm removed for security reasons here: https://socket.dev/npm/category/removed When you view any of these, we show the results of our security analysis. Here is a removed package I just picked at random to give you an idea:

https://socket.dev/npm/package/netlify-swag/files/1.2.0/inde...


feross (creator of this project and submitter of this thread) didn't mention his background/experience/pedigree but I can point out that he knows his shit. Check out his resume: https://feross.org/resume

I became aware of his existence after I realized he was clever enough to set up a bot to watch new posts on https://web.dev (I used to be content lead for that site) and automatically post every new thing here on HN


I think this is a good project, but I also feel like this problem really needs to be solved by NPM, especially since in many cases it is solved pretty trivially.

The single biggest reason that NPM is so vulnerable to supply-chain attacks is that running `npm install` or `npm update` will always install the absolute latest semver-compatible version, regardless of how long that version has been published or any other criteria.

If suppose those two NPM commands instead had flags like --conservative, --default or --canary (OK, I suck at naming, but you get the idea) where, for example:

1. --conservative installs the most recent version has been released for at least 2 weeks or has at least 10000 downloads.

2. --default only requires the most recent version to have been released for at least 1 day or has at least 1000 downloads.

3. --canary would give you the existing behavior, where you get the most recent semver-compatible version even if it's only a second old.

The idea being that, depending on the risk level of your project, you could use a different flag. Furthermore, security researchers and companies like Snyk could install packages in --canary mode in sandboxes to look for malware.

The fact is that despite all the (deserved) handwringing about NPM supply chain attacks like with ua-parser-js and the faker/color fiasco, most of these issues are discovered very quickly. Minor changes to NPM would provide a huge amount of additional security (for those who need it) with extremely little cost.


You can change nom registry url to be socket managed one, which would not let you install compromised packages - wouldn't that work?


Hey team, great to see someone tackle this project, and the idea to scan for new potential issues based on the code that changed is cool

Initial impressions when visiting the website though is I can't understand how it works exactly, for example the learn more page at https://socket.dev/integrations says Socket helps improve the posture, but it would be good to explain (perhaps with screenshots an actual example?) of it catching an issue.


I'm actually working on that page right now hah.

Right now the integration is fairly slim, it just does typo squat warnings when you install a package that has a similar name to a more common package. The warning comes in the form of a comment in PRs that include additions of packages that meet this criteria.

We have a bunch of other detections listed here: https://socket.dev/npm/issue which have not been added to the GitHub App yet, but are available on a per-package basis for manual research at the moment. Over the next few release cycles we will be adding additional issue checks and warnings to the Github integration so that you can get a warning when dependencies add new capabilities, add suspicious things like analytics or install scripts or add unknown publishers to their maintainer list, or start publishing binary or obfuscated code. These will be automatically turned on and rolled out as we determine them to be not too noisy and provide interesting signals.


Thanks, glad you like our approach!

Sorry about the confusing page. We're still working on some of the pages on the site. You might find these links more informative:

- Launch post: https://socket.dev/blog/introducing-socket

- What's Really Going On Inside Your node_modules Folder? https://socket.dev/blog/inside-node-modules

- And maybe even the launch Twitter thread: https://twitter.com/feross/status/1498676284590800903


Maybe a minor defect. I see many packages that receive the error unclear license because the bot cannot find the file LICENSE in the package when the file exists with lowercase name “license”.

https://socket.dev/npm/issue/unclearLicense


Thanks, we’ll fix that.


As with all end products where you need safety, all the dependencies (3rd party or not) you use should be treated as your own code. All the dependencies should be committed to SCM and reviewed before you deploy something into production, without any “automatic version bumping” magic that a package manager ala npm would otherwise do.


Sure, but the startup that does this is going to lose, because usually a given risky dependency isn't actually going to turn out to be malicious, and reviewing all foreign dependencies as if they were your own code is wildly time consuming.


Listen to the Changelog podcast featuring OP and focused on Socket, https://changelog.com/podcast/482.


Is there any plan to introduce human review into the process? I imagine a company that had a human review a diff of each package release prior to approval could make a ton of money from the enterprise market.


Bug report: the typo-squatting check seems overly aggressive. I checked a module with a name like "foo-engine", and Socket shows a warning that this may be a typo for "engine.io".


Thanks for the feedback, we'll look into it.


The "weekly downloads" graphs look a little silly right now because they all go to zero at the right side of the graph. Might be worth only showing complete weeks or something like that


Great suggestion. We'll fix it.


What is a supply chain attack when it comes to packages? I'd at least include that in the FAQ as I couldn't figure it out.


Thanks for the feedback.

You might enjoy this blog post which explains supply chain attack at length and starts off with a concrete example of a recent supply chain attack (the ua-parser-js attack): https://socket.dev/blog/inside-node-modules


I like the fact that Firebase has a 0 for Quality and 67 for its supply chain security.


Is Socket open source? Can I run it locally from the command-line?


Congrats on the launch


[deleted]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: