Hacker News new | past | comments | ask | show | jobs | submit | more ori_b's comments login

It's a BSD licensed binary blob. There's no code provided.


Wow! That is so weird.

Its freeware under an open source license. Really misleading.

It looks like something you should stay away from unless you need it REALLY badly. Its a proprietary product with unknown pricing and no indication of what their plans are.

Does the fact that the binary is BSD licensed allow reverse-engineering?


> Redistribution and use in source and binary forms, with or without modification, are permitted

Reversing and re-compiling should count as modification?


How can you tell?


I mean, based on the claims and the benchmarks, it seems to provide massive speedups to a very popular tool.

How would you define "quality" in this context?


High quality code isn't just code that performs well when executed, but also is readable, understandable and maintainable. You can't judge code quality by looking at the compiled result, just because it works well.


That's certainly one opinion about it.

One could also say that quality is related to the functional output.


> One could also say that quality is related to the functional output.

Right, I said nothing that contradicts that ("High quality code isn't just code that performs well when executed, but also ..."). High quality functional output is a necessary requirement, but it isn't sufficient to determine if code is high quality.


Sure, I guess it depends on what matters to you or to your evaluation criteria.

My point was that it's all subjective in the end.


It's not really subjective if you're at all reasonable about it.

Imagine writing a very good program, running it through an obfuscator, and throwing away the original code. Is the obfuscated code "high quality code" now, because the output of the compilation still works as before?


Again it depends what you mean by "high quality code".

Do you mean how well it was written, or do you mean how well it performs? Or do both matter? Equally, or one more/less than the other?

It probably depends on whether you're the developer taking over the codebase, or the customer running the code in production..

Take video games.. A lot of it is messy spaghetti C++ code, not modular or well structured, full of hacks and manual optimizations, to give the best possible performance on available hardware.

It might be impossible to parse or maintain, but it does the job about as well as possible, which is really all that matters to the end user. I would call that high quality code.

So again, subjective...


Written so that it's easy to maintain, well tested, correct in its handling of edge cases, easy to debug, and easy to iterate on.


It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.


True for consumer products like ChatGPT but there are plenty of models that are not censored. https://huggingface.co/models?sort=trending&search=uncensore...


No. The censoring has already been done systematically by tech corporations at the behest of political agents that have power over them.

You only have to look at opinions about covid policies to realize you won't get a good representation because opinions will be deemed "misinformation" by the powers that are vested in that being the case. Increasingly, criticism of government policy can be conflated with some sort of crime that is absolutely up for interpretation to some government institution so people self censor, companies censor just in casa and the Overton window gets narrower.

LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.


> LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.

I don't think this is a description of LLM censorship though, especially in light of the fact that many LLMs are fine-tuned for the explicit purpose of censoring responses otherwise generatable by the model, Contrasting uncensored models with censored ones yields objectively uncensored results.


Could be interesting if used with many different llms at once


> Raymond Chen has written in the past about this problem

That would be a citation. Do you have a link?


Three random Explorer examples:

https://devblogs.microsoft.com/oldnewthing/20230911-00/?p=10...

https://devblogs.microsoft.com/oldnewthing/20230324-00/?p=10...

https://devblogs.microsoft.com/oldnewthing/20220613-00/?p=10...

For code injection into applications that don't load third-party DLLs as plugins, see, e.g., Microsoft's (unsupported) toolkit for runtime API interception:

https://github.com/microsoft/Detours


You install tortoiseSVN or something similar, look at explorer.exe process or any process that use a standard "Open File" widget, and you will see some dll from the utility loaded by the process. (Easy to see with process explorer from sysinternals)


I think tortoiseSVN and consorts are "just" a shell extension, though, which is an officially supported concept, even if that means that potentially any random software using the standard file dialogues ends up loading your DLL, too.


The server side efficiency problem is running advertising.


The some of the brightest minds of our generation has been wasted on optimizing how to deliver personalized advertisements in front of as many eyeballs as possible, and the true tragedy is that they didn't have a more efficient language than JavaScript to do it.


It really doesn't matter. Advertisement is the debt collection of software. Its this shit nobody wants to do. As a result these people tend to make more, be pampered, and the output is generally still garbage and the developers are still depressed. Compare that with gaming where the developers are worked to death and generally underpaid, but they love what they do and have a great time building amazing things.


It’s funny you mention this, because I just visited space.com for the black hole article (no adblocker on this device) and I could feel the device becoming warm and literally lost 3% on my battery level.


Most programs require a setup phase, where they want a great deal of access to the environment to set up the resources they use, and a steady state phase, where they need very little access beyond pre-opened file descriptors.

An externally imposed sandboxing feature can be useful for namespacing, but is necessarily less restrictive than pledge and unveil. For example, in steady state on OpenBSD, most programs can't even their own configs.


> An externally imposed sandboxing feature can be useful for namespacing, but is necessarily less restrictive than pledge and unveil.

An externally imposed sandboxing feature isn't necessarily less restrictive than pledge and unveil at all, although I'm curious why you think that is the case.

To say nothing of the fact that pledge and unveil are wholly dependent on developer opt-in.

A good sandboxing solution should be robust and not require the cooperation of the programs that require sandboxing.


Those are two different kinds of sandboxing. One protects application from itself - "here is what I use, if I try anything else, then it's a bug".

Sandboxing you're talking about protects system from the application. You really need both.

re: restrictiveness

With external sandboxing, you need to restrict to a common denominator of all application states you see youself observing that application. Internal sandbox can adjust itself as it goes.


> Those are two different kinds of sandboxing.

I'm arguing that only one example is sandboxing, the other is imposing limitations but doesn't meet the definition of sandboxing.


> An externally imposed sandboxing feature isn't necessarily less restrictive than pledge and unveil at all, although I'm curious why you think that is the case.

With pledge, you can read config files, open a file handle to your logs, and then completely drop the ability to open() files at all for the remaining lifetime of the program. How would you do that from outside the program?


By monitoring and intercepting what the program is doing?


I guess technically you could write a dynamic policy that... I guess you'd give it a list of files that the program can access exactly once? But that seems difficult and brittle. Is anyone actually doing that?


No, I suppose not, but then, is there really an advantage to doing so?

I'm much more concerned with blocking write and execute access than I am about a potential hacker being able to read the config files of the program they leveraged to get a shell.

I think it's a good approach and part of defense in depth, but if we're comparing approaches I'll tale the former every time.


> No, I suppose not, but then, is there really an advantage to doing so?

A massive one; not reading the config is just a consequence of dropping all file system access once you're done with the initial program setup. You can also set up listening sockets, and then stop the program from making any new network connections, even after a compromise. And so on, with most resources. There are a lot of resources that tend to be needed to set up a program. There tend to be very few needed once the program is running.

How would you block all file system access after a program is finished reading its config files, the files they include, and the shared libraries and plugins scattered around the file system? How would you turn off all network access after the listening socket is established?

With Pledge, it's trivial:

    load_config();
    load_plugins();
    open_sockets();
    pledge(NULL, NULL);
    run();
Pledge makes everything you're talking about work trivially, the only thing that's needed is for the program to opt in to security with one line of code. You don't need to micromanage the permissions and what the program is doing to drop privileges from the outside, with all of the race conditions and fragility that implies.


> There are a lot of resources that tend to be needed to set up a program. There tend to be very few needed once the program is running.

I think this is true for simple programs, and less true the more complex a program is.

What about programs that due to their nature need to frequently make new network connections, or to periodically check config files?

> How would you block all file system access after a program is finished reading its config files, the files they include, and the shared libraries and plugins scattered around the file system? How would you turn off all network access after the listening socket is established?

This could be done with seccomp, although it would be more work than it is to use pledge (although a pledge 'port' also exists), it could also be done with things like SELinux.

> Pledge makes everything you're talking about work trivially,

I was talking about more complete sandboxing, and pledge doesn't allow for that. Pledge is substantially more limited in scope.

> the only thing that's needed is for the program to opt in

That's actually a pretty big issue. If all the software you want to use is in the ports tree I guess it's fine, but what about for untrusted or complex code? Say, running an instance of Oracle, or a torrent program that by it's nature constantly needs to make network connections and write/read different files? Pledge is little help in these cases, and especially ineffective as any attempt at sandboxing such applications.


> I think this is true for simple programs, and less true the more complex a program is.

Chrome and Firefox have both been successfully pledged and unveiled. What programs more complex than them are you considering?


I gave examples at the end of my previous reply.


> Say, running an instance of Oracle, or a torrent program that by it's nature constantly needs to make network connections and write/read different files?

Yes, those seem relatively simple to pledge (source availability aside); there are a lot of permissions that they should be able to drop once they decide on, say, where the database lives or what files they're saving to. It gets even better if you're willing to privsep the torrent program, though that could take some refactoring.

Note that you can trivially do a looser sandbox around unmodified processes using exec pledges and unveil, even for proprietary code. These kinds of sandboxes need to be permissive, though, since they're not aware of program phases. So they're not nearly as tight as a sandbox written by the developer with knowledge about expected program behavior.


> It gets even better if you're willing to privsep the torrent program, though that could take some refactoring.

Now you're talking about modifying the code substantially which is out of scope of the thought experiment.

Pledge can't really help with the torrent program since it needs to make new network connections and write and read arbitrary files constantly. Unless as you say, you substantially modify the code.

If substantially modifying the code is off the table, can you give an example of how pledge can prevent an attacker leveraging an RCE in the torrent program? To what extent would they be restricted? You can't say, limit execution to only certain files/libraries or restrict the ability to delete or overwrite files, right?

> Note that you can trivially do a looser sandbox around unmodified processes using exec pledges and unveil, even for proprietary code. These kinds of sandboxes need to be permissive,

Yeah, I wouldn't consider that to be a sandbox. Imposing limitations on a program isn't by itself a sandbox, nor is every instance of doing so sandboxing.


> Pledge can't really help with the torrent program since it needs to make new network connections and write and read arbitrary files constantly. Unless as you say, you substantially modify the code.

Unveil helps with the "arbitrary files" part. There's a reason Linux is cloning that interface with landlock.


> Unveil helps with the "arbitrary files" part.

How? The torrent program needs read and write access to create whatever files it needs to, which can't be predicted ahead of time.

Imagine a worst case scenario for an RCE in a torrent program, and then what is your best case scenario for pledge and unveil being able to confine an attacker?

Because I'm pretty sure it would be a lot less restrictive than what proper sandboxing can provide.

> There's a reason Linux is cloning that interface with landlock.

Sure, because it has advantages as part of defense in depth. I never said it was useless or without value.

Besides that, from memory landlock actually preceded unveil having started development in 2016, so I don't know that it's fair to say Linux is cloning anything if they had a solution first.


> How? The torrent program needs read and write access to create whatever files it needs to, which can't be predicted ahead of time.

The same way it was handled in Firefox, for example; unveil the output dir. At least my torrent program doesn't shit files all throughout my file system. Maybe yours does?


I meant arbitrary files within the dir. Not including any other dirs/files it has to read. So basically, it's marginally more effective than a chroot, without any real granularity.

Besides, you avoided the hard question:

Imagine a worst case scenario for an RCE in a torrent program, and then what is your best case scenario for pledge and unveil being able to confine an attacker?

Because I'm pretty sure it would be a lot less restrictive than what proper sandboxing can provide.


> Imagine a worst case scenario for an RCE in a torrent program, and then what is your best case scenario for pledge and unveil being able to confine an attacker?

Preventing exfiltration of any data outside of the downloads dir. Preventing execution of new programs. Preventing inspection, tracing, and signaling of existing ones. Preventing mmap of writable executable memory for shell code. And preventing pivoting exploits using system interfaces like vulnerable sysctls, large subsystems like drm, and so on.

This much can be done without touching the program code, or even binary, at all, using unveil and exec pledges.

If you're willing to refactor the code a bit, you can also prevent new sockets from being opened and new addresses from being listened on if the code doing networking is isolated from the code doing disk I/O.


> Preventing exfiltration of any data outside of the downloads dir.

Except for all the data it needs access to. I'm not so sure torrent programs will continue to function correctly if they can't re-read their config file, in my experience most want access to a temp directory, the ability to run a few external applications like rar or zip, etc. Most torrent programs need access to more than just the directory where downloads end up when complete.

> Preventing execution of new programs.

This gets spicy if the torrent program is written in an interpreted language like python, no?

I honestly don't have much faith in how far unveil/pledge can restrict in this scenario, but as a result of this discussion I now have an OBSD box again so I can test and play around with it.

> If you're willing to refactor the code a bit

That's beyond the scope of the question. It's bad enough there is no mechanism to sandbox binaries where you don't have access to the code, talking about rewriting programs to solve the issue is some kobayashi maru nonsense.


Oh look who is here Ori!! How is gefs going and do we have in the next 9Front release?


With small projects, websites tend to be updated infrequently. On the other hand, there seem to have been 11 commits to the main project repo today: https://groups.google.com/a/hardenedbsd.org/g/src-commits-al...


These are all or mostly all just automated sync from upstream FreeBSD.


In some ways, it seems surprising that avoiding open/read/write syscalls and caching in user space is only 35% faster.


My hypothesis is that the scenarios the SQLite team has setup don’t particularly exploit the fact that the SQLite metrics become more favorable the more files are accessed.


I took a look at the code, and the file based code is basically the worst case for the file system, accessing about as many files as it can.

Additionally, it's doing a lot of unnecessary work. It's using buffered I/O to do a full file read, and throwing away the buffer immediately, which is an expensive way to add an unnecessary memcpy. It's opening by full path instead of openat, which incurs a path lookup penalty.

I think the file based code can be quite a bit faster, if you put a bit of effort in.

I think in serious use, the gap would narrow, not grow.


Git does both. When you create a commit, it stores a full (zipped) copy of the object, without any deltas.

Periodically (I believe it used to be every thousand commits, though I'm not sure what the heuristic is today), git will take the loose objects and compress them into a pack.

The full blob format is how objects are manipulated by git internally: to do anything useful, the objects need to be extracted from the blob, with all deltas applied, before anything can be done with them.

It's also worth nothing that accessing a deltified object is slow (O(n) in the number of deltas), so the length of the delta chain is limited. Because deltification is really just a compression format, it doesn't matter how or where the deltas are done -- the trivial "no deltas" option will work just fine if you want to implement that.

You can trivially verify this by creating commits and looking in '.git/objects/*' for loose objects, running 'git repack', and then looking in '.git/objects/pack' for the deltified packs.


Note: this breaks if you want to pass struct literals:

   plot((myfoo){x,y})
Macros will take the struct literals as multiple parameters:

    plot(
      .arg0=(myfoo){x,
      .arg1=y}
    )
C macros are best left unused when possible.


Nope! In general, that can be a problem, but not for this specific technique:

  $ cpp -P
  void plot(float xlo, float xhi, float ylo, float yhi, float xinc, float yinc);
  struct plot_a { float xlo, xhi, ylo, yhi, xinc, yinc; };
  static inline void plot_i(struct plot_a _a) {
      // inline thunk to allow arguments to go in registers
      plot(_a.xlo, _a.xhi, _a.ylo, _a.yho, _a.xinc, _a.yinc);
  }
  #define plot(...) (plot_i((struct plot_a){ __VA_ARGS__ }))
  
  plot((myfoo){x,y})
  plot(.yinc=(myfoo){x,y})
  ^D
  [...]
  (plot_i((struct plot_a){ (myfoo){x,y} }))
  (plot_i((struct plot_a){ .yinc=(myfoo){x,y} }))
You could argue this is excessively clever, but when you need it, you really need it, so it could deserve known idiom status in the right situation.


> You could argue this is excessively clever, but when you need it, you really need it

I've probably written millions of lines of C so far, and I don't think I have ever needed it.


The usual workarounds are a stateful API (e.g. Cairo, OpenGL, or Windows GDI), passing a structure explicitly (oodles of examples in Win32, e.g. RegisterClass or GetOpenFileName), or a twiddling an object that’s actually just a structure dressed up in accessor methods (IOpenFileDialog).

There could be reasons to use one of those still (e.g. extensibility while keeping a compatible ABI, as in setsockopt, pthread_attr_*, or arguably posix_spawnattr_*). But sometimes you really do need a finite, well-known but just plain large number of parameters that mostly have reasonable defaults. Old-style 2D APIs to draw and/or stroke a shape (or even just a rectangle) are the classic example. Plotting libraries (in all languages) are also prone to this. It does seem like these situations are mostly endemic to specific application areas—graphics of all kinds first of all—but that doesn’t make them not exist.

If you don’t want to use function-like macros for anything ever even if this particular one works, that’s a valid position. But it does work, it does solve a real problem, and it is less awkward at the use site than the alternatives.


With large numbers of parameters, it's almost always more readable to use a config struct. Especially since often, you want to collect configuration from multiple sources, and incrementally initializing a struct that way is helpful.


C macros are best left unused when possible.

Blame the committee for failing to specify an obvious and widely-demanded feature like named parameters.

The only explanation is that the people in charge of the language don't write much code.


There are a lot of syntactic sugar improvements the committee could make that they simply refuse to. Named parameters and function pointer syntax are compile-time fixes that would have zero runtime costs, yet it's 2024 and we've hardly budged from ANSI C.


Exactly. I actually think default parameters are hazardous without named-parameter support. When they added one, IMO they should have added the other as well, so that you can specify exactly which non-default parameters you're passing.


I think this is more an appeasement of the C++ committee because they don't like the order of evaluation to be ambiguous when constructors with side effects come into play. Witness how they completely gimped the primary utility of designated initializers with the requirement to have the fields in order.


Or they do and don’t want to learn stuff as “everything can be done the old way anyway”


Say that to the Linux kernel or any embedded system


I've written both kernel code and embedded systems. It's easier to maintain the code when the preprocessor is avoided.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: