Its freeware under an open source license. Really misleading.
It looks like something you should stay away from unless you need it REALLY badly. Its a proprietary product with unknown pricing and no indication of what their plans are.
Does the fact that the binary is BSD licensed allow reverse-engineering?
High quality code isn't just code that performs well when executed, but also is readable, understandable and maintainable. You can't judge code quality by looking at the compiled result, just because it works well.
> One could also say that quality is related to the functional output.
Right, I said nothing that contradicts that ("High quality code isn't just code that performs well when executed, but also ..."). High quality functional output is a necessary requirement, but it isn't sufficient to determine if code is high quality.
It's not really subjective if you're at all reasonable about it.
Imagine writing a very good program, running it through an obfuscator, and throwing away the original code. Is the obfuscated code "high quality code" now, because the output of the compilation still works as before?
Again it depends what you mean by "high quality code".
Do you mean how well it was written, or do you mean how well it performs? Or do both matter? Equally, or one more/less than the other?
It probably depends on whether you're the developer taking over the codebase, or the customer running the code in production..
Take video games.. A lot of it is messy spaghetti C++ code, not modular or well structured, full of hacks and manual optimizations, to give the best possible performance on available hardware.
It might be impossible to parse or maintain, but it does the job about as well as possible, which is really all that matters to the end user. I would call that high quality code.
It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.
No. The censoring has already been done systematically by tech corporations at the behest of political agents that have power over them.
You only have to look at opinions about covid policies to realize you won't get a good representation because opinions will be deemed "misinformation" by the powers that are vested in that being the case. Increasingly, criticism of government policy can be conflated with some sort of crime that is absolutely up for interpretation to some government institution so people self censor, companies censor just in casa and the Overton window gets narrower.
LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.
> LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.
I don't think this is a description of LLM censorship though, especially in light of the fact that many LLMs are fine-tuned for the explicit purpose of censoring responses otherwise generatable by the model, Contrasting uncensored models with censored ones yields objectively uncensored results.
For code injection into applications that don't load third-party DLLs as plugins, see, e.g., Microsoft's (unsupported) toolkit for runtime API interception:
You install tortoiseSVN or something similar, look at explorer.exe process or any process that use a standard "Open File" widget, and you will see some dll from the utility loaded by the process. (Easy to see with process explorer from sysinternals)
I think tortoiseSVN and consorts are "just" a shell extension, though, which is an officially supported concept, even if that means that potentially any random software using the standard file dialogues ends up loading your DLL, too.
The some of the brightest minds of our generation has been wasted on optimizing how to deliver personalized advertisements in front of as many eyeballs as possible, and the true tragedy is that they didn't have a more efficient language than JavaScript to do it.
It really doesn't matter. Advertisement is the debt collection of software. Its this shit nobody wants to do. As a result these people tend to make more, be pampered, and the output is generally still garbage and the developers are still depressed. Compare that with gaming where the developers are worked to death and generally underpaid, but they love what they do and have a great time building amazing things.
It’s funny you mention this, because I just visited space.com for the black hole article (no adblocker on this device) and I could feel the device becoming warm and literally lost 3% on my battery level.
Most programs require a setup phase, where they want a great deal of access to the environment to set up the resources they use, and a steady state phase, where they need very little access beyond pre-opened file descriptors.
An externally imposed sandboxing feature can be useful for namespacing, but is necessarily less restrictive than pledge and unveil. For example, in steady state on OpenBSD, most programs can't even their own configs.
> An externally imposed sandboxing feature can be useful for namespacing, but is necessarily less restrictive than pledge and unveil.
An externally imposed sandboxing feature isn't necessarily less restrictive than pledge and unveil at all, although I'm curious why you think that is the case.
To say nothing of the fact that pledge and unveil are wholly dependent on developer opt-in.
A good sandboxing solution should be robust and not require the cooperation of the programs that require sandboxing.
Those are two different kinds of sandboxing. One protects application from itself - "here is what I use, if I try anything else, then it's a bug".
Sandboxing you're talking about protects system from the application. You really need both.
re: restrictiveness
With external sandboxing, you need to restrict to a common denominator of all application states you see youself observing that application. Internal sandbox can adjust itself as it goes.
> An externally imposed sandboxing feature isn't necessarily less restrictive than pledge and unveil at all, although I'm curious why you think that is the case.
With pledge, you can read config files, open a file handle to your logs, and then completely drop the ability to open() files at all for the remaining lifetime of the program. How would you do that from outside the program?
I guess technically you could write a dynamic policy that... I guess you'd give it a list of files that the program can access exactly once? But that seems difficult and brittle. Is anyone actually doing that?
No, I suppose not, but then, is there really an advantage to doing so?
I'm much more concerned with blocking write and execute access than I am about a potential hacker being able to read the config files of the program they leveraged to get a shell.
I think it's a good approach and part of defense in depth, but if we're comparing approaches I'll tale the former every time.
> No, I suppose not, but then, is there really an advantage to doing so?
A massive one; not reading the config is just a consequence of dropping all file system access once you're done with the initial program setup. You can also set up listening sockets, and then stop the program from making any new network connections, even after a compromise. And so on, with most resources. There are a lot of resources that tend to be needed to set up a program. There tend to be very few needed once the program is running.
How would you block all file system access after a program is finished reading its config files, the files they include, and the shared libraries and plugins scattered around the file system? How would you turn off all network access after the listening socket is established?
Pledge makes everything you're talking about work trivially, the only thing that's needed is for the program to opt in to security with one line of code. You don't need to micromanage the permissions and what the program is doing to drop privileges from the outside, with all of the race conditions and fragility that implies.
> There are a lot of resources that tend to be needed to set up a program. There tend to be very few needed once the program is running.
I think this is true for simple programs, and less true the more complex a program is.
What about programs that due to their nature need to frequently make new network connections, or to periodically check config files?
> How would you block all file system access after a program is finished reading its config files, the files they include, and the shared libraries and plugins scattered around the file system? How would you turn off all network access after the listening socket is established?
This could be done with seccomp, although it would be more work than it is to use pledge (although a pledge 'port' also exists), it could also be done with things like SELinux.
> Pledge makes everything you're talking about work trivially,
I was talking about more complete sandboxing, and pledge doesn't allow for that. Pledge is substantially more limited in scope.
> the only thing that's needed is for the program to opt in
That's actually a pretty big issue. If all the software you want to use is in the ports tree I guess it's fine, but what about for untrusted or complex code? Say, running an instance of Oracle, or a torrent program that by it's nature constantly needs to make network connections and write/read different files? Pledge is little help in these cases, and especially ineffective as any attempt at sandboxing such applications.
> Say, running an instance of Oracle, or a torrent program that by it's nature constantly needs to make network connections and write/read different files?
Yes, those seem relatively simple to pledge (source availability aside); there are a lot of permissions that they should be able to drop once they decide on, say, where the database lives or what files they're saving to. It gets even better if you're willing to privsep the torrent program, though that could take some refactoring.
Note that you can trivially do a looser sandbox around unmodified processes using exec pledges and unveil, even for proprietary code. These kinds of sandboxes need to be permissive, though, since they're not aware of program phases. So they're not nearly as tight as a sandbox written by the developer with knowledge about expected program behavior.
> It gets even better if you're willing to privsep the torrent program, though that could take some refactoring.
Now you're talking about modifying the code substantially which is out of scope of the thought experiment.
Pledge can't really help with the torrent program since it needs to make new network connections and write and read arbitrary files constantly. Unless as you say, you substantially modify the code.
If substantially modifying the code is off the table, can you give an example of how pledge can prevent an attacker leveraging an RCE in the torrent program? To what extent would they be restricted? You can't say, limit execution to only certain files/libraries or restrict the ability to delete or overwrite files, right?
> Note that you can trivially do a looser sandbox around unmodified processes using exec pledges and unveil, even for proprietary code. These kinds of sandboxes need to be permissive,
Yeah, I wouldn't consider that to be a sandbox. Imposing limitations on a program isn't by itself a sandbox, nor is every instance of doing so sandboxing.
> Pledge can't really help with the torrent program since it needs to make new network connections and write and read arbitrary files constantly. Unless as you say, you substantially modify the code.
Unveil helps with the "arbitrary files" part. There's a reason Linux is cloning that interface with landlock.
How? The torrent program needs read and write access to create whatever files it needs to, which can't be predicted ahead of time.
Imagine a worst case scenario for an RCE in a torrent program, and then what is your best case scenario for pledge and unveil being able to confine an attacker?
Because I'm pretty sure it would be a lot less restrictive than what proper sandboxing can provide.
> There's a reason Linux is cloning that interface with landlock.
Sure, because it has advantages as part of defense in depth. I never said it was useless or without value.
Besides that, from memory landlock actually preceded unveil having started development in 2016, so I don't know that it's fair to say Linux is cloning anything if they had a solution first.
> How? The torrent program needs read and write access to create whatever files it needs to, which can't be predicted ahead of time.
The same way it was handled in Firefox, for example; unveil the output dir. At least my torrent program doesn't shit files all throughout my file system. Maybe yours does?
I meant arbitrary files within the dir. Not including any other dirs/files it has to read. So basically, it's marginally more effective than a chroot, without any real granularity.
Besides, you avoided the hard question:
Imagine a worst case scenario for an RCE in a torrent program, and then what is your best case scenario for pledge and unveil being able to confine an attacker?
Because I'm pretty sure it would be a lot less restrictive than what proper sandboxing can provide.
> Imagine a worst case scenario for an RCE in a torrent program, and then what is your best case scenario for pledge and unveil being able to confine an attacker?
Preventing exfiltration of any data outside of the downloads dir. Preventing execution of new programs. Preventing inspection, tracing, and signaling of existing ones. Preventing mmap of writable executable memory for shell code. And preventing pivoting exploits using system interfaces like vulnerable sysctls, large subsystems like drm, and so on.
This much can be done without touching the program code, or even binary, at all, using unveil and exec pledges.
If you're willing to refactor the code a bit, you can also prevent new sockets from being opened and new addresses from being listened on if the code doing networking is isolated from the code doing disk I/O.
> Preventing exfiltration of any data outside of the downloads dir.
Except for all the data it needs access to. I'm not so sure torrent programs will continue to function correctly if they can't re-read their config file, in my experience most want access to a temp directory, the ability to run a few external applications like rar or zip, etc. Most torrent programs need access to more than just the directory where downloads end up when complete.
> Preventing execution of new programs.
This gets spicy if the torrent program is written in an interpreted language like python, no?
I honestly don't have much faith in how far unveil/pledge can restrict in this scenario, but as a result of this discussion I now have an OBSD box again so I can test and play around with it.
> If you're willing to refactor the code a bit
That's beyond the scope of the question. It's bad enough there is no mechanism to sandbox binaries where you don't have access to the code, talking about rewriting programs to solve the issue is some kobayashi maru nonsense.
My hypothesis is that the scenarios the SQLite team has setup don’t particularly exploit the fact that the SQLite metrics become more favorable the more files are accessed.
I took a look at the code, and the file based code is basically the worst case for the file system, accessing about as many files as it can.
Additionally, it's doing a lot of unnecessary work. It's using buffered I/O to do a full file read, and throwing away the buffer immediately, which is an expensive way to add an unnecessary memcpy. It's opening by full path instead of openat, which incurs a path lookup penalty.
I think the file based code can be quite a bit faster, if you put a bit of effort in.
I think in serious use, the gap would narrow, not grow.
Git does both. When you create a commit, it stores a full (zipped) copy of the object, without any deltas.
Periodically (I believe it used to be every thousand commits, though I'm not sure what the heuristic is today), git will take the loose objects and compress them into a pack.
The full blob format is how objects are manipulated by git internally: to do anything useful, the objects need to be extracted from the blob, with all deltas applied, before anything can be done with them.
It's also worth nothing that accessing a deltified object is slow (O(n) in the number of deltas), so the length of the delta chain is limited. Because deltification is really just a compression format, it doesn't matter how or where the deltas are done -- the trivial "no deltas" option will work just fine if you want to implement that.
You can trivially verify this by creating commits and looking in '.git/objects/*' for loose objects, running 'git repack', and then looking in '.git/objects/pack' for the deltified packs.
The usual workarounds are a stateful API (e.g. Cairo, OpenGL, or Windows GDI), passing a structure explicitly (oodles of examples in Win32, e.g. RegisterClass or GetOpenFileName), or a twiddling an object that’s actually just a structure dressed up in accessor methods (IOpenFileDialog).
There could be reasons to use one of those still (e.g. extensibility while keeping a compatible ABI, as in setsockopt, pthread_attr_*, or arguably posix_spawnattr_*). But sometimes you really do need a finite, well-known but just plain large number of parameters that mostly have reasonable defaults. Old-style 2D APIs to draw and/or stroke a shape (or even just a rectangle) are the classic example. Plotting libraries (in all languages) are also prone to this. It does seem like these situations are mostly endemic to specific application areas—graphics of all kinds first of all—but that doesn’t make them not exist.
If you don’t want to use function-like macros for anything ever even if this particular one works, that’s a valid position. But it does work, it does solve a real problem, and it is less awkward at the use site than the alternatives.
With large numbers of parameters, it's almost always more readable to use a config struct. Especially since often, you want to collect configuration from multiple sources, and incrementally initializing a struct that way is helpful.
There are a lot of syntactic sugar improvements the committee could make that they simply refuse to. Named parameters and function pointer syntax are compile-time fixes that would have zero runtime costs, yet it's 2024 and we've hardly budged from ANSI C.
Exactly. I actually think default parameters are hazardous without named-parameter support. When they added one, IMO they should have added the other as well, so that you can specify exactly which non-default parameters you're passing.
I think this is more an appeasement of the C++ committee because they don't like the order of evaluation to be ambiguous when constructors with side effects come into play. Witness how they completely gimped the primary utility of designated initializers with the requirement to have the fields in order.