It could be improved even more, performance wise. The potentially expensive modulo could be avoided entirely with an if statement. Or, only use powers of 2 for the capacity, and then you can also use bit wise ops
I refuse to believe they didn’t notice a key difference in the phones - the iPhone 16 will feel much, much snappier. Games aside, just opening any old app will be much faster. I notice this every time I upgrade, even when it’s just going up 3 versions, not 8.
When I start a lot of projects and only get halfway through them, I feel overwhelmed and frustrated by all the loose ends. I like to see projects through to release if I think they’re worth it, but that also requires a bit of self-imposed discipline. I don’t think there’s any shame in having a lot of half-finished projects, but I find more happiness in pushing myself to finish at least some of them.
> I feel overwhelmed and frustrated by all the loose ends.
I think the article is trying to help with exactly that: it's telling people to give themselves permission to abandon half-finished projects without guilt or frustration - so that you won't be immobilized by expecting those feelings every time you think about dabbling in something new.
I ask myself the question: Could this be of constructive use to other people? If so, there's a moral obligation to keep at it, despite some burden of drudgery to be carried.
The spec is very large, not particularly well written, and is not “total” (in the sense that AMD64 and IA32e and other x86-64 flavors are all subtly different). There are a lot of ways to get it wrong; even XED (the reference decoder from Intel) has bugs.
If I remember correct, the Intel SDM alone is over 3000 pages long.
lol, no. For one Capstone has a lot of bugs (it uses some old version of LLVM as its base) but the whole question of how to decode things is complicated because there are a lot of pitfalls and inconsistencies that different disassemblers handle differently. And what the hardware does is a different question entirely: it may not match the spec, or even other processors with the same ISA.
Moving Linux and its associated projects to GitHub would definitely help. I don't buy the arguments that GitHub isn't suitable for it, e.g. because it doesn't allow each sub-project to have its own area. Things like tags help a lot to separate issues and PRs based on which sub-project they're a part of.
Totally. Microsoft is a surely benevolent entity to host the Linux project. They've proven to be excellent stewards of open source in their copilot project.
ASAN also checks for memory leaks like valgrind, the main difference with the tools is whether you can recompile all of libraries to get the compiler support for detection or whether binary instrumentation is better (https://github.com/google/sanitizers/wiki/AddressSanitizerLe...)
When building something that I want to run on both CPU and GPU, depending, I’ve found it much easier to use PyTorch than some combination of NumPy and CuPy. I don’t have to fiddle around with some global replacing of numpy.* with cupy.*, and PyTorch has very nearly all the functions that those libraries have.
Interesting. Any links to examples or docs on how to use PyTorch as a general linear algebra library for this purpose? Like a “SciPy to PyTorch” transition guide if I want to do the same?
This looks similar to Triton, I wonder what it does differently. But in any case, for any of these libraries, it would be awesome if it could output object files from this, with PTX or SASS code. Then it can be linked into a binary instead of needing a Python environment to run it.
Warp outputs its intermediate GPU CUDA or CPU C++ files that can be compiled and linked into a binary. Here is an old example of mine calling Warp kernels from C++: https://github.com/erwincoumans/warp_cpp
Triton offers broad GPU support for writing high throughput kernels. Some higher level ML/AI tools, such as PyTorch, can use Triton internally. I don’t know off the top of my head if any simulation libraries do.