More

meisel · 2024-12-09T01:19:31 1733707171

It could be improved even more, performance wise. The potentially expensive modulo could be avoided entirely with an if statement. Or, only use powers of 2 for the capacity, and then you can also use bit wise ops

meisel · 2024-12-03T17:30:02 1733247002

I refuse to believe they didn’t notice a key difference in the phones - the iPhone 16 will feel much, much snappier. Games aside, just opening any old app will be much faster. I notice this every time I upgrade, even when it’s just going up 3 versions, not 8.

meisel · 2024-12-01T18:11:05 1733076665

This is what programming languages looked like to me before I learned programming

meisel · 2024-10-18T04:41:22 1729226482

When I start a lot of projects and only get halfway through them, I feel overwhelmed and frustrated by all the loose ends. I like to see projects through to release if I think they’re worth it, but that also requires a bit of self-imposed discipline. I don’t think there’s any shame in having a lot of half-finished projects, but I find more happiness in pushing myself to finish at least some of them.

TeMPOraL · 2024-10-18T06:35:13 1729233313

> I feel overwhelmed and frustrated by all the loose ends.

I think the article is trying to help with exactly that: it's telling people to give themselves permission to abandon half-finished projects without guilt or frustration - so that you won't be immobilized by expecting those feelings every time you think about dabbling in something new.

everybodyknows · 2024-10-18T23:36:39 1729294599

I ask myself the question: Could this be of constructive use to other people? If so, there's a moral obligation to keep at it, despite some burden of drudgery to be carried.

meisel · 2024-09-26T00:33:10 1727310790

How is there any discrepancy in accuracy? Isn’t it just a matter of following the spec?

woodruffw · 2024-09-26T04:15:24 1727324124

The spec is very large, not particularly well written, and is not “total” (in the sense that AMD64 and IA32e and other x86-64 flavors are all subtly different). There are a lot of ways to get it wrong; even XED (the reference decoder from Intel) has bugs.

If I remember correct, the Intel SDM alone is over 3000 pages long.

saagarjha · 2024-09-26T02:47:07 1727318827

lol, no. For one Capstone has a lot of bugs (it uses some old version of LLVM as its base) but the whole question of how to decode things is complicated because there are a lot of pitfalls and inconsistencies that different disassemblers handle differently. And what the hardware does is a different question entirely: it may not match the spec, or even other processors with the same ISA.

xvilka · 2024-09-26T02:57:26 1727319446

It just updated to the nearly latest LLVM, so that argument is void: https://github.com/capstone-engine/capstone/blob/next/docs/c...

saagarjha · 2024-09-26T04:04:29 1727323469

I'll believe it when I see it. If I can go a few years without wasting time during a CTF because of an incorrect decode I'll change my tune.

woodruffw · 2024-09-26T04:16:41 1727324201

This has been my experience as well. I’ve had to rip Capstone out of more research projects than I care to admit.

meisel · 2024-09-24T19:09:42 1727204982

Moving Linux and its associated projects to GitHub would definitely help. I don't buy the arguments that GitHub isn't suitable for it, e.g. because it doesn't allow each sub-project to have its own area. Things like tags help a lot to separate issues and PRs based on which sub-project they're a part of.

JonChesterfield · 2024-09-24T21:53:02 1727214782

Totally. Microsoft is a surely benevolent entity to host the Linux project. They've proven to be excellent stewards of open source in their copilot project.

meisel · on Sept 21, 2024

I’d say address sanitizer is a better starting point, and likely to show memory issues faster than this

tempodox · on Sept 22, 2024

Yep, ASAN to find use-after-free, and valgrind memcheck to find forgotten-to-free.

manwe150 · on Sept 22, 2024

ASAN also checks for memory leaks like valgrind, the main difference with the tools is whether you can recompile all of libraries to get the compiler support for detection or whether binary instrumentation is better (https://github.com/google/sanitizers/wiki/AddressSanitizerLe...)

tempodox · 2024-09-22T19:51:06 1727034666

Thanks, I didn't know the `ASAN_OPTIONS` part for macOS.

meisel · on Sept 20, 2024

When building something that I want to run on both CPU and GPU, depending, I’ve found it much easier to use PyTorch than some combination of NumPy and CuPy. I don’t have to fiddle around with some global replacing of numpy.* with cupy.*, and PyTorch has very nearly all the functions that those libraries have.

setopt · on Sept 20, 2024

Interesting. Any links to examples or docs on how to use PyTorch as a general linear algebra library for this purpose? Like a “SciPy to PyTorch” transition guide if I want to do the same?

ttyprintk · on Sept 20, 2024

Mentioned above:

https://data-apis.org/array-api-compat/

meisel · on Sept 20, 2024

It's typically just importing torch and s/np/torch, not too different from NumPy -> CuPy. Try it in your own code and see!

meisel · on July 20, 2024

> The case in point is that most LLMs, including GPT-4o, cannot tell whether 9.11 or 9.8 is bigger!

Wrong. GPT-4o gives me the correct answer to this question, 9.8.

orangecat · on July 20, 2024

Even GPT-3.5 was correct, and so was Claude Sonnet 3.5. Haiku usually gets it wrong.

meisel · on July 14, 2024

This looks similar to Triton, I wonder what it does differently. But in any case, for any of these libraries, it would be awesome if it could output object files from this, with PTX or SASS code. Then it can be linked into a binary instead of needing a Python environment to run it.

erwincoumans · on July 14, 2024

Warp outputs its intermediate GPU CUDA or CPU C++ files that can be compiled and linked into a binary. Here is an old example of mine calling Warp kernels from C++: https://github.com/erwincoumans/warp_cpp

meisel · on July 14, 2024

Neat!

xpe · on July 14, 2024

Triton offers broad GPU support for writing high throughput kernels. Some higher level ML/AI tools, such as PyTorch, can use Triton internally. I don’t know off the top of my head if any simulation libraries do.

In what sense do you think they are similar?