Hacker Newsnew | past | comments | ask | show | jobs | submit | pilooch's commentslogin

Losing the mental map is the number one issue for me. I wonder if there could be a way to keep track of it, even at a high level. Keeping the ability to dig in is crucial.

Spend time reviewing outputs like a tech lead does when managing multiple developers. That's the upgrade you hust got in your career, you are now bound to how many "team members" you can manage at a single time. I'm grateful to live in such a time.

The code is the mental map. Orchestra conductors read and follow the music sheet as well. They don't let random people comes in and mess with. Neither do film directors with their scripts and their plans.

Hello, very interested in the scrollback! I've used mosh for 10+ years and it still runs my 100+ opened terminals to this day ! Would love to try your alternative

Awesome! I’ll post it to HN once I have the repo up and the code is in a halfway decent state. Look forward to your feedback!

Exactly, for real time applications VTO, simulators,...), i.e. 60+FPS, diffusion can't be used efficiently. The gap is still there afaik. One lead has been to distill DPM into GANs, not sure this works for GANs that are small enough for real time.

I mean it is really hard to push diffusion models down in size so that just makes the speed part hard. I'm not sure diffusion can ever truly win in the speed race, at least without additional context like breadth of generation. But isn't that the thing? The best model is only the best in a given context?

I think the weirdest thing in ML has always been acting like there's an objectively better model and no context is needed.


Is this a bit similar to what tensorrt does, but in a more opened manner ?

Because they'd never hire, but subcontract down to the bone.

Plenty of large companies only hire union contractors for electrical, mechanical, and plumbing systems (aka skilled trades, or trades that can damage a building if installation is done poorly)

I'd be interested in what implementation of D3PM was used (and failed). Diffusion model are more data efficient than their AR LLM counterpart but les compute efficient at training time, so it'd be interesting to know whether with more time.to.converge the diffusion approach does succeed. I guess I'll try :)


True but modern models such as gemma3 pan& scan and other tricks such as training from multiple resolutions do alleviate these issues.

An interesting property of the gemma3 family is that increasing the input image siwmze actually does not increase processing memory requirements, because a second stage encoder actually compresses it into fixed size tokens. Very neat in practice.


Some colleagues and myself did implemented exactly this six months ago for a French gov agency.

It's open source and available here: https://github.com/jolibrain/colette

It's not our primary business so it's just lying there and we don't advertise much, but it works, somehow and with some tweaks to get it really efficient.

The true genius though is that the whole thing can be made fully differentiable, unlocking the ability to finetune the viz rag on targeted datasets.

The layout model can also be customized for fine grained document understanding.


You don't have a license in your repository top-level. That means that nobody who takes licensing at all seriously can use your stuff, even just for reference.


Good catch, will add it tomorrow. License is Apache2.


They do have: https://github.com/jolibrain/colette/blob/main/pyproject.tom...

I agree it's better to have the full licence at top level, but is there a legal reason why this would be inadequate?


Standard practice now is to just have an LLM read the whole repo and write a new original version in a different language. It’s code laundering.


Great, thanks for sharing your code. Could you please add a license so I and others can understand if we're able to use it?


Yeah the fine tuning is definitely the best part.

Often, the blocker becomes high quality eval sets (which I guess always is the blocker).


AlphaEvolve and similar systems based on map-elites + DL/LLM + RL appears to be one of the promising paths.

Setting up the map-elites dimensions may still be problem-specific but this could be learnt unsupervisedly, at least partially.

The way I see LLMs is as a search-spqce within tokens that manipulate broad concepts within a complex and not so smooth manifold. These concepts can be refined within other spaces (pixel -space, physical spaces, ...)


This model is fully compatible with anything previously done with gemma3. Just passed it to one of my vlm fine-tuning scripts and it started without issues (hf transformer code). On a single GPU with Lora the E4B model takes 18Gb of VRAM in batch size 1 where gemma-4B was 21Gb. Nice one from deepmind, the gemma3 family tops the open weights VLLMs.


Fix: it's the E2B


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: