More

GistNoesis · 2025-01-22T16:35:47 1737563747

>full of holes

On the geometry side from the theoretical point of view you can repair meshes, [1], by inferring a signed or unsigned distance field from your existing mesh, then you contour this distance field.

If you like the distance field approach, there are also research work [2], to estimate neural unsigned distance fields directly, (kind of a similar way as Gaussian splats).

[1] https://github.com/nzfeng/signed-heat-3d [it works but it's research code, so buggy, not user friendly, and mostly on toy problems because complexity explode very quickly when using a grid the number of cells grows as a n^3, and then they solve a sparse linear system on top (so total complexity bounded by n^6), but tolerating approximations and writing things properly practical complexity should be on par with methods like finite element method in Computational Fluid Dynamics.

[2] https://virtualhumans.mpi-inf.mpg.de/ndf/

GistNoesis · 2025-01-16T09:48:12 1737020892

You can have look at Keenan Crane's repulsive curves https://www.cs.cmu.edu/~kmcrane/Projects/RepulsiveCurves/ind... (He also has various other interesting geometry projects if you have some time to wander https://www.cs.cmu.edu/~kmcrane/).

GistNoesis · 2025-01-07T09:14:07 1736241247

Playing chess with strings to build datasets for text generation.

I want to share this quick win.

The other day I was asking myself some theoretical chess questions, and wanted to answer them programmatically and needed to build some custom chess datasets for that.

I needed the chess basic routines, like getting the next legal moves, displaying the board, and some rudimentary position scores. I contemplated writing from scratch. I contemplated using some library. But instead I settled for a higher level choice : interfacing with Stockfish game engine over a text interface.

There is something called UCI, which stands for Universal Chess Interface, ( https://official-stockfish.github.io/docs/stockfish-wiki/UCI... ), to use it you start a new stockfish process and write and read from the standard inputs.

So instead of writing bug prone routines to check the validity of board positions, it turn the basic routines into a simple wrapper of parsing task to read and write UCI protocol to use a battle tested engine.

A chess position state is simply defined as a vector<string> representing the sequence of moves. Moves are string in long algebraic notation.

This architectural decision allows for very quick (LLM-powered development) prototyping.

namespace bp = boost::process; bp::ipstream is; bp::opstream os;

bp::child c("../Stockfish/src/stockfish", bp::std_in < os, bp::std_out > is);

void displayBoard( const vector<string> & moveSeq, bp::ipstream& is, bp::opstream& os );

void getLegalMoves( const vector<string> & moveSeq, vector<string>& legalMoves, bp::ipstream& is, bp::opstream& os );

void getTopKMoveAndScoreAtDepthFromPosition(const vector<string> & moveSeq,int K, int D, vector<pair<string,int> >& topkmoves, bp::ipstream& is, bp::opstream& os , bool debug = false);

void displayBoard( const vector<string> & moveSeq, bp::ipstream& is, bp::opstream& os ) {

os << "position startpos moves";

for( int i = 0 ; i < moveSeq.size() ; i++)

{

  os << " " << moveSeq[i];

}

os << endl;

os << "d" << endl;

os << "isready" << endl;

string line;

while (getline(is, line)) { if (!line.compare(0, 7, "readyok")) break; cout << line << endl; }

}

You get the gist...

GistNoesis · 2024-12-24T06:45:17 1735022717

And to protect you from it, you can use the following lucky clover charm (polar plot r=cos(2theta) ): https://www.wolframalpha.com/input?i=+plot+r%3Dcos%282theta%... whose perimeter can also define a constant 4*E(-3) ~ 4 * 2.4221

https://www.wolframalpha.com/input?i=plot+r%3Dcos%282theta%2...

GistNoesis · 2024-12-16T12:42:18 1734352938

Here OpenERV use a push-pull ventilation design where air direction is reversed every 30s. This allows energy recuperation and dispense connecting the inlet and the outlet to each other, as each ventilation port alternate role simultaneously.

The alternative design is a counter-flow heat exchanger. Using 3d printing and gyroids it seems possible to build quite compact ones. (metal 3d printed heat exchanger for helicopter https://www.youtube.com/watch?v=1qifd3yn9S0 )

3d-printing a counter-flowing heat-exchanger seems interesting but maybe there are some molding issues that need to be taken care of (maybe HEPA filters on the inside in/outlet are sufficient).

The main advantage of the heat-exchanger solution is that you won't need specific electronic control and can reuse the standard fans for controlled ventilation, but there is more thermally isolated piping required (and the pipes are quite big (~10cm diameter) because they need to move a lot of air even if the fans are weak).

The push-pull system is harder to DIY because most of the off-the shelf fans can't be reversed easily (and 3d printed fans are noisy and inefficient).

GistNoesis · 2024-12-16T09:22:40 1734340960

I find the idea of reversing the air flow direction every 30s simpler to understand than two counter-flowing pipe side by side.

Imagine a pipe filled with 3 metallic grid sections (such that the air temperature in the section will equalize with the metal temperature) separated by plastic grids (such that the heat isn't conducted through the metal), and you push air alternatively from one hot side at 20°C to a cold side at 0°C for 30s and in the other direction for 30s.

For symmetry reason, the pipe will passively (we don't count the energy required to move the air) have a gradient of temperature from the hot side to the cold side. The first section will be ~15°C, the second ~ 10°C, the third ~5°C. (Each section temperature is the temporal average of the temperature of the air flowing from previous sections : so because air switch direction, it means it's the average of left and right sections.)

From the point of view of the house, you only lose energy from the first section of the pipe which will be more like 15°C rather than 0°C.

GistNoesis · 2024-12-10T15:56:50 1733846210

Here is a visualization I made of the tree calculus rules as a pattern matching on binary trees https://github.com/unrealwill/tree-calculus-visualizer

GistNoesis · 2024-12-10T09:16:57 1733822217

I think the main problem people encounter understanding this thing is just parsing the expression tree for the rules.

It isn't stated but all trees in question are binary trees.

Even though the expression tree syntax is really easy E := t | E E

The mental gymnastic needed to visualize that the string

not = t ( t (t t) (t t t)) t correspond to the only tree drawn in https://treecalcul.us/specification/

is quite overwhelming.

The left-associative notation to remove unneeded parenthesis makes it even harder.

It could be explained so much better if the author made the 10 pictures corresponding to the transformation rules of the trees. Eventually highlighting the subtrees a, b, c, in the corresponding color before after.

Brains are used to pattern matching images but not abstractly defined syntax unless you have been trained in grammar theory.

GistNoesis · 2024-12-09T18:49:20 1733770160

(2020) https://arxiv.org/abs/2010.11929 : an image is worth 16x16 words transformers for image recognition at scale

(2021) https://arxiv.org/abs/2103.13915 : An Image is Worth 16x16 Words, What is a Video Worth?

(2024) https://arxiv.org/abs/2406.07550 : An Image is Worth 32 Tokens for Reconstruction and Generation

dartos · 2024-12-09T18:59:43 1733770783

Those are indeed 3 papers.

GistNoesis · 2024-12-09T19:53:43 1733774023

Yes in a nutshell they explain that you can express a picture or a video with relatively few discrete information.

First paper is the most famous and prompted a lot of research to using text generation tools in the image generation domain : 256 "words" for an image, Second paper is 24 reference image per minutes of video, Third paper is a refinement of the first saying you only need 32 "tokens". I'll let you multiply the numbers.

In kind of the same way as a who's who game, where you can identify any human on earth with ~32bits of information.

The corollary being that contrary to what parent is telling there is no theoretical obstacle to obtaining a video from a textual description.

dartos · 2024-12-09T20:19:30 1733775570

I think something is getting lost in translation.

These papers, from my quick skim (tho I did read the first one fully years ago,) seem to show that some images and to an extent video can be generated from discrete tokens, but does not show that exact images nor that any image can be.

For instance, what combination of tokens must I put in to get _exactly_ Mona Lisa or starry night? (Tho these might be very well represented in the data set. Maybe a lesser known image would be a better example)

As I understand, OC was saying that they can’t produce what they want with any degree of precision since there’s no way to encode that information in discrete tokens.

GistNoesis · 2024-12-09T20:46:35 1733777195

If you want to know what tokens you want to obtain _exactly_ Mona Lisa, or any other image, you take the image and put it through your image tokenizer aka encode it, and if you have the sequence of token you can decode it to an image.

VQ-VAE (Vector Quantised-Variational AutoEncoder), (2017) https://arxiv.org/abs/1711.00937

The whole encoding-decoding process is reversible, and you only lose some imperceptible "details", the process can be either trained with a L2Loss, or a perceptual loss depending what you value.

The point being that images which occurs naturally are not really information rich and can be compressed a lot by neural networks of a few GB that have seen billions of pictures. With that strong prior, aka common knowledge, we can indeed paint with words.

dartos · 2024-12-09T20:56:32 1733777792

Maybe I’m not able to articulate my thought well enough.

Taking an existing image and reversing the process to get the tokens that led to it then redoing that doesn’t seem the same as inserting token to get a precise novel image.

Especially since, as you said, we’d lose some details, it suggests that not all images can be perfectly described and recreated.

I suppose I’ll need to play around with some of those techniques.

GistNoesis · 2024-12-09T22:59:12 1733785152

After encoding the models are usually cascaded either with a LLM or a diffusion model.

Natural Image-> Sequence of token, but not all possible sequence of token will be reachable. Like plenty of letters put together form non-sensical words.

Sequence of token -> Natural Image : if the initial sequence of token is unsensical the Natural image will be garbage.

So usually you then modelize the sequence of token so that it produce sensical sequences of token, like you would with a LLM, and you use the LLM to generate more tokens. It also gives you a natural interface to control the generation of token. You can express with words what modifications to the image you should do. Which will allow you to find the golden sequence of token which correspond to the mona-lisa by dialoguing with the LLM, which has been trained to translate from english to visual-word sequence.

Alternatively instead of a LLM you can use a diffusion model, the visual words usually are continuous, but you can displace them iteratively with text using things like "controlnet" (stable diffusion).

GistNoesis · 2024-12-06T08:56:58 1733475418

>But it can’t do the essential work of complexity management

I think this is the "closing the loop" ( https://en.wikipedia.org/wiki/Control_loop#Open-loop_and_clo... ) moment for coding AI.

All pieces are there, we just need to decide to do it. Today's AI are able to produce an increasing tangled mess of code. But it's also able to reorganize the code. It's also capable of writing test code, and assess the quality of the code. It's also capable to make architectural decision.

Today's AI code, is more like a Frankenstein's composition. But with the right prompt OODA loop and quality assessment rigor, it boils down to just having to sort and clean the junk pile faster than you produce it.

Once you have a coherent unified codebase, things get fast quickly, capabilities grows exponentially with the number of lines of code. Think of things like Julia Language or Wolfram Language.

Once you have a well written library or package, you are more than 95% there and you almost don't need AI to do the things you want to do.

qazpot · 2024-12-06T10:04:24 1733479464

> I think this is the "closing the loop" ( https://en.wikipedia.org/wiki/Control_loop#Open-loop_and_clo... ) moment for coding AI.

> All pieces are there, we just need to decide to do it.

Another silver bullet.

GistNoesis · 2024-12-06T10:27:25 1733480845

There is a huge gap in performance and reliability in control systems between open-loop and closed-loop.

You've got to bite the bullet at one point and make the transition from open-loop to closed-loop. There is a compute cost associated to it, and there is also a tuning cost, so it's not all silver lining.

rafaelmn · 2024-12-06T10:50:25 1733482225

>Once you have a coherent unified codebase, things get fast quickly, capabilities grows exponentially with the number of lines of code. Think of things like Julia Language or Wolfram Language.

>Once you have a well written library or package, you are more than 95% there and you almost don't need AI to do the things you want to do.

That's an idealistic view. Packages are leaky abstractions that make assumptions for you. Even stuff like base language libraries - there are plenty of scenarios where people avoid them - they work for 9x% of cases but there are cases where they don't - and this is the most fundamental primitive in a language. Even languages are leaky abstractions with their own assumptions and implications.

And these are the abstractions we had decades of experience writing, across the entire industry, and for fairly fundamental stuff. Expecting that level of quality in higher level layers is just not realistic.

I mean just go look at ERP software (vomit warning) - and that industry is worth billions.