Just curious, how does program analysis on real-world programs exactly circumven...

nextos · 2024-03-03T01:06:16 1709427976

The main strategy is to build sound but imperfect analyzers, i.e. analyzers that never raise false negatives but that may raise some false positives. See §1.6 in [1], a simplified version of the classic Principles of Program Analysis. Good analyzers are practical, rarely raising false positives for well-written programs. The Astreé analyzer reported zero false positives for the 1 MLOC fly-by-wire A380 code.

Another complementary strategy is to avoid Turing-complete constructs as much as possible, i.e. use DSLs with restricted semantics. This way, advanced semantic properties such as termination are provable.

[1] Program Analysis, An Appetizer. https://arxiv.org/pdf/2012.10086.pdf

Legend2440 · 2024-03-02T23:33:30 1709422410

Halting problem only applies in the general case. I can trivially tell you that while(1) will never halt.

There are many examples of programs whose halting behavior is not known (collatz conjecture for example) but many others where program analysis works just fine.

jameshart · 2024-03-03T15:43:30 1709480610

If you write a program whose behavior is Collatz-like (say, in some states it queues up more work for itself, and in other states it completes work from the queue, and you believe that in general it should always ultimately complete the queue) it is actually useful to have a static analyzer tell you ‘it’s not entirely clear that this code will ever terminate’.

You can make the analyzer happy by adding a max iteration limit or a recursion depth limit or something to make sure it fails out rather than looping forever.

Which is probably a good idea anyway, if you’re running code that you can’t mathematically prove will always complete.

TeMPOraL · 2024-03-03T09:33:49 1709458429

In the real world, if program analysis hits a blocker like this, you tweak stuff until it don't. Top-level post is correct in that, while theory applies to the general case, data and programs we actually use are not completely random/general - there's lots of properties baked in as a consequence of being real-world, physical entities.

fweimer · 2024-03-03T09:44:31 1709459071

The commercial static analyzers I've seen generate false positives (bogus issues that just aren't there) and false negatives (e.g., unconditional out of bounds pointer writes if that particular piece of C code is ever executed). Some of that comes with the territory because commonly used languages and their libraries are underspecified and commonly used at the boundaries of what is specified. And these tools must always produce some result even if they cannot even parse (or see) the entire code base.

Usually, when people say “static analysis“ they accept unsoundness and use of heuristics. Otherwise, they call the tool a type checker or a verifier. Such tools may run into the theoretical issues you mentioned. For them, the solution is to change the program until it compiles in a reasonable amount of time.

gryn · 2024-03-02T23:24:05 1709421845

In the real World Turing machines don't exist only finite state Machines. (No infinite tape or infinite time)

I guess something related to this one way or another.

pdonis · 2024-03-03T00:12:07 1709424727

You don't need an infinite tape to make a finite state machine that never halts. As Legend2440 pointed out upthread, while(1) is a simple finite state machine that never halts.

grekiki · 2024-03-03T02:10:54 1709431854

Sure but halting problem is solvable for finite state machines.

carlthome · 2024-03-03T15:06:12 1709478372

Could you expand or provide a link to a good resource for me to understand this?

If the judge program should say terminates yes/no and the program given is `while True: continue`, I guess the argument is that in the finite case, you could in principle just enumerate all programs that don't terminate and identify them as such?

jameshart · 2024-03-03T15:51:06 1709481066

In principle, you can enumerate all possible memory states of the system and determine what the next memory state would be from each one (including multiple possible next states if you account for things like interrupts)

Then you treeshake the unreachable parts of that directed graph from the start state, and look for closed loops in what remains.

carlthome · 2024-03-05T00:13:10 1709597590

Makes sense! Thanks!

falcor84 · 2024-03-03T11:03:48 1709463828

The theoretical halting problem is required to return a yes/no answer, whereas in the real world, it's actually really valuable to get back a "maybe doesn't halt" answer, so you can then more specifically iterate on those "here be dragons" areas of your system.