For a long time already I've wanted to make the leap towards learning dependently typed programming, but I was never sure which language to invest in - they all seemed either very focused on just proofs (Coq, Lean) or just relatively far from Haskell in terms of maturity (Agda, Idris).
I went through Software Foundations [0] (Coq) which was fun and interesting but I can't say I ever really applied what I used there in software (I did get more comfortable with induction proofs).
You're mentioning Lean with Agda and Idris - is Lean usable as a general purpose language? I've been curious about Lean but I got the impression it sort of steps away from Haskell's legacy in terms of syntax and the like (unlike Agda and Idris) so was concerned it would be a large investment and wouldn't add much to what I've learned from Coq.
I'd love any insights on what's a useful way to learn more in the area of dependent types for a working engineer today.
When I last looked into Lean, I was highly unimpressed, even for doing math proofs. There's no way I'd invest into as a general-purpose language.
Idris at least does state that they what people building real programs with it and don't want it to just be a research language.
For dependent types, I myself am skeptical about languages trying to continuously push more and more stuff into types. I am not certain that such efforts are a net positive on writing good software. By their very definition, the more typed a language gets, the less programs it can represents. That obviously reduces buggy programs, but it also reduces non-buggy programs that you can implement. Highly typed languages force more and more effort into pre-compile time and you will often find yourself trying to fit a problem into the chains of the type system.
Rather, I think reasonably multi-paradigm languages like F# are the sweet spot. Just enough strict typing and functional core to get you going for most of your program, but then it allows classes and imperative programming when those paradigms are appropriate.
I think the way to go to write better software is better tooling and ergonomics. I don't think type systems are going to magically save us.
> By their very definition, the more typed a language gets, the less programs it can represents. That obviously reduces buggy programs, but it also reduces non-buggy programs that you can implement.
While I generally share your skepticism, I think this is quite wrong. A good part of the point of advanced type systems is to make more complex problems possible while still being well typed. For example, in C, if you want a function whose return type is tied to an input argument's type, you either use void* and casts (no type safety), or you don't write that function. In languages with even slightly more advanced type systems, you can write that function and still get full type safety.
Even more advanced type systems achieve the same things: you can take programs that can only be written in a simpler type system and make them safe. In standard Haskell, for example, you can't write a Monad and actually have the compiler check that it respects the Monad laws - the implementation of Monad functions just assumes that any type that implements the right shape of functions will work as a monad. With dependent types, you can actually enforce that functions designed to work with monads only apply to types that actually respect the monad laws.
The trade-off with very complex type systems is different, in my opinion: after some point, you start duplicating your program's logic, once in the implentation code, but again in the type signatures. For example, if you want to specify that a sort function actually sorts the input list, you might find that the type specification ends up not much shorter than the actual code of the function. And apart from raw effort, this means that your type specifications start being large enough that they have their own bugs.
I think GP's point was that most[1] programs that can be represented will fail to please the programmer or his principals. The act of programming is navigating the state space of all possible programs and somehow finding one that has the desired properties and also doesn't otherwise suck. When viewed through that lens, a type system preventing most programs from being represented is a good thing, since odds are every single program it prevents is one that is unpleasant or otherwise sucks.
[1] of the countably infinite possible programs, virtually all
That would make sense if writing a program would be similar to randomly drawing a program from a pot of programs.
If instead I have a good idea what I want to write, the type system may either guide me towards the solution, or hinder me. It usually hinders me, I don't need a type system to guide me, but I like a type system that can check for trivial errors (oh, I meant to pass a list of numbers, not just a single number).
> For example, if you want to specify that a sort function actually sorts the input list, you might find that the type specification ends up not much shorter than the actual code of the function. And apart from raw effort, this means that your type specifications start being large enough that they have their own bugs.
Not to mention the tools to debug complex type errors are generally much less mature than the tools to debug runtime errors.
But even so, I think we could still benefit from going a little further towards the "proof" end of the type system spectrum in most cases. I don't think anyone really wants to deal with Coq and similar, but having used a language with dependent types for integers and vector lengths it's really nice to be able to say stuff like "this integer is in the range [0, 8)" and then have it catch errors when you pass it to a function that expects [0, 3) or whatever.
> When I last looked into Lean, I was highly unimpressed, even for doing math proofs.
I remember exploring different proof assistants for the first time in the 2000s. Back then, only people with a background in logic were involved, and most of the proofs that were formalized as showcases were of textbook results from the 19th century at most, or some combinatorial stuff like the four-color theorem.
I believe Voevodsky was one of the first prominent non-logicians to become interested in proof assistants, using Coq around 2010. Nowadays, several mathematicians coming from algebraic geometry, number theory, etc. are promoting formal proofs, and it seems like most of them have chosen Lean. I don't know whether this is because Lean is somehow better suited for working mathematicians, or if it was simply a random personal preference among people who got enthusiastic about this stuff and started advertising it to their colleagues?
I am not familiar with every proof assistant out there, but many of them are a very hard sell for mathematicians and lack a comprehensive math library. Lean seems to be one of the few exceptions.
>When I last looked into Lean, I was highly unimpressed, even for doing math proofs. There's no way I'd invest into as a general-purpose language.
Can you elaborate? I am using Lean as a general-purpose language writing simple little programs, so I have not encountered the deeper parts of the runtime etc. I'd like to see some criticism/different perspectives from more experienced people.
Lean aims to be a general purpose language, but I haven't seen people actually write HTTP servers in it. If Leo de Moura really wanted it to be general purpose, what does the concurrent runtime look like then? To my knowledge, there isn't one?
That's why I've been writing an HTTP server in Idris2 instead. Here's a todo list demo app[1] and a hello world demo[2]. The advantage of Idris is that it compiles to e.g. Racket, a high level language with a concurrent runtime you can bind to from Idris.
It's also interesting how languages don't need their own hosting (e.g. Hackage) any more. Idris packages are just listed in a TOML file[3] (like Stackage) but still hosted on GitHub. No need for versions, just use git commit hashes. It's all experimental anyway.
There are tasks, which are implemented as part of the runtime and they appear to plan to integrate libuv in the future. Some of the runtime seems to be fairly easy to hack and have somewhat nice ways of interoperating with both C, C++ and Rust.
I don't have much experience with Haskell, but one of the worst experiences has been Stack's compile time dependency on GitHub. GitHub rate limits you and builds take forever.
That's interesting. Could you say more? This is something that we (speaking as part of the Haskell community) should fix. As far as I know Stack/Stackage should pick up packages from Hackage. What does it use GitHub for?
And a few more. The "fix" is having Stack impersonate the user (https://github.com/commercialhaskell/stack/pull/6036) and authenticate to the API. This unblocks progress, but this is really a design bug and not something I think people should emulate.
Every other language I've used allows you to build code without authenticating to a remote service.
> Every other language I've used allows you to build code without authenticating to a remote service.
Sure, the problem here wasn't "building". It was downloading a package template (which one doesn't tend to do 60 times per hour). I agree packages shouldn't be fetched from GitHub.
> and this comment suggests it wasn't GitHub doing the rate limiting after all
That comment is from someone other than the ticket filer who was seeing another issue even after sending the GitHub token. It was this second issue that wasn't caused by GitHub rate limiting -- the original one was.
> It was downloading a package template (which one doesn't tend to do 60 times per hour).
I've personally had Stack-related GitHub API rate limiting delay builds by at least an hour due to extreme slowness. So whatever the rate limits are, Stack occasionally hits them.
This is not related to Lean or Haskell. I'm just wondering why when people are curious about a new general-purpose language, the first thing they test is an HTTP server.
Lean can be used to write software in [0]. I dare say that it may even be the intended use for Lean 4. Work on porting mathlib to Lean 4 is far along and the mathematicians using it will certainly continue to do so. However there is more space for software written in Lean 4 as well.
However...
it's no where near ready for production use. They don't care about maintaining backwards compatibility. They are more focused on getting the language itself right than they are about helping people build and maintain software written in it. At least for the foreseeable future. If you do build things in it you're working on shifting ground.
But it has a lot of potential. The C code generated by Lean 4 is good. Although, that's another trade-off: compiling to C is another source of "quirks."
One reason I took interest in Idris (and lately Roc, although it's even less mature) is the promise of a functional but usable to solve problems today language with all the latest thinking on writing good code baked-in already, compiling to a single binary (something I always envied about Go, although unfortunately it's Go). There simply isn't a lot there yet in the space of "pure functional language with only immutable values and compile time type checking that builds a single fast binary (and has some neat developer-friendly features/ideas such as dependent types, Roc's "tags" or pattern-matching with destructuring)" (this rules out OCaml, for example, despite it being mature). You get a lot of that, but not all of it, with other options (OCaml, Elixir/Erlang, Haskell... but those 3 offer a far larger library of ready-to-import software at this point). Haskell did manage to teach everyone who cares about these things that managing side-effects and keeping track of "purity" is important.
But it's frankly still early-days and we're still far from nirvana; Rust is starting to show some warts (despite still being a massive improvement over C from a safety perspective), and people are looking around for what's next.
One barely-touched thing is that there are compiler optimizations made possible by pure functional/pure immutable languages (such as caching a guaranteed result of an operation where those guarantees simply can't be given elsewhere) that have simply been impossible until now. (Roc is trying to go there, from what I can tell, and I'm here for it! Presumably, Rust has already, as long as you stick with its functional constructs, which I hear is hard sometimes)
> Rust is starting to show some warts (despite still being a massive improvement over C from a safety perspective)
The way it seems to me is that actually Rust aims to be an improvement over C++ rather than C. (And Zig aims to be an improvement over C rather than C++.)
The major downsides of both will be the same as their reference point: Rust will eventually be too complicated for anyone to understand while still not being really safe (and the complexity then comes back to bite you one more time). Zig will be easy to understand and use but too unsafe to use for important applications (at least once people start really caring about software in important applications).
Both of these will be fairly niche because compiling to a single binary just isn't as important, as elegant as it might be.
Zig is, right now, being used for high-assurance systems where correctness is a terminal value, and it provides many tools and affordances to assist in doing so. It isn't a good choice for projects which give lip-service to correctness, but for ones which actually mean it, and are willing and able to put in the effort and expense to achieve it, it's an excellent choice. I'm willing to gloss that domain as "important applications", but perhaps you meant something different by that term.
> Zig is, right now, being used for high-assurance systems
I'm not convinced this is telling us very much. I was talking about software that, e.g., causes deaths when it fails. But regardless of what level of assurance one looks at, most "high assurance" systems continue to be built using C. The very few that use Zig surely chose it primarily due to compatibility with C, with the hope that it's safer than C (a low bar). Maybe in some cases also a wish to play around with new toys played a role in the decision, though in "important applicatios" I'd like to hope that's rare.
In the end, we'd have to look at harm done due to bugs. For C, I'd say the record is abysmal. For Zig, it's way to early to look at this.
My judgement above is mostly based on Zig not making it at all hard to violate memory safety and the analogy with C. Needless to say, Zig is better than C in this respect and that's a good thing. If your argument is something like "memory safety doesn't really matter, even for critical applications", we'll just not agree on this.
I think you are wrong in a funny way. Memory safety and memory leaks and stuff dont always matter.
This sparked and interesting memory for me. I was once working with a
customer who was producing on-board software for a missile. In my analysis
of the code, I pointed out that they had a number of problems with storage
leaks. Imagine my surprise when the customers chief software engineer said
"Of course it leaks". He went on to point out that they had calculated the
amount of memory the application would leak in the total possible flight time
for the missile and then doubled that number. They added this much
additional memory to the hardware to "support" the leaks. Since the missile
will explode when it hits it's target or at the end of it's flight, the
ultimate in garbage collection is performed without programmer intervention.
while a funny example, you are very wrong. High assurance systems have a specification and a bazillion tests because memory safety is an insanely tiny problem in the sea of problems they face. [that is why, frama-C and friends are preferred over Rust,Ada, ATS, and whatever else that exists.]
Correctness of the spec and perfectly following the spec is far more important.
Zig allows correct code to feel easier to write when compared to C. Thats why it was chosen in tigerbeetle and thats why, if the community wants, it will have an insanely bright future in high assurance systems.
As a Frama-C developer, and more precisely the deductive verification tool, I'd say that formal proof of programs (especially proof that the program conforms to its specification) would be significantly easier on Rust. The main reason is related to the handling of memory aliases which is a nightmare in C, and that generates formulas that kill SMT solvers. The consequence is that the ongoing development tend to target something that has a lot in common with Rust: we try to assume memory separation most of the time, and check that it is true on function call, but it as harder to do it than with a type system.
> the handling of memory aliases which is a nightmare in C
Zig has some aliasing issues which are to-date unsolved. The core and community are keenly aware of them, and they'll be solved or greatly ameliorated before a 1.0 release. It's why TigerBeetle has a coding standard which requires all container types to be passed by constant pointer, not reference.
It isn't ideal, but I think it's reasonable to withhold judgement on a pre-1.0 language for some sorts of known issues which are as yet unaddressed. It's worth taking the time to find an excellent solution to the problem, rather than brute-force it or regress one of the several features which combine into the danger zone here.
If you're curious or interested in the details, look up "Attack of the Killer Features" on YouTube. Given your background, I'm sure the community would value your feedback on the various issues tracking this.
> If your argument is something like "memory safety doesn't really matter, even for critical applications"
Not at all, not even close. What I will say is "memory safety can be achieved by policy as well as by construction, and indeed, can only be achieved by construction through policy".
Let's break that down. Rust and Go are two examples of languages generally referred to as memory-safe. Rust achieves this by construction, through the borrow checker, Go achieves it by construction through the garbage collector.
If there are bugs in the borrow checker, or the garbage collector, then the result is no longer memory-safe. That assurance can only be achieved by policy: the garbage collector and the borrow checker must be correct.
TigerBeetle, the program I linked to, achieves memory safety with a different policy. All allocation is performed once, at startup. After this, the replica's memory use is static. This, if correct, is memory-safe.
Zig makes this practical, because all allocation in the standard library is performed using the allocation interface: any function which allocates receives an Allocator as an argument. Libraries can violate that policy, but they generally don't, and TigerBeetle is a zero-dependency program, so that's not a concern for them. Other languages where it maybe isn't immediately obvious if something goes on a heap? Not so easy to achieve a memory policy like that one.
So this:
> Zig not making it at all hard to violate memory safety
Is irrelevant. What's needed in high-assurance systems is the practical ability to create memory safety by policy, and Zig provides a difference-in-class in this matter compared to C.
> most "high assurance" systems continue to be built using C
Yes, other than your scare quotes, this is correct. Programs like SQLite are memory safe by construction and exhaustive testing, you're welcome to try and land a CVE if you disagree. Every few years someone gets a little one, maybe you'll be the lucky next player. Or you could try your luck at QEMU.
My premise is that Zig makes it massively easier to achieve this, through numerous choices which add up: the allocator interface, built-in idiomatic leak detection, a null-safe type system, slices and arrays carrying bounds, and much else. It has `std.testing.checkAllAllocationFailures`, which can be used to verify that a function doesn't leak or otherwise misuse memory even if any one allocation anywhere downstack of a function call fails. You might want to compare this with what happens if a Rust function fails to allocate.
Basically you're taking the lessons of C, with all of its history and warts, and trying to apply them, without due consideration, to Zig. That's a mistake.
Lean definitely intends to be usable as a general purpose language someday. but I think the bulk of the people involved are more focused on automated theorem proving. The Lean FRO [0] has funds to guide development of the language and they are planning to carve out a niche for stuff that requires formal verification. I'd say in terms of general purpose programming it fits into the category of being "relatively far from haskell in terms of maturity".
I took that course as well, and for me, the big takeaway wasn't that I specifically want to use Coq for anything practical, but the idea that you can actually do quite a lot with a non-Turing complete language. Realizing that constraints in a language can be an asset rather than a limitation is something that I think isn't as widely understood as it should be.
I went through Software Foundations [0] (Coq) which was fun and interesting but I can't say I ever really applied what I used there in software (I did get more comfortable with induction proofs).
You're mentioning Lean with Agda and Idris - is Lean usable as a general purpose language? I've been curious about Lean but I got the impression it sort of steps away from Haskell's legacy in terms of syntax and the like (unlike Agda and Idris) so was concerned it would be a large investment and wouldn't add much to what I've learned from Coq.
I'd love any insights on what's a useful way to learn more in the area of dependent types for a working engineer today.
[0] https://softwarefoundations.cis.upenn.edu/