Note that we do sibling call optimization per LLVM, and you can use trampolines like Clojure does. Sibling call optimization, most notably, includes all self-calls. This means that calls to the currently executing function in tail position will be tail call optimized, if there are no destructors in scope.
The biggest problem here is ABIs: you have to use Pascal calling conventions instead of C calling conventions. It's also difficult when you have destructors: there are not many tail call positions when destructors are in scope.
I'd vote getting to use RAII is a better win than TCO.
And really its by definition not a tail call when you have to run destructors after a "tail" call, and tricky/annoying to express that destructors should be run prior to a tail call.
Wouldn't it be possible to have some sort of macro able to "manually" convert calls in a recursive style into loops? Something along the lines of Clojure's `loop/recur` but less tricky (and maybe with a single "word")?
Self recursion is only a particular case and you can often represent it with loops or folding patterns as you are imagining. However, the really cool use for tail call elimination is when you are calling other functions since you can't rewrite the code without breaking encapsulation. Out of the top of my head, one important example of this would be continuation passing style (you need TCO to turn the continuations into real "gotos")
That would be useless. I believe the point of proper tail calls is that the set of functions that may be called from a particular call site is not closed. The callee may be a parameter of the caller, for example in combinator-style libraries. You can't convert that in advance.
How about the other two calling conventions that support tail call optimization? (cc 10 and cc11)
Do those have the same performance penalty as fastcall + tailcallopt?
Please, for the sake of $DIETY, stop calling it "Tail Call Optimization". It is not an optimization, and the wrong term causes lots of useless discussions and misunderstanding. Call it "Tail Call Elimination", because that's what it is.
With finite memory (that is, in the real world), a program that relies on tail recursion will blow up if TCE is not implemented even though it can run forever with TCE. Thus, it changes actual program results (rather than just runtime speed or amount of memory required), and cannot be considered just an "optimization", since it is functionally required.
And to future-language designers: Please, for the sake of $DIETY2, make the syntax for guaranteed-eliminated-tail-call different than a regular call: It's really a bad idea that a function such as:
def f(x):
...
return 0+f(x-1)
cannot in general be TCEd (adding 0 to a floating point imporper denormal will make it proper), whereas
def f(x):
...
return f(x-1)
can. I suggest "chain y" instead of "return y" so that the compiler can verify that a TCE can indeed happen.
cannot be considered just an "optimization", since it is functionally required
Here is a C program:
#include <stdio.h>
int main(void) { while (1) malloc(64); }
With clang, when compiled with -O0, this consumes all the memory on my system. When compiled with -O2, it runs forever in constant memory, because clang has optimized out the call to malloc. If I replace the call with, say, 'malloc((size_t)(-1) >> 1)', it even changes the output of the program.
Here is another example:
int factorial(long long x) { return x * factorial(x-1); }
With gcc, when compiled with -O2, the factorial function can correctly compute the factorial of a very large number. When compiled with -O0, it cannot.
I would call both of these optimizations, even though someone may rely on either for correctness. So I don't think tail calls are unique in this respect.
make the syntax for guaranteed-eliminated-tail-call different than a regular call
Both of those programs are playing with C's infamously broad undefined behavior rules, the first for a non-terminating loop and the second for using too much stack (when called with a sufficiently large number). -O2 being nice to you is a coincidence, not something you should rely on.
In a language which prefers diagnosing programmer error over punting both should be errors.
In the finite world beagle3 talks about, don't all optimizations potentially qualify as "functionally required"? If the finite world you're operating in can only run a program that requires at most N steps ("steps" being deliberately vague "computing resources," like CPU cycles or memory usage), then any optimization that makes an (N+1)-step program complete in N steps qualifies as a "functional requirement."
The reason people are so interested in TCO/TCE is not just that someone might rely on them for correctness, which as you point out is true of any optimization, but rather that they are so predictable that you can omit looping constructs from your core language if you have them.
> even though someone may rely on either for correctness.
That someone is playing with fire. It's the kind of optimization that might be removed in a hotfix because it turns out to break something else.
It's even worse than relying on things like byte order -- and if you rely on them, you are definitely outside the realm of "C language", and into the realm of "Specifically LLVM v.37x patchlevel 2, on an SSE4 architecture"
If you worship food, that is. But you're right, people calling proper tail calls "an optimization" should be shot. Those who survive the subsequent generation scavenging should be shot again. :-)
Or follow Alef (http://www.cs.bell-labs.com/who/rsc/thread/ug.pdf) and use "become". The Alef become statement is a kind of generalized tail-call form because you can use it to drop the current stack frame from anywhere, not just in tail-position.
As a historical note, this was also the original plan for Rust, where the presence of the `be` keyword rather than the `return` keyword would indicate tail-calling.
If I remember correctly the new version of TCL will have special syntax to enforce TCE. Unfortunately I did not book mark the relevant online discussion.
Clojure also doesn't support TCO, and from what I can tell the explicit loop/recur is sufficient, which also has the advantage that it is very clear (i.e., the compiler will tell you) when TCO is not being invoked. With implicit TCO you might think you're getting the benefit, only to have a stack blow up unexpectedly at runtime with an edge-case input.
Especially with destructors. Whenever you have a destructor in the frame, you're not in tail-call position. This means that tail-call positions in Rust are going to be few anyway.
The problem is that TCO in the sense we're talking about isn't really an "optimization". It's a language feature that removes the need for explicit looping constructs by allowing you to write them as higher-order functions instead. This only works if the guy writing the higher-order function can prove that the compiler will successfully "optimize away" the tail call, or if he doesn't care whether the program dies with a stack overflow. If the success of TCO is dependent on which conservative approximation the compiler is using for escape and liveness analysis, then people who care about their programs continuing to run will demand explicit looping constructs anyway.
There are three separate inventions in Scheme that work this way. Closures give you objects without an object construct, tail-call optimization gives you loops without looping constructs, and call/cc gives you threads, exceptions, and backtracking without thread, exception, or backtracking constructs.
(I mention Scheme because all three of these were, as far as I can tell, introduced in Scheme, and only later adopted by other functional languages like ML, although to be fair, TCO at least falls out naturally from combinator-graph reduction.)
In a sense, Scheme is sort of like a functional assembly language: there are lots of object systems, looping constructs, threads, and exception systems in Scheme, and they aren't compatible with each other. It's sort of like the situation with linked lists in C, where every library has its own linked-list type.
It's exactly the opposite of assembly language in another way, though. By making object instantiation and population implicit, both the compiler and the maintenance programmer have to do extra work to figure out what's an object and what's not, and what the object's fields are. The same is true of loops, threads, and exceptions. In assembly, instead, you have the problem of things being too explicit, thus losing the signal in the noise.
That's my take, anyway. I haven't spent that much time programming in either Scheme or assembly, although I did write an almost-Scheme compiler targeting assembly called Ur-Scheme.
The "assembly language" for ML would actually be the lambda calculus. Scheme isn't really defined unless you're talking about some standard, and the semantics of the language are probably just defined in English.
Hmm, it sounds like you're trying to argue with me, but I'm not clear on what you're saying — some quick responses — hope this helps:
① I did not claim Scheme was an "assembly language" for ML. I said that programming in, reading programs in, and compiling programs in Scheme was like programming in, reading programs in, and compiling programs in assembly language in some specific ways, and very unlike it in others. The relationship between Scheme and ML is that some of the central insights of Scheme were adopted by ML.
② ML defines evaluation order. The lambda calculus does not. Typical ML implementations compile to the assembly languages of actual processors. Can you clarify?
③ I think Scheme is sufficiently well defined for this discussion — it's a family of languages originating in some papers by Sussman and Steele in the 1970s, and continuing through the current R7RS work, including a number of compilers. Several of the standards define the semantics of the language symbolically, not just in English.
That's correct. It should be possible to have an explicit guarantee of TCO while maintaining the ability to write things that aren't trivially reducible to loops...
No, scala's @tailrec annotation only guarantees that calls that can trivially be compiled to loops in fact are. You can't do better on a standard JVM without jumping through hoops (or, rather, trampolines).
Direct mutual recursion is somewhat rare (but useful for encoding state machines, for instance). However, using continuation passing style (and tail calls to a continuation argument) to prevent stack overflow is a common idiom in some functional languages (e.g. when mapping over a binary tree).
Erlang uses mutual TCO everywhere. All actors/processes are state machines with the function they're in being their current state. To switch states, you just tail-call a new function (which doesn't have to be from the same module you were in.)
It seems like part of decision was reached because of limitations in LLVM[1]. This makes me a bit uneasy, but I guess it has to follow C to meet its goals.
No they can't, at least not without compromising on several points laid out in TFA. Mainly these two:
- Tail calls also "play badly" with assumptions in C
tools, including platform ABIs and dynamic linking.
- Tail calls require a calling convention that is a
performance hit relative to the C convention.
See other posts in this discussion for examples of the impacts of some alternate calling convention choices.
2) Some of the times they can do it, they do it when they control both the caller and the callee. In that case you can violate the ABI to your heart's content.
3) I don't really see TCE as an issue in Rust, since they seem to be going the RAII route and, as others have commented, RAII and TCE don't play well together (for the same reason a call at the end of a dynamic binding can't be TCE'd in common-lisp).
What is tail call optimization? How does it usually work? There's an implementation example on the Wikipedia page to answer this question a little bit:
And, of course, "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO" by Guy Steele. Available here: http://library.readscheme.org/page1.html
I'm puzzled: if Rust and Clojure won't support TCO and they are both cool, why is there such bitter complaining every time it comes up that Python won't support TCO?
I can't speak for Clojure, but Rust's rationale for omitting TCO is based on sound engineering issues (as articulated) whereas Guido's sole reason for omitting it (as far as I've seen) is that you lose stack trace information, which is considered something of a red herring (as you can always selectively disable TCO for debugging, and there are ways of retaining stack information anyway.) Note that the original article here is quite disappointed that they can't include TCO, because it would be a valuable tool and allow for different, efficient styles of programming.
I can't answer fully for Clojure's reasons for not supporting TCO, but it will at least be fundamentally rooted in the fact that the JVM, as a platform, does NOT support TCO.
Scala is usually the language brought up as a counter argument here, but Scala has the same limitations as Clojure - the JVM can't do the TCO. Scala's compiler tries, with certain types of tail calls, to optimize via (I believe) a trampoline – it reduces those calls into a loop, so they are no longer a function call.
But, only certain types of calls (specifically, the last line of code in a recursive function must be a call to the hosting function) can be TCO in Scala. There is an annotation, scala.annotation.tailrec, which can be placed above a method you want to tail recurse.
@tailrec does not, however, "force" the compiler to do TCO – it simply forces compilation to fail if the method in question cannot be Tail Call Optimized. It's a developer hook for saying "I realize the compiler tries to do TCO where possible, but I require this method to be trampolined". If it can't be, you'll get a compilation error with details on why the TCO failed.
Yes, the x86 instruction that implements TCO is called JMP. The JVM doesn't have an equivalent instruction; you can only jump around inside a single method, not to the beginning of another method.
The JVM also doesn't (I'm fairly sure) have an indirect JMP instruction, which means you can't compile a polymorphic source-language method call into a JMP inside of a single generated method. Instead you have to use a call to a method. That means that tail-calls to functions passed as parameters (rather than functions that can be statically bound at compile time) can't use JMP.
I'm talking about the bytecode instructions standardized in the "JVM" specification. The "JVM" has to implement the semantics of the instructions found in your bytecode program, or your program won't run. Your bytecode program can only use bytecodes defined in the "JVM" specification, or it won't run on the "JVM".
Because the security manager needs to be able to introspect the stack. Avian isn't a JVM; it's "designed to provide a useful subset of Java's features" and can run some Java code.
It's actually possible to implement the stack inspection in a tail-call-friendly way (along the lines of Racket's continuation marks IIRC), though AFAIK nobody does it.
> Because the security manager needs to be able to introspect the stack.
Not allowing certain “optimizations” in security-sensitive contexts is perfectly fine. In fact, this is exactly what Avian, the CLR (and pretty much everyone else) is doing.
> Avian isn't a JVM; it's "designed to provide a useful subset of Java's features" and can run some Java code.
Now you're talking about legal aspects. Frankly, I'm not interested in discussing those.
Avian is a JVM for all practical purposes. If you disagree, please provide a test-case which runs on HotSpot but not on Avian.
JVM interop. You can't call a Scala function from Java unless it's somehow addressable as a Java method, so Scala implements functions in terms of Java methods. The JVM, for better or worse, is largely built around Java's object model - you can do your own weird stuff, but you won't easily be able to interop with Java code or benefit from HotSpot's optimizations.
Amusingly, "losing stack trace information" is among the reasons the JVM doesn't do TCO -- it needs to keep it around for the security manager to do its thing. Its especially annoying there because references are held by higher stack frames and prevent GC, and the destructor excuse isn't present.
> it needs to keep it around for the security manager to do its thing
Haha. As if the JVM needed proper tail calls to get owned every other week. :-)
Anyway, the discussion is pretty much moot, because "JVM" doesn't really refer to specific implementation. If one cares about proper tail calls, use an implementation which supports it.
A debugging switch that, when enabled, turns a valid program into an invalid one is madness.
Python has decided to place very conservative requirements on what an implementation's stack must be capable of. This might indicate a certain lack of ambition but it's a justifiable engineering tradeoff.
There are better alternatives to said debugging switch for examining the stack of tail-call-optimized programs, c.f. Clements and Felleisen's 2004 paper, "A Tail-Recursive Machine with Stack Inspection." The debugging switch is merely the simplest naïve option.
Essentially, your engineering tradeoff boils down to, "It's easier (for implementers) to omit TCO." That is actually a very reasonable and rational position, as simpler implementations are less likely to have bugs and have a lower creation cost. However, I still hold that the advantages of TCO outweigh the concern about its implementation. (Also, I do not believe that Python does place conservative requirements on an implementation's stack, as a conformant Python implementation must also implement generators; I regard this as roughly on par with TCO's complexity.)
Complexity and stack space usage are two very different things. CPython does indeed place (very!) conservative limits on stack depth, and it also implements generators. I haven't looked at the generator implementation, but it has a number of limits that make it easy to implement — the main one is that generators don't have their own stacks, so can't yield from inside a function they call, the way e.g. Lua coroutines or (effectively) Icon generators can.
I'm sorry, I made something of a terminological error. When I said 'complexity', I was referring to complexity of implementation, not algorithmic time or space complexity. My intention was to assert that implementing generators and implementing tail-call optimization are both endeavors of roughly the same order of engineering difficulty (as the mechanism used to implement TCO is quite simple) and not to compare the stack space required by each.
I also should clarify that at this point, it's likely not reasonable to expect Guido or the Python community to ever turn around and decide to implement TCO, as the current engineering effort required to add TCO to all active implementations of Python is much larger in comparison to the engineering required to add it at an earlier stage. I regard this as a failure of the langauge design from the start, but having made their decision I don't fault the Python community for sticking to it.
I was responding to your "I do not believe that Python does place conservative requirements on an implementation's stack", in response to bcoates's statement that it does, nothing about complexity. I was just trying to clarify (my interpretation of) what he said.
I believe that both C# iterator-functions and python generators are designed with enough limits that they can be implemented as syntax sugar: you lift the generator function's locals into an object and decompose the function body into a state-machine.
-Werror isn't a "debugging switch". It should be enabled at all times, for development builds and production builds.
If compilation warnings are given, then the responsible thing to do is to assume the program is invalid, at least in certain situations. Fix the code, avoid the warnings completely, and that's that.
There are remarkably few cases where compilation warnings can or should be ignored.
> Rust is supposed to be as fast as C/C++ and TCO will slow it down.
TCO doesn't slow programs down. If anything, it might even make them a tiny bit faster. But speed isn't the reason you'd want TCO: you want it as a stack optimization so that tail-recursive functions use constant stack space. Furthermore, although the C specification places no TCO requirement on implementors, it's actually a very reasonable optimization even in C. GCC actually does do TCO in some cases (I'm using 4.5.3, and tail-call elimination occurs with -O2 and higher).
> Clojure can't have TCO because the Java Virtual Machine doesn't support it. So they have a workaround (trampolines).
This is true. There was a time when TCO support was slated for Java 7. That time has come and gone; unofficially, it looks like it may make an appearance in Java 9 at the earliest.
> Python doesn't support TCO because... it screws up stack traces?
That's the cited reason, but it's horseshit. As others have pointed out on this thread, it's feasible to simply disable TCO when doing debugging if you need a stack trace. Even fancier, there are algorithms to recover the stack even after the calls have been eliminated (see Lua for example).
"TCO doesn't slow programs down. If anything, it might even make them a tiny bit faster."
right, but i think they were saying that, in this case, because of dependencies out of the rust team's control, they would have to use a different, slower calling convention to support proper tail calls.
Well, according to a quick perusal of the LLVM documentation, the only llvm calling convention that allows TCO on all platforms seems to be fastcc, whose main purpose is to have calls that are a fast as possible, at the cost of not being compatible with the slower, but more common calling conventions of C.
http://llvm.org/docs/LangRef.html#calling-conventions
In that case you can clone the function. Internal linkage calls can use the fastcc calling convention while you leave either a whole clone with the non-fastcc calling convention or a non-fastcc wrapper of the fastcc function for calls from external linkage.
If I understand it correctly, the Rust issue is that tail calls are expensive in the context of LLVM and the calling convention that they are using. MLton (like Manticore, which I work on) uses a different code generation backend and calling convention entirely. And neither of us worry about "destructors" or, in general, things that happen at every return point.
Though, interestingly, in the context of a CPS-converted program, destructors would just be the next continuation called on the way to calling the procedure previously "returned to." Under our (Manticore) calling convention, we would just jump to each of those continuations and wouldn't have to grow stack (though in our case, it's keeping around heap frames, as we're not stack-based), and would have the same next allocation and instruction-level behavior as a tail call.
That said, if you think tail calls are hard to debug, CPS-converted programs would destroy most people's will to live. No stack backtraces, except in debug modes, with 5-10x performance penalties.
If you don't have any interesting control operators like call/cc, prompt/control, etc. then it should be possible to track stack-like control patterns after CPS conversion through the compiler and through DWARF (or a custom debugging format). Unless I am missing something it seems that the control operators are the problem, not CPS.
Certainly, that's true! As I mentioned in my other far-too-wordy response, though, if you just CPS convert and run a program, it will be far too slow (5x the funcalls != 5x the fun...). So, it's the combination of CPS conversion and later optimizations --- paired with whole-program compilation --- that make it so hard to debug.
I never thought tail calls were hard to debug, but can you can keep text offsets to the original tokens through the CPS transform? Usually with typed languages that's enough information to figure out what went wrong or at least what to log.
Also, I'm not sure why the entire library needs the same calling convention when it's such bad form to expose all of your functions anyway.
You can keep the original tokens. The challenge with debugging such programs is really the integration with the rest of the optimizations. CPS transformation turns your program into a ton of tiny functions. Then, any reasonable optimizer will do a huge amount of inlining. Attempting to do trace-based debugging bounces you from one subexpression in one function to another and so forth. Especially when you consider that Manticore (like MLton) is a whole-program optimizing compiler, you also get little chunks of library functions we've duplicated like map, foldr, etc.
Worse, we liberally remove dead code, including unused arguments, branches of conditionals we can guarantee are never called, etc. And since it's a research compiler, there isn't really a "-O0". You can turn off individual optimization passes, but guessing what combination lead to something getting inlined or not requires some careful study.
As to why they've chosen what they have with LLVM, I don't know enough to judge. Doing something like ML compilers do with a custom calling convention internally and then a C convention externally is pretty expensive, and ends up putting a massive penalty on C calls (oh, you want to call C? let me move the GC pointer out of the way, set up a fake stack frame for you, etc.). Further, doing it makes register allocation more of a challenge. Many ML compilers like to "pin" certain registers with their own values (e.g., the GC local heap limit pointer) so that we can write custom little blobs of assembly that get emitted in the right places without worrying about substituting in what the _real_ heap limit pointer is, etc. That pinning interacts badly with LLVM, as you can see from the Haskell/LLVM master's paper - they basically had to give up on it and just add their pinned values as extra arguments and then stop emitting code that relied on it being in a sane, typical place.
Hope that helps! I've been thinking about these problems and talking only with other people who do the same for so many years I'm starting to forget which parts are and aren't obvious (or even published/documented).
I guess another problem would be that people often complain about how slow these kinds of compilers can be, and if you keep around enough to reconstruct good error messages it would probably make it that much slower.
Most of the time in whole-program compilers such at Manticore and MLton (> 60%, though it depends on the particular program) is spent in the combination of parsing the input source files and running GCC over the generated assembly to produce the binary.
Many compilers (including ours) did not keep around that extra information because even in 2007 (when we started) RAM was a bit scarce. The cost associated with keeping that info around between phases is primarily in the extra working set hit and the resulting GC and especially memory paging issues. Given that it's not unreasonable to expect people to have > 1GB of available physical RAM for the compilation process these days, that's something we should consider changing. Though at this point threading that info through the compiler is probably a couple weeks of dedicated effort to get working correctly.
Nice! I have had envious eyes for Manticore as a modern replacement for Concurrent ML. Are you guys intending for it to be available for general use or more of a research language?
It's still a research language today. Honestly, we would need to make a fairly major push --- which would freeze all resarch work and publications --- to get to a point where it is ready for general use. We're working on some grant proposals along those lines, because frankly without NSF support (in the context of making it ready for wider use in classes such as CMU's new intro curriculum), it's unlikely we can justify the massive amount of work required to take it from something that PL experts to use to something that, say, MLton or SML/NJ users could pick up and use.
I hate to sound mercenary, but frankly if we can't significantly increase the project from the current number of developers (myself and two part-time undergrads), it's difficult to see how we can make it more generally available. Especially when I lose a couple of months of work every time we double the number of processors in a server-class machine, as there's always either a GC bug or some scalability issue still lurking in the runtime...
Python doesn't optimize any tail calls at all. Rust optimizes sibling calls, which includes all self-tail calls. It's only tail calls to a different function that Rust doesn't eliminate.
Who says they are both cool? HN isn't a single person. It is quite possible that group A thinks rust and clojure are "cool", and group B thinks that languages without TCO (including python, clojure and rust) are bad.
The Y Combinator is something important in the theory of functional programming. It takes one function as argument and applies it to itself ad infinitum. Together with lazy evaluation this is how you can implement recursion in a language which does not natively support it, namely (lazy) lambda calculus.
Actually, TCO and Y are not really related with each other apart from the fact that you usually learn about them in the same course about functional programming.
If you write your loops with the Y-combinator, you need tail-call elimination to keep them from overflowing the stack when they run for enough iterations, because the way your function f goes on to the next iteration of itself is by calling the (Y f) that was passed as its first argument.
You can write a lambda calculus interpreter which does beta-reduction. In this case, there is no stack which could overflow. Just a lambda expression, which is reduced to another expression.
The Y combinator is a way to express recursion in lambda calculus. Tail call optimization (turning certain function calls into GOTOs) is the traditional way to keep from overflowing the control stack in languages based on lambda calculus.
The problem seems to be that you can't do tail calls with the C calling convention.
For example, let's say a 0-argument function tail calls a 1-argument one: the 1-argument function expects an argument on the stack, so the 0-argument function must push one.
However, when the 1-argument function returns, the argument will still be on the stack, because with the C calling convention the caller removes arguments from the stack.
But the caller called a 0-argument function, so he won't remove any argument, and thus the stack is now misaligned due to the argument left over there, which will crash the program soon.
However, just switching to the Pascal/stdcall convention, where the callee removes arguments from the stack should just work; it might be slightly slower, but on all modern architectures (i.e. those that aren't x86-32) parameters are going to be passed in registers anyway for must functions, so it shouldn't matter.
The problem of that is that non-varags functions cannot be called with a varargs prototype; this is an issue with K&R C that allows calling undeclared functions, but isn't an issue with Rust.
So, I'm not quite sure why Rust doesn't just switch calling conventions and support tail calls.
Seriously though, why not just use a rec keyword and then disallow any of the things that don't play nicely when you're in the recursive function? If you really wanted to be cool you could put those things right into the type system.
As they mentioned in the original post, a big problem is that TCO does not play nice with two other important features they want.
1. deterministic destructors that run at the end of functions (therefore making things that look like tail calls not actually tail calls) and
2. binary compatibility with C and C++ libraries and tools (they say that tail recursion doesn't let you usethe C calling conventions that these tools and libraries expect you to use)
There is no point in allowing tail recursion in restricted contexts if you can't use these restricted functions to do the sort of stuff Rust was actually made to do
> 1. deterministic destructors that run at the end of functions (therefore making things that look like tail calls not actually tail calls) and
Couldn't owned boxes (~) passed through the jump transfer their ownership to the next frame (so they'll be destructed by whoever finally does return), and everything else just get destructed before the jump?
> binary compatibility with C and C++ libraries and tools
>(they say that tail recursion doesn't let you use the C
> calling conventions that these tools and libraries expect
> you to use)
I've read variations of this comment about Rust C++ compatibility a few times, but haven't managed to find a source. Any references you could point me to?
AFAIK there's no plan to support any sort of C++ interop natively in Rust. However, it should work just fine if your C++ code exposes a C-compatible interface. Servo has to be able to call into SpiderMonkey somehow.
g++ vs itself, or g++ vs msvc? g++ uses Itanium C++ ABI on at least linux/mac/mingw, and the mangling and vtable layouts seem to be the same (I'm very interested in any information to the contrary - though I realize vtables are only part of the story).
Itanic may be an exception, but there have been at least two backwards-incompatible C++ ABI changes in G++ over the years. Casting to superclass in the presence of multiple inheritance and exception handling have been among the changes. Maybe they should change vtables too, I don't know.
Figured as much - it would be hard to take any language seriously without C interop. But that's a long way from "C++ binary compatibility." I've heard this several times, so I was hoping it might be true.
In any case, I'm not sure that any official Rust source has ever tried to claim "C++ binary compatibility". Perhaps this language is a result of people misinterpreting the fact that Rust lays out structs in memory in the same fashion as C and C++.
at what point did people start to use the word "pragmatic" for "opinionated"? (and I've heard it a lot lately...) I'm ok with opinionated technologies, but stop calling them "pragmatic": when the implementer of a language or technology makes a choice that restricts how its users can do some things, it makes an opinionated decision, that just happens to be pragmatic in the use-context he has thought of, but may not be pragmatic at all for the use-context that some technology user imagines, like a language feature that may be hard-to-impossible to implement but that when done properly (and it only has to be done once) would simplify the lives of a large group of programmers that just want to do things in a certain way.
>at what point did people start to use the word "pragmatic" for "opinionated"?
I don't think people are doing that. It's just that being pragmatic also makes you opinionated by necessity.
Actually that would a very good definition: being pragmatic means you are "opinionated by necessity" (as opposed to opinionated by other concerns, e.g pureness, advancing the state of the art, etc). So being pragmatic is a subset of being opinionated, not a synonym.
>when the implementer of a language or technology makes a choice that restricts how its users can do some things, it makes an opinionated decision, that just happens to be pragmatic in the use-context he has thought of, but may not be pragmatic at all for the use-context that some technology user imagines
A language has thousands (millions) of users and every one can imagine whatever use-context. Not catering to all of them doesn't make you opinionated: merely pragmatic. Nobody has the time and the means to cater to all possible use-contexts. And we all know that adding too many things can ruin a language, over-complicate it and such. So that's another pragmatic concern.
>like a language feature that may be hard-to-impossible to implement but that when done properly (and it only has to be done once) would simplify the lives of a large group of programmers that just want to do things in a certain way.
Sounds like a very horrible feature to build into a language. Especially when you are starting out. If it's "hard-to-impossible to implement" then you are wasting tons of resources to something that might very well not pan out in the end.
To reject this is not opinionated in the bad way. It's merely being pragmatic again, ie understanding that you have to prioritize things, and not go on a wild goose chase.
All words are empty until we put meaning on them. Go and read what they are trying to achieve and how, and then they won't be empty anymore.
>Usable and pragmatic are purely contextual. Is rust faster than Haskell?
It's not highly optimized yet, as it's in pre-alpha stage. But it's goal is to be faster than Haskell, and close to C/C++/ADA speed.
Usable and pragmatic means that it should work for their goals, which are very real and tangible themselves: to use it as a compiled language to create a fast, parallel and secure web browser engine.
They don't want to pile on academic concept and programming features or compiler tricks just to be "cool", "interesting", or "cutting edge". They don't even care if they would be "nice to have". They care about: what helps their goals, and what can be implemented without overcomplicating things.
If Rust cannot become a language in which it's able to write a fast (faster than the currently available), parallel (more parallel than the currently available) and safe (safer than the currently available) browser engine, then it would have failed on its targets.
My point above about Go (but put somewhat trollishly I admit) was that if rust forsakes the things which make functional programming great, what about it is compelling? I was under the impression that it was supposed to be the functional systems programming alternative.
I think you're mixing up tail recursion with general recursion. Rust absolutely supports recursion: the implementation uses segmented stacks to allow for stacks to grow dynamically. Tail call support is about not having the stack grow but only in the subset of recursive calls that are in tail position.
So support for tail calls is not about safety but expressiveness, and Rust chooses to express iterative algorithms through other means, such as the iteration protocol that's built on top of higher-order functions.
EDIT: Maybe I'm misreading you and you only mean the imperative nature of loops forces you to use mutation. Rust does not eschew mutation altogether. Even as a functional programmer myself, I'd argue pretty emphatically that mutation, particularly of local variables, is not a grave safety concern.
What do you mean? Combinators are great (even if implemented imperatively) until you really care about performance and isn't that what this rust discussion is all about.
I mean that 'safety' means a wide variety of things, and you're conflating the PLT feature with the more general reason why that feature was needed.
For example, above, you mention how TCO lets you use first-class functions to help you eliminate mutability. It's the elimination of mutability that's the safety here. Rust allows you to eliminate mutability through other means.
Their specific performance-related target is to have predictable, low worst-case latencies. In throughput-style loads, they might well lose to Haskell most of the time.
For some use cases (games, web browsers, etc) the first goal is much more important than the second.
Not having TCO might lead to a much bigger problem than not hitting a performance target for large numbers of loop iterations. I'd rather have a program that is slightly slower and that works correctly for all input than one that has a built in - but invisible and hard to test for - hard limit.
I assume you mean that without TCO the stack might overflow for certain inputs and it is easy to miss those cases during testing.
Is that actually a real problem? I never hear C/C++/Java/Python/Ruby/Javascript/D/Go/Clojure programmers complain about such bugs. On a Linux/Windows/OS X system the stacks are big enough to practically never hit such bugs. Whenever I get a stack overflow, I coded an infinite loop, which is actually easier to find without TCO, because the program is terminated.
C/C++ etc don't have that problem because they have loops.
TCO is a way to turn a recursion into a loop. If for every situation where you would elegantly use recursion you'd end up with real recursion and all its overhead you'd indeed end up overflowing the stack. For a C/C++ etc program it would not matter whether your loop iterator would be 10 or 10,000,000,000, the program would just run a little slower. The rust/clojure program (coded up without knowledge of the underlying mechanisms, so 'naively') would likely run out of memory. With TCO such a naive implementation would run just fine. 10,000,000,000 is merely large, not infinite and should all things otherwise being equal not come with a huge memory penalty if all you're doing is writing elegant code in the most applicable idiom for a certain language.
I can't speak for Clojure, but the idiomatic Rust code that I've seen does tend toward C-style iteration (though that iteration is discreetly accomplished via closures and higher-order functions) rather than Lisp-style recursion. So I wouldn't expect recursive memory exhaustion to be a common concern in idiomatic Rust, though that's obviously no consolation if you were hoping to code in a recursive style.
Though perhaps you'll be happy to hear that Rust stacks are growable (Go-style) rather than fixed, so stacks are typically small and running out of stack is (afaict) theoretically less of a concern than in C/C++.
Yes, but that may very well be because it does not do TCO... What's idiomatic in a language is usually the result of how efficiently that language implements certain constructs.
Right, because any destructor would break your ability to do TCO assuming that it runs before the tail call... So as soon as you use an object mechanism that will invoke a destructor at some point your ability to use TCO is gone.
Interesting, isn't there a possible transformation where you would be able to move the constructor/destructor pairs out of the generated loop construct without breaking the function?
After all a tail call is just a fancy go-to under the hood.
The biggest problem here is ABIs: you have to use Pascal calling conventions instead of C calling conventions. It's also difficult when you have destructors: there are not many tail call positions when destructors are in scope.