Forth is kind of weak dealing with value types of unknown size. For example, suppose you're writing a cross-compiler, and an address on the target machine might take one or two cells on the host machine depending on the host bitness. Now suppose you need to accept a (host-machine) cell and a target-machine address (e.g. an opcode and an immediate) and manipulate them a bit. Trivial in C, definitely possible in Forth, but supremely annoying and the elegance kind of falls apart.
Basically dead. The main motivation would be to make it easier to use variably modified types in function parameters, where the (length) identifier is declared after the variably modified type, as in
> void foo(int a[m][m], int m)
Currently you can only do:
> void foo(int m, int a[m][m])
The holy grail is being able to update the prototypes of functions like snprintf to something like:
> int snprintf(char buf[bufsiz], size_t bufsiz, const char *, ...);
However, array pointer decay means that foo above is actually:
> void foo(int (*a)[m], int m)
Likewise, the snprintf example above would be little different than the current definition.
There's related syntax, like
> foo (int m, int a[static m])
But a is still just a pointer, and while it can help some static analyzers to detect mismatched buffer size arguments at the call site, the extent of the analysis is very limited as decay semantics effectively prevent tracing the propagation of buffer sizes across call chains, even statically.
There's no active proposal at the moment to make it possible to pass VM arrays (or rather, array references) directly to functions--you can only pass pointers to VM array types. That actually works (sizeof *a == sizeof (int) * m when declaring int (*a)[m] in the prototype), but the code in the function body becomes very stilted with all the syntactical dereferencing--and it's just syntactical as the same code is generated for a function parameter of `int (*a)[m]` as for `int *a` (underneath it's the same pointer value rather than an extra level of memory indirection). There are older proposals but they all lost steam because there aren't any existing implementation examples in any major production C compilers. Without that ability, the value of forward declarations is greatly diminished. Because passing VM array types to functions already requires significant refactoring, most of the WG14 felt it wasn't worth the risk of adopting GCC's syntax when everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code.
I hope it is not "basically" dead. I just resubmitted it at the request of several people.
And yes, for new APIs you could just change the order, but it does help also with legacy APIs. It does even when not using pointers to arrays: https://godbolt.org/z/TM5Mn95qK (I agree that new APIs should pass a pointer to a VLA).
(edited because I am agreeing with most of what you said)
> everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code
I know that was a common opinion pre-C23, but it feels like the committee trying to reshape the world to their desires (and their designs). It's a longstanding convention that C APIs accept (address, length) pairs in that order. So changing that will already get you a score of -4 on the Hard to Misuse List[1], for "Follow common convention and you'll get it wrong". (The sole old exception in the standard is the signature of main(), but that's somewhat vindicated by the fact that nobody really needs to call main(); there is a new exception in the standard in the form of Meneide's conversion APIs[2], which I seriously dislike for that reason.)
The reason I was asking is that 'uecker said it was requested at the committee draft stage for C23 by some of the national standards orgs. That's already ancient history of course, but I hoped the idea itself was still alive, specifically because I don't want to end up in the world where half of C APIs are (address, length) and half are (length, address), when the former is one of the few C conventions most everyone agrees on currently.
You can also integrate with AddressSanitizer to some extent, look into[1] ASAN_{UN,}POISON_MEMORY_REGION from <sanitizer/asan_interface.h> or the underlying intrinsics __asan_{un,}poison_memory_region.
(Not a lawyer.) To have rights to a trademark, you have to use it in, well, trade. It’s not enough for a term to refer to a specific thing in normal usage, you must have a widely recognized claim on that thing. It should be in the customer’s interest that your thing not be confusable with thing-alikes that others may offer, specifically by having an exclusive right to be sold as the thing. And Oracle demonstrably does not deal in “JavaScript” any more than many many other companies and individuals do.
The only JavaScript offering from Oracle that I know of is GraalVM[0]. It's funny though - they use "JavaScript" and "ECMAScript" interchangeably in their docs. They call it "A high-performance embeddable JavaScript runtime for Java" but then tout it as "ECMAScript Compliant", basically acknowledging that JavaScript is defined by ECMAScript specs and the terms mean the same thing.
Them calling it ECMAScript in some instances means that it follows the actual ECMA spec for ECMAScript (what everyone calls JavaScript historically). Them calling it JavaScript implies it could be their flavor, or something like Node and not necessarily strictly ECMAScript, at least that'd be the reason I'd use it interchangeably.
This is why I think Deno has a solid chance here. Sun may have filed for the trademark, but it’s not clear to me how much it has been used by Oracle. I also think this is why this step is likely the beginning of litigation, not the end. With Oracle not voluntarily withdrawing the trademark, it allows the rest of the process to invalidate the trademark to begin.
Not only that, but other company's not-technically-javascript products are widely known as JavaScript. And have been for close to 3 decades now (Microsoft's JScript was released in 1996)
The relationship between those two things is more like the relationship between burgers and fries, two things that some people may think belong together, but having a claim on one surely has connection to having a claim on the other.
Or like saying McDonald's shouldn't have a trademark on "Big Mac" outside of beef burgers, because they don't regularly deal in "Big Macs" made out of chicken.
And though Sun was undeniably more worthy of sympathy than Oracle is, Sun’s original claim on the trademark seems just as bogus as Oracle’s current one.
Maybe. I'm not a lawyer, let alone an IP lawyer, but Netscape creating a programming language called "JavaScript" seems like the kind of thing that would be likely to cause confusion in the marketplace. Netscape explicitly chose the name to latch on to the popularity of Java at the time. It doesn't seem unreasonable to me for the Sun of 1997 to want to protect their interest in the Java name by licensing it to Netscape but controlling the underlying trademark.
Netscape didn't just try to "latch onto" the popularity of Java.
Netscape _and_ Sun, together, called it JavaScript. The point was that the renamed language had rudimentary bindings that you could use to connect functionality in an HTML page with the applets embedded in it (which were effectively silo'd in HotJava)
I mean, yes, it’s a reasonable thing for Sun to want, but I (also not a lawyer) don’t see how it’s within the USPTO’s purview to grant. Sun/Oracle can claim others’ use of “JavaScript” is confusing others with regard to what “Java” is—indeed as you say this seems to have been Netscape’s explicit intent with the name—but that means Sun/Oracle may have some control over the use of the term “JavaScript” through their very real dealings in Java, not their nonexistent ones in JavaScript. You don’t get to squat trademarks, you have to use them. (Unless you have enough money to outlast any challenger in a legal battle. I guess we’ll see how that goes.)
On the contrary, i think sun's claim back in the 90s was pretty strong. Only one company was making javascript and they were doing so under license.
Fast forward to today and there are many people making "javascript" that don't seem to have a relationship to oracle and nobody has been trying to defend the mark for a long time. Seems pretty genericized to me.
“Hygiene” in the context of macro systems refers to the user’s code and the macro’s inserted code being unable to capture each other’s variables (either at all or without explicit action on part of the macro author). If, say, you’re writing a macro and your generated code declares a variable called ‘x’ for its own purposes, you most probably don’t want that variable to interfere with a chunk of user’s code you received that uses an ‘x’ from an enclosing scope, even if naïvely the user’s ‘x’ is shadowed by the macro’s ‘x’ at the insertion point of the chunk.
It’s possible but tedious and error-prone to avoid this problem by hand by generating unique identifier names for all macro-defined runtime variables (this usually goes by the Lisp name GENSYM). But what you actually want, arguably, is an extended notion of lexical scope where it also applies to the macro’s text and macro user’s program as written instead of the macroexpanded output, so the macro’s and user’s variables can’t interfere with each other simply because they appear in completely different places of the program—again, as written, not as macroexpanded. That’s possible to implement, and many Scheme implementations do it for example, but it’s tricky. And it becomes less clear-cut what this even means when the macro is allowed to peer into the user’s code and change pieces inside.
(Sorry for the lack of examples; I don’t know enough to write one in Zig, and I’m not sure giving one in Scheme would be helpful.)
zig comptime is not a macro system and you can't really generate code in a way that makes hygeine a thing to worry about (there is no ast manipulation, you can't "create variables"). the only sort of codegen you can do is via explicit conditionals (switch, if) or loops conditioned on compile time accessible values.
thats still powerful, you could probably build a compile time ABNF parser, for example.
Surely there's a way to generate code by manipulating an AST structure? Is there some reason this can't be done in Zig or is it just that no one has bothered?
Doing it this way is more verbose but sidesteps all hygiene issues.
Zig lets you inspect type info (including, eg, function signatures), but it doesn't give you raw access to the AST. There's no way to access the ast of the body of the function. As highlighted by view 0 in my article, I consider this a good thing. Zig code can be read without consideration for which pieces are comptime or not, something that heavy AST manipulation loses.
Though, if you really wanted to do stupid things, you could use @embedFile to load a Zig source file, then use the Zig compiler's tokenizer/ast parser (which are in the standard library) to parse that file into an AST. Don't do that, but you could.
Zig disallows ALL shadowing (basically variable name collisions where in the absence of the second variable declaration the first declaration would be reachable by the same identifier name).
Generating a text file via a writer with the intent to compile it as source code is no worse in Zig than it is in any other language out there. If that's what you want to do with your life, go ahead.
Because that’s the original problem statement for Algol! As in, we’ve been publishing all that pseudocode for quite a bit and it seems like conventions have emerged, let’s formalize them and program in that. ... People were markedly more naïve back then, but then that’s sometimes what it takes.
That’s usually something your ABI will describe in fairly precise terms, though if (as in your example) you want non-naturally-aligned fields, you may indeed want to both use a packed struct and prepare for alignment faults on less-tolerant architectures.
Rec-descent absolutely does work (source: wrote a working parser), but it's a bit annoying to make it build a syntax tree. Making it build a prefix encoding of the declaration is much easier.
If you have to, you can make a syntax tree work, too, but you'll have to thread through the declarator parser a double (triple?) pointer for the hole in the tree where the next part goes, and at the end plug it with the specifiers-and-qualifiers part you parsed before that. Or at least that's what I had working before I gave up and switched to a prefix code. It'd probably pay to check the LCC book (Fraser & Hanson, A Retargetable C Compiler: Design and Implementation) to see what they do there, because their type representation is in fact a hash-consed tree. (The LuaJIT FFI parser needs some sort of magic fix-up pass at the end that I didn't much like, but I don't remember the specifics.)
I think the most straightforward recursive-descent approach is just to parse the expression part of the type as an expression and then turn it inside out to get the type. So if you have e.g. 'int (*(p[5]))(char)', then your parse tree is something like
declaration
int
function call
*
[]
p
5
char
and you need to turn that inside out to get something like
array
pointer to
function
int
(char)
This way you have a simple expression parser combined with a simple recursive tree transformation function. (The tree transformation took about 50 LOC when I implemented it for a toy parser.)
> Rec-descent absolutely does work (source: wrote a working parser), but it's a bit annoying to make it build a syntax tree.
Yeah, it required too much backtracking and state snap-shotting and resets, and I couldn't figure out a decent way of reporting good errors.
Thanks for the references. The code I have now is pretty elegant and functional, so I'm not in the mood of diving back into it. But if I ever need to change it, I'll take a look.
reply