Warp, a fast preprocessor for C and C++

WalterBright · on March 28, 2014

Walter here (author of Warp). AMA about Warp.

jedbrown · on March 28, 2014

> warp is currently able to preprocess many source files, one after the other, in a single command. [...] warp is set up/torn down only once for the entire set of files, rather than once for each source file.

I'd like to learn more about this. I spend a fair amount of time building on HPC systems. Frustratingly, compiling on a $100M computer is typically 50x slower than compiling on a laptop due to the atrocious metadata performance of the shared file system. Configuration is even worse because there is typically little or no parallelism. Moving my source tree to a fast local disk barely helps so long as system and library headers continue to reside on the slow filesystem. A compiler system that transparently caches file accesses across an entire project build would save computational scientists an enormous amount of time building on these systems.

srean · on March 28, 2014

This is why I asked Walter about use of asynchronous idioms in Warp. It ought to help even if there is no parallelism involved. It is perhaps a TODO. Walter would be able to elaborate.

mrich · on March 29, 2014

So your build speed is limited by metadata file accesses to network storage? Your network/storage must be really bad then. How about mirroring all the needed headers locally before building? You could establish that as a makefile rule.

_delirium · on March 29, 2014

On large HPC clusters a common setup is to have a distributed cluster file system, rather than disk set up as "local" and "network" volumes. All the cluster machines' disks are integrated into a distributed volume running a filesystem like https://en.wikipedia.org/wiki/IBM_General_Parallel_File_Syst... or https://en.wikipedia.org/wiki/GFS2

haberman · on March 28, 2014

Could you describe in a bit more detail how the ranges-and-algorithms style applies to Warp? What are the major algorithms that you are gluing together? What is the high-level design of Warp?

WalterBright · on March 28, 2014

> how the ranges-and-algorithms style applies to Warp?

First off, a text preprocessor is a classic filter program, and ranges-and-algorithms is a classic filter program solution. Hence, if that didn't work out well for Warp's design, that would have been a massive failure.

The classic preprocessor design, however, is to split the source text up into preprocessing tokens, process the tokens, and then reconstitute output text from the token stream.

Warp doesn't work like that. It deals with everything as text, and desperately tries to avoid tokenizing anything. The ranges used are all ranges of text in various stages of being preprocessed. A major attempt is done to minimize any state kept around, and to avoid doing memory allocation as much as possible.

Warp doesn't use much of any classic algorithms. They're all custom ones built by carefully examining the Standard description of how it should work.

haberman · on March 28, 2014

So is the main data processing pipeline broken into several decoupled stages ("algorithms") that pass buffer pointer/length pairs between them ("ranges")? Sorry if I'm misunderstanding, I'm not very familiar with these terms as they are used in D.

What are the main stages/algorithms that constitute the processing pipeline?

I'm mainly trying to understand the high-level roadmap/design enough that I can use the source code itself to answer my more detailed questions. :)

WalterBright · on March 28, 2014

The main stages correspond roughly to the "translation phases" described in the Standard. If you're familiar with those, such as \ line splicing, the source code will make more sense. There are also things like "Rescanning and further replacement" mentioned in the spec that are implemented, for example, by macroExpand. I tried to stick with terminology used in the Standard.

0x09 · on March 28, 2014

Since Warp is meant to be a drop-in replacement for GCC's cpp, do you plan to include a traditional (pre-standard) preprocessing mode? This has been a source of some agony in FreeBSD, preventing clang's cpp from fully replacing GCC cpp on some ports (mostly X11 related due to imake). So far the partial solution has been ucpp.

http://gcc.gnu.org/onlinedocs/cpp/Traditional-Mode.html

https://wiki.freebsd.org/PortsAndClang/CppIssues

http://code.google.com/p/ucpp

WalterBright · on March 28, 2014

I didn't know anyone still expected to compile pre-1989 code.

jzwinck · on March 29, 2014

It seems that it's not so much about pre-1989 "code," but rather Makefiles. ANSI C preprocessors like yours will convert tabs to spaces, and Make doesn't like that. It seems reasonable to let this use case be handled the way it always has been: by using an old tool that still works.

The other major area of difference seems to be that pre-ANSI people used foo//bar to paste tokens (whereas now we use ##). If we're talking about C, that's easy to update; apparently some Haskell folks can't do that for their own reasons. Again it's a use case which is not preprocessing of C or C++, so it seems OK to ignore it if you're implementing a C preprocessor (as opposed to a generic macro expander usable with Makefiles and Haskell).

WalterBright · on March 29, 2014

I wrote a 'make' program in the 80's (I still use it http://www.digitalmars.com/ctg/make.html) and it has a macro processor in it, but it is not like the old C preprocessors.

Warp does support the obsolete gcc-style varargs, but other obsolete practices it discards.

In any case, I haven't seen any Makefiles big enough to benefit from faster preprocessing.

crpatino · on March 28, 2014

You are trolling, right?

jevinskie · on March 28, 2014

Is there a PR for that on http://llvm.org/bugs/ ?

0x09 · on March 28, 2014

http://llvm.org/bugs/show_bug.cgi?id=4557

jevinskie · on March 28, 2014

Thanks for the pointer!

zwieback · on March 28, 2014

According to your bio you're a lapsed ME, like myself. Do you think not coming from a CS background gives you an opportunity for novel solutions in compiler development work?

[My first real project used Zortech C++ for OS/2, thanks for the fond memories...]

WalterBright · on March 29, 2014

I think not coming from a CS background has resulted in me reinventing the wheel on several occasions. (I thought I'd had a brilliant insight, only to be told it was well known.)

I'm credited with a few innovations (like NRVO), but for the most part I just put together novel combinations of other peoples' ideas :-)

AsmMAn · on March 29, 2014

Also, you made the D language. :)

srean · on March 28, 2014

Are file I/O asynchronous in Warp ?

On a related topic could you tell us about coroutines/fibres in D ... How are they implemented ? Can they be called from C ? are they are used in D's standard library, examples of a few notable use cases (I guess one would be Vibe.d), examples of asynchronous idioms in D.

Since that's a lot of questions, pointing to documents would be fine too.

..and thanks for answering questions here.

WalterBright · on March 28, 2014

No, Warp doesn't do asynchronous I/O.

The D runtime library does have fibers:

http://dlang.org/phobos/core_thread.html#.Fiber

phaedrus · on March 28, 2014

Do you employ any detection of boost preprocessor library use, so that e.g. BOOST_PP_FOREACH gets processed as a literal loop rather than a large expansion?

WalterBright · on March 28, 2014

Nope. Boost is just another bucket of chum as far as Warp is concerned. I did use Boost extensively to validate correct operation of Warp and profile execution speed.

pmr_ · on March 28, 2014

Hard problems for C++ compilers are keeping track of where a piece of code originated and providing good diagnostics in code expanded from macros. Can warp help with this or was speed such a paramount goal that the rest of the tool-chain cannot provide good diagnostics as soon as macros are involved?

reubenmorais · on March 28, 2014

FWIW, Clang can (and does) help you with that. See "Automatic macro expansion" in http://clang.llvm.org/diagnostics.html

WalterBright · on March 28, 2014

Warp doesn't go beyond inserting linemarker records so that GCC can identify source(line) for subsequent error messages.

floil · on March 28, 2014

Will Warp's design allow modes to be added, where it can be be used as a preprocessor replacement for compilers other than gcc (msvc, clang)? Would you accept such patches? I know mcpp (the only other standalone c preprocessor I've seen) tried to do this.

Other than the speed processing, is there anything about how Warp operates that would make it easier to implement a distributed build system? It seems like the preprocessor can sometimes play a role in whether builds are deterministic enough to be successfully cacheable (__FILE__ macros and such).

WalterBright · on March 28, 2014

Warp only predefines the macros required by the Standard. For all the other macros predefined by cpp, it is passed those definitions on the command line via a script file.

The same should work for any other compiler, provided they don't use preprocessors with custom behavior.

I don't see any reason why Warp cannot be used in a distributed build system.

zenbowman · on March 28, 2014

Must say facebook has quite the lineup of all-star programmers:

- Carmack

- Abrash

- Alexandrescu

- Kent Beck

Cannot say I'm not a little jelly of those who get to spend time with these fine gents.

http://www.southparkstudios.com/clips/babjj8/jelly-school

aristidb · on March 28, 2014

I find it a bit weird that the top comment in this story fawns over several star programmers, but ignores the guy who actually wrote the tool that the story is about.

boxy_brown · on March 28, 2014

The article suggests he is not a full-time employee of Facebook.

WalterBright · on March 28, 2014

I'm not an employee of Facebook.

AsmMAn · on March 29, 2014

Thanks. Otherwise no D language could exists.

astrodust · on March 28, 2014

Those are the greats today, and with a talent pool like that it's likely that many future greats will be created there. You have to do more than write good code to be legendary.

Maybe Facebook might end up being the Xerox PARC of our modern times.

eropple · on March 28, 2014

I'm not sure you know who Walter Bright is. He's done a...few...things.

astrodust · on March 31, 2014

There was a time at Xerox PARC when you couldn't throw a book without hitting someone who was or would become an important pioneer.

I'm not trying to discredit what Facebook is doing, or the people there, but it'll take time to build up to that level of talent. Having a few remarkable individuals is a great start. Having an entire department filled with them is going to be hard work.

AsmMAn · on March 29, 2014

I think compare Facebook to Xerox PARC is same as compare Google to Bell Labs.

zenbowman · on March 29, 2014

Nothing compares to Bell Labs. Google is an amazing software and networking company, however the majority of its contributions are in the field of Computer Science, specifically in distributed computing and networking.

Bell Labs made significant contributions to Physics (solid-state and optics), Chemistry, Electronics, Computer Science (operating systems, graphics, speech recognition, networking), Materials Science, Communications (here they pretty much invented an entire field of study) and the design of so many everyday things (they had an entire team that focused on just the design of your telephone wire) that is would be impossible to imagine what the world would be like if they didn't exist.

If Google didn't exist, search would still have been solved, maybe a decade or so later. If Claude Shannon didn't work at Bell Labs when he did, it could well be the case that we wouldn't have come up with such an elegant theory of Information even today, and therefore the world would look nothing as it does.

astrodust · on March 31, 2014

How long had Bell been around before the Labs came to be a force to reckon with? How about Xerox with PARC?

Facebook has been around very little time by those standards, and Google only slightly longer.

Time will tell if Facebook, Google and Microsoft can live up to their contemporaries.

on March 28, 2014

[deleted]

steveklabnik · on March 28, 2014

I'm thinking trading cards...

EDIT: Unsure why my parent comment was deleted, but I was suggesting that trading cards of famous programmers with their accomplishments on the back would be fun(ny).

mfonda · on March 28, 2014

Great line from the interview:

> WB: I can guarantee that you are wrong about where your code is spending most of its time if you haven't run a profiler on it.

Definitely a good thing for all programmers to think about!

dllthomas · on March 28, 2014

So, to be clear, this is a reimplementation of cpp?

WalterBright · on March 28, 2014

I have never looked at the source code to cpp.

But Warp is designed to be a drop-in replacement. It doesn't produce char-by-char exactly the same output as cpp, as the whitespace differs, and the decisions about when/where to produce linemarker records are different.

But the output is functionally identical (any differences are bugs in either Warp or cpp).

haberman · on March 28, 2014

Interesting, I did not realize the standard left room for preprocessors to make implementation-defined decisions!

dllthomas · on March 28, 2014

That fits well within my use of the term "reimplementation." I appreciate the specificity of your response, though - always best to answer every possible question when you're not sure which is meant.

jevinskie · on March 28, 2014

Are precompiled headers and the performance increase from switching to clang[0] not enough?

[0]: http://clang.llvm.org/features.html#performance

jevinskie · on March 28, 2014

I got a chance to try this out. This is the file I used: https://gist.github.com/jevinskie/9843152

Top-of-trunk clang from http://llvm.org/apt/ takes ~2.1 seconds on my machine to preprocess the file to /dev/null. warp (compiled with -O4 -frelease -fno-bounds-check) takes ~2.8 seconds.

So this test case is faster with clang even without precompiled headers. It is hard to make a benchmark for clang's precompiled headers because the AST is lazy-loaded from the PCH. You would have to actually have code use values from the header.

EDIT: I forgot another advantage of clang: the preprocessing and compilation and assembly are all done within the same process, eliminating process creation overhead.

andralex · on March 28, 2014

Thanks for the data point! (Also don't forget -frelease.)

jevinskie · on March 28, 2014

You're welcome. =) I just double checked, I did have -frelease and I updated my parent comment to reflect that.

comex · on March 29, 2014

Precompiled headers are a pain to use. But clang modules are supposed to be precompiled headers done right:

http://clang.llvm.org/docs/Modules.html

When they're usable for C++, there should be little reason to use any special preprocessor, since every header file (not just a static common set) need only be compiled once into a binary format rather than being included into N source files. It can't happen any sooner for me... but as of recently, they're still very broken, so I'm still waiting.

pmr_ · on March 29, 2014

Clang modules for C++ are going nowhere as long as the concerned working group of WG21 isn't proposing anything.

While it seems that everybody has almost the same idea how modules should look as part of the language almost no one can agree on how they should be specified or what part of a module should actually be specified at all.

comex · on March 29, 2014

I was under the impression that clang is not waiting for standardization to implement this stuff (indeed, C frameworks on OS X already ship module.map files). Am I wrong?

pmr_ · on April 1, 2014

AFAIK the main reason for implementing modules for C++ was to show that it can be done, which makes standardization of a major feature much more likely.

As long as there is only one compiler and no guarantee that you will not have to change everything once again as standardization is complete no one is going to touch a large cross-platform codebase and add module support.

comex · on April 2, 2014

Well, if the performance gain is enough, I certainly would. (The codebase I'm thinking of is not huge, but large enough that I think modules would make a significant compile speed difference.) I think that with the current implementation, if your header files are sane (no depending on previously included files) you can autogenerate a module map file, one module per header, and have it just work. But I'm not sure if there are any wrinkles, because in the current state all I was able to achieve was clang crashing.

lbrandy · on March 28, 2014

> Is X not enough?

Nope, it's not, for all X.

Danieru · on March 29, 2014

Error: Type mis-match near "all". Usage of symbol X as type 'Situation' in line 2 does not match usage in line 1 as 'Solution'.

pjmlp · on March 28, 2014

Surely not.

I know of a project that takes around 3 hours to build with Clang when you do a "make all".

deadfa11 · on March 28, 2014

Sure, but is that using PCHs? And how much of that time is preprocessing? I don't think the parent question is whether clang can compile any project in less than 3 hours. I think the interesting question is if warp is more compelling than clang from a preprocessing perspective.

pjmlp · on March 28, 2014

Yes with PCH and full build not just preprocessing.

duaneb · on March 29, 2014

I mean, I do too, but they take even longer with gcc. I can remember when compiling could take a full day. These metrics mean nothing without comparison. Clang is hands down the fastest c/++ compiler I've ever used.

gillianseed · on March 29, 2014

The compile speed difference between a modern GCC and Clang/LLVM is nowhere near that dramatic, this comparison you linked is against GCC 4.21 and GCC 4.0 which are 7 and 9 years old respectively.

jevinskie · on March 29, 2014

I really only meant to refer to the -fsyntax-only portion which is all that matters when it comes to C preprocessor performance. I know GCCs codegen has steadily improved but I'm not aware of any work to speed up the parser.

wehadfun · on March 28, 2014

Could any of this be used to improve comiling speed in other languages?

pmr_ · on March 28, 2014

Which other languages use the C/C++ preprocessor? I've seen it used for generating data or other source code when a preprocessor comes in handy, but never as a full-fledged component of another language.

I also think that a modern language that uses includes instead of modules is just outright insane.

plorkyeran · on March 28, 2014

Objective-C(++). I don't think obj-c changes anything about how the preprocessor works, so presumably it'll work fine for that.

Using the C preprocessor in a non-superset-of-C language would be a pretty odd choice.

duaneb · on March 29, 2014

Objective-c introduced `#import`, which guards from double including by default. I would suspect and performance gains are tempered with a "pure" objc project.

aidenn0 · on March 29, 2014

Many assemblers have the option to use the c preprocessor.

nemetroid · on March 29, 2014

The Glasgow Haskell Compiler has a language pragma for running CPP before compilation. I've seen it used a few times, for #ifdef compatibility between Windows and Linux for low-level stuff. I don't think anyone would ever want to use #include or macros, so the issue of speed is not very interesting in that context.

example: http://hackage.haskell.org/package/system-filepath-0.4.10/do...

WalterBright · on March 28, 2014

It'll speed up any language that uses a separate C preprocessor.

The same overall design could be used to implement many textual macro preprocessors, such as runoff, makefiles, etc.

duaneb · on March 29, 2014

Could you substantiate this claim? I don't necessarily not believe you, but it's an easy claim to make without any quantifiable argument. It's also hard to test against eg clang, where preprocessing is performed in the same process as the compiler and assembler, and the performance of a preprocessor is really less interesting than total compile time.

WalterBright · on March 29, 2014

Anything that uses a standard C preprocessor could use it, because it's a standard C preprocessor.

I've written a couple of macro processing programs, including a make, so I know this kind of program can work with that.

duaneb · on March 29, 2014

Let me clarify—could you substantiate the performance claims? I have no doubt it's a correct preprocessor, but I do have doubts about performance claims in the context of the compilation of a project. cpp (and the rest of the gnu compiler toolchain) itself is a pretty terrible example of a well-written program, but clang is a well-written, holistic, performance-centered program, and it would strike me as difficult for an isolated preprocessor to improve significantly on it compared to the built in one.

Or another way of putting this is—you speak compellingly about both the algorithm side and the constant side (no tokenizing), but you don't speak about "real world" performance when interacting with many levels of caches, processes, and filesystems. Could you speak to this at all?

WalterBright · on March 29, 2014

Other people in this thread have tested warp against both cpp and clang and posted numbers.

BTW, of course if a preprocessor is integrated with another program, Warp can't replace it. It can replace standalone use of a preprocessor.

yetanotherphd · on March 29, 2014

One thing that really sped up my compiler was eating a ketogenic diet, and eliminating preprocessed foods.

aDevilInMe · on March 28, 2014

Does Andrei have a sticky D button on his keyboard? He seems to mention it in every other sentence, where as I thought this post was about C and C++.

I would ask if there are any improvements between using wrap or just using clang.

m0nastic · on March 28, 2014

Warp is written in D, by the guy who created the language. It doesn't seem weird that Andrei would want to talk about that (especially considering the fact that I assume he's very much interested in seeing D in use in more places).

I had the same question about how it compares to clang though. I suppose now that it's open source, someone can do some testing.

aDevilInMe · on March 28, 2014

" I assume he's very much interested in seeing D in use in more places"

Yes, I think you are correct and this is what I was getting at. It seems like a propaganda exercise to promote D, you only have to look at the end of the article to see this "And join the D language community for the D Conference 2014 on May 21-23 in Menlo Park, CA."

For the record I am very much aware of who Andrei is and the influences he has on certain languages.

AsmMAn · on March 29, 2014

What's wrong with that?

fpelliccioni · on April 1, 2014

For me is propaganda, too!

DannyBee · on March 28, 2014

I would be shocked if there are improvements over clang (and if there were, clang would fix them) in the general case.

clang has a pretty optimized preprocessor (It even has things like using SSE2 to do some neat things with character processing).

GCC has a fairly modern one, but it's still beatable (and clang beats it handily).

andralex · on March 28, 2014

I'd be curious to see how clang does, too. What matters to us is that warp is easy to get into so we can easily adapt it to our build system (in particular multithreaded preprocessing that saves on opening same included files multiple times).

WalterBright · on March 28, 2014

Feel free to try it and see for yourself! Post results here.

DannyBee · on March 28, 2014

Don't take this the wrong way, but you guys are the ones making performance claims that it's faster!

I would expect you guys to post the numbers comparing it to the two most used things out there :)

WalterBright · on March 28, 2014

I don't post numbers anymore because I'd always wind up in arguments with people who simply didn't believe them, or thought I'd unfairly manipulated them, or cherry-picked the test cases, whatever. Hence I encourage you to run your own numbers.

DannyBee · on March 28, 2014

Fair enough.

I had some trouble using the ubuntu 13 packages for gdc, so i downloaded it from the gdc project binaries as of the latest available there, as recommended by the readme.

Using that to compile warp with gdc with the flags it suggests (-release is not recognized by gdc, -O3 is), i get a warp that works.

For including every file in /usr/include/boost/*.hpp in one .cc file (which produces roughly 16 megabytes of C++ code), we get:

[dannyb@mainserver 12:40:56] ~ :) $ time gcc -E e.cc >f

  In file included from e.cc:101:0:
  /usr/include/boost/spirit.hpp:18:4: warning: #warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-Wcpp]
   #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
    ^

gcc -E e.cc > f 3.18s user 0.25s system 97% cpu 3.528 total

[dannyb@mainserver 12:40:51] ~ :) $ time clang -E e.cc >f

  In file included from e.cc:101:
  /usr/include/boost/spirit.hpp:18:4: warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-W#warnings]
  #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
     ^
  1 warning generated.

clang -E e.cc > f 1.42s user 0.14s system 93% cpu 1.657 total

[dannyb@mainserver 12:40:33] ~ :( $ time ./warp/fwarpdrive_gcc4_8_1 -I/usr/include -I/usr/include/c++/4.8 -I/usr/include/x86_64-linux-gnu/c++/4.8 -I/usr/include/x86_64-linux-gnu -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include/ -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed/ e.cc >f

  cla/usr/include/boost/spirit.hpp(18) : warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp"

./warp/fwarpdrive_gcc4_8_1 -I/usr/include -I/usr/include/c++/4.8 e.cc 2.88s user 0.06s system 95% cpu 3.080 total

I've repeated these timings 10 times, and they are within 0.5% of these numbers each time.

I've also tried this on a large C++ project i have, that generates about 200 meg of preprocessed source (that i can't share, sadly) and got similar relative timings. I also tried it on some smaller projects. Based on data i have so far, clang blows warp out of the water by a factor of 2 in most cases i've tried it.

The above tests include stdout IO, but the relative numbers are the same without it:

[dannyb@mainserver 12:48:24] ~ :( $ time gcc -E e.cc -o f

  In file included from e.cc:101:0:
  /usr/include/boost/spirit.hpp:18:4: warning: #warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-Wcpp]
  #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
    ^

gcc -E e.cc -o f 3.14s user 0.27s system 99% cpu 3.418 total

[dannyb@mainserver 12:48:33] ~ :) $ time clang -E e.cc -o f

  In file included from e.cc:101:
  /usr/include/boost/spirit.hpp:18:4: warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-W#warnings]
  #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
     ^
  1 warning generated.

clang -E e.cc -o f 1.41s user 0.13s system 94% cpu 1.631 total

[dannyb@mainserver 12:48:40] ~ :) $

(I reordered this one to make the timings in the same order as they were before)

[dannyb@mainserver 12:47:38] ~ :( $ time ./warp/fwarpdrive_gcc4_8_1 -o f -I/usr/include -I/usr/include/c++/4.8 -I/usr/include/x86_64-linux-gnu/c++/4.8 -I/usr/include/x86_64-linux-gnu -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include/ -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed/ e.cc

  /usr/include/boost/spirit.hpp(18) : warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp"

./warp/fwarpdrive_gcc4_8_1 -o f -I/usr/include -I/usr/include/c++/4.8 e.c 2.93s user 0.02s system 99% cpu 2.953 total

Warp is definitely faster than GCC, though.

eco · on March 28, 2014

Just to make your results readable:

  gcc:   3.14s user 0.27s system 99% cpu 3.418 total
  clang: 1.41s user 0.13s system 94% cpu 1.631 total
  warp:  2.93s user 0.02s system 99% cpu 2.953 total
         2.31s (with recommended build settings)

WalterBright · on March 28, 2014

Because of different #define's, Warp may take a very different path through header files than other preprocessors do. In fact, Warp doesn't have any predefined macros other than the ones required by the Standard. Hence, to use it with cpp or clang's preprocessor, it needs to be driven with a command that -D defines each macro.

There's a command (I forget at the moment what it is) that will tell cpp to list all its predefined macros. It's quite a few. You'll need to do that for clang to get an equivalent list, then drive Warp with that.

You'll be able to tell if it is taking the same path or not by using a diff on the outputs that ignores whitespace differences.

The reason Warp doesn't predefine all that stuff is because every install of gcc has a different list, and it's completely impractical to try and keep up with all that.

DannyBee · on March 28, 2014

I did in fact, use warpdrive, which uses those predefines, as you can see in the commands.

I'm also familiar with the innerworkings on llvm and gcc (having hacked a lot on both), and generated the list of include paths i used with warpdrive (emulating gcc 4.8.1) to be exactly the same as GCC on my system uses for 4.8.1.

I also verified the preprocessed output is "sane" in each case, as per diff.

WalterBright · on March 28, 2014

>Using that to compile warp with gdc with the flags it suggests (-release is not recognized by gdc, -O3 is),

Andrei wrote on Reddit:

"We build warp at Facebook using gdc with -fno-bound-checks -frelease -O4."

http://www.reddit.com/r/programming/comments/21m0bz/warp_a_f...

DannyBee · on March 28, 2014

This is not the set of flags you have in the makefile on github though. I would expect people to use those :)

In any case, building warp with this brings the timings down to 2.31 seconds, so clang is still 40% faster (1.41 vs 2.31)

In any case, at least on my side, i don't have time to further explore, i'd love to see cases where warp is faster, but i haven't found them.

(There is also a certain irony of saying you don't post numbers because you get accused, then saying i used the wrong flags, but ...)

WalterBright · on March 28, 2014

Thanks for doing this. I have read that clang uses some SIMD instructions to speed this up, and I don't know how much that contributes. Warp doesn't use any inline assembler.

And, as your numbers show, suggesting the change in compiler flags was entirely justified.

DannyBee · on March 28, 2014

the SIMD usage is mainly to do two things.

In the lexer, it is used for block comment skipping. It will find the end of block comments 16 characters at a time (on both PPC and x86).

During line number computation, it will also find newlines 16 characters at a time.

This could actually (nowadays) be done 32 characters at a time on newer processors, but isn't.

jfb · on March 29, 2014

This is flat-out fascinating. Thanks.

jevinskie · on March 28, 2014

64 characters with AVX-512? =)

DannyBee · on March 28, 2014

I didn't check to see if the instructions exist, but possibly :)

You do start to hit two issues though as oyu increase the size of the skipping:

1. Alignment 2. If the average block comment/line is < 64 characters, you may lose more time performing the instruction and then counting the trailing zeros in the result to find the place it ended.

I have no numbers to back up whether this matters, of course :)

pbsd · on March 29, 2014

AVX-512 does not seem to have PMOVMSKB, which is how I assume it is being done with SSE2. There are other ways to skin that cat, but it's unclear whether they have any advantage over using AVX2 with VPMOVMSKB.

DannyBee · on March 30, 2014

I posted a patch here: https://gist.github.com/dberlin/9867614

It adds AVX2 and SSE4.2 instruction support. It makes no discernible difference performance wise that i can find :)

jevinskie · on March 31, 2014

Heh, awesome! I'll try it out today.

WalterBright · on March 28, 2014

I'm curious how much of an effect this has.

WalterBright · on March 29, 2014

I forgot to ask - are you compiling Warp for a 64 bit executable? Use -m64 if you're not sure.

to3m · on March 28, 2014

A comparison against VC++ would also be interesting. I believe VC++ is commonly used on Windows, which I hear is a somewhat popular platform. (I suppose warp might not run on Windows though.)

WalterBright · on March 28, 2014

In fact I developed Warp on Windows. It compiles and works fine. Warp source code is completely portable between Linux and Windows, with the following exceptions:

1. wchar_t is ushort on Windows, uint on Linux. 2. Warp uses a slightly modified file reader from the one in the D standard library, customized to the platform.

Grep the source for "version(Windows)".

to3m · on March 28, 2014

Thanks! Another item for my to do list then :)

Zardoz84 · on March 28, 2014

It's write in D, and there is a build of dmd for window, so should work in Windows.

pjmlp · on March 28, 2014

Well, as I mentioned on a parallel thread, I know of a project taking around 3 hours to compile, that doesn't look very optimized to me.

DannyBee · on March 28, 2014

You know of a project that takes 3 hours of time to preprocess with clang?

I have serious doubts. Overall compilation time is kind of irrelevant to this discussion, because Warp is just a preprocessor.

While you can get some X speedup to gcc by replacing the preprocessor, X, as a factor of overall compilation time, is usually 0.2-0.5 in most cases, depending on size of file.

I expect the gains warp gets over gcc overall from preprocessing to be similar to those clang gets over gcc overall from preprocessing.

(Though it depends on the size of files being compiled, etc).

Most companies that want actual fast overall compilation and have the resources, build caching distributed compilation infrastructure (Google, Facebook).

As mentioned, if warp is really that much faster than clang's preprocessor that it mattered, clang would be fixed :)

pjmlp · on March 28, 2014

Faire enough, I meant full build times.

reubenmorais · on March 28, 2014

What is this even supposed to mean? Is the majority of that time spent preprocessing? How do other compilers perform? Your anecdote alone is absolutely useless.

bch · on March 28, 2014

What's stated doesn't say anything about optimization. It could be that it takes a long time to run processes that take a long time.

wnissen · on March 28, 2014

He understands C++ well enough to have written "Modern C++ Design" ( http://amazon.com/dp/0201704315 ) back in 2001. If he's interested in D, so am I.

AsmMAn · on March 29, 2014

You can see this guy on http://forum.dlang.org/ almost everyday talking about D's designer with the community.

fpelliccioni · on April 1, 2014

This makes no sense. Alex Stepanov continues to programming in C++. What language are you going to use now? (¿?)

EvenThisAcronym · on March 28, 2014

>Does Andrei have a sticky D button on his keyboard?

Well, it is written in D. Walter (not Andrei) is talking about how he leveraged D's features to his advantage when writing Warp.

andralex · on March 28, 2014

Preprocessing with clang would be very tricky in a system that uses gcc for the compilation proper (which we do). Clang adds its own predefined #defines, which may change the resulting code.

DannyBee · on March 28, 2014

Sure, if you are stuck with GCC, and have no plans to move, it makes sense to improve GCC's preprocessor for a number of reasons (warp produces cacheable artifacts, etc).

But it's actually not that tricky, since you can just change the defines it makes (after all, if you are maintaining your own toolchain, you are maintaining your own toolchain).

In fact, you are already doing it with warp to emulate GCC's defines.

Suffice to say, we've done it before to provide clang diagnostics but build with GCC.

plorkyeran · on March 28, 2014

"stuck with GCC" is rather unfair. GCC still produces faster binaries on x86_64 Linux, which is very important for Facebook.

DannyBee · on March 28, 2014

Yes, and no offense, but while other large companies have folks working on fixing this in public, i don't see facebook trying.

This is really not meant as a dig (really!), but more of "why i figured facebook was not trying to make a transition". The companies trying to do so, are contributing heavily to LLVM to make that transition :)

the_mitsuhiko · on March 28, 2014

Develop on clang, do automated tests and release builds on GCC?

bames53 · on March 28, 2014

You can undefine predefined macros in clang using -U. I took a little time to write some commands that should produce something close to the right defines/undefines to let one compiler's preprocessor masquerade as another compiler's preprocessor:

http://coliru.stacked-crooked.com/a/17122f90ec070d55

andralex · on March 28, 2014

I appreciate this piece of feedback. I've been consciously trying to keep the article interesting technically and avoid it being construed as an advertisement for D, to the extent a couple of coworkers and at least one other colleague (http://goo.gl/QZ5ELn) were unclear about warp being written in D at all. I'll do my best to tone things down further in the future. There's plenty of exciting stuff going on to be worth avoiding alienating people.

fpelliccioni · on April 1, 2014