If you have control over the build system, clang can generate that directly nowadays, by adding -MJ path to CFLAGS (or equiv). This will save JSON fragments to path (make'em separate for each compiled file), then you can concatenate all the fragments into a compilation database.
I use this approach because on macOS, with SIP, Bear no longer works as it is forbidden from injecting its hooks into the system-protected clang shim. There may be other solutions.
(I haven't explored the space in a few years. They've historically fared poorly with embedded targets where one file may be compiled many different ways for targeting different platforms)
If you would like to make similar hooks work again on macOS, check out this guide: https://firebuild.com/setup-macos .
Firebuild uses similar technique to explore the process tree and create a report with CPU utilizations highlighted in a graphical process tree graph.
You need to, in general, run the build for compilation databases to actually work anyways. Consider the following cases:
- headers generated during the build: tools may find the wrong header or fail when the header isn't found
- generated sources: does the source file in the database even exist yet?
For those cases, one still needs to consider the "out of date" situation where a previous build left an old version around. This is something tools can't really even know today.
Beyond that, modules present an issue. The command line doesn't have any indication of what `import foo;` means to the TU in question (and in CMake's case, the file which does tell it that information is build-time generated anyways). Even if it did, the BMI may be in an incompatible format (e.g., using GCC for the build, but using `clang-analyzer` on the database) so the tool needs to go and make its own BMI. I gave a talk at CppCon about "build databases" which is basically "compilation databases, but consider what it means to support modules". (The video isn't available yet.)
I'm also working on getting a paper on them through ISO C++'s SG15 (Tooling) for proper specification.
You really need build system support. For example, Ninja can generate the file without actually doing the build because it knows everything it _would_ do.
Is this really a major problem? Is there any tool that supports generating the database without compiling? I think the answer to both of these is "no". In fact I can't even think of a reasonable case where it would be even a minor problem.
> In fact I can't even think of a reasonable case where it would be even a minor problem.
On my last project, a build took 2+ hours on a 128 core threadripper. It wasn't every day that the lowest level stuff changed, but it was probably once a sprint or so. Waiting 2 hours for the compilation database to be ready isn't tenable. Rider and Visual Studio could generate a usable model of the project in 2-3 minutes.
OK that does sound like an actual problem. But I think you would only be in a position to need to do it that way if you don't use CMake, Ninja, or one of those other tools. Bear works the way it does because it's kind of a hack. I have only wanted to use it for Make projects. If the problem is big enough for you and constant enough then you can switch tools to avoid having to compile. It's just an uncommon problem IMO.
CMake does but I think you have to attempt a build to get the database. On the other hand I don't think that should be necessary, as it is capable of generating Ninja build files.
Almost nobody is writing Ninja files by hand. If you have to write something along those lines by hand, Makefiles would make more sense than Ninja. If Ninja does support exporting commands, it's a use case that doesn't matter because almost everyone uses CMake-generated Ninja files.
The most obvious example is the build system of Chromium, which uses gn. They do have a wrapper script to actually generate the compile_commands.json file, but it's just a thin wrapper around Ninja's built-in capabilities. (I'm not sure that CMake's isn't, on that note; I'm pretty sure it's ability to build the database is in fact dependent on which generator you're using.)
I frequently use this to make clangd work well. With CMake-based projects, you can just set CMAKE_EXPORT_COMPILE_COMMANDS then create a symlink from compile_commands.json in your source directory to the CMake build directory. (If you control the CMake file, you can even set this by default and make CMake create this symlink automatically.) To keep the symlink out of Git, you can add it to .gitignore of course, or if you don't control that, the lesser-known local equivalent, .git/info/exclude.
I use this often in combination with direnv and Nix flakes for a good developer experience. (Note that if you need to keep the Nix flake VCS-ignored, you'll need to tell direnv to explicitly not use a Git fetcher for Nix using something like `use flake path://$PWD` though this isn't needed if you can just re-use a Nixpkgs expression, e.g. `use flake nixpkgs#wineWow64Packages.wine-unstable` or so.)
One thing that sucks is that it doesn't seem to be easy to handle cross-compilation stuff. Wine is a particularly challenging case, as many of the binaries are now cross-compiled with MinGW. It still provides better completion than nothing, but I do wish I could get it to be perfect.
When using Nix with MinGW I struggled even harder because Nix's MinGW is even weirder... But with enough mangling, you can even convince clangd on Linux to give somewhat decent completions for a MinGW compilation.
One obvious disadvantage is that you kind of need a full rebuild to get the compilation database to work correctly. At least in my personal experience, running bear on a partial build seems to not work in an additive fashion, though maybe I was holding it wrong.
Working with embedded I had mixed success with bear. Old environments that cannot run the latest version, LD preload messing with vendor toolchains.
These days I found my peace just writing my own python scripts to parse verbose build logs. You just need to extract filename, base dir, compiler and arguments.
Sometimes you're lucky and build system supports dry run and you don't even have to run a real build to get the logs.
Way less invasive, no need to ask devops for additional tools and you can adapt it to any build system.
Whenever I start with a new toolchain, I spend a couple of hours to tweak the parser and I'm good to go.
the compilation database is essential for editors these days, cmake can generate it directly but not all code use cmake, in that case I just use compiledb, which is inactive (there is a newer compiledb-go though), then use bear. somehow bear did not work well for me comparing to compiledb so compiledb(now compiledb-go) is my main tool now.
I use this approach because on macOS, with SIP, Bear no longer works as it is forbidden from injecting its hooks into the system-protected clang shim. There may be other solutions.
(I haven't explored the space in a few years. They've historically fared poorly with embedded targets where one file may be compiled many different ways for targeting different platforms)