vup's comments

vup · on April 14, 2022

My understanding is, that the rasterization process also happens per tile.

One of the cases where this can cause performance problems is, when you want to read the output of a previous render pass. If you want to be able to read arbitrary parts of the output of the previous render pass, the output buffer probably needs to be copied from the tile memory into a memory that can hold the whole buffer at once. Furthermore, this also means that all tiles of the previous render pass need to execute before the next one can run. This limits how much work can be done in parallel.

vup · on April 3, 2022

As mentioned in the blog post, guix already does something similar, by patching glibc to support a per-application loader cache [1]. shrinkwrap seems to increase the binary size a lot (6.6 MB without shrinkwrap, 13 MB after running shrinkwrap for emacs).

Also when comparing the performance, the guix approach seems to be a bit faster:

  hyperfine -m 100 -w 5 -n shrinkwrap './emacs --version'  -n guix 'emacs --version'
  Benchmark #1: shrinkwrap
    Time (mean ± σ):      68.8 ms ±  13.7 ms    [User: 45.8 ms, System: 22.9 ms]
    Range (min … max):    52.8 ms … 106.3 ms    100 runs
  
  Benchmark #2: guix
    Time (mean ± σ):      56.7 ms ±  12.3 ms    [User: 34.9 ms, System: 21.7 ms]
    Range (min … max):    42.7 ms …  84.6 ms    100 runs
  
  Summary
    'guix' ran
      1.21 ± 0.36 times faster than 'shrinkwrap'

[1] https://guix.gnu.org/en/blog/2021/taming-the-stat-storm-with...

setheron · on April 3, 2022

That's interesting. I'll have to think on why (or profile) the ld.so.conf.cache way is faster.

Thanks for the profile.