More

amadio · 2025-04-18T12:23:24 1744979004

CERN is a heavy user of ceph, with about 100PB of data across cephfs, object stores (used as backend for S3), and block storage (mostly for storage for VMs). CVMFS (https://cernvm.cern.ch/fs/) is used to distribute the software stacks used by LHC experiments across the WLCG (Worldwide LHC Computing Grid), and is back by S3 with ceph for its storage needs. Physics data, however, is stored on EOS (https://eos.web.cern.ch) and CERN just recently crossed the 1EB mark of raw disk storage managed by EOS. EOS is also used as the storage solution for CERNBox (https://cernbox.web.cern.ch/), which holds user data. Data analyses use ROOT and read the data remotely from EOS using XRootD (https://github.com/xrootd/xrootd), as EOS is itself based on XRootD. XRootD is very efficient to read data across the network compared to other solutions. It is also used by other experiments beyond high energy physics, for example by LSST in its clustered database called Qserv (https://qserv.lsst.io).

amadio · 2025-03-20T18:25:58 1742495158

I think this is good advice overall. I wrote a CMake script that does most of the heavy lifting for XRootD (see https://news.ycombinator.com/item?id=39657703). The CI is then a couple of lines, one to install the dependencies using the packaging tools, and another one calling that script. So don't underestimate the convenience that packaging can give you when installing dependencies.

amadio · 2025-02-08T22:25:59 1739053559

I was just taking a look and couldn't help but notice the switch statement for your operator[], which likely causes a lot of unnecessary bad speculation at runtime:

https://github.com/RandyGaul/cute_headers/blob/755849fc2819d...

I fixed this exact problem in a highly used library in high energy physics:

https://gitlab.cern.ch/CLHEP/CLHEP/-/commit/5f20daf0cae91179...

Many believe the C++ compiler will magically optimize the switch away, but in some cases, like the example above for CLHEP, it doesn't happen, so you end up with bad performance.

amadio · 2025-02-08T22:37:54 1739054274

Since you left this "optimize me" comment here:

https://github.com/RandyGaul/cute_headers/blob/755849fc2819d...

See an optimized quaternion multiplication implementation in SSE by me here:

https://stackoverflow.com/questions/18542894/how-to-multiply...

amadio · 2024-10-30T19:06:21 1730315181

Unfortunately this is no longer maintained, but it's interesting nonetheless: https://abi-laboratory.pro/?view=timeline&l=curl

amadio · 2024-10-15T18:32:33 1729017153

There is also a new format being developed for Run 4, RNTuple:

- https://indico.fnal.gov/event/23628/contributions/240607/

- https://indico.cern.ch/event/1338689/contributions/6077632/

amadio · 2024-10-04T21:27:03 1728077223

Robin Green has some excellent material on the subject, which I am linking below.

Faster Math Functions:

https://basesandframes.files.wordpress.com/2016/05/fast-math...

Even faster math functions GDC 2020:

https://gdcvault.com/play/1026734/Math-in-Game-Development-S...

amadio · 2024-07-30T16:42:52 1722357772

See also:

- Gentoo Prefix https://wiki.gentoo.org/wiki/Project:Prefix

- European Environment for Scientific Software Installations (EESSI) https://www.eessi-hpc.com/

- REANA (Reusable Analyses) https://reanahub.io/

amadio · 2024-06-01T21:19:31 1717276771

It was a nice guest post on the website about eclipse, but most people just use gdb. It is now possible to step through ROOT macros with gdb by exporting CLING_DEBUG=1. See https://indico.jlab.org/event/459/contributions/11563/

amadio · 2024-06-01T20:48:05 1717274885

ROOT 7 is coming. Things are being discussed this year about it, the target is for HL-LHC. See link below. https://indico.cern.ch/event/1369601/contributions/5867782/a...

amadio · on April 27, 2024

I take issue with this part of the article:

> In general, managed tools will give you stronger governance and access controls compared to open source solutions. For businesses dealing with sensitive data that requires a robust security model, commercial solutions may be worth investing in, as they can provide an added layer of reassurance and a stronger audit trail.

There are definitely open source solutions capable of managing vast amounts of data securely. The storage group at CERN develops EOS (a distributed filesystem based on the XRootD framework), and CERNBox, which puts a nice web interface on top. See https://github.com/xrootd/xrootd and https://github.com/cern-eos/eos for more information. See also https://techweekstorage.web.cern.ch, a recent event we had along with CS3 at CERN.

AnthonyMouse · on April 27, 2024

Not only that, open source and proprietary software both generally handle the common case well, because otherwise nobody would use it.

It's when you start doing something outside the norm that you notice a difference. Neither of them will be perfect when you're the first person trying to do something with the software, but for proprietary software that's game over, because you can't fix it yourself.

datadrivenangel · on April 27, 2024

Your options are to use off the shelf and end up with a brittle and janky setup, or use open source and end up with a brittle and janky setup that is more customized to your workflows... It's a tradeoff though, and all the hosting and security work of open source can be a huge time sink.

AnthonyMouse · on April 27, 2024

You don't actually have to do any of that work if you don't want to. Half the open source software companies have that as their business model -- you can take the code and do it yourself or you can buy a support contract and they do it for you. But then you can make your own modifications even if you're paying someone to handle the rest of it.