Sorry to say that your previous "evangelism" shows. It would be very interesting...

bayindirh · on Feb 6, 2024

Being able to spare a couple of cores as a new partition and moving a partition to it to replace the memory and even the processor of the faulty partition with zero downtime is great.

Consider that partitions are completely isolated from each other. Not some pesky soft isolation either, it's all done in hardware. In practice, every partition in a mainframe is a different logical yet isolated mainframe.

Mainframes are built for different kinds of workloads. They are not cloud machines. They are batch machines with 6 or more nines of uptime.

In my current job, a mainframe would be useless. However, fore mission critical core services which needs predictable latencies (bank transaction engines, central big databases etc.) I'll take a mainframe all day, every day.

sonnyOrullivan · on Feb 6, 2024

> Consider that partitions are completely isolated from each other. Not some pesky soft isolation either, it's all done in hardware. In practice, every partition in a mainframe is a different logical yet isolated mainframe.

"Completely isolated in hardware" as in isolated by a software hypervisor (PR/SM) that doesn't have a whole lot of hardware support?

> Mainframes are built for different kinds of workloads. They are not cloud machines. They are batch machines with 6 or more nines of uptime.

That's kind of the point. What exactly is the niche, i.e. which new customer with exactly what software and latency requirements would switch from their current system to a z? IBM won't tell you apart from buzzword bingo.

fuzztester · on Feb 6, 2024

>buzzword bingo

I'm guessing that many of the terms in this thread are buzzwords for many of us reading it :), since relatively few people work on mainframes.

I had not heard of the term buzzword bingo before, so I googled it:

https://en.m.wikipedia.org/wiki/Buzzword_bingo

Interesting. It aligns with my corporate experience in a few cases.

20after4 · on Feb 7, 2024

https://www.youtube.com/watch?v=RXJKdh1KZ0w

rbanffy · on Feb 7, 2024

I think this is the one https://www.youtube.com/watch?v=XLgvlPMp0o8

fuzztester · on Feb 8, 2024

Wow. She nails his guff at the end, with one single word.

I'm nominating this video for the next Oscars, Grammies, or whatever the heck the movie awards are called (I don't keep track of such stuff). Bet it'll win gold in the under-1-minute category ;)

rbanffy · on Feb 9, 2024

That campaign probably got a truckload of awards in Cannes. I was working at Ogilvy back then and IBM was one huge client of ours.

fuzztester · on Feb 11, 2024

Oh, wow. I have read his book, Ogilvy on Advertising. Found it interesting, to say the least.

prng2021 · on Feb 7, 2024

Read chapter 3 here: https://www.redbooks.ibm.com/redbooks/pdfs/sg248233.pdf#page...

They’ve spent the last 60 years developing and refining everything for extreme levels of uptime and high availability. It’s truly unique to these mainframes because IBM controls every aspect of it. The closest equivalent of that kind of vertical integration is your Macbook Pro.

I highly doubt there are any brand new mainframe customers these days. But many of the biggest companies you know and use every single day have tons of workloads that will never move off of it.

sonnyOrullivan · on Feb 7, 2024

I don't think I need to be reading this. The uptime and several 9s of high availability are attainable with z/OS and a parallel sysplex. No doubt systems like these are used by banks and others in the wild.

But this doesn't say anything about "unbelievably powerful" or "feature-full" as in the OP?

The niche customer claim probably is "will never move off of it because nobody ever wants to touch millions of lines of COBOL" then that's fine. It's likely sane from a business perspective to continue using them as long as maintenance burden is manageable. Luckily managers in those more conservative companies consider full rewrites dangerous, rightly so. But in order to claim otherwise (i.e. unbelievably powerful and thus for everybody) we need to see numbers.

prng2021 · on Feb 7, 2024

I’d say 15+ years ago, they were very powerful relative to other solutions on the market during those times. But I agree that’s no longer the case when compared to modern server racks.

I pointed out that Redbook about HA features since it’s a major differentiator of mainframes even today, but there are also these that document other features:

https://www.redbooks.ibm.com/redbooks/pdfs/sg248950.pdf#page...

https://www.redbooks.ibm.com/redbooks/pdfs/sg246366.pdf#page...

You can find 1 or more books that go into a deep dive of every chapter.

IBM has taken input from the biggest companies in the world over many decades to run as many of their workloads as possible. They’ve honed everything in their software and hardware to do so, down to developing specific cpu instructions to support specific use cases. If all of this doesn’t convey an extremely feature rich system, I’d like to hear why you think otherwise.

wolverine876 · on Feb 7, 2024

> "Completely isolated in hardware" as in isolated by a software hypervisor (PR/SM) that doesn't have a whole lot of hardware support?

Are you saying the GP's statement is outright false?

rodgerd · on Feb 7, 2024

It is extremely optimistic, to put it mildly. You can definitely get inter-LPAR noise. I've seen it first hand, personally, on systems that I've worked on.

Moreover the norm for zLinux system is to isolate guests with zVM, not LPARs. LPARs are more likely to be boundaries for chunkier workload definitions - that advice may have changed since I last cracked open a redbook; zVM offers similar isolation to what you'd see with KVM, VMWare, or other hardware-assisted hypervisors.

jupp0r · on Feb 7, 2024

What do you even mean by "predictable latencies"? Afaik this needs an RTOS.

bayindirh · on Feb 7, 2024

An RTOS defines a stricter latency envelope than a mainframe, almost into "deterministic latency" range.

For example, you can say "this operation will take at most 3ms" in an RTOS, and it'll never exceed that number. If you're running on a fixed-frequency system, you can even say that I expect an answer in 2.8ms all the time, every time.

In a mainframe this is a bit relaxed. You can say that latency is <3ms 99% of the time, <3.1ms in 99.999% of the time and <3.5ms in 100% of the time.

IOW, you never think/say "somebody is running a heavy load on the host, and I also slowed down because of them".

jupp0r · on Feb 7, 2024

How is the mainframe's 100% number different from the upper boundary that RTOSes offer? It's not 100%, is it?

bayindirh · on Feb 7, 2024

What I tried to say is, a mainframe might answer late, but rarely, and very slightly. a RTOS doesn't deviate from that number in either direction. It's neither early nor late.

Some RTOS even doesn't consider late answers as acceptable.

jupp0r · on Feb 8, 2024

Yeah exactly. My arm64 Linux box also might answer late, but rarely.

bayindirh · on Feb 8, 2024

That's true, but that jitter envelope gets bigger as the load increases. For a mainframe this jitter envelope is much narrower, and in a competent RTOS, that's basically 0.

cbsmith · on Feb 6, 2024

Last I checked, the Teliums had some pretty impressive numbers for memory bandwidth even compared to NVidia.

sonnyOrullivan · on Feb 6, 2024

Telum is attached to the L2/L3 so it's that bandwidth as long as the model fits into it. Afterwards you go to memory and you really don't want to compare DDR4 modules with RAIM against HBM3 or the likes that a current compute GPU uses. Latency might be closer but you mentioned bandwidth.

rbanffy · on Feb 6, 2024

IIRC, with 32 MB per core (8 per die - a Telum package has two of them), one socket has 256MB of cache for 16 cores. L1 and L2 are in-core, but L3 is on-die unused L2 from other core and L4 is unused L3 on the other die.

I am not sure if that continues off-package, but if it does, a drawer with 4 chips of 16 cores each will have 2 GB of off-chip cache and a full-sized Z with 5 drawers would have 10 GB of off-drawer cache (at this level it's probably not that much faster than same-drawer memory).

As for the RAIM, I think it's safe to assume it has a very wide path (n-1 modules) to the sockets and that aggregate bandwidth will not leave the 4 sockets starved (even if latencies suffer because of the redundancy, but you can replace a defective memory module with a running computer without it having to pause.

sillywalk · on Feb 7, 2024

Some more details for people who know about these things, emphasis mine:

2 chips make a dual-chip module with 512MB cache, 4 dual-chip modules make a drawer with 2GB cache, and a 4 drawer system with 32 chips makes 8GB cache.

"The L3 and L4 virtual design across the fabric provides 50% more cache per core, with improved latencies. The idea is that software and microcode still see two separate caches. These caches are shared and distributed across all eight cores via a 320 GB/s ring and across the integrated fabric. Horizontal cache persistence should further reduce cache misses as well. Specifically, when a cache line is ejected, the system looks for available cache capacity on other caches, first on the chip and then even across the 32-chip fabric."[0]

"The accelerator itself delivers an aggregate of over 6 TFLOPS of 16-bit floating-point throughput per chip to scale up to roughly 200 TFLOPS per system. 1024 processor tiles in a systolic array make up the matrix array, and 256 fp16/32 tiles make up the accelerator for computing activations and include built-in functions for RELU, tanH, and log. The platform also provides enterprise-class availability and security, as one should expect in a Z, with virtualization, error checking/recovery, and memory protection mechanisms. While 6 TFLOPS does not sound impressive, keep in mind that this accelerator is optimized for transaction processing. Most data are in floating-point and are highly structured, unlike in voice or image processing. Consequently, we believe this accelerator will deliver adequate performance and is undoubtedly much faster than offloading to another GPU-equipped server or running the inference on a Z core. The latency of off-platform inference can cause transactions to time out, and inference does not complete"[0]

"Intelligent Prefetcher and Write-Back – 120+ GB/s read bandwidth to internal scratchpad – 80+ GB/s store bandwidth – Multi-zone scratchpad for concurrent data load, execution and write-back Intelligent Data Mover and Formatter – 600+ GB/s bandwidth – Format and prepare data on the fly for compute and write-back"[1]

[0] https://cambrian-ai.com/wp-content/uploads/edd/2021/08/IBM-T...

[1] https://hc33.hotchips.org/assets/program/conference/day1/HC2...

zwaps · on Feb 6, 2024

That's not a lot compared to 8xA100

cbsmith · on Feb 7, 2024

A100's are definitely still the best, but IBM's offering isn't a joke. Depending on what you're trying to do, it might be a good way to go.

justinclift · on Feb 7, 2024

We'd probably need to see pricing for IBMs offering, because it's possible it'll be eye watering-ly high compared to buying even A100's.

jabl · on Feb 7, 2024

TBH, I don't really think they compete in anything like similar markets.

You buy a DGX A100, or a cluster of them, for training and running large deep learning models (or for doing "traditional" HPC).

IBM's solution is more a small inference engine that is part of the CPU, so you don't need to move you data off-chip when doing a little bit of inferencing as part of some other workflow. I don't work with mainframes so I could be talking out of my behind, but maybe something like DL-assisted fraud detection as part of processing bank transactions?

cbsmith · on Feb 7, 2024

> TBH, I don't really think they compete in anything like similar markets.

Yes.

wmf · on Feb 6, 2024

It doesn't. Mainframe terminology is so obfuscated that it's impossible to tell for sure but it sounds like less memory bandwidth than Genoa.

rbanffy · on Feb 6, 2024

They are tailored to the traditional mainframe workloads (they do a lot of hardware/software co-design in their mainframe lineup), so I wouldn't expect a mainframe designed for the generic cloud hyperscale workloads.

In any case, I have played with their LinuxONE Community Cloud service (running on the previous-gen z15) and it's very fast. The impression I get is that it doesn't need to wait for IO. There is a ton of very clever engineering on those machines and the z16 is a technological wonder.

https://www.anandtech.com/show/16924/did-ibm-just-preview-th...

Tor3 · on Feb 7, 2024

Check out this youtube video: https://www.youtube.com/watch?v=ouAG4vXFORc&pp=ygUMbWFpbmZyY...

It does a reasonably good job at comparing mainframes to a regular server setup. It's not about SPEC or GFLOPs etc.