>Presumably Intel's shrinking Q1 2021 Data Center revenues are partly as a result of this.
It was both AMD and ARM.
There are many work loads that G2 offer immediate cost / performance advantage. AWS charges per vCPU, which is one thread on Intel/AMD and one Core on ARM. So you get ~30% performance improvement along with a ~30% lower cost for using ARM Graviton Series. Most of them have reported a total of 50% reduction in cost. For those that have hundreds if not thousands of EC2 running which fits that workload advantage, this is too much saving to pass on.
There are many SaaS running on EC2 that has mentioned their success on twitter and various other places.
Worth pointing out, this is with Amazon installing as many as they get from TSMC.
A few months ago on HN I wrote [1] about how half of the Intel DC market will be gone in a few years time.
Edit: Another point worth mentioning, this is as much of a threat to Medium and Smaller Size Cloud like Linode and DO where they dont have access to ARM (Yet). And even when they get it Amazon have the cost advantage of building their own instead of buying from a company ( Ampere ).
Linode and DO could always offer a physical x86 core instead of a virtual SMT core. It would cut into margins somewhat, but maybe Intel and AMD would be more willing to discount when they have to play defense. I think one problem for the x86 guys is that because the demand for chips far exceeds supply, they’re still doing “fine” or even “well” right now. So the threat from ARM may still be perceived on mostly an intellectual level instead of provoking the necessary visceral survival response.
I believe Intel couldn't have imagined with what ease their biggest customers can turn into their biggest competitors overnight.
Even a decade ago that would've been unthinkable, but today, making a cookiecutter SoC is relatively easy because nearly everything can be taken off the shelf.
Production costs though.... sub-10nm mask set costs completely rule out anything resembling a startup competing in this area.
I think 65nm was the last golden opportunity to jump on the departing train. It was still posible to ship a cookie cutter chip under $1m, now... no way.
Now, Semi industry is basically Airbus vs. Boeing.
Startups can absolutely compete here. There is sufficient capital to fund chip design (integration) and it is relatively low risk. We are going to see a huge number of Arm and RISC-V solutions on the market 14 months from now.
A few RISC-V SBCs are already on the market. I suspect RISC-V will come to dominate the IoT/Edge space in the next few years before graduating to other market segments.
IoT/Edge deployments are less standardised than other computing workloads. Developers and integrators in this area already expect to deal with a lot of bother when working with a new chips. Also, the margin on these devices is usually razor thin, so the potential savings from not paying ARM licensing fees would be more appreciated.
Finally, RISC-V's modular approach allows for a greater level of flexibility and innovation, which will allow manufacturers to further differentiate and gain a competitive advantage. This is especially relevant for IoT/Edge solutions where thermal and power budgets are heavily constrained.
Ampere basically started as a re-labeled XGene from Applied Micro which started back in 40nm days. And they came with quite some cash to start with: their backer is Carlyle Group, the biggest LBO shop in the world.
Nuvia basically never intended to really compete Intel, or AMD heads on. Their $30m stash would've been just enough for a single "leap of faith" tapeout on a generation old node, and a year of life support after.
They were aiming for a quick sell from the start too.
Depends on your definition of startup I guess. Certainly seems to be enough capital available.
I definitely don't agree with premise that it's now Boeing vs Airbus now (certainly less so than it was a few years ago when x86 was the only game in town).
Do you actually know how much a sub-10nm mask set costs? There’s a lot of speculation from people who don’t have access to those numbers. Those who do are bound by NDAs.
I do hear figures in single megabucks for relatively small tapeouts.
Back in 65nm, 40nm days, big tapeouts were already costing in high 6 figure digits in masks.
And... masks are not the most expensive items on the signoff costs these days.
Specialist verification, outsourced synthesis, layout, analog, physical, test, and other specialist services will easily cost more than the maskset for <40nm.
I would not be surprised if tier 1 fabless already spend $10m+ per design just on them.
You are absolutely correct that design costs swamp mask costs by far. For 7 nm, it costs more than $271 million for design alone (EDA, verification, synthesis, layout, sign-off, etc) [1], and that’s a cheaper one. Industry reports say $650-810 million for a big 5 nm chip.
They're the cheapest EC2 instance type, so they're very attractive to small scale deployments like side projects, personal sites etc. (basically anything that can run on one or two small nodes) where budget is a major concern. The t4g.micro is in the free tier as well, so that'll help.
I host a few very low traffic sites & I'm in the process of switching from a basic DO Droplet to a pair of low-end Gravitons. Will save me money and give better peak performance for my workloads.
> switching from a basic DO Droplet to a pair of low-end Gravitons. Will save me money and give better peak performance for my workloads.
I'm having trouble figuring this out - a t4g.micro is $6/month, before any storage or data transfer costs. The roughly equivalent DO offering is $5/month, inclusive of 25GB SSD and 1TB transfer. Even with a reserve instance discount and significantly less than 1TB outbound transfer, DO seems likely to be cheaper.
Maybe, but it would take a _lot_ of people moving small deployments (where by definition the savings would be small especially relative to the fixed costs of getting to work on Arm) in a relatively short space of time to have this impact - so I'm sceptical (and if it is then it must be very easy to move to Arm - which I'm also sceptical of).
More likely some very big customers (peer comment mentions Twitter) moving to Graviton2 for cost savings.
Graviton might be the top and/or default choice in their management console when you create an EC2 instance. That would swing things pretty quickly for all the free tier folks.
Yes...edited my edit. I'm pretty sure there was no radio button for some time, you would have had to scroll into other choices to get a Graviton instance.
The company I work for has migrated hundreds of heavily-utilised Elasticsearch and Storm nodes to Graviton. No performance issues, pure cost saving. We’re working on the rest of our systems now. We’re going to save hundreds of thousands of dollars over the next few years.
"instance additions" also doesn't take instance size/performance into account. If ARM-based instances are overall smaller, that'd allow more of them, distorting the numbers...
Percentage of compute power would be cool to know here.
I wouldn't be surprised if AWS is using Graviton2 pretty heavily for internal processes as well, stuff like control planes for the major services like S3, SQS, SNS, etc...
I know many users. Basically any non-x86 workload that cost sensitive can benefit from moving to arm instances. Database instances are good candidates, big data workloads as well.
Funnily coincident timing: current post #2 (GCC 11.1 released) adds support for the CPUs mentioned here (currently post #4):
AArch64 & arm
A number of new CPUs are supported through arguments to the -mcpu and -mtune options in both the arm and aarch64 backends (GCC identifiers in parentheses):
Arm Cortex-A78 (cortex-a78).
Arm Cortex-A78AE (cortex-a78ae).
Arm Cortex-A78C (cortex-a78c).
Arm Cortex-X1 (cortex-x1).
Arm Neoverse V1 (neoverse-v1).
Arm Neoverse N2 (neoverse-n2).
Good to see work going into this at the proper times. (Not that that was much of a problem for CPU cores in recent times. Still not a matter of course though.)
These tunings will only be used if you compile stuff yourself with -march=native (or specifying one particular model). Most software out there would be compiled with generic non-tuned optimizations. The tuning is rarely a huge deal though.
- when you have a particularly CPU-intensive application, you'd hopefully compile it to target your system
- the cloud providers can just do a custom Debian/Ubuntu/... build for their zillions of identical systems
- the library loading mechanism on Linux is slowly getting support for having multiple compile variants of a library packaged into different subdirectories of /lib (e.g. "/usr/lib64/tls/haswell/x86_64")
Also I was mostly trying to point out as a positive how well the interaction is working there between ARM and the GCC project. I wish it were like this for other types of silicon.
(CPU vendors all seem to be getting this right, and GPUs are slowly getting there, but much other silicon is horrible… e.g. wifi chips)
That is not entirely true. Binaries in the packaging systems might not be compiled for the most recent atomic instructions which can really affect performance.
Well, that – yeah. But it doesn't strictly have anything to do with the actual CPU model specific tuning that the news was about, only in that setting a specific CPU in -march (-mtune would not do it!) would imply the features. Typically though you'd just do -march=armv8-a+the+desired+features for that like the first post you linked does.
Really the important piece for making distribution binaries not suck is ifuncs/multiversioning. But library and app authors currently are required to deliberately use them. Which is fine for manual optimizations that use intrinsics or assembly (and e.g. standard library atomics) but I'm not sure any compiler currently would automatically just do that for autovectorization.
Apple hasn't seemed interested historically. And the Nuvia folks left Apple to found their company explicitly because they thought an M1 style CPU core would do well in servers but Apple wasn't interested in doing that.
it's not that apple will sell server chips. it's that developers can locally work on arm which makes it easier to deploy to severs. linus torvalds had a quote about this...
"""
And the only way that changes is if you end up saying "look, you can deploy more cheaply on an ARM box, and here's the development box you can do your work on".
"""
It’s wild to consider that my next computer (an arm M1 Mac) will be compiling code for mobile (arm) and the cloud (arm). I wonder if we’ll ever see AMD releasing a competitive arm chip and joining the bandwagon.
Personally, I don’t see many server admins choosing to pay the Apple Tax to get M1 into their data center. I don’t see how the watt/performance ratio could pay off that kind of tax.
I did not mean to imply that the actual M1 will be used in data centers. Apple is quite popular among developers and its also a trendsetter which will probably lead to other computer manufacturers to adopt ARM for personal computers. So having more people use ARM on their personal computers will lead to more ARM adoption in the data center.
I believe interpreting statistics from those surveys in this way isn't fair. There are so many developers around the world but the pattern of value/money generation by them is not uniform; in other words, a small percentage of developers work for companies that pay the largest share of server bills and penetration rate of macOS devices among developers of top companies is probably higher than average.
(I'm not implying that developers who work on non-macOS devices, make less value because your device doesn't have - nearly - anything to do with your impact. I'm just talking about a trend and possible misinterpretation of data)
The OP wasn't suggesting Apple M1 chips in the data centre, but rather that Apple M1 chips in developer workstations will disrupt the inertia of x64 dev –> x64 prod. It will be easier for developers to choose ARM in production when their local box is ARM.
AArch64 is load-store + fixed-instruction-length, which is basically what "RISC" has come to mean in the modern day. X86 in 2001 was already… not that :)
Not really, because the variable length instructions have consequences - mostly good ones because they fit in memory better.
Also, the complex memory operands can be executed directly because you can add more ALUs inside the load/store unit. ARM also has more types of memory operands than a traditional RISC (which was just whatever MIPS did.)
The upside to variable length instructions is that they are on average shorter so you can fit more into your limited cache and you make better use of your RAM bandwidth.
The downside is that your decoder gets way more complex. By having a simpler decoder Apple instead has more of them (8 wide decode) and a big reorder buffer to keep them filled.
Supposedly Apple solved the downside by simply throwing lots of cache at the problem and putting the RAM on-chip.
I'm not a CPU guy and this is what I've gathered from various discussions so I'm happy to be corrected.
In most cases, yes, but it doesn't get rid of the complexity for compiler backends that can't directly target the real instruction sets Intel uses and have to target the compatibility shim layer instead.
I hope that ARM servers with reasonable specs won't be exclusive to AWS and the other hyperscalers. For example, it would be nice if OVH would offer ARM-based dedicated servers.
V1 = Slightly tweaked ARM Cortex X1 with SVE ( Used on Snapdragon 888 ) on 7nm aiming at ~4W per Core.
N2 = New Cortex with AMRv9, ~40% IPC improvement over N1 or 10% lower than V1, SVE2, 5nm aiming at ~2W per Core. With Similar die size to N1. ( I fully expect Amazon to go 128 Core with their N2 Graviton )
So in case anyone is wondering, no, it is not Apple M1 level. Not anywhere close.
CMN-700 = More Cores and support of Memory partitioning, important for VMs.
The cores don't serve the same purpose as the M1 cores. M1 is optimized for single thread at the cost of die size (and a bit of power). I don't have exact numbers, but say the apple M1 core takes 1.5x the die area of N2, the you'd get better performance by putting in 1.5x the number of N2 cores.
Yes. M1 / A14 is also a 5W+ Core. So a different set of trade off.
I mentioned the M1 because it is the question that always comes up and people keep banging on about it. I wish most tech site would simply point this out since they have the reach. But it is obvious neither Apple nor ARM have the interest for their pieces to be named and compared in this way. And I guess tech site wont do this to harm any relationship.
It's apples and oranges comparison to Apple M1 chips (server vs. consumer) but does hint at what's possible with the next generation ARM Cortex "X2" cores, that could appear in next year's flagship smartphones and laptops. A 30-40% IPC jump, partly due to moving to 5nm fabrication process, is huge.
Given the right implementation, namely squeezing more big cores than the current 1-3-4 configuration, it could close the gap considerably with Apple.
Process node changes generally doesn't do anything for IPC - those are generally rather due to microarchitecture improvements, so I doubt the move to 5nm has anything to do with the IPC gain..?
I agree with that - but if you take an unchanged core and manufacture it at a different node, then you won't see a change in IPC, which in my book makes it questionable to attribute IPC gains to the process node.
I wonder why they didn't use AWS wide numbers rather then just EC2. I would have thought EC2 would lag in the transition while AWS services would make the switch quickly
Because EC2 represents a more realistic market adoption, it’s more important to know if you can run the software of your choice on ARM than can Amazon develop a service on an ARM stack.
> 49% of AWS EC2 instance additions in 2020 are based on Graviton2
Surprised at this level of Graviton2 adoption in AWS at this stage. Any clues as to who is using these instances?
Edit: Presumably Intel's shrinking Q1 2021 Data Center revenues are partly as a result of this.