I gotta give AMD massive credit...while I have mainly used intel in my setups, AMD has really pushed their offerings.
First I remember everyone driving the price up of an AMD video card just because bitcoin mining.
Second they got the backing in PS4 and Xbox One hardware.
Now an Arm 8-core CPU...although I find the clock speed (2GHz) kinda underwhelming, still AMD's pricing would entice me to buy 2 for the price of 1 Intel i7
How about also comparing the power usage? At 25 watts, you can get three of these for one six core Intel CPU; so you're at 24 cores at 2 GHz vs 6 cores at 3 GHz (and probably still at a lower price). The GHz can't really be directly compared, though. Even comparing GHz across Intel product generations isn't useful. I have a 3.16 GHz Core 2 Duo in my desktop that I think (I haven't really benchmarked, but I've run a Litecoin miner on both for testing) does about half the work of the 3.2 GHz i7 in my laptop.
All that said, I have a 16 core AMD server in colo that is running at about 3% usage across all CPUs, and yet it is slow as hell because the disk subsystem can't keep up (replacing the spinning disks with SSDs as we speak). The reality is that CPU is not the bottleneck in the vast majority of web service applications. Memory and disk subsystems are the bottlenecks in every system I manage.
So, I love the idea of a low power, low cost, CPU that still packs enough of a punch to work in virtualized environments. Dedicate one of each of these cores to your VMs; would be pretty nice, I think.
The Avoton Atom C2750 which is already out now is also 8 cores, but at 2.4GHz and with the entire SoC at 20W. It's supposed to have comparable performance to a Xeon E5520 quad-core/8-thread 2.26GHz CPU from 3 generations back, or about half the performance of a current quad-core/8-thread Xeon E3 1230v3. And it supports virtualization extensions.
I agree that I/O and not the CPU is usually the bottleneck though.
That's pretty impressive, actually. I just built a new server, and was comparing 80W and 95W packages. I didn't realize Intel had sorted out power so effectively that they could compete with ARM architecture. (Though, to be fair, this new ARM from AMD is a beast...ARM's grown up and Intel has shrunk down.)
Three generations back is still plenty fast for me. I'm running a Core 2 Duo on my desktop, as mentioned, and I don't see any reason to upgrade. I can't imagine needing vastly more in a web server.
I think it'd be interesting to see how these two product lines stack up on all the variables: cost, performance, power under load, etc.
The AMD might have more IO bandwidth - ships with dual 10GbE, while the C2750 ships with 4x2.5GbE (usually 1Gb unless on backplane), although it has PCIe too, who knows about total bandwidth.
I think you mean 'tick' for a die shrink, which is Denverton, and is supposed to have "more cores and more of everything". 'tock' refers to a new microarchitecture, and the details on the generation that will follow Denverton haven't been announced yet.
There's also going to be the Broadwell SoC, which will fit somewhere between Denverton and E3 v4's.
Memory / Cache subsystems are generally where ARM chips fall down (in terms of throughput), and Intel has several patents on cache hierarchies that AMD is licensed to use.
So it's technically possible that AMD could build ARM chips that are a lot more competitive (FLOPS-wise) with Intel than other manufacturers can.
The efficiency of a memory hierarchy doesn’t factor into raw FLOPS throughput at all. Rather, it effects your ability to bring real data into registers and get useful work done.
What sort of issues with ARM memory/cache do you have in mind? These systems have been sufficiently powerful to keep the ALUs saturated on compute-heavy tasks on all recent ARM micro architectures with which I am familiar.
Of course it does in real life - unless you're working on very small amounts of data, cache level latencies (where Intel chips - non-atom at any rate - generally have much lower latencies) and cache pre-fetchers and branch prediction units (where Intel is generally 5/6 years ahead) can make or break the difference between the FP units being constantly busy or regularly stalled waiting for data.
In mainstream raw-flops workloads (things like lapack), a correct implementation re-uses the data from each load many times such that the FPUs are not “stalled waiting for data”. Unless the software implementation is terrible, the memory hierarchy does not pose a significant bottleneck for these tasks, and even older ARM designs like Cortex-A9 can achieve > 75% FPU utilization, comparable to x86 cores.
There are more specialized HPC workloads (sparse matrix computation, for example) where gather and scatter operations are critical, and the efficiency of the memory hierarchy comes much more into play (but in these cases even current x86 designs are stalled waiting for data). There are also streaming workloads (which you seem to reference) where you have O(1) FPU op data element, which stress raw memory throughput and prefetch. However, one doesn't typically use these to make a general claim about which core is "more competitive (FLOPS-wise)”, precisely because they are so dependent on other parts of the system.
What "real-world usage" are you talking about, specifically?
EDIT: looking at your comment history, you seem to be focused on VFX tasks, which tend to be entirely bound by memory hierarchy; even on x86 the FPU spends most of its time waiting for data. For a workload like that, you absolutely want to buy the beefiest cache/memory system you can, but that shouldn’t be confused with a processor being more competitive “flops-wise".
I don't know why u guys are arguing, he wouldn't have much use this ARM part. Where this will excel is in CRUD web apps. The ARM part it there just to shuttle data between the 10Ge and the 128GB of ram.
Well, Intel vaguely sells a 4-core Xeon at 1.8/2.8 GHz with 25 W TDP [1], then between the hyperthreading and higher IPC you've mostly made up for the fewer cores relative to this ARM part.
An i7 will auto-boost its clock speed (if it's cool enough); it could actually be running at 3.9GHz during your mining. Won't make up twice the work the way, so there's still some architectural improvements, but that's not all the story.
Clock speed is only relevant when comparing the same CPU.
I have been having this argument since the 1980s in the school playground when kids with Spectrums would claim their 4Mhz Z80s were faster than 2Mhz 6502s...
And the QL-inspired Spectrum+ and 128 remain some of the most beautiful 8-bit computers ever built. I love that keyboard.
Commodores and Ataris of the time had very clever ideas about expansion. The intelligent peripherals are a brilliant idea and we should have done more of that.
> although I find the clock speed (2GHz) kinda underwhelming
Remember that the clock speed is not necesarely everything. AMD may be able to get more work done with those 2Ghz than a Snapdragon 800 might using similar or less energy.
I lost that poor little machine when I moved across the country; not sure where it could have ended up. Now my only "system to screw around with" is my RasPi.
The first PC I had with X installed was a 486SX/25MHz with 20 MB of RAM. On the rare occasion that I "needed" to use X, I would start it, do whatever I needed to, kill X, and return to the console.
I still spend most of my time in a terminal window, but it's on a machine with ~1120x as much CPU horsepower and ~1638x as much RAM. Just calculating that made me go "wow" and realize how far we've come in the last 20 years or so.
I remember doing some intro comp.sci homework on a hand-me-down Pentium 2 laptop. It sort of managed to run X11 (in 8bit greyscale!) -- but the jdk took literally 5 minutes to start up (compilation time was effectively bounded by the star-up time: It took 5 minutes until I had my .class files, or my list of syntax errors...).
I did a similar thing. My P2-233 (at the time this was the top end) blew a PSU and I did my engineering writeup on a P66 (pulled from a campus skip) with 32Mb of RAM using heirloom nroff, eqn, pic and vi on FreeBSD because it was all I had access to and you couldn't compile TeX on that box in a week. I have a lot of respect for the troff family of tools and still use them now.
Interesting question. I think the problem was the 4(or 8?) MBs of ram, and javac swapping out while loading. I can't recall that I did any compiling with gcc on that laptop -- at the time gcj certainly wasn't mature enough. I don't think I tried with jikes either, for some reason -- can't remember why now.
You compiled it? Why? Why not compile on a different architecture thats faster or use distcc to make a compile farm? Last times I used Gentoo I just used the packages, took no longer to install than an rpm based distribution.
>Why not compile on a different architecture thats faster or use distcc to make a compile farm? Last times I used Gentoo I just used the packages, took no longer to install than an rpm based distribution.
Well this was 10 years ago. I was 17 or so and I didn't know about distcc. I was young, dewy-eyed, and a huge noob.
Back when I was touring colleges, building Gentoo from Stage 1 seemed to be all the rage. I tried on a PIII also. Then the power went out after a day and I never tried it again after. While maybe it would have been a good experience, ultimately I don't think I missed out on all that much.
> Second they got the backing in PS4 and Xbox One hardware.
Does not mean much. Their APU used in these consoles is very weak compared to what we have in mid to high end PC graphics cards. They were obviously chosen for their price there, not for the performance (which is MEH at best).
It's approximately equal to a 7870 (the PS3 variant), which is anything but "weak". Indeed, it's what's in my play PC, and plays any current game with ease. As to AMD generally, well they have by far the leading compute cards, and hold their own in the high end, so certainly not "meh".
Well the performance on 1080p games has been less than stellar so far, compared to my gaming PC. Laughable, even, since my PC graphics card is already almost 2 years old. These consoles are really a joke, and they are making money on the hardware this time around, that tells you how cheap they are.
The games at launch are always notoriously not representative of the full capabilities of the console. Compare any game from the launch of the previous generation to games that are being released now. There is a huge difference.
Also, I'd like to see your source for the statement that they are making money. I'm sure they aren't as huge loss leaders as they were in the past, but I'd be surprised to know if they are making money on them. If Microsoft could cut the MSRP, they would have.
I totally agree...this time around Sony and Microsoft really were under inventory constraints to launch for the holiday season....
If Microsoft could have, I think they would've offered their subsidized consoles (like they did the 360), and launch with Halo as their premiere title....
(this will probably get a lot of hate) then again Playstation actually decided to play video games this time and xbox one is trying to be a super Roku.
> Also, I'd like to see your source for the statement that they are making money
This was clearly mentioned by Sony before the launch of the console. As for the Xboxone, Microsoft also mentioned that if someone buys one game with the system, they are already making a profit overall. They are NOT selling their systems cheap compared to the previous generations, in terms of what's inside the box. I'll have to find the quotes, but I'm sure you can find them as well if you have 5 minutes.
> Compare any game from the launch of the previous generation to games that are being released now. There is a huge difference.
My point is that when the Xbox 360 came out, for example, the games running on it at launch were very impressive versus what the PCs could do at the time. In this generation, watching the launch games on PS4 and Xboxone just feels MEH at best. They are way too late in the game vs the power they are packing in.
> The games at launch are always notoriously not representative of the full capabilities of the console
Sorry but let me call BS on this one. We're not talking about PS3 kind of hardware here, where the architecture was unknown and new and developers had to learn a lot. The PS4 and Xboxone are using basically PC hardware under the hood, and the learning curve should be close to zero for most developers involved.
Thats a bold claim. Care to name what you are using in your gaming PC ? Two years old i would guess a HD6870 or equivalent, thats about half the performance of the PS4/XB1 GPU which is on the 7870/R9 270X/660Ti level. Add an 8 core CPU and 8GB of extremely fast DDR5 to the mix and i highly doubt Sony/MS are making any profit on the hardware right now.
ANd that's just a third party evaluation, I have no doubt Sony gets way better hardware price deals when ordering millions of parts through their purchasing contracts, so of course they are making some kind of profit on each PS4 sold. They actually want to avoid what put them in the red with the PS3 sales.
> IHS iSuppli, having totted up the parts costs, has come up with a total of $372, with an estimated $9 labor cost bringing it to $381 – $18 below the recommended retail price.
So retail price is 399, out of which retailers make how much? 199 ? Tell me again how Sony is making a profit.
I don't feel any console can really compare to a custom built gaming PC...for a while I was hosting my website and ffmpeg compression server on my gaming rig.
In regards to the latest consoles, I have a ps4...and while most "anticipated," games are still on preorder, I don't think it's fair yet to call the hardware a joke, I don't feel the studios have taken full advantage of the hardware.....now the games offerings right now....yeah I agree, but I think the studios really compiled their games (at least the ones I play :: COD: Ghosts, and Battlefield 4) for previous gen consoles.
I am interested to see how the new halo plays though...just can't get past the fact that the xbox one requires kinect to be...ummm, connected to play.
The language that you're using (weak, joke, etc) gives the impression that you aren't really evaluating from a rational perspective, but instead have chosen a side.
Why? What's the point of doing that?
Though it is an interesting segue from this article about AMD supporting ARM chips (one of the biggest problems facing data centers is power density and with that heat removal) -- the PS4 has a total power consumption of 137W. That's the eight cores, GPU, 8GB of DDR5, blu-ray player...the entirety of the box consumes 137W.
The nvidia GTX770, undoubtedly a higher performance card, not only costs almost as much as a PS4, alone it consumes significantly more power than the PS4. That's excluding the rest of the box around it. That just isn't tenable for a living room gaming machine, which is exactly why consoles represent a necessary compromise. Similarly, several of the most interesting Steam boxes have GPUs significantly less powerful than the PS4's 7870, because you just can't drop a 450W device in the living room and call it a day.
> The nvidia GTX770, undoubtedly a higher performance card, not only costs almost as much as a PS4, alone it consumes significantly more power than the PS4.
I don't care if it costs more. It's available. Hardware better than the PS4 in consoles is NOT. That's the whole point. If I want to have better hardware than the PS4, I'll have to wait another 10 years for another console cycle to come. No way. In 2 years time the PS4 and XBoxone will be extremely weak even compared with low-end PC gaming rigs.
And you bet I'll want a high performing card (no matter what it costs) when the VR headsets come around (whether Occulus Rift or Valve).
> Similarly, several of the most interesting Steam boxes have GPUs significantly less powerful than the PS4's 7870, because you just can't drop a 450W device in the living room and call it a day.
Then these makers are missing the point. It will be more interesting to build your own Steam Machine then, and install SteamOS on it if no one is willing to put power in the living room. PC market is not console market.
@ 2Ghz they can probably produce large quantities. It sounds like they went for a "safe" fabrication process so that the initial rollout is as defect free as it can.
I think reusing the Opteron name is really not a good idea, since now there'll be x86 Opterons and ARM Opterons. Maybe Apteron would've been a better choice...
Fair point and some excellent alternatives been mooted.
I think AMD are positioning this for the server market (though would love a nice cheap desktop with this chip. With that they are levridging there only real asset in branding and I feel it will not water down but help the Opteron range live on, given they are moving away from the x86 area.
Maybe they should call it "Letniuoywercs" I don't know how to pronounce it either :-)
I agree that re-using the name is bad mojo from a marketing perspective, too many people will be caught off guard by the lack of compatibility with the x86 chip set.
Unless non-x86 desktop PCs become mainstream in the consumer market, I don't think there is much risk of confusion. The channels will be completely different and I don't think it will be easy to buy a motherboard with an Opteron A anytime soon.
At $400 I assume youre thinking of something like an Intel X520 DA2 plus optics. If you can tolerate the power you can do 82599 with dual Phys for more like $150-200.
Obviously I have no idea on the network controller or its sdk/driver support.
I was thinking of solutions for 10s of gbs on todays x86 boxes. Dollars and power are both a budget, so it's all a tradeoff.
WRT to the opteron A1100 yes, I could see your comment. Something like a box of A1100 blades plugging 802.3ap to a common backplane, a trident chip there, and then a bunch of (Q)SFP+ northbound. A couple hundred gbs for around 150 watts of networking.
When I see A1100 I think an IO node with 10s of SATA disks attached. In that case Im only getting 10 or 20 per rack. A backplane makes less sense to me, running two DAC phys per box to a TOR I could see.
DPDK is Intel's turf and PF RING only supports igb/ixgbe/e1000 drivers.
For out of the box usage, looking at Netmap or Linux PACKET_MMAP (though not entirely zero-copy) should be possible.
I don't think that's "sweet" I think that's a bad decision. What if I don't need two of those per 10 arm cores? Now I'm just paying for gates I don't need.
What if you don't need AES encryption? Now you're just paying for gates you don't need. What if you don't need SIMD instructions? Now you're just paying for gates you don't need.
It doesn't matter. Modern processors have the complete opposite problem. It isn't that transistors are expensive, it's that they're so cheap you end up with too many and they generate too much heat. If you can stick a block on there which 60% of your customers can use and the other 40% can shut off to leave more headroom for frequency scaling, it's a win.
Also, the number of gates you need for a network controller is small.
You haven't been paying for the silicon for a while; when you buy a chip you're really paying for the design and it's cheaper to design one chip with all the features people might need.
It's much cheaper to make a chip that both you and the other 99% of the people (that need 2 ethernet ports/CPU) need than making a specific chip for each market.
Then you buy a different chip if they're not cost effective.
As an entry into the server market, this sounds awesome; for things like storage aggregation and vm migration, gigabit ethernet is becoming a real bottleneck for many applications as the core counts has gone up.
In server farms, who cares if the processor costs $400 or $500. Energy usage is the biggest cost over time, and this silicon (presumably) isn't powered if it's not used.
This is a microserver, designed to connect up I/O bound resources to each other. Imagine a cache like Squid running on this thing. Imagine mulitiple RAID-0 SSD drives on one side, and 20Gbps going out through the network.
This is NOT a computationally difficult task. For computationally difficult tasks, you have 8-core $2000 E5 Xeons (which get more and more efficient the bigger the workload you have).
However, filling your datacenter with $2000 Xeons so that they can spend 0.01% of their CPU power copying data from SSD drives to the network is a waste of money and energy.
The A1100 looks like it will be a solution in the growing microserver space. As Facebook and Google scales, they have learned that a large subset of their datacenters are I/O bound and that they're grossly overspending on CPU power.
Big CPUs -> Big TDPs -> higher energy costs.
This machine is designed with big I/O throughput (multiple 10GbE and 8 SATA ports on-chip), with the barest minimum CPU possible to save on energy costs.
The upcoming competitors to this market are HP Moonshot (Intel Atoms), AMD Opteron A1100, and... that's about it. Calexda's Boston Viridis has died, so that is one less competitor in this niche.
It's a relief that AMD has a performance/watt alternative to bulldozer. I sure hope they can keep being a business so I have someone to buy hardware from that doesn't fuse off features to screw us out of a %65 margin.
Either way, I'm hoping ARM64 will trickle up from iThingies to the desktop so I can buy a CPU with Virtual MMIO without paying an extra hundred bucks.
Maybe their desktop and server CPUs aren't so hot right now, but I don't think you have to worry about AMD for a while. All of the current generation of consoles have AMD GPUs, those GPUs are on the same die as an AMD CPU for two of the three (PS4 and Xbone), AMD GPUs remain competitive with NVidia's offerings, and they seem to be winning mindshare with their lower-level Mantle graphics API.
EDIT: And the whole Bitcoin-mining thing (or Litecoin/Dogecoin mining thing, these days), as mentioned in another thread.
Neither sha256 (bitcoin) nor scrypt (litecoin) mining uses floating point operations.
AMD cards are faster because they are built with more, simpler "cores", compared to Nvidia's fewer, more complex "cores". Mining benefits from the increased parallelization and doesn't benefit from Nvidia's "fancy" "cores". For sha256 mining, AMD cards gain an additional advantage by supporting a bit rotation instruction that Nvidia cards do not.
They just mentioned bitcoin mining and didn't go into any detail.
> Does AMD GPU hardware have a general advantage mining?
Yes. Disclaimer: I don't mine anything, am not that familiar with how Bitcoin works, and mostly heard about this stuff on the grapevine, so some of this is probably off. Corrections are welcome, if there are any cryptocurrency experts lurking.
At first, Bitcoin mining was done on CPUs. It uses SHA-256 for hashing, which (relatively speaking) isn't that difficult to compute. At some point, someone developed a GPU implementation for it, after which GPU mining quickly overtook CPU mining in terms of cost efficiency. The problem is easily parallelizable (just run a separate hashing procedure on each of the many processing elements in the average GPU), making it a better fit for GPUs than CPUs.
The short reason[1] that AMD GPUs are better than NVidia GPUs for this purpose is that while NVidia's GPUs use more powerful, but lesser in number processing elements, AMD GPUs use less powerful but more numerous processing elements (a processing element is basically marketing speak for each of the dozens or hundreds of small, specialized "CPUs" that make up a GPU). For mining Bitcoins, the extra capabilities of the NVidia processing elements basically go to waste, while AMD's cards, with more, less power-consuming elements, give you both "more hash for the dollar" and "more hash for the megawatt hour." This caused a huge spike in the value of ATI cards, completely unrelated to demand for PC gaming.
However, because this was a trivially parallelizable problem and there is big money at stake, miners came up with FPGA-based solutions (and later ASICs) dedicated to the purpose of mining Bitcoins, which in turn took the Bitcoin mining throne from GPUs. As I understand it, at this point mining Bitcoin with a GPU is a net-negative, and you need an ASIC farm to actually make anything off the operation.
At another point, Litecoin came around, and one of it's design goals was to be only practical to mine on CPUs, so that Bitcoin miners could make use of the underutilized CPU in their PC-based mining setups. It used the scrypt algorithm in place of SHA-256 for this purpose, which was designed to be computationally expensive and difficult to practically implement on FPGAs or ASICs (and therefore, more resistant against brute-force password hashing attacks), in particular by requiring a lot more memory than it would make sense to allocate a hashing unit on dedicated hardware. Unexpectedly, someone came up with a performant GPU implementation for that as well, giving AMD GPUs back the throne for mining (of Litecoins, and the Litecoin-derived Dogecoin). At this point, there is no sign of a cost-effective FPGA or ASIC-based scrypt mining device, so it looks like things will remain that way for a while.
[1] The gory details here, including something I wasn't aware of until now: NVidia GPUs lack an instruction for a certain operation necessary for SHA-2 hashing that costs them a couple of instructions each loop to emulate. AMD GPUs do have an instruction for it, so this automatically gave them another advantage over NVidia GPUs for mining purposes. Not sure if this applies to scrypt as well, but I'd guess so, since it was derived from SHA-2.
Does the ARM architecture have anything like the nested page tables in recent x86-64 chips? Or is that an orthogonal processor feature that is not required (or forbidden) in a particular implementation of ARM/x86-64?
To make a real entrance into the server market, I would expect good virtualization support to be nearly a requirement.
It has support for virtualization. There is a two stage translation where the first stage handles guest operating system and second stage handles the hypervisor mappings. Both stages have nested page tables.
Also there is an IOMMU implementation for supporting virtualization for IO. For example, the IOMMU and CPU MMU page table mappings are synchronized such that a DMA controller would also adhere to page table mappings set up for the CPU.
A recent post revealed some security problems using firewire (and a few other technologies) related to DMA[1]. Would the IOMMU features you're talking about prevent that problem?
Right. DMA creates security holes because it does not sit behind an MMU. It can change the memory of any guest OS. That means any OS or code that can program the DMA controller can bypass security. IOMMU prevents that, because all IO devices sit behind this MMU.
You can have this protection, but then face programming issues if IOMMU and cpu MMU use different page tables. You have to update both. ARM IOMMU is designed so that it is automatically in sync with the CPU tables.
I don't know anything about chips. But I know ARM architecture was around for decades. Why it's hot again? I get the point for using it in smartphones and tablets, but why should servers use ARM?
It's hot because the tablet and smartphone market exploded, creating a demand for high performance, low-power cores that could be flexibly integrated into SoC's with other parts. With that new market came volume, and in the processor market, volume is important. The reason x86 overtook RISC architectures is that the high volume of x86 chips generated revenues that allowed for massive capital investment in x86 designs. The tablet and phone market is driving a similar process for ARM chips. Right now, there are at least three very well-funded lines of ARM micro architectures: Qualcomm's, Apple's, and ARM's. It's been a long time since a non-x86 platform got that kind of investment.
You are right that smartphones have drive demand for higher performance and hence more expensive / higher margin ARM CPUs.
But overall ARM volume has been far higher than x86 volume for a long time even excluding all smartphones and tablets.
Most of our x86 servers at work have more ARM CPU's on them than they have x86 cores (most of the harddrives have controllers with ARM CPU's - some of them multi-core etc.). You'll also find it all over the place from washing machines to set-top boxes to microwaves. You find ARM cores in some sd-cards even.
I believe the projected number of cores for ARM last year was around 3 billion. I doubt x86 passed 500 million, which also means that both MIPS and PPC is competing with x86 for second place in number of cores for 32bit+ CPU's. (On the 16 bit or below end you also have surprises like 6502 derivatives shipping in ludicrous volumes)
So x86 has been "hot" for the market for main CPU's in devices consumers recognise as computers, and has been by far the most profitable architecture for a long time. Outside of that, though, it's at best at second place in total volume, and in most non-computer markets it's more likely to place in 3rd to 5th place in volume.
That, and because historically, the ARM instruction set has always been the gold-standard power-efficient 32-bit architecture. Especially in thumb mode.
Open-source software and the extreme efficiency goals of data centers make an interesting alternative to x86 now.
If you look at it from the perspective of expressiveness, the CISC-ness of the x86 ISA also allows far more opportunity for hardware-level enhancements than a RISC style one: the code density is higher, meaning better cache usage and less memory bandwidth needed (especially with multiple cores), and there's still a lot of relatively complex instructions with the potential to be made even faster. RISC came from a time when memory bandwidths were higher and the bottleneck was instruction execution inside the CPU, but now it's the opposite; memory bandwidth and latency is becoming the bottleneck. There's only so much you can do to speed up an ARM core without adding new instructions.
It's pretty weird to think that x86 gives Intel any advantage over ARM.
Let's see: x86 code density is horrible for a CISC, there is hardly any advantage over ARM, which does great being a RISC. Also remember that the memory bandwidth is primarily a problem for data, but not code. ARM64 is a brand new ISA, it's the x86 ISA that is a relict from the times when processors were programmed with microcode. Intel is doing a great job to handle all this baggage, but to claim that the ISA gives Intel an advantage is ridiculous.
And finally, Linus has been an Intel fanboy since day one. Go read the USENET archives to find out. He received quite a bit of critique because the first versions of Linux were not portable but tied to i386.
x86 code density may not be optimal but it's better than regular ARM - only thumb-mode can beat it, and just barely.
> Also remember that the memory bandwidth is primarily a problem for data, but not code
RISCs, by design, need to bring the data into the processor for processing; but I see things like http://en.wikipedia.org/wiki/Computational_RAM being more widely used in the future, where the computation is brought to the data, and this becomes much easier to fit to a CISC like the x86 with its ability to operate on data in memory directly with a single instruction. Currently this is done with implicit reads/writes, but what I'm saying is that the hardware can then optimise these instructions however it likes.
The underlying principle is that breaking down complex operations into a series of simpler ones is easy, combining a series of simpler operations into a complex one, once hardware can handle doing the complex one faster, is much harder. x86 lagged behind in performance at the beginning because of a sequential microsequencer, but once Intel figured out how to parallelise that with the P6, they leapt ahead.
Linus being an Intel fanboy has nothing to do with whether x86 has an advantage or not. But even if you look at cross-CPU benchmarks like SPEC, x86 is consistently at the top of per-thread per-GHz performance, beating out the SPARCs and POWERs, and those are high performance, very expensive RISCs. I'd really like to see whether AMD's ARMs can do better than that.
I already start to feel like a grandpa talking about 8080, x86 processors. The next generation may not even remember what x86 is. The movie mimzy predicted intel to stay there until the far future, when they are able to fabric self-assembling smart chips.
Same reason that x86 became hot, really. There are these newfangled PCs/smartphones that are providing ridiculous volumes that create network effects and defray design expenses. Back in the day the idea of x86 in a server was crazy, but they were able to break into the server market from the bottom and mostly consumed it. The same might happen with ARM, or it might not since Intel is in a better position than the RISC vendors were with it's near monopoly giving it access to phenomenal engineering resources.
> Intel is in a better position than the RISC vendors were
Actually, Intel might be in a worse position with respect to vendor lock-in. I'm guessing a lot of early servers' lower layers like OS, webserver, etc. were proprietary; convincing the vendor to support x86 would have been a hard sell; and porting your application to an x86 environment was difficult.
All of these things would have had a tendency to lock people into their existing hosting choices.
Nowadays most servers run mostly / completely FOSS (at the lower layers) that can be easily ported to ARM. I'd imagine porting code to x86 from VAX or DEC or mainframe or whatever, was a lot more painful than porting PHP, Django or Ruby web apps to ARM today.
Of course, Intel does have deeper pockets and much of the desktop market, and may well be able to use that to keep ARM in check despite the fact that switching CPU architectures is probably much easier for website owners today than it was when Intel was trying to break into the server market.
X86 chips (especially Intel's) are leagues ahead in terms of performance per watt.
In fact not only regarding performance per watt, but also performance per dollar. It's just that ARM designs for lowest power consumption while Intel/AMD design for maximum perfomance.
The original choice of ARM for mobile and x86 for desktop is basically a historical accident.
The differences between modern ARM cpus and modern x86 have less to do with the ISA itself and more to do with the way ARM cpus have been designed to be low-power for decades and have worked their way up the performance scale, while x86 has been designed for performance and has only lately been emphasizing low power. These lead to different design points.
Because everything today is about the heat generated by computation. In a phone, it wastes the battery and is unpleasant for the user. In the datacentre, heat determines how much computation you can do in the volume of space you have, and how much you have to spend on cooling systems (the running of which is expensive too). So datacentre operators that already have a building are facing a choice: get a new building, or make better use of the one they have.
ARM cores are typically slower in absolute terms than Intel cores, but at a given level of power, you can run more of them.
Because there isn't any type of x86 processor that beats a comparable ARM processor for efficiency. If you could make an efficient x86 processor Atom would be it, and it's less efficient than ARM.
The x86 ISA fundamentally takes more silicon to implement than ARM. More gates = more power.
Everything Intel sells today clobbers any currently-marketed ARM chip on per-unit-energy computation performed. The race is not even close. ARM is only of interest if you are constrained by something other than compute (phones) or you don't know how to program and you are wasting most of the performance of your Xeons. The latter category contains nearly the entire enterprise software market and most other programmers as well.
Or, your program is entirely constrained by IO so most of the power of Xeon is wasted, while you still have to pay the premium for it.
This chip is interesting not because of the cpu core in it, but because it has two presumably fast 10GbE interfaces and possibility for a large amount of ram in a cheap-ish chip.
There's another variable to throw into the mix: all gates are not created equal. A 28nm (this new processor) takes a lot more power than a 22nm (new intel processors) gate.
Do you have a source for any of this? x86 is much more powerful than ARM by watt, being exponentially faster at most math. I've never had anyone seriously propose that ARM is more efficient than x86 at anything then not pulling watts from a Li Ion battery.
Can you elaborate what you mean by "exponentially"?
For ARMv7 vs x86, yes, x86 just destroys ARMv7 (Cortex A15 etc.) in double (float64) performance.
While I do think x86 is still faster vs ARMv8, the gap is likely much less per GHz, because ARMv8 Neon now supports doubles much like SSE. Of course Haswell has wider AVX (256-bit) and ability to issue two 256-bit wide FMAs per cycle (16 float64 ops). Cortex A57 can handle just 1/4th of that, 4 FMA float64 ops per cycle.
That said, low to mid level servers are not really crunching much numbers. They're all about branchy code such as business logic, encoding / decoding, etc. Or waiting for I/O to complete.
So why would you care about math in a low end server CPU if it's not being used anyways?
Maximizing density and still keeping everything within operating temperatures is one of the toughest parts of data center operations. Not to mention the cost of generating all that waste heat. Many tasks are not CPU bounded, so the ARM is plenty good for them.
I don't see AMD's play here. What value do they add with a processor that they don't design, don't fab, and can't produce in the kind of volume tablet and phone chips get produced?
It's possible that this is their first foray into ARM with a core license to get a feel, and their next iteration will be with an architecture license (which they can get some design value add).
I don't understand your comment. "Doesn't design and doesn't fab" describes every ARM licensee. Also, this chip isn't going anywhere near phones or tablets. Did you look at the specifications?
The other ARM licensees have a hook: Qualcomm integrates with its LTE base bands, Apple builds a phone around it, etc. The phone/tablet angle is important because the high volumes in those markets help justify big design teams at Qualcomm and Apple.
System level design, integration with gpgpu. Inspite of being dwarfed by Intel in the x86 space I would imagine AMD has the engineering bandwidth to compete with the current arm based designers - qualcomm and samsung.
AMD has actually donated some nice amount of engineer-hours to enable ImageMagick to use OpenCL. And the difference is massive.
So if you have a webservice that allows users to upload images and then subsequently processes the images it's cost efficient to purchase few AMD APU's instead of massive Intel Xeon server.
There are a lot of other things to do, but Imagemagick is something that has support for it right now.
Where is the market for this, apart from Facebook (Open Compute Project)? Is it set to compete with CPUs like the Xeon E3-1220L series? Will it end up in HP's Moonshot? I thought that bigger boxes with virtualization would be more economical for most uses than closets full of low-power CPUs.
Perhaps I/O is the key here, N of these A1100 CPUs can easily saturate N x 2 x 10GbE, a single box with 64+ cores probably cannot push 16 x 10GbE.
The developer board runs Fedora. Any workload that does not depend on a specific CPU architecture (mostly everything but Windows) should run on it. The dev board is there to make it possible to developers to fine tune their implementations so they run well on the new platform.
Will server makers buy it? That remains to be seen.
Making a dev board available (let's hope it's also cheap enough to make hobbyists buy it) is rather clever. Without software tuned for it, the chip could fail on the market like Sun's Niagara and Intel's Itanium did.
We're getting close to the age where you can buy a off the shelf AMD desktop machine with Linux, a good graphics card, and the same performance of a x86-64.
While support varies, it's pretty common for Linux programs to be written with an eye to portability. It's really up to the distro to package things up nicely. Most distros that decide to support ARM have many packages in the repos, just as easy to install as x86. You can search here for packages in the ARM port of Archlinux http://archlinuxarm.org/packages Here's a list of ARM packages from Debian http://packages.debian.org/squeeze/armel/
I have a feeling history is going to repeat itself. Statements like "AMD believes that it will be the leader of this ARM Server market" reminds me of the DRAM boom-and-bust from 2006-2009. The new (old) hot technology froths the market into a frenzy and semiconductor fabs start rushing to get a slice of the action.
That's an... odd perspective. AMD64 was a huge threat to Intel. It dominated the then flagship P4 and completely destroyed Intel's attempt to force us into an "Itanium everywhere" future.
The fact it didn't leave AMD in charge of the x86-compatible market owes more to their superb Israeli chip design team being able to pull them out of the frying pan.
In order to take significant market share, AMD will either need to offer superior price/performance and/or superior performance/watt. Both are unlikely because at 28nm, by the time the A1100 hits the market it will effectively be two manufacturing processes behind Intel's 14nm. Manufacturing process size makes a big difference in production costs, as well as in power efficiency. The microarchitecture won't even be a factor with that great a disparity.
Especially with AMDs ability to throw GPU hardware on the CPU. ARM eats the bottom end on a $/Watt model, and the GPU eats the top end on a raw performance model.
Optics are a big chunk also. A new controller with DAC Gbe phy is probably more like 7-8W. They only need to do the controller on this chip. A couple watts for the Phy are part of the motherboards budget.
While this seems to be targeted for small low power web servers, I really want low powered, cool and low temperature laptops. Laptops with hot intel processors are cooking my body if I actually keep my laptop on my lap.
Modern processors draw very little power at idle; it's only at full load that they reach TDP. If your laptop is constantly doing something with its CPU then it will obviously get much warmer than one that's just idling. I have one with a Core Duo and it barely gets warm if I'm just reading static webpages and editing text, but really heats up if I'm gaming or watching videos.
I have the late 2013 MacBook Air with the new Haswell Core i5 CPU and it's the first time I can put my notebook on my lap for as long as I want without getting a heat stroke.
As long as I don't do computing intensive stuff like playing games or visiting some websites that overuse javascript the MBA runs really cool. Cooler than my body temperature.
-- typed on my MBA, lying on the couch, having it placed on my belly ;)
wasn't it someone at facebook who remarked that they would be interested in ARM cpu's once the freq > 2.5Ghz, also it seems that google also has a bunch of pa-semi guys, so, they working on an ARM clone isn't so far fetched...
At this point, hardware crypto engines are a "must have" feature for many purchasers. Their desire for performance outweighs concerns about malicious actors backdooring the crypto engine.
And if the crypto engine is compromised, how little/great of a leap is it to believe there is microcode to backdoor a general OS or crypto library?
These aren't unusual features in SoCs. Of course if you're concerned about backdoors you don't have to invoke these engines, at least with a software implementation you can control the implementation.
While that's true and required to successfully decrypt most algorithms, it is also true that there are more types of tampering one can do than changing the output ciphertext. Usually involving storing the key or leaking data somehow.
Assuming they've already compromised the crypto bits of the chip there's nothing to gain in avoiding them since the non-crypto bits could just as well have the same compromises. Might as well just take the time/energy savings.
Tampering with the RNG probably provides the best value for an attacker, and is harder to detect.
That is a pretty great thing, as far as I am concerned. Custom ASICs for routine server tasks that would otherwise clog up a general purpose core? I'll take it.
First I remember everyone driving the price up of an AMD video card just because bitcoin mining.
Second they got the backing in PS4 and Xbox One hardware.
Now an Arm 8-core CPU...although I find the clock speed (2GHz) kinda underwhelming, still AMD's pricing would entice me to buy 2 for the price of 1 Intel i7