Anyone can explain why opensource projects embrace CUDA over OpenCL? As I understand OpenCL is more generic API which could be potentially used with CPUs and GPUs.
There are probably a lot of factors. I worked on CUDA code for around a year, and used to understand the landscape pretty well, but if I were to start a high-performance computing project today I'd probably take my lumps and go with OpenCL. There would be a lot of lumps.
Firstly, CUDA is just more mature; there is a very large and well-established set of libraries for a lot of common operations, there is a decent sized community, and Nvidia even produces specialized hardware (Tesla cards) designed just for CUDA.
Second, all that generic-ness of OpenCL doesn't come for free. With Nvidia, you're just working with one architecture; CUDA cards. Optimizing your kernels is much easier. OpenCL is just generically parallel, so you could have any sort of crazy heterogeneous high-performance computing environment you have to fiddle with (any number of CPU's with different chipsets and any number of GPU's with different chipsets).
I haven't used OpenCL myself, but almost purely anecdotally I have heard many people say that CUDA is often slightly faster[1] and the code is easier to write.
TL;DR: CUDA sacrifices flexibility for ease of development and performance gains. OpenCL wants to be everything for everyone, and comes with the typical burdens.
[1]: Maybe this is a result of OpenCL being more generic and so harder to optimize.
I've been working on a rather large computation library using OpenCL. OpenCL is useful for providing an abstraction over multiple device types. If you are only interested in producing highly-tuned parallel code to execute on NVidia hardware, I suggest sticking to CUDA for the above reasons.
I utilised the OpenCL programming interface to write code that would run the same kernel functions on CPU and/or GPU devices (using heuristics to trade-off latency/throughput) which is something that is not possible afaik using the CUDA toolchain.
FYI regarding highly-tuned code -- An ex ATI/AMD GPU core designer told me that the price you pay for writing optimized code in OpenCL versus the device specific assembler is roughly 3x. Something to keep in mind if you're targeting a large enough system to OpenCL and you find spots that can't be pushed any faster.
Unlike previous versions, OpenCL 2.0 been shown to only be about 30%[1] slower than CUDA and can approach comparable performance given enough optimisation.
Since I am working on code generation of Kernels to perform dynamic tasks, I can't afford to write at the lowest level available. (I'm accelerating Python/Ruby routines though so OpenCL gives a significant bonus without much pain at all.)
> Nvidia is in the slow process of eventually discontinuing further CUDA support, and it is recommended to write new code in OpenCL only.
[Citation needed]
Their OpenCL support is still limited to v1.1 (released in 2010), while just few months ago they've released a new major version of CUDA with tons of features nowhere to be seen in (any vendor's) OpenCL.
Furthermore Python[1], Matlab[2], F#[3]. Furthermore parallel device debuggers (TotalView, Allinea), profilers (NVIDIA). There's a long way for OpenCL to catch up, if ever (because there might be a better standard coming further down the line).
On the contrary, I'd argue that it's not true in specific cases.
"Because it's hard" is a cop-out.
"It's too hard to accomplish given constraint [X]" where X is a deadline, financial constraints, or other real/tangible resource limitations might be one thing. But if you're working on your own timeline on some sort of open-source project, or there is nothing external preventing you from acquiring the expertise/resources to conquer the hard problem, then "Because it's hard" is an absolutely shitty excuse to not do something.
I suggest you read "It's too hard" when written by other developers as, "It's too hard [given that I spend N hours a week on this and would rather actually accomplish something in the next two months than learn the 'right' API]." Or, "It's too hard [given various constraints that I'm not going to explain to you but are valid to me.]" It'll save you having to give speeches about shitty excuses.
That said, if it makes sense for your project, make it happen! :)
Even if the long term goal is more portable GPU support it still makes some sense to get a CUDA implementation up first if it is easier to get to. It then allows real world testing faster, can always go to openCL later once they know more.
Just out of curiosity: how often did you see that happen (not only related to GPU's, but technologie decisions overall)? In my (little) experience the change at a later moment will not happen. Most of the time because the management has a new idea/project which you have to attend to.
It's often a great excuse to do something else instead. If you can't get what you need done without the more difficult option, sure do it. But there's no sense in going down the harder path needlessly.
I'm not trying to convince you you don't need it or shouldn't do it, I was looking for a datapoint about what you find valuable in OpenCL.
Well, the portability can be a killer feature. I've been writing quite a bit of OpenCL code lately. I have an AMD GPU, so CUDA is a non-starter. I'll eventually replace the AMD card with an NVIDIA one, so it won't be as big of a problem, but my OpenCL code will still be fine then.
You're preaching to the choir. Looks like NEC funded this.
NVidia seems to be the preferred hardware for institutions/big companies. I'm not sure if this is because NVidia's architecture is better for supercomputers or if they're simply better at marketing to those types of customers
NVidia funds a lot of academics in my space, and I've found academia to be very anti-open source for those reasons, which amuses me greatly.
Case in point, Matlab. Why is this taught in a world with Python/Numpy/MatplotLib?
Matlab seems like inertia/culture to me: it's the longtime de-facto standard in engineering. Since it's what everyone uses, it's got packages for everything, and papers will often come with prototype Matlab implementations. Roughly like the cultural position R holds in statistics. Matlab's hold on engineering is also bolstered by its widespread use in industry: students want to learn it, because it's what their future employers use, and professors / research scientists like to use it because it's what their industrial collaborators use.
In my area of CS (artificial intelligence) it seems considerably less popular. I don't really remember how to use it, since the last time I used it seriously was in some engineering (but not CS) courses in undergrad.
In addition, the MathWorks have so far managed not too screw up too badly and are keeping Matlab up to date. (They are definitely quite nice as an employer.)
Could be; we don't use Matlab much in my own research area, so recent change could've happened under my radar. When I've occasionally had contact with engineers in industry, though, Matlab still seemed to be everywhere. The most recent two examples were someone doing DSP, and someone doing mechanical engineering, and both had all their stuff built on top of Matlab+Simulink.
Matlab knows how to control numerical precision and many algorithms produce the best results when running on that platform.
In the world of electrical engineering Matlab can do things that other packages can't.
From personal experience, I have spent many hours looking at the results of a atan2 function in C++ and Matlab and trying to get them to agree. After a day of work I was able to get them to agree by precisely controlling the rounding modes and using my own atan2 function. This was not fun, and I would rather give somebody 1k to take care of it for me.
I'm saying there is a systematic advantage to using proprietary technologies in academic research (companies have money, so you can write a grant and they will pay you $). Case in point, look at apps coming out of academia and you'll see a lot of WindowsPhone. It is because Microsoft gives away a ton of free phones (I have one on my desk at this moment) and Azure time.
Ok, that tracks better than it being something nVidia in particular did.
I don't see a serious problem with the scenario you describe, though. You're not really describing a hostile scenario, just an affinity for commercial software.
There is a bit of a problem of course; I find a lot of papers that describe how to do things with commercial technology that isn't in the budget. That hasn't been insurmountable for me in any way, but maybe others have had more serious problems with it.
The free alternatives are not so good for beginners. One of MATLAB's main strengths is the embedded editor + repl, whereas with Python/Numpy/Matplotlib stack, there are just too many moving parts. The environment of MATLAB can be emulated with ipython workbook or emacs, but I don't believe it is easy enough for beginners.
Are you aware of Sage [1]? It's a Python-based, batteries-included, integrated maths system. It is actually more popular than Matlab around the lab here. Incidentally, we don't get much funding from corporations (but yes, we have licenses for Matlab, Maple and Mathematica for everyone, just somehow Sage is more popular).
One reason why lots of things typically use CUDA is because NVidia makes datacenter rackmounted GPU gear. So there is no need to run generically if that is the only available "production" hardware.
The fun part of parallel programming is getting things running on your GPU, parallelizing the algorithm, then tuning and optimizing the code. This is easier, faster, and more pleasant in CUDA with it's mature tools and ecosystem. That's why open source projects often use CUDA.
The advantage of OpenCL is that it runs on more platforms (not just NVIDIA). The problem is that it's more complicated and more of a headache.
My advice to programmers is to start with CUDA and play around with your problem for a while. Time spent learning how GPUs work, what kinds of operations are efficient, and how to parallelize algorithms is not wasted if you switch to OpenCL later. Once you've made some progress then make an informed decision about whether you want to go to production with CUDA or OpenCL.
Because in bitcoin mining, it's always ATI/AMD and OpenCL because ATI cards are like ten times faster than Nvidia cards. This is because of architecture differencies.
Does it not affect this postgres table-scanning task? I wonder if they did any benchmarks.
AMD's advantage in Bitcoin mining was purely due to an architectural quirk: their shader cores supported bitwise rotation, but Nvidia's didn't. Bitwise rotation is a rare instruction outside of certain crypto algorithms (like SHA256!), so this really means very little for general-purpose performance.
"AMD's advantage in Bitcoin mining was purely due to an architectural quirk"
False. I authored a Bitcoin miner utilizing this quirk (bit_align). I was also the first to leverage another instruction exclusive to AMD (bfi_int): https://bitcointalk.org/?topic=2949 bit_align "only" gave AMD a 1.7x advantage over Nvidia. The biggest perf gains (2x-3x!) came from the fact AMD has more execution units: https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU... (I also authored this section of the wiki).
This is because the hashing algorithm is highly dependent on integer rotate right instruction. AMD's implements it in 1 clock cycle, Nvidia in 3. So it's a special case.
NVidia cards do not efficiently implement bit rotation, while AMD cards do, and it happens to be the core part of the SHA algorithm used for bitcoin. In general for an arbitrary task they're fairly close in performance.
AMD cards dominate Nvidia in pretty much all password hash bruteforcing algos, even those that do not rely on bit rotation (bit_align). See
http://golubev.com/gpuest.htm for example. It is true that another instruction helps in more cases (bfi_int which I talked about at http://blog.zorinaq.com/?e=43) but in general, AMD cards have a lot more raw integer and floating point compute resources (execution units) than Nvidia cards.
The fastest Nvidia showing is the Tesla S2070 which is a $18k Server with 8 GPU's! It can just barely keep up with a slightly over clocked single gpu HD7970.
I'm not a miner, so correct me if I'm wrong, but that seems like a bit of an apples-to-oranges comparison. Tesla cards (and the servers designed around them) are intended for specific use cases: mission-critical enterprise solutions and scientific HPC. As a result, they run slower processor and memory speeds in comparison to nVidia's own consumer products, use ECC memory, and are optimized for double-precision over single-precision performance. Mining with a Tesla is like gaming with a Quadro card.
There is nothing mission critical or 'enterprise' about Tesla/Fermi cards. You can crash them and lock up your whole machine. Even if you can reboot the OS the card may not respond and the rebooted OS won't see it, we sometimes have to physically shut the machine down to reset the Nvidia card. Nvidia is still a gaming company at heart and it's going to take a while for them to adjust to providing equipment that is meant to be reliable and not just fast.
there are matrix multiplication routines developed for CUDA. OpenCL you would have to do everything from the ground up. So Nvidia gave everyone a head start for numerical computation, and that edge has ever since snowballed.
No, they don't. They implement ECC by re-purposing some of the existing RAM to hold the parity data. Enabling ECC reduces the usable amount of memory, and also can hurt performance. It offers some improved reliability, but it's nothing like a real server-grade memory system.
CUDA is a lot more mature and easier to program for. OpenCL likely has the future, but it takes more work to set it up and if you have Nvidia cards it is harder to optimize.
So is nvidia. Both companies produce absolutely horrible drivers. Not just for linux either, tons of bluescreens, crashes and other windows instability issues are video driver bugs. That is what happens when the sole concern is speed, and stability is totally ignored.
I have to agree with the grandparent. AMD's graphics drivers for Linux are a complete disaster. NVidia's are only a partial disaster. And sometimes that's the best you get.
From my experience porting CUDA code to OpenCL code, CUDA is much cleaner and more succinct since it is able to assume a lot about the underlying hardware.
Anyone can explain why opensource projects embrace CUDA over OpenCL? As I understand OpenCL is more generic API which could be potentially used with CPUs and GPUs.