"Intel has just published a news release on its website stating that Jim Keller has resigned from the company, effective immediately, due to personal reasons.
"Intel’s press release today states that Jim Keller is leaving the position on June 11th ( 2020 ) due to personal reasons. However, he will remain with the company as a consultant for six months in order to assist with the transition." [1]
Exactly six months later he took a new job. Some may want to look back at their comment on the subject. [2] [3]
Still waiting for the Story between Jim and Gerard.[4]
This requires a post of its own. This is a must watch for anyone interested in cpu architecture. The clarity with with he talks about some of the complex problems in cpu design is brilliant. The interviewer does a decent job to make it more palatable for a larger audience.
My provisional answer is that I'd host a conversation between experts.
For example, I'm now very curious about Apple Silicon M1 wrt to Java's Memory Model and Project Loom (structured concurrency). But I don't know nearly enough to even ask smart questions, much less understand the answers.
So my dream future perfect interview would have Ron Pressler, Doug Lea, and one or two people really smart about M1 (the only name I know is Dan Luu) sit around and chat it up.
I'd ask them open ended questions, like "What's new and different?" "What happens next?" "What are you excited about?"
The conversations would likely happen over multiple sessions and different mediums. Because the experts would share and ask each other stuff which would prompt followups.
As podcast host, I'd try to be catalyst, try to remove myself from the convo as much as possible. I can't think of any examples, role models. While I'm a huge fan of Ezra Klein and Adam Gordon Bell (Corecursive), I'm not confident I could lean in like they do.
One tactic both Lex and AGB do really well is prompt their guests to explicitly define jargon. I suspect that some of the perceptions of Lex's ignorance are him trying to make topics more accessible. eg Working close to the metal with AI, I'm quite confident Lex knows about branch prediction.
I think you are right about Lex and also thanks for the compliment!
Even if you know about branch prediction, then asking the guest to explain it, maybe even pretending not to know about it, is a great way to have concepts introduced and make things more approachable.
Lex wouldn't be as popular as he was if he didn't have a good sense for the level of knowledge his ideal listener has about the subject.
I don’t see a problem with that. Lex’s podcast is aimed at a general technical audience of different backgrounds, so he often asks questions on behalf of listeners who are technical but might not be experts in the field. To be fair, machine learning and CPU design are vastly different fields with little overlap.
I mean, I have a PhD and had no idea what branch prediction was until I listened to that podcast.
This was one of the most impressive things I've ever seen. The sheer mind puzzle exploration of "well..if we had a CPU the size of the sun, here's why it still wouldn't work". Guy seems unbelievably brilliant.
He also had the best answer to Lex's meaning of life question (last few minutes of the interview). Really made me stop and re-listen. It's very rare for someone on a podcast to think about every word they say.
Thank you. This is definitely one of the best interviews I have read. The precision of the statements and the usage of analogies to explain the topics is astonishing.
I often find it hard to believe that a single person can make much of a difference in such intricate problem domains as chip design but in his case the evidence is overwhelming. Also goes to show what a shit show Intel has become since even he was not able to right that ship. I think in the CPU space it will be all ARM and RISC ten years from now and since Intel never really managed to become a dominant player in any other (relevant) field, they are pretty much done for.
Part of it may also be the situation the company is in, and it' mindset, when Keller is hired.
There's no arguing that Keller is a smart guy, but he doesn't design an entire CPU architecture himself. If you're desperate and say "Okay, we are hiring the smartest guy we can find to build our new CPU, and give everything he needs to make it happen", then perhaps you get the AMD64, Zen or the A4 and A5. If you try to just dump a smart guy into a team as just another engineer, maybe you get nothing, like Intel.
Perhaps AMD, who already knew him, just gave Keller everything he need to build a team that can deliver on a new architecture, even when he's no longer there. Same with Apple. Intel on the other hand may have been unwilling to grant Keller the same level of autonomy and control. Then it also makes sense that he would leave Intel, for personal reasons, those being: "I can't work here, they won't let me do my job".
One of the tendencies of shrinking companies is exacerbated executive infighting.
If the company is growing, there are new X-of-Y positions to move up to.
If the company is stable or shrinking, people start watching out for their own careers with knives out.
AMD possibly avoided this because of size & realization of what needed to be done. Intel's too big & old: I would be very surprised if they weren't much more internally resistant to that sort of change.
And one can only deal with your colleagues throwing up brick wall after brick wall on every bit of minutiae for so long, as least if you're talented enough to have other options.
This is an absolutely on-point observation about company dynamics that a large number of people in the tech industry have never had to experience. It's why growth is so critical.
Past a certain size threshold the organizational and social dynamics of human relationships seems to be the predominant factor in getting anything done. i.e. There’s a limit to how big of a headwind we can cope with.
It is hard to believe. However out of my own experience: Execution, decision making, vision and goals, people, alignment, setbacks. A very complex mixture and the more people there are, the more important a driver is.
PS: Nice nickname. Read the book from Th. Mann - over a period of 7 years I think.
If you have someone in charge who knows how to run things and help people do their jobs better while having a vision of the future you have a much better chance of success.
I learned an important lesson at my first job out of school: high-quality tech people are more common than the person who can effectively lead them. It was a tough lesson for me because I had spent my entire academic career striving to become a top-notch engineer. Don't confuse a leader with a manager. They might be the same, they might not.
>Also goes to show what a shit show Intel has become since even he was not able to right that ship.
I think we are already starting to see fruit of his work. Intel doesn't need Jim Keller for CPU uArch design. Intel has had their uArch roadmap ready, and they were the best in the Industry if it wasn't for the 10nm delay. They also have work in the pipeline all being held back by their process node.
Jim described it in one of his interview ( Sorry I spent 10 min but couldn't find the source, so I may have remembered it wrong ) about not having process node held back your chip design, where he has experience in doing so in AMD and Apple. Being flexible enough to back port your design should anything happen as Plan B. Where previously Intel was just keep waiting for the process guys to fix it. That in itself is a huge workflow changes. It is hard to imagine the amount of work required to push this through especially with all the internal politics at Intel.
And Intel is at least looking at TSMC / alternative paths for some of their product lineup now ( Gaming Focused Large- Die Size GPU ) . Whether that is decided or not is unclear. But at least we have Rocket Lake launching soon which is sort of a half baked Willow Cove ( Used in Tiger Lake ) ported back to 14nm on Desktop. And we have Sapphire Rapid as well as other product roadmap hinting at multiple node ( Shown in Investor meetings notes ). That is at least showing Intel has changed their Internal design to be flexible enough in case of another 10nm like fiasco. And I think Jim Keller has some credit in this transition.
That is of course, having flexible design still doesn't fix their problem if TSMC is 2 years ahead of Intel in leading edge node design, volume and cost. And as I have repeatedly stated, Intel's problem is not design, but their business Model. And It would not surprise me if TSMC have shipped more 5nm wafers last year ( 2020 ) than Intel's entire 10nm production history since 2017.
Honest question, are we sure he didn't make a difference? Dude usually shows up, does the work with the team, leaves.. only year, two or plus we see the results.
There's a very big delay between finalizing a design and actually etching said design in silicon, and bringing a design to silicon of course involves a lot of non-trivial work. This is kind of why Intel had the whole tick-tock thing, as whilst the current design is being put into silicon, the design team can work on the next iteration of the design.Also why AMD could be very confident about their next Zen iteration being a lot faster when they were releasing Zen 2.
I only know the name of one prominent individual within the CPU space that is not a CEO or major scientist; and that's Jim Keller.
Why? Because I never ever see any articles talking about any other interesting employees. Every single time it's Jim Keller.
I'm sure he's good, his interview with Lex Fridman shows that he's knowledgeable and creative, but there's no way he's as exclusive a major force as the media portrays him.
Just going by the interview posted by another commenter, it seems to me a big reason is that he seems to enjoy the "people challenge" about as much as the technical challenge.
One argument is perhaps that bad management can be stifling and it can be hard to achieve good outcomes under bad management. The semiconductor space is perhaps difficult because you have very long lead times and the cost of each iteration is high: if you have different parts of the organisation pulling in different directions, you're unlikely to have a good outcome, and iterating to unify that direction is very difficult.
Novel development in a complex problem space isn't something one can just throw manpower at and expect progress. I'm sure that a certain amount of his fame is due to being a famous architect, as name recognition always compounds. There's no way he's more influential than the rest of the industry combined (shoulders of giants and whatnot), but I would be hard pressed to find press recognition of other hardware engineers. Still, he is undoubtedly exceptional.
For example, one could round up as many scientists as they could find in 1900, but there is no number that would guarantee the progress made in theoretical physics by someone like Einstein alone.
Worse than that I wonder if the trouble at Intel (e.g. inability to develop post 14nm chips plus one insane instruction set extension after another -- I wonder if the point of AMX is to have a big die area that is mostly unused that doesn't need to be cooled) isn't something that people like him are running from but rather something they are going to bring with then wherever they wind up.
>one insane instruction set extension after another
You're probably going to see a whole lot more of this sort of thing given the limits to process scaling. Keeping things simple and backwardly compatible made sense when you could just throw more transistors at the problem. Now you're seeing more and more specialized circuitry that software people are just going to have to deal with.
I am not against a new instruction. At first blush the new JavaScript instruction in arm might seem like a boondoggle but it is a simple arithmetic operation.
Compare that to the non-scalable SIMD instructions that mean you have to rewrite your code to take advantage of them and resultingly people don't bother to use them at all.
AMX allocates a huge die area to GEMM functionality that gets used a lot less in real numerics than you'd gather from reading a linear algebra textbook.
There are other approaches to the problems the industry faces other than 'fill up the die with registers that will never be used', nvidia and apple are going that way and that is why they are succeeding and Intel is failing.
As I understand it, Apple have a direct equivalent to Intel's AMX as an undocumented instruction set on their new Apple Silicon laptop processors, it just took a while for people to figure it out because the whole thing was hidden behind an acceleration library that is implemented very differently on Intel-based Macs.
> find it hard to believe that a single person can make much of a difference ... goes to show what a shit show Intel has become since even he was not able to right that ship.
These statements almost seem contradictory. What if instead of "not being able to right that ship", it is instead an example to the contrary?
Intel started out with memory. They never left and are on by far the cutting edge with memory tech. In particular Optane NVM DIMMs are so fast they basically define a new layer in the performance/cache hierarchy. Intel might see a shift in their focus over time away from CPUs to chalcogenide based persistent memory, where it seems they have held the lead for some time now.
Isn't 80+% of Intel's revenue from CPUs with another some percent from mobile chips? So while they may produce memory it's almost irrelevant as far as current revenue goes.
Nobody does that stuff alone. I think a more accurate description would be that he seems to be a force multiplier for the team he leads, which can have a bigger impact than any singular engineering feat.
If there are only a handful of people capable of doing this or even just one then 10x or 10000x doesn't make sense because that is a productivity metric. 10000 normal engineers wouldn't be able to do it just like 10 juniors can't do the job of one senior.
IMHO, the correct way to look at it is to compare impact/results.
A 10x engineer might not get 10x done, but their work will be 10x better in a combination of ways: quality, maintainability, speed, portability, extendability, etc. Hopefully the ways in which their work is better fits the priorities of the organization.
That’s the only way you can really call someone an N-xer compared to a productive individual contributor.
Someone like Jim Keller is a big multiplier at a higher level. People today understand the value an executive like Steve Jobs brings, but usually there’s debate on the value before it becomes clear a few years later.
You can have a sense of how it would be to work with him in this interview with Lex Fridman[1]. I'm also curious how it would be to work with him since it sounds like he is full of himself[3]. But he I believe he has the right for it since all what he has achieved[3].
The guy also read couple books a week[4] for last decades
I watched that interview half a year ago, and came away with the impression of a curious humble guy who was interested in digging deep below the surface and working on big advances.
I think you meant to refer to [2], and that part of the interview is Lex Fridman interjecting him when he was trying to make a deep point about how to think about things.
It's a bit corny that Reddit and HN go over the top about this guy like he's the centre of the universe, just because he's the only player they know the name of and could point out in a photo.
I think it's a timing thing. There was an earlier post with the Anandtech article late yesterday, but because so much of HN is in NA timezones it was likely smothered by the morning.
> The chips also operate on the assumption that future software will involve programmers giving high-level directions while artificially intelligent computers write much of the nitty gritty code required to implement those human ideas.
It'll turn more like trying to teach someone to do a trick, with all the misunderstandings and frustrations to boot.
Mind you, that same thing is now done to software devs, that is, product manager tries to explain what they want, designers and devs interpret in a certain way.
About 6 months ago my job required that I finally get my hands dirty writing x86 assembly. It's my first real foray into assembly coding.
There are a few aspects of it that I'm really enjoying:
- I can now actually understand the disassembled code that I see during debugging. This includes recognizing some of the assembly patterns that appear because of ABI requirements and/or common programming idioms.
- I'm becoming comfortable with a programming idiom that I've never really used in the past: registers, flags, various kinds of memory addressing.
- It helps my understanding of compilers' lower levels / backends, and the related problems: register allocation, instruction selection, etc.
- It provides a clear path for my first attempt at writing JIT code (using Xbyak[0]).
So as Richard Feynman might have said, it's great fun!
This page has some details of Tenstorrent's current chips [1]. Looks like a manycore simd design focused on tensor ops from a brief look. Apparently they also have some sort or compression scheme to boost memory bandwidth.
I don't notice anything in particular that stands out vs. the many other AI chips people are making, at first glance. But I'm far from an expert. There are several other technical videos on their YouTube channel as well: https://www.youtube.com/channel/UC7041p6DlAh0r4_Fnlk10pQ
Thanks for these! My assumption is that hiring Keller means they plan to gain advantage by world class execution rather than some crazy architectural leap of faith.
Watching the video I think turning each tensor into packets is quite clever as you get some ability to send them around a network and organise layers/data manipulation/transforming the layers/compression all as part of the stack.
I’m pretty surprised no-one has actually exposed the actor model for parallelising neural networks, it seems it would work quite well and allow you to have a layer per node (or actually many split configurations). Maybe data locality would be an issue with actor based approaches. They seem to be solving this at a lower level but with less knowledge of the actual parallelism in software.
Their marketing material states: "Facilitating machines to go beyond pattern recognition and into cause-and-effect learning".
I wonder what they are referring to. Are they accelerating what SHAP's GradientExplainer [1] does? (namely: crafting inputs at a specific layer, propagating forward to see the influence on class prediction, and sort of backpropagating to pixels) Or is it about something more related to Judea Pearl's work on causality?
Good question. I also predicted in 2015, that Donald Trump could become president, since he was giving public speeches about that before running. So yeah, I was right. Also, chances were 50-50 since there are only 2 parties. So I predicted and he become president.
I think the same goes with this guy, Jim Keller quit Intel and could join another company sooner or later, and that will not be a former one, likely a startup. We are Genius.
A lot of people in this thread are asking questions about Jim Keller single-handed contributions.
Could it be possible he's "famous for being famous"?
E.g. Some might say Jeff Dean (Google) fits this mode a bit, whereas Sanjay Ghemawat (Google) has contributed arguably just as much if not more - but is mentioned radically less than Jeff.
"Intel’s press release today states that Jim Keller is leaving the position on June 11th ( 2020 ) due to personal reasons. However, he will remain with the company as a consultant for six months in order to assist with the transition." [1]
Exactly six months later he took a new job. Some may want to look back at their comment on the subject. [2] [3]
Still waiting for the Story between Jim and Gerard.[4]
[1] https://www.anandtech.com/show/15846/jim-keller-resigns-from...
[2] https://news.ycombinator.com/item?id=23496083
[3] https://news.ycombinator.com/item?id=23493046
[4] https://news.ycombinator.com/item?id=23497336