Fast dev tools are awesome and I am glad the TS team is thinking deeply about dev experience, as always!
One trade off is if the code for TS is no longer written in TS, that means the core team won’t be dogfooding TS day in and day out anymore, which might hurt devx in the long run. This is one of the failure modes that hurt Flow (written in OCaml), IMO. Curious how the team is thinking about this.
Hey bcherny! Yes, dog-fooding (self-hosting) has definitely been a huge part in making TypeScript's development experience as good as it is. The upside is the breadth of tests and infrastructure we've already put together to watch out for regressions. Still, to supplement this I think we will definitely be leaning a lot on developer feedback and will need to write more TypeScript that may not be in a compiler or language service codebase. :D
Interesting! This sounds like a surprisingly hard problem to me, from what I've seen of other infra teams.
Does that mean more "support rotations" for TS compiler engineers on GitHub? Are there full-stack TS apps that the TS team owns that ownership can be spread around more? Will the TS team do more rotations onto other teams at MSFT?
Ultimately the solution has to be breaking the browser monopoly on JS, via performance parity of WASM or some other route, so that developers can dogfood in performant languages instead across all their tooling, front end, and back end.
First, this thread and article have nothing to do with language and/or application execution performance. It is only about the tsc compiler execution time.
Second, JavaScript already executes quickly. Aside from arithmetic operations it has now reached performance parity to Java and highly optimized JavaScript (typed arrays and an understanding of data access from arrays and objects in memory) can come within 1.5x execution speed of C++. At this point all the slowness of JavaScript is related to things other than code execution, such as: garbage collection, unnecessary framework code bloat, and poorly written code.
That being said it isn't realistic to expect measurably significant faster execution times by replacing JavaScript with a WASM runtime. This is more true after considering that many performance problems with JavaScript in the wild are human problems more than technology problems.
Third, WASM has nothing to do with JavaScript, according to its originators and maintainers. WASM was never created to compete, replace, modify, or influence JavaScript. WASM was created as a language ubiquitous Flash replacement in a sandbox. Since WASM executes in an agnostic sandbox the cost to replace an existing runtime is high since an existing run time is already available but a WASM runtime is more akin to installing a desktop application for first time run.
How do you reconcile this view with the fact that the typescript team rewrote the compiler in Go and it got 10x faster? Do you think that they could have kept in in typescript and achieved similar performance but they didn't for some reason?
This was touched on in the video a little bit—essentially, the TypeScript codebase has a lot of polymorphic function calls, and so is generally hard to JIT optimize. JS to Go therefore yielded a direct ~3.5x improvement.
The rest of the 10x comes from multi-threading, which wasn't possible to do in a simple way in the JS compiler (efficient multithreading while writing idiomatic code is hard in JS).
JavaScript is very fast for single-threaded programs with monomorphic functions, but in the TypeScript compiler's case, the polymorphic functions and opportunity for parallelization mean that Go is substantially faster while keeping the same overall program structure.
I have no idea about the details of their test cases. If they had used an even faster language like Cobol or Fortran maybe they could have gotten it 1,000,000x faster.
What I do know is that some people complain about long compile times in their code that can last up to 10 minutes. I had a personal application that was greater than 60k lines of code and the tsc compiler would compile it in about 13 seconds on my super old computer. SWC would compile it in about 2.5 seconds. This tells me the far greater opportunity for performance improvement is not in modifying the compiler but in modifying the application instance.
Are you looking for non-browser performance such as 3d? I see no case that another language is going to bring performance to the DOM. You'd have to be rendering straight to canvas/webgl for me to believe any of this.
The issue with Flow is that it's slow, flaky and has shifted the entire paradigm multiple times making version upgrades nearly impossible without also updating your dependencies, IF your dependencies adopted the new flow version as well. Otherwise you're SOL.
As a result the amount of libraries that ship flow types has absolutely dwindled over the years, and now typescript has completely taken over.
Our experience is the opposite, we have a pretty large flow typed code base, and can do a full check in <100ms. When we converted to TS (decided not to merged) we saw typescript was in the multiple minute mark. It’s worth checking out LTI and how the typing on boundaries, enables flow to parallelize and give very precise error messages compared to TS. The third party lib support is however basically dead, except the latest versions of flow are starting to enable ingestion of TS types, so that’s interesting.
They should write a typescript-to-go transpiler (in typescript) , so that they can write their compiler in typescript and use typescript to transpile it to go.
Thanks everyone for all your questions! The team and I are signing off. Please drop any other bugs or feature requests here: https://github.com/anthropics/claude-code. Thanks and happy coding!
Hi everyone! Boris from the Claude Code team here. @eschluntz, @catherinewu, @wolffiex, @bdr and I will be around for the next hour or so and we'll do our best to answer your questions about the product.
One thing I would love to have fixed - I type in a prompt, the model produces 90% or even 100% of the answer, and then shows an error that the system is at capacity and can't produce an answer. And then the response that has already been provided is removed! Please just make it where I can still have access to the response that has been provided, even if it is incomplete.
I've made the extension, but I haven't been able to test it (hence I'd rather not release it). I use Claude daily, but I haven't bumped into the situation yet where the generated output would disappear.
To me it doesn’t look like a bug. I believe it is a intended “feature” pushed from high management - a dark patern to make plebs pay for answer that has overflowed the quota.
The biggest complaint I (and several others) have is that we continuously hit the limit via the UI after even just a few intensive queries. Of course, we can use the console API, but then we lose ability to have things like Projects, etc.
Do you foresee these limitations increasing anytime soon?
Quick Edit: Just wanted to also say thank you for all your hard work, Claude has been phenomenal.
I'm sure many of us would gladly pay more to get 3-5x the limit.
And I'm also sure that you're working on it, but some kind of auto-summarization of facts to reduce the context in order to avoid penalizing long threads would be sweet.
I don't know if your internal users are dogfooding the product that has user limits, so you may not have had this feedback - it makes me irritable/stressed to know that I'm running up close to the limit without having gotten to the bottom of a bug. I don't think stress response in your users is a desirable thing :).
It takes time to grow capacity to meet growing revenue/usage. As parent is saying, if you are in a growth market at time T with capacity X, you would rather have more people using it even if that means they can each use less.
The problem with the API is that it, as it says in the documentation, could cost $100/hr.
I would pay $50/mo or something to be able to have reasonable use of Claude Code in a limited (but not as limited) way as through the web UI, but all of these coding tools seem to work only with the API and are therefore either too expensive or too limited.
> The problem with the API is that it, as it says in the documentation, could cost $100/hr.
I've used https://github.com/cline/cline to get a similar workflow to their Claude Code demo, and yes it's amazing how quickly the token counts add up. Claude seems to have capacity issues so I'm guessing they decided to charge a premium for what they can serve up.
+1 on the too expensive or too limited sentiment. I subscribed to Claude for quite a while but got frustrated the few times I would use it heavily I'd get stuck due to the rate limits.
I could stomach a $20-$50 subscription for something like 3.7 that I could use a lot when coding, and not worry about hitting limits (or I suspect being pushed on to a quantized/smaller model when used too much).
Claude Code does caching well fwiw. Looking my costs after a few code sessions (totaling $6 or so) the vast majority is cache read, which is great to see. Without caching it'd be wildly more expensive.
Like $5+ was cache read ($0.05/token vs $3/token) so it would have cost $300+
I paid for it for a while, but I kept running out of usage limits right in the middle of work every day. I'd end up pasting the context into ChatGPT to continue. It was so frustrating, especially because I really liked it and used it a lot.
It became such an anti-pattern that I stopped paying. Now, when people ask me which one to use, I always say I like Claude more than others, but I don’t recommend using it in a professional setting.
The gateway is integrated directly into our chat (https://glama.ai/chat). So you can use most of the things that you are used to having with Claude. And if anything is missing, just let me know and I will prioritize it. If you check our Discord, I have a decent track record of being receptive to feedback and quickly turning around features.
Long term, Glama's focus is predominantly on MCPs, but chat, gateway and LLM routing is integral to the greater vision.
I would love feedback if you are going to give a try frank@glama.ai
The issue isn't API limits, but web UI limits. We can always get around the web interface's limits by using the claude API directly but then you need to have some other interface...
The API still has limits. Even if you are on the highest tier, you will quickly run into those limits when using coding assistants.
The value proposition of Glama is that it combines UI and API.
While everyone focuses on either one or the other, I've been splitting my time equally working on both.
Glama UI would not win against Anthropic if we were to compare them by the number of features. However, the components that I developed were created with craft and love.
You have access to:
* Switch models between OpenAI/Anthropic, etc.
* Side-by-side conversations
* Full-text search of all your conversations
* Integration of LaTeX, Mermaid, rich-text editing
Ok, but that's not the issue the parent was mentioning. I've never hit API limits but, like the original comment mentioned, I too constantly hit the web interface limits particularly when discussing relatively large modules.
Your chat idea is a little similar to Abacus AI. I wish you had a similarly affordable monthly plan for chat only, but your UI seems much better. I may give it a try!
Who is glama.ai though? Could not find company info on the site, the Frank name writing the blog posts seems to be an alias for Popeye the sailor. Am I missing something there? How can a user vet the company?
As another commenter in this tread said, we are just a 'frontend wrapper' around other people services. Therefore, it is not particularly difficult to add models that are already supported by other providers.
The benefit of using our wrapper is that you can use a single API key and you get one bill for all your AI bills, you don't need to hack together your own logic for routing requests between different providers, failovers, keeping track of their costs, worry what happens if a provider goes down, etc.
The market at the moment is hugely fragmented, with many providers unstable, constantly shifting prices, etc. The benefit of a router is that you don't need to worry about those things.
Scaling infrastructure to handle billions of tokens is no joke.
I believe they are approaching 1 trillion tokens per week.
Glama is way smaller. We only recently crossed 10bn tokens per day.
However, I have invested a lot more into UX/UI of that chat itself, i.e. while OpenRouter is entirely focused on API gateway (which is working for them), I am going for a hybrid approach.
The market is big enough for both projects to co-exist.
this is also my problem, ive only used the UI with $20 subscription, can I use the same subscription to use the cli? I'm afraid its like those aws api billing where there is no limit to how much I can use then get a surprise bill
It is API billing like AWS - you pay for what you use. Every time you exit a session we print the cost, and in the middle of a session you can do /cost to see your cost so far that session!
What I really want (as a current Pro subscriber) is a subscription tier ("Ultimate" at ~$120/month ?) that gives me priority access to the usual chat interface, but _also_ a bunch of API credits that would ensure Claude and I can code together for most of the average working month (reasonable estimate would be 4 hours a day, 15 days a month).
i.e I'd like my chat and API usage to be all included under a flat-rate subscription.
Currenty Pro doesn't give me any API credits to use with coding assistants (Claude Code included ?) which is completely disjointed. And I need to be a business to use the API still ?
Honestly, Claude is so good, just please take my money and make it easy to do the above !
I don’t think you need to be a business to use the API? At least I’m fairly certain I’m using it in a personal capacity. You are never going to hit $120/month even with full-time usage (no guarantees of course, but I get to like $40/month).
$1500 is 100 million output tokens, or 500 million input tokens for Claude 3.7.
The entire LOTR trilogy is ~.55 million tokens (1,200 pages, published).
If you are sending and receiving the text equivalent of several hundred copies of the LOTR trilogy every week, I don't think you are actually using AI for anything useful, or you are providing far too much context.
You can do this yourself. Anyone can buy API credits. I literally just did this with my personal credit card using my gmail based account earlier today.
1. Subscribe to Claude Pro for $20 month
2. Separately, Buy $100 worth of API credits.
Now you have a Claude "ultimate" subscription where the credits roll over as an added bonus.
As someone who only uses the APIs, and not the subscription services for AI, I can tell you that $100 is A LOT of usage. Quite frankly, I've never used anywhere close to $20 in a month which is why I don't subscribe. I mostly just use text though, so if you do a lot of image generation that can add up quickly
I don't think you can generate images with claude. just asked it for pink elephant: "I can't generate images directly, but I can create an SVG representation of a pink elephant for you." And it did it :)
But I still hit limits, I use Claudemind with jetbrains stuff and there is a max of input tokens (j believe), I am ‘tier 2’ but doesn’t look like I can go past this without an enterprise agreement
Claude is my go to llm for everything, sounds corny but it's literally expanding the circle of what I can reasonably learn, manyfold. Right now I'm attempting to read old philosophical texts (without any background in similar disciplines), and without claude's help to explain the dense language in simpler terms & discuss its ideas, give me historical contexts, explaining why it was written this or that way, compare it against newer ideas - I would've given up many times.
At work I used it many times daily in development. It's concise mode is a breath of fresh air compared to any other llm I've tried. It has helped me find bugs in foreign code bases, explain me the techstack, written bash scripts, saving me dozens of hours of work & many nerves. It generally makes me reach places I wouldn't without due to time constraints & nerves.
The only nitpick is that the service reliability is a bit worse than others, forcing me sometimes to switch to others. This is probably a hard to answer question, but are there plans to improve that?
I'm in the middle of a particularly nasty refactor of some legacy React component code (hasn't been touched in 6 years, old class based pattern, tons of methods, why, oh, why did we do XYZ) at work and have been using Aider for the last few days and have been hitting a wall. I've been digging through Aider's source code on Github to pull out prompts and try to write my own little helper script.
So, perfect timing on this release for me! I decided to install Claude Code and it is making short work of this. I love the interface. I love the personality ("Ruminating", "Schlepping", etc).
Just an all around fantastic job!
(This makes me especially bummed that I really messed up my OA awhile back for you guys. I'll try again in a few months!)
Just started playing with the command-line tool. First reaction (after using it for 5 minutes): I've been using `aider` as a daily driver, with Claude 3.5, for a while now. One of the things I appreciate about aider is that it tells you how much each query cost, and what your total cost is this session. This makes it low-key easy to keep tabs on the cost of what I'm doing. Any chance you could add that to claude-code?
I'd also love to have it in a language that can be compiled, like golang or rust, but I recognize a rewrite might be more effort than it's worth. (Although maybe less with claude code to help you?)
EDIT: OK, 10 minutes in, and it seems to have major issues doing basic patches to my Golang code; the most recent thing it did was add a line with incorrect indentation, then try three times to update it with the correct indentation, getting "String to replace not found in file" each time. Aider with claude 3.5 does this really well -- not sure what the counfounding issue is here, but might be worth taking a look at their prompt & patch format to see how they do it.
One of the silver bullets of Claude, in the context of coding, is that it does NOT use RAG when you use it via the web interface. Sure, you burn your tokens but the model sees everything and this let it reply in a much better way. Is Claude Code doing the same and just doing document-level RAG, so that if a document is relevant and if it fits, all the document will be put inside the context window? I really hope so! Also, this means that splitting large code bases into manageable file sizes will make more and more sense. Another Q: is the context size of Sonnet 3.7 the same of 3.5? Btw Thanks you so much for Claude Sonnet, in the latest months it changed the way I work and I'm able to do a lot more, now.
Right -- Claude Code doesn't use RAG currently. In our testing we found that agentic search out-performed RAG for the kinds of things people use Code for.
Since the Claude Code docs suggest installing Ripgrep, my guess is that they mean that Claude Code often runs searches to find snippets to improve in the context.
I would argue that this is still RAG. There's a common misconception (or at least I think it's a misconception) that RAG only counts if you used vector search - I like to expand the definition of RAG to include non-vector search (like Ripgrep in this case), or any other technique where you use Retrieval techniques to Augment the Generation phase.
I agree that retrieval can take many forms besides vector search, but do we really want to call it RAG if the model is directing the search using a tool call? That like an important distinction to me and the name "agentic search" makes a lot more sense IMHO.
Yes, I think that's RAG. It's Retrieval Augmented Generation - you're retrieving content to augment the generation.
Who cares if you used vector search for the retrieval?
The best vector retrieval implementations are already switching to a hybrid between vector and FTS, because it turns out BM25 etc is still a better algorithm for a lot of use-cases.
"Agentic search" makes much less sense to me because the term "agentic" is so incredibly vague.
I think it depends who "you" is. In classic RAG the search mechanism is preordained, the search is done up front and the results handed to the model pre-baked. I'd interpret "agentic search" as anything where the model has potentially a collection of search tools that it can decide how to use best for a given query, so the search algorithm, the query, and the number of searches are all under its own control.
Exactly. Was the extra information pushed to the model as part of the query? It’s RAG. Did the model pull the extra information in via a tool call? Agentic search.
rag is an acronym with a pinned meaning now. just like the word drone. drone didnt really mean drone, but drone means drone now. no amount of complaining will fix it. :[
1. RAG: A simple model looks at the question, pulls up some associated data into the context and hopes that it helps.
2. Self-RAG: The model "intentionally"/agentically triggers a lookup for some topic. This can be via a traditional RAG or just string search, ie. grep.
3. Full Context: Just jam everything in the context window. The model uses its attention mechanism to pick out the parts it needs. Best but most expensive of the three, especially with repeated queries.
Aider uses kind of a hybrid of 2 and 3: you specify files that go in the context, but Aider also uses Tree-Sitter to get a map of the entire codebase, ie. function headers, class definitions etc., that is provided in full. On that basis, the model can then request additional files to be added to the context.
I'm still not sure I get the difference between 1 and 2. What is "pulls up some associated data into the context" vs ""intentionally"/agentically triggers a lookup for some topic"?
1. Tends to use embeddings with a similarity search. Sometimes called "retrieval". This is faster but similarity search doesn't alway work quite as well as you might want it to.
2. Instead lets the agent decide what to bring into context by using tools on the codebase. Since the tools used are fast enough, this gives you effectively "verified answers" so long as the agent didn't screw up its inputs to the tool (which will happen, most likely).
Does it make sense to use vector search for code? It's more for vague texts. In the code relevant parts can be found by exact name match. (in most cases. both methods aren't exclusive)
Vector search for code can be quite interesting - I've used it for things like "find me code that downloads stuff" and it's worked well. I think text search is usually better for code though.
Been a long time casual — i.e. happy to fix my code by asking questions and copy/pasting individual snippets via the chat interface. Decided to give the `claude` terminal tool a run and have to admit it looks like a fantastic tool.
Haven't tried to build a modern JS web app in years — it took the claude tool just a few minutes of prompting to convert and refactor an old clunky tool into a proper project structure, and using svelte and vite and tailwind (which I haven't built with before). Trying to learn how to even scaffold a modern app has felt daunting and this eliminates 99% of that friction.
One funny quirk: I asked it to build a test suite (I know zilch about JS testing frameworks, so it picked vitest for me) for the newly refactored app. I noticed that 3 of the 20 tests failed and so I asked it to run vitest for itself and fix the failing things. 2 minutes later, and now 7 tests were failing...
Which is very funny to me, but also not a big deal. Again, it's such a chore to research test libs and then set things up to their conventions. That the claude tool built a very usable scaffold that I can then edit and iterate on is such a huge benefit by itself, I don't need (nor desire) the AI to be complete turnkey solution.
Anthropic is back and cementing its place as the creator of the best coding models—bravo!
With Claude Code, the goal is clearly to take a slice of Cursor and its competitors' market share. I expected this to happen eventually.
The app layer has barely any moat, so any successful app with the potential to generate significant revenue will eventually be absorbed by foundation model companies in their quest for growth and profits.
I think an argument could be reasonably made that the app layer is the only moat. It’s more likely Anthropic eventually has to acquire Cursor to cement a position here than they out-compete it. Where, why, what brand and what product customers swipe their credit cards for matters — a lot.
(1) That's a big if. It requires building a team specialized in delivering what Cursor has already delivered which is no small task. There are probably only a handful of engineers on the planet that have or can be incentivized to develop the product intuition the Cursor founders have developed in the market already. And even then; I'm an aspiring engineer / PM at Anthropic. Why would I choose to spend all of my creative energy copying what somebody else is doing for the same pay I'd get working on something greenfield, or more interesting to me, or more likely to get me a promotion?
(2) It's not clear to me that users (or developers) actually behave this way in practice. Engineering is a bit of a cargo cult. Cursor got popular because it was good but it also got popular because it got popular.
In my opinion you're vastly overestimating how much of a moat Cursor has. In broad strokes, in builds an index of your repo for easier referencing and then adds some handy UI hooks so you can talk to the model, there really isn't that much more going on. Yes, the autocomplete is nice at times, but it's at best like pair programming with a new hire. Every big player in the AI space could replicate what they've done, it's only a matter of whether they consider it worth the investment or not given how fast the whole field is moving.
If Zed gets its agentice editing mode in I’m moving away from Cursor again. I’m only with them because they currently have the best experience there. Their moat is zero, and I’d much rather use purely API models than a Cursor subscription.
> It requires building a team specialized in delivering what Cursor has already delivered which is no small task.
There are several AIDEs out there, and based on working with Cursor, VS Code, and Windsurf there doesn't seem to be much of a difference (although I like Windsurf best). What moat does Cursor have?
Just chiming in to say that AIDEs (Artificial Intelligence Development Environments, I suppose) is such a good term for these new tools imo.
It's one thing to retrofit LLMs into existing tools but I'm more curious how this new space will develop as time goes on. Already stuff like the Warp terminal is pretty useful in day to day use.
Who knows, maybe this time next year we'll see more people programming by voice input instead of typing. Something akin to Talon Voice supercharged by a local LLM hopefully.
And Typescript simply doesn't work for me. I have tried uninstalling extensions. It is always "Initializing". I reload windows, etc. It eventually might get there, I can't tell what's going on. At the moment, AI is not worth the trade-off of no Typescript support.
My entire company of 100+ engineers is using cursor on multiple large typescript repos with zero issues. Must be some kind of local setup issue on your end, it definitely works just fine. In fact I've seen consistently more useful / less junky results from using LLMs for code with typescript than any other language, particularly when cursor's "shadow workspace" option is enabled.
They do actually have custom models for autocomplete (which requires very low latency) and applying edits from the LLM (which turns out to require another LLM step, as they can’t reliably output perfect diffs)
I wonder if they will offer competitive request counts against Cursor. Right now, at least for me, the biggest downside to Claude is how fast I blow through the limits (Pro) and hit a wall.
At least with Cursor, I can use all "premium" 500 completions and either buy more, or be patient for throttled responses.
Reread the blog post, and I suspect Cursor will remain much more competitive on pricing! No specifics, but likely far exceeding typical Cursor costs for a typical developer. Maybe it's worth it, though? Look forward to trying.
>Claude Code consumes tokens for each interaction. Typical usage costs range from $5-10 per developer per day, but can exceed $100 per hour during intensive use.
hi! I've been using Claude Code in a very complementary way to my IDE, and one of the reasons we chose the terminal is because you can open it up inside whichever IDE you want!
Hi Boris, love working with Claude! I do have a question—is there a plan to have Claude 3.5 Sonnet (or even 3.7!) made available on ca-central-1 for Amazon Bedrock anytime soon? My company is based in Canada and we deal with customer information that is required to stay within Canada, and the most recent model from Anthropic we have available to us is Claude 3.
A minor ChatGPT feature I miss with Claude is temporary chats. I use ChatGPT for a lot of random one-off questions and don’t want them filling up my chat history with so many conversations.
Will check out Claude Code soon, but in the meantime one unrelated other feature request:
Moving existing chats into a project. I have a number of old-ish but super-useful and valuable chats (that are superficially unrelated) that I would like to bring together in a project.
I really want to try your AI models, but "You must have a valid phone number to use Anthropic's services." is a show-stopper for me.
It's the only mainstream AI service that requests this information. After a string of security lapses by many of your competitors, I have zero faith in the ability of a "fast moving" AI-focused company to keep my PII data secure.
It's a phone number. It's probably been bought / sold a few times already.
Unless you're on the level of Edward Snowden, I wouldn't worry about it. But maybe your sense of privacy is more valuable than the outcome you'd get from Claude. That's fine too.
It's my phone number... linked to my Google identity... linked to every submitted user prompt... linked to my source code.
There's also been a spate of AI companies rushing to release products and having "oops" moments where they leaked customer chats or whatever.
They're not run like a FAANG, they don't have the same security pedigree, and they generally don't have any real guarantee of privacy.
So yes, my privacy is more valuable.
Conversely: Why is my non-privacy so valuable to Anthropic? Do they plan on selling my data? Maybe not now... but when funding gets a bit tight? Do they plan on selling my information to the likes of Cambridge Analytica? Not just superficial metadata, but also an AI-summarised history of my questions?
The best thing to do would be not to ask. But they are asking.
It's an anti abuse method. A valid phone number will always have a cost for spammers/multi accounters to obtain in mass, but will have no cost for the desired user base (the assumption is that every worthwhile user already has a phone).
Captchas are trivially broken and you can get access to millions of residential IP addresses, but phone numbers (especially if you filter out VOIP providers) still have a cost.
I pay for a number from voip.ms and use sms forwarding. Its very cheap and it works on telegram as well which seemed fairly strict at detecting most voips.
Does the fact its so ungodly expensive and highly rate limited kind of prove the modern point that AI actually uses tons of water and electricity per prompt? People are used to streaming YouTube while they sleep and it's hard to think of other web technology this intensive. OpenAI is hostile to this subject. Does Claude have plans to tackle this?
Is there / are you planning a way to set $ limits per API key? Far as I can tell the "Spend limits" are currently per-org only which seems problematic.
Hi! I’ve been using Claude for macOS and iOS coding for a while, and it’s mostly great, but it’s always using deprecated APIs, even if I instruct it not to. It will correct the mistake if I ask it to, but then in later iterations, it will sometimes switch back to using a deprecated API. It also produces a lot of code that just doesn’t compile, so a lot of time is spent fixing the made up or deprecated APIs.
Awesome to see a new Claude model - since 3.5 its been my go-to for all code related tasks.
I'd really like to use Claude Code in some of my projects vs just sharing snippets via the UI but I'm curious how might doing this from our source directory affect our IP including NDA's, trade secret protections, prior disclosure rules on (future) patents, open source licensing restrictions re: redistribution etc?
Hi Boris et al, can you comment on increased conversation lengths or limits through the UI? I didn't see that mentioned in the blog post, but it is a continued major concern of $20/month Claude.ai users. Is this an issue that should be fixed now or still waiting on a larger deployment via Amazon or something? If not now, when can users expect the conversation length limitations will be increased?
It would be great if we could upgrade API rate limits. I've tried "contacting sales" a few times and never received a response.
edit: note that my team mostly hits rate limits using things like aider and goose. 80k input token is not enough when in a flow, and I would love to experiment with a multi-agent workflow using claude
Now that the world's gotten used to the existence of AI, any hope on removing the guardrails on Claude? I don't need it to answer "How do I make meth", but I would like to not have to social engineer my prompts. I'd like it to just write the code I asked for and not judge me on how ethical the code might be.
Eg Claude will refuse to write code to wget a website and parse the html if you ask it to scrape your ex girlfriend's Instagram profile, for ethical and tos reasons, but if you phrase the request differently, it'll happily go off and generate code that does that exact thing.
Asking it to scrape my ex girlfriend's Instagram profile is just a stand in for other times I've hit a problem where I've had to social engineer my way past those guard rails, but does having those guard rails really provide value on a professional level?
Not having headlines like "Claude Gives Stalker Instructions" has a significant value to their business I would wager.
I'm very much in favour of removing the guardrails but I understand why they're in place. The problem is attribution. You can teach yourself how to engage in all manner of dark deeds with a library or wikipedia or a search engine and some time, but any resulting public outcry is usually diffuse or targeted at the sources rather than the service. When Claude or GPT or Stable Diffusion are used to generate something judged offensive, the outcry becomes an existential threat to the provider.
Unless Cursor had agreed to an exclusivity agreement with Anthropic, Antropic was (and still is) at risk of Cursor moving to a different provider or using their middleman position to train/distill their own model that competes with Anthropic.
honestly, is this something that anthropic should be worried about? you could ask the same question from all the startups that were destroyed by OpenAI.
Claude Code is a research preview -- it's more rough, lets you see model errors directly, etc. so it's not as polished as something like Cline. Personally I use all of the above. Engineers here at Anthropic also tend to use Claude Code alongside IDEs like Cursor.
Thanks for the product! Glad to hear the (so called) "safety" is being walked back on, previously Claude has been feeling a little like it is treating me as a child, excited to try it out now.
In the console, TPM limit for 3.7 is not shown (I'm tier 4). Does it mean there is no limit, or is it just pending and is "variable" until you set it to some value?
We set the Claude Code rate limits to be usable as a daily driver. We expect hitting rate limits for synchronous usage to be uncommon. Since this is a research preview, we recommend you start small as you try the product though.
Sorry, I completely missed you're from the Code team. I was actually asking about the vanilla API. Any insights into those limits? It's still missing the TPM number in the console.
Your footnote 3 seems to imply that the low number for o1 and Grok3 is without parallelism, but I don't think it's publicly known whether they use internal parallelism? So perhaps the low number already uses parallelism, while the high number uses even more parallelism?
Also, curious if you have any intuition as to why the no-parallelism number for AIME with Claude (61.3%) is quite low (e.g., relative to R1 87.3% -- assuming it is an apples to apples comparison)?
Awesome work, Claude is amazingly good at writing code that is pretty much plug and play.
Could you speak at all about potential IDE integrations? An integration into Jetbrains IDEs would be super useful - I imagine being able to highlight a bit of code and having a plugin check the code graph to see dependencies, tests etc that might be affected by a change.
Copying and pasting code constantly is starting to seem a bit primitive.
Part of our vision is that because Claude Code is just in the terminal, you can bring it into any IDE (or server) you want! Obviously that has tradeoffs of not having a full GUI of the IDE though
Thanks, I wasn't aware of the Model Context Protocol!
For anyone interested - you can extend Claude's functionality by allowing it to run commands via a local "MCP server" (e.g. make code commits, create files, retrieve third party library code etc).
Then when you're running Claude it asks for permission to run a specific tool inside your usual Claude UI.
I'm not affiliated with Anthropic, but it seems like doing this will commoditize Claude (the AIaaS). Hosted AI providers are doing all they can to move away from being interchangeable commodities; it's not good for Anthropic's revenue for users to be able to easily swap-out the backend of Cloud Code to a local Olama backend, or a cheaper hosted DeepSeek. Open sourcing Claude Code would make this option 1 or 2 forks/PRs away.
> It's not hard to make, its a relatively simple CLI tool so there's no moat
There are similar open source CLI tools that predate Claude Coder. Its reasonable to assume Anthropic chose not to contribute to those projects for reasons other than complexity, and charitably Anthropic likely plans for differentiating features.
> Also, the minified source code is available
The redistribution license - or lack thereof - will be the stumbling block to directly reusing code authored by Anthropic without authorization.
What do I need to do to get unbanned? I have filled in the provided Google Docs form 3-4 times to no avail. I got banned almost immediately after joining. My best guess is that I got banned because I used a VPN. https://news.ycombinator.com/item?id=40808815
Is there a way to always accept certain commands across sessions? Specifically for things like reading or updating files I don't want to have to approve that each time I open a new repl.
Also, is there a way to switch models between 3.5-sonnet and 3.5-sonnet-thinking? Got the initial impression that the thinking model is using an excessive amount of tokens on first use.
When you are prompted to accept a bash command, we should be giving you the option to not ask again. If you're not seeing that for a specific bash command, would you mind running /bug or filing an issue on Github? https://github.com/anthropics/claude-code/issues
Thinking and not thinking is actually the same model! The model thinks automatically when you ask it to. If you don't explicitly ask it to think, it won't use thinking.
with Claude coder, how does history work?
I used it with my account, ran out of credit then switched to a work account but there was no chat history or other saved context of the work that had been done.
I logged back in with my account to try copy it but it was gone.
A bit off topic but I wanted to let you know that anthropic is currently in violation of EU Directive 98/6/EC:
> The selling price and the unit price must be indicated in an unambiguous, easily identifiable and clearly legible manner for all products offered by traders to consumers (i.e. the final price should include value added tax and all other taxes).
I wanted to see what the annual plan would cost as it was just displaying €170+VAT, and when I clicked the upgrade button to find out (I checked everywhere on the page) then I was automatically subscribed without any confirmation and without ever seeing the final price before the transaction was completed.
The bottle caps are a joke, but how can anyone in their right mind be against transparent pricing?
You think it's acceptable that a company say the price is €170+vat and then after the transaction is complete they inform you that the actual price was €206.50?
No, not OK. In this case, the recourse in the US is simple- contact the company, and when refused a refund, cancel the charge in your credit card wit a couple of simple clicks in the app.
Hi Boris! Thank you for your work on Claude! My one pet peeve with Claude specifically, if I may: I might be working on a Svelte codebase and Claude will happily ignore that context and provide React code. I understand why, but I’d love to see much less of a deep reliance on React for front-end code generation.
When I first started using Cursor the default behavior was for Claude to make a suggestion in the chat, and if the user agreed with it, they could click apply or cut and paste the part of it they wanted to use in their larger project. Now it seems the default behavior is for Claude to start writing files to the current working directory without regard for app structure or context (e.g., config files that are defined elsewhere claude likes to create another copy of). Why change the default to this? I could be wrong but I would guess most devs would want to review changes to their repo first.
Cursor has two LLM interaction modes, chat and composer. The chat does what you described first and composer can create/edit/delete files directly.
Have you checked which mode you're on? It should be a tab above your chat window.
> We’ve also improved the coding experience on Claude.ai. Our GitHub integration is now available on all Claude plans—enabling developers to connect their code repositories directly to Claude
The https://claude.io/ integration is read-only. Basically you OAuth with GitHub and now you can select a repository, then select files or directories within it to add to either a Claude Project or to an individual prompt.
Claude Code can run commands including "git" commands, so it can create a branch, commit code to that branch and push that branch to GitHub - at which point point you can create a PR.
hey guys! i was wondering why you chose to build Claude code via CLI when many popular choices like cursor and windsurf fork VScode. do you envision the future of Claude code to abstract away the codebase entirely?
We wanted to bring the model to people where they are without having to commit to a specific tool or radically change their workflows. We also wanted to make a way that lets people experience the model’s coding abilities as directly as possible. This has tradeoffs: it uses a lot of tokens, and is rough (eg. it shows you tool errors and model weirdness), but it also gives you a lot of power and feels pretty awesome to use.
I'm curious why there are no results for the "Claude 3.7 Extended Thinking" on SWE-Bench and Agentic tool use.
Are you finding that extended thinking helps a lot when the whole problem can be posed in the prompt, but that it isn't a major benefit for agentic tasks?
It would be a bit surprising, but it would also mirror my experiences, and the benchmarks which show Claude 3.5 being better at agentic tasks and SWE tasks than all other models, despite not being a reasoning model.
From the release you say: "[..] in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs."
Can you tell us more about the trade-offs here?
Also, are you using synthetic data for improving the responses here, or are you purely leveraging data from usage/partner's usage?
I recently attempted to use the Google Drive integration but didn't follow through with connecting because Claude wanted access to my entire Google Drive. I understand this simplifies the user experience and reduced time to ship, but is there anyway the team can add "reduce the access scope of Google Drive integration" to your backlog. Thank you!
Also, I just caught the new Github integration. Awesome.
Small UX suggestion, but could you make submission of prompt via URL parameter work? It used to be possible via https://claude.ai/new?q={query}, but that stopped working. It works for ChatGPT, Grok, and DeepSeek. With Claude you have to go and manually click the submit button.
You still need to know what good code looks like to use these tools. If you go forward in your career trusting the output of LLMs without the skills to evaluate the correctness, style, functionality of that code then you will have problems.
People still write low level machine code today, despite compilers having existed for 70+ (?) years.
We'll always need full-stack humans who understand everything down to the electrons even in the age of insane automation that we're entering.
Could not agree more! I have 20+ years experience and use Cursor/Sonnet daily. It saves huge amounts of time.
But I can’t imagine this tool in the hands of someone who does not have a solid understanding of programming.
You need to understand when to push back and why. It’s like doing mini code reviews all the time. LLMs are very convincing and will happily generate garbage with the utmost authority.
+1 to this. There has never been a better time to learn to code - the learning curve is being shaved down by these new LLM-based tools, and the amount of value people with programming literacy can produce is going up by an order of magnitude.
People who know both coding and LLMs will be a whole lot more attractive to hire to build software than people who just know LLMs for many years to come.
Can you just make a blog post on this explaining your thesis in detail? It's hard for me not to see non-technical "vibe coding" [0] sidelining everyone in the industry except for the most senior of senior devs/PMs.
I will give a little more pessimistic answer. If someone is right now studying CS then probably have expectation that can work with this profession for 30-40 years until retirement and this profession will still pay much more than average salary for most of devs anywhere (instead only of elite devs or those in US) and easily to find such job or easily switch employer.
I think the best period of Software Devs will be gone in few years. Knowing how how to code and fix things will be important still but more important to be also Jack-of-Many-Trades to provide more value: know a little about SEO, have a good taste of design and be able to tweak simple design, good taste how to organise code, better soft skills and managing or educating less tech-savvy stuff.
Another option is to specialise in some currently difficult subfield: robotics, ML, CUDA, rust and try to be this elite dev with expectation would have to move to SV or any such tech hub.
Best general recommendation I would give right now (especially for someone who is not from US) to someone who is currently studying is to use that a lot of time you have right now with not much responsibility to make some product that can provide you semi-passive income on a monthly basis ($5k-$10k) to drag yourself out of this rat race. Even if you not succeed or revenue stream will run out eventually you will learn those other skills that will be more important later if wanna be employed (SEO, code & design taste, marketing, soft skills).
Because most likely this window of opportunity might be only for the next few years in similar way when the best window for Mobile Apps was first ~2 years when App Store started
I would love to make "side revenue", but frankly I am awful at practical idea generation. I'm not a founder type I think, maybe a technical co-founder I guess.
Moving a function or class? Yes. But moving arbitrary lines of code into their own function in a new module is still a PITA, particularly when the lines of code are not consecutive.
Hi there. There are lots of phrases/patterns that Claude always uses when writing and it was very frustrating with 3.5. I can see with 3.7 those persist.
Is there any way for me to contact you and show those so you can hopefully address them?
Can you give some insight into how you chose the reply limit length? It seems to cut off many useful programs that are 80%-90% done and if the limit were just a little higher it would be a source of extraordinary benefit.
in 1055 lines. But eventually it couldn't improve on it anymore, ChatGPT couldn't modify it at my request so that inline elements would be on the same line.
If you want to run it just download it and rename it to .py, I like Anaconda as an environment, after reading the code you can install the required libraries with:
then run the browser from the Anaconda prompt by just writing "python " followed by the name of the file.
2.
I tried to continue to improve the program with Claude, so that in-line elements would be on the same line.
I performed these reproduceable steps:
1. copied the code and pasted it into a Claude chat window with ctrl-v. This keeps it in the chat as paste.
2. Gave it the prompt "This complete web browser works but doesn't lay out inline elements inline, it puts them all on a new line, can you fix it so inline elements are inline?"
It spit out code until it hit section 8 out of 9 which is 70% of the way through and gave the error message "Claude hit the max length for a message and has paused its response. You can write Continue to keep the chat going". Screenshot:
So I wrote "Continue" and it stops when it is 90% of the way done.
Again it got stuck at 90% of the way done, second screenshot in the above album.
So I wrote "Continue" again.
It just gave an answer but it never finished the program. There's no app entry in the program, it completely omited the rest of the main class itself and the callback to call it, which would be like:
def run(self):
self.root.mainloop()
###############################################################################
# main
###############################################################################
if __name__=="__main__":
sys.setrecursionlimit(10**6)
app=ToyBrowser()
app.run()
so it only output a half-finished program. It explained that it was finished.
I tried telling it "you didn't finish the program, output the rest of it" but doing so just got it stuck rewriting it without finishing it. Again it said it ran into the limit, again I said Continue, and again it didn't finish it.
The program itself is only 1055 lines, it should be able to output that much.
You don't want all that code in one file anyway. Have Claude write the code as several modules. You'll put each module in its own file and then you can import functions and classes from one module to another. Claude can walk you through it.
Congrats on the launch! You said its an important tool for you (Claude Code) how does this fit in with Co-Pilot, Cursor, etc. Do you/your teammates only rely on Claude Code? What do you reach for for different tasks?
Claude Code is super popular internally at Anthropic. Most engineers like to use it together with an IDE like Cursor, Windsurf, VS Code, Zed, Xcode, etc. Personally I usually start most coding tasks in Code, then move to an IDE for finishing touches.
Is there plans to add websearch function over some core websites (SO, API docs)? Competitors have it, and in my experience this provide very good grounding for coding tasks (way less API functions hallucinated).
I tried signing up to use Claude about 6 months ago and ran into an error on the signup page. For some reason this completely locked me out from signing up since a phone number was tied to the login. I have submitted requests to get removed from this blacklist and heard nothing. The times I have tried to reach out on Twitter were never responded to. Has the customer support improved in the last 6 months?
I don't want use the product after having a bad experience. If they cannot create a sign up page without it breaking for me why would I want to use this service? Things happen and bugs can occur, but the amount of effort I have put in to resolve the issue outweighs the alternatives that I have had no issues using.
Hello! Member of the API team here. We're unable to find issues with the /v1/models endpoint—can you share more details about your request? Feel free to email me at suzanne@anthropic.com. Thank you!
If you can reproduce the issue with the other API key, I'd also love to debug this! Feel free to share the curl -vv output (excluding the key) with the Anthropic email address in my profile
with Claude coder, how does history work? I used it with my account, ran out of credit then switched to a work account but there was no chat history or other saved context of the work that had been done. I logged back in with my account to try copy it but it was gone.
Have you seen https://mycoder.ai? Seems quite similar. It was my own invention and it seems that you guys are thinking along similar lines - incredibly similar lines.
It seems very very similar. I open sourced the code to MyCoder here: https://github.com/drivecore/mycoder I'll compare them. Off hand I think both CodeBuff and Claude Coder are missing the web debugging tools I added to MyCoder.
Folks, let me tell you, AI is a big league player, it's a real winner, believe me. Nobody knows more about AI than I do, and I can tell you, it's going to be huge, just huge. The advancements we're seeing in AI are tremendous, the best, the greatest, the most fantastic. People are saying it's going to change the world, and I'm telling you, they're right, it's going to be yuge. AI is a game-changer, a real champion, and we're going to make America great again with the help of this incredible technology, mark my words.
The blog post also talks about how privacy is preserved in more concrete terms:
> These four steps are powered entirely by Claude, not by human analysts. This is part of our privacy-first design of Clio, with multiple layers to create “defense in depth.” For example, Claude is instructed to extract relevant information from conversations while omitting private details. We also have a minimum threshold for the number of unique users or conversations, so that low-frequency topics (which might be specific to individuals) aren’t inadvertently exposed. As a final check, Claude verifies that cluster summaries don’t contain any overly specific or identifying information before they’re displayed to the human user.
Ink is awesome, but it does have some rough edges. For example, want to absolutely position something on screen? Not possible. The UI also tends to flicker when you’re doing anything complicated, in a way that can be hard to debug.
This! I hit both of these issues. I tried building a solitaire game using ink. I couldn’t stack and offset the cards on top one another so they took up a lot of screen real estate. Then when I had re-renders the screen would flicker. Other than that it was a joy to use. I think I even tried flipping the solitaire board horizontal but that felt too weird.
How are you measuring productivity? And is the effect you see in A/B tests statistically significant? Both of these were challenging to do at Meta, even with many thousands of engineers —- curious what worked for you.
This article, and all the articles like it, are missing most of the puzzle.
Models don’t just compete on capability. Over the last year we’ve seen models and vendors differentiate along a number of lines in addition to capability:
- Safety
- UX
- Multi-modality
- Reliability
- Embeddability
And much more. Customers care about capability, but that’s like saying car owners care about horsepower — it’s a part of the choice but not the only piece.
One somewhat obsessive customer here: I pay for and use Claude, ChatGPT, Gemini, Perplexity, and one or two others.
The UX differences among the models are indeed becoming clearer and more important. Claude’s Artifacts and Projects are really handy as is ChatGPT’s Advanced Voice mode. Perplexity is great when I need a summary of recent events. Google isn’t charging for it yet, but NotebookLM is very useful in its own way as well.
When I test the underlying models directly, it’s hard for me to be sure which is better for my purposes. But those add-on features make a clear differentiation between the providers, and I can easily see consumers choosing one or another based on them.
I haven’t been following recent developments in the companies’ APIs, but I imagine that they are trying to differentiate themselves there as well.
To me, the vast majority of "consumers" as in B2C only care about price, specifically free. Pro and enterprise customers may be more focused on the capabilities you listed, but the B2C crowd is vastly in the free tier only space when it comes to GenAI.
The best solution I have found for the problem of rewarding prevention, is having leaders repeatedly tell their people that prevention is important. And when performance review season comes around, folks that do preventative work can ask for quotes from people more senior that were close to the work to speak to its impact.
YMMV, and this may not work in every org. It did work reasonably well in a number of technical orgs I was in.
Once you phrase it that way, I start to think it might be literally impossible. How did Aslan put it? "We're never told what would have happened." It's basically the same problem as predicting the future, just starting from a time in the past, with even less grounding in reality. Impossible, right?
Measurement is important, but it isn't everything. If it's your only hammer, you're going to have... exactly the bad time our society is currently having, I guess.
This seems like a place where viewing things as a Bayesian instead of a frequentist could help. We only have one outcome that actually occurred, but that doesn't necessarily mean we can't reason about likelihoods of alternative scenarios. I'm not saying I think it's worth trying to be as granular as "reduced risk of an outage by 10% in Q3" or something like that, but it seems a bit too extreme to assume that we don't know _anything_ about what might have happened.
If we didn't have any intuition at all about potential risks, how could we be taking actions that we think reduce risk in the first place? We (hopefully!) don't try to numerically quantify how productive an engineer in raw numbers like "number of lines of code modified" or "number of commits merged", but that doesn't mean we treat it as impossible for experienced engineers to be able to judge the performance of their subordinates. I honestly don't see this as that much harder to measure than the type of engineering work that does already get rewarded; the reason it doesn't get measured as much is because it isn't valued as much, not the reverse.
It’s important to supplement metrics with estimates, stories about the real world impact of the work, quotes from others close to the work. The less measurable, the more you need to lean on those other tools. Assuming 75% of reactive work can be measured and 10% of preventative work can with any degree of certainty, you’ll reach for those other tools more often for the latter.
Yeah, obviously you have to try to predict the effects of your disaster prevention measures, and if you can use quantitative methods so much the better, but that's not "measurement". Not even close, and especially not in the sense that the people who want to count lines of code yearn for.
And yes, those people are still out there. Some of them have learned their lesson about LoC in particular, but they haven't lost their craving for simple (often simplistic) metrics.
The difference with "regular" engineering is that working features are in fact quite tangible, even if their exact costs aren't. You put a ticket on the board for a button, or a bridge, and eventually you can push the button or drive on the bridge. Not so for prevented disasters. By the time a disaster becomes tangible, it's usually too late.
Yep, I don't pretend to believe that "software engineering" is actually "engineering" in the traditional sense. I still that security concerns in software engineering lies somewhere in the spectrum between "real engineering" and "impossible to meaningfully reason about".
As for the people who want to find a metric as simple as lines of code, I can't imagine we ever find something that will satisfy them, so I'm not particularly dissuaded by the idea of people looking for that not being happy with the measures that I think would be effective. To me, this is a classic case of not letting the perfect be the enemy of the good, and I think that it would be unfortunate for us as an industry if we only considered ways of measuring risk prevention that are 100% quantitative.
> As for the people who want to find a metric as simple as lines of code, I can't imagine we ever find something that will satisfy them
The problem with these people is that they're "satisfied" with the illusion of certainty, until things blow up. We agree on the fundamentals re measurement, but you have to watch out for that effect. Every metric is a chance for someone to implicitly decide that it's the ultimate truth of reality.
One trade off is if the code for TS is no longer written in TS, that means the core team won’t be dogfooding TS day in and day out anymore, which might hurt devx in the long run. This is one of the failure modes that hurt Flow (written in OCaml), IMO. Curious how the team is thinking about this.