Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Axilla – Open-source TypeScript framework for LLM apps (github.com/axilla-io)
161 points by nichochar on Aug 7, 2023 | hide | past | favorite | 40 comments
Hi HN, we are Nick and Ben, creators of Axilla - an open source TypeScript framework to develop LLM applications. It’s in the early stages but you can use it today: we’ve already published 2 modules and have more coming soon.

Ben and I met while working at Cruise on the ML platform for self-driving cars. We spent many years there and learned the hard way that shipping AI is not quite the same as shipping regular code. There are many parts of the ML lifecycle, e.g., mining, processing, and labeling data and training, evaluating, and deploying models. Although none of them are rocket science, most of the inefficiencies tend to come from integrating them together. At Cruise, we built an integrated framework that accelerated the speed of shipping models to the car by 80%.

With the explosion of generative AI, we are seeing software teams building applications and features with the same inefficiencies we experienced at Cruise.

This got us excited about building an opinionated, end-to-end platform. We started building in Python but quickly noticed that most of the teams we talked to weren’t using Python, but instead building in TypeScript. This is because most teams are not training their own models, but rather using foundational ones served by third parties over HTTP, like openAI, anthropic or even OSS ones from hugging face.

Because of this, we’ve decided to build Axilla as a TypeScript first library.

Our goal is to build a modular framework that can be adopted incrementally yet benefits from full integration. For example, the production responses coming from the LLM should be able to be sent — with all necessary metadata — to the eval module or the labeling tooling.

So far, we’ve shipped 2 modules, that are available to use today on npm:

* *axgen*: focused on RAG type workflows. Useful if you want to ingest data, get the embeddings, store it in a vector store and then do similarity search retrieval. It’s how you give LLMs memory or more context about private data sources.

* *axeval*: a lightweight evaluation library, that feels like jest (so, like unit tests). In our experience, evaluation should be really easy to setup, to encourage continuous quality monitoring, and slowly build ground truth datasets of edge cases that can be used for regression testing, and fine-tuning.

We are working on a serving module and a data processing one next and would love to hear what functionality you need us to prioritize!

We built an open-source demo UI for you to discover the framework more: https://github.com/axilla-io/demo-ui

And here's a video of Nicholas walking through the UI that gives an idea of what axgen can do: https://www.loom.com/share/458f9b6679b740f0a5c78a33fffee3dc

We’d love to hear your feedback on the framework, you can let us know here, create an issue on the GitHub repo or send me an email at nicholas@axilla.io

And of course, contributions welcome!




Hey Nick and Ben, congrats to launch! I really like that you're going in the TS way by default. I personally think there will me more AI Engineers (devs building LLM apps/agents) working in TS than in Python.

I wanted to ask if you accept PRs for integrations?

I'm a co-founder of E2B [0]. We give private sandboxed cloud envs to any agent. We're building two things:

- [1] Agent Protocol - it's an open protocol that defines how to communicate with an agent. The current goal is to make benchmarking agents simple (it's used for example by folks at AutoGPT and other popular agents)

- [2] SDK that gives your agent a cloud environment (currently in early access)

Would love to figure out how to integrate these to into Axilla if it makes sense to you. What would be the best way to connect?

[0] https://e2b.dev/

[1] https://github.com/e2b-dev/agent-protocol

[2] https://github.com/e2b-dev/rest-api We built for example our ChatGPT plugin with it https://github.com/e2b-dev/chatgpt-plugin


We're very open to contributions, I am interested in what the integration would look like.

Do you want to email me at nicholas@axilla.io? we can get into the details.


Thanks! Just sent you an email (vasek@e2b.dev)


I'm really excited about E2B. Axilla looks great too! :)


Thank you Will :)


We use GPT-4 pretty heavily in a Typescript project, but have noticed lag from the TS versions of popular libraries (OpenAI’s npm lib, Langchain TS, etc.).

This framework is exciting to see. Even though Python is the “language of AI” most foundational models just sit behind an HTTP endpoint, making the web (and thus JS/TS) a perfect fit, as you’ve called out.

It’d be neat to see a caching layer (maybe similar API to evals?) that can be a drop-in for production workflows where the responses are somewhat deterministic.


Glad to hear this, indeed we think there's opportunity for some more cutting edge tooling in the TS ecosystem.

We absolutely want to add a caching layer. Actually, we think middleware is where a lot of the value of the framework will come: it enables a whole bunch of features, e.g. sending errors to datasets for labeling, caching, user throttling, analytics, A/B tests, ...

We're likely going to build the serving module next which will cover this.


Very cool. To be completely candid, we just hit OpenAI directly, no 3rd party libs involved at the moment (just fetch).

We're open to trying more TS-focused libraries, but definitely more hesitant after our initial experiences with other libs. The less magic the better (no hidden prompts, etc.).


Amazing! I am working on projexts where we use LLMs and Typescript and and besides langchain-js which can be only described as bloatware, I can never find anything and find myself reinvent the wheel most of the time.


Why do you think langchainjs is bloatware?


I've come to the conclusion that anything that "abstracts" the openai complete/chat complete API call is just bad practice and to stay away from the entire framework, with the exception of microsoft guidance. Just because you can, doesn't mean you should. And if you do abstract the completion API, then it must either reduce friction or increase capabilities over just calling openai with http fetch/axios. Which microsoft guidance does this.


Yes, we largely agree with you on that. The APIs are high-level enough that wrapping really doesn't add much value in many circumstances.

We chose to do this for our first module to take a stab at integrating RAG pipelines in a coherent manner, but we don't plan on following this pattern in all modules within our framework. There is possibly one exception here, which is that an interface that allows composable middleware for things like logging, error handling, or redirecting of requests may justify wrapping in some places.

The next steps for us involve lower-level functionality. One need we see again and again is more robust data extraction and processing. Most people we talk to who use other community projects (e.g., langchain or llama) find that data loading and chunking are among the most valuable parts of those libraries. We agree, but would like more robust functionality for these tasks, so this is one thing we're working towards next.

Beyond that, we're working on infrastructure. Easy model serving from Node (for OSS or proprietary models), monitoring, and pipelines for fine-tuning based on production inference results.


Yea learning how to use the core API directly should be the focus for any engineer. Lots of frameworks built on top of LLMs are being made very quickly each with their own philosophy. It's a good time to stay with the fundamentals as much as possible until the dust settles. Learning langchain will take you less than a day if you have fundamentals don't worry about not staying up to date.

Now's the time to learn how LLMs work from the ground up not being a framework chaser. (watch Karpathy's GPT from scratch video and read through huggingface's LLM documentation from RLHF to PEFT fine-tuning)


> with the exception of microsoft guidance.

why?


Hi, I checked out the demo and it looks very promising. As someone who is not very familiar with AI development, I feel a bit puzzled looking at the code examples. If I use the lib and it sends textual prompts based on some templates, can I be certain that the AI outputs will be well structured and contain the right information? Would it be possible to build an AI model with a lower level, programmable interface...?


My two cents which you are free to ignore (and I almost implore you to ignore): I'm sure this is useful but as someone who works at an AV company and is building with these new generative AI tools... your psuedo-YC story intro kind of puts me off even wanting to look at the library, because it sets you up as grifters.

The only overlap with what you were doing at Cruise and the problems people building off a REST API wrapper for an LLM are running into are things that all software being pushed into a production environment runs into. High level things like "let's not introduce a regression".

I think if you're talking to investors who don't know better go for it. But if you're posting for technical folks, some of them will be completely put off the moment you try to imply working in MLOps at an AV company makes you any more suited to implement RAG than any suitably experienced engineer who's messed around with embeddings for a month.


Thanks for the feedback!

The lesson that we learned at Cruise is that the tough thing when shipping AI software is closing the data loop and integrating all of the steps of the ML lifecycle together, so if you only look at the RAG workflow, I actually agree with you.

The vision for Axilla is that all of the modules interoperate with each other naturally. This means that your production data gets logged such that the datasets can be sent for data processing, labeling, or added to regression test suites. This way, production and development workflows are tied together.

In terms of RAG: how do you test your RAG workflow in an automated way? A lot of people are building these workflows today, but from our customer conversations nearly none of them are testing them or monitoring their performance automatically in production, because most evaluation frameworks don't integrate naturally with document retrieval.

We have a way to go before the framework delivers on its full potential, but we still feel that it's in a useful enough shape for people to use it and contribute today, which is why we open-sourced it while we keep building it.


I just don't see many people shipping AI software right now in the way AI applies to AVs.

But I'm also giving you my two cents as someone who's not only building in the space, but sitting next to * a lot of the people who will use something like this, and there's some real fatigue building around the flood of tooling for "LLMOps".

* figuratively and literally: checking my past events after the sibling mentioned your YC connection and even you and I have been to at least one mutual AI event

At the end of the day I get that as a startup you need to weave stories sometimes: If this was a Launch HN I wouldn't have bothered with my comment and that's kind of what my "ignore this" intro is getting at.

But we went from chatbots to selling shovels in a gold rush as the default AI play in the last couple of months. Most builders will take any excuse to assume any given tool is just another rushed shovel. So you don't want to invite the mental friction of "ML vs AI": at most I'd mention you're former coworkers at Cruise and let the people amenable to that connection make it themselves. For the rest of us even knowing two former coworkers are working on something is enough to build some confidence in its staying power.


They are a YC company.


I'm gonna be that guy who will probably show up sooner or later anyway, but... I can't imagine performance can compete with other languages? What were your findings or experience with that?

Still, I'm a huge fan of TypeScript and will give it a try anyway :)


Hey Chris , can you further qualify performance?

Before I share some thoughts on this, let me just say that our primary motivators for Axilla have much more to do with bringing better AI tooling to an otherwise flourishing ecosystem rather than shaving milliseconds off an arbitrary task or request. Given that, I'm not sure how fruitful a performance discussion will be.

If by performance you meant maturity of third party packages for AI-related functionality, then yes JS/TS is lacking. This is what is motivating us :). We want better tooling for AI applications in TS.

If you're referring to performance for CPU-bound tasks, then yes JS would not be as good as lower-level languages like Rust or Go. If you're referring to JS compared to Python, then I don't know how true that is. Python doesn't have a great concurrency story either (at least not today). JS may be single threaded for the most part, but with web workers and WASM (+ WebGPU!), we now have tools at our disposal for dramatically speeding up CPU-bound tasks while not blocking the main thread. Assuming we get the interfaces right, we can swap out a subset of the implementation with a WASM-based implementation later if justified.

There is nothing about Python the language that makes it especially well-suited for AI/ML-related functionality. It is just the language whose ecosystem has the most maturity when it comes to that functionality. We hope to chip away at that over time.


I'm no expert in actual ML implementations but I was under the impression that Python (i.e Tensorflow) is actually C/C++ based under the hood. I just meant I can't imagine the V8 engine can be as performant for all that matrix math in those models.

But now that I'm looking at the actual code samples, I'm not even sure JavaScript is doing any of the actual heavy lifting? (I see you use OpenAI's embedding) so this tool is more of the glue connecting all the parts? Again, I'm out of my wheelhouse here.


Ahh yes, right now we're operating at a higher-level of the stack.

That said, we are investigating serving from Node and possibly on edge devices with WebGPU. For serving from Node, it would be similar to what you describe with Tensorflow compiling down to C/C++. There are various backends for frameworks like Tensorflow, Pytorch, etc. and those backends are often C/C++. We would bridge this lower-level code to Node through e.g. Node API (https://nodejs.org/api/n-api.html) or use frameworks like ONNX / ONNX Runtime.


What a horrible name. Why? Just why? You are putting so much love into something to give it a name that is so bad?


Yeah, I'd reconsider this too.

If this sounds too opinionated so far, I'll give you a fact. As a Spanish speaker, this is also a little too close to the word for armpit, and I'd not name my project Armpitt.


It is the Latin for armpit, and related to the (obsolete) English "oxter" for the same.


I guess that you are not a fan of "put.io" neither.


Congrats on building the library! I’ve recently been playing around with the js implementation of langchain, and I’m excited for there to be more high quality typescript support here.


Feedback: Axila in Spanish means armpit


Hi Nick and Ben, Looked at the demo. Great job! I'll be trying it soon!


"Axilla" in Portuguese means "armpit" (it's spelt "axila"). I like the name more because of this. Congrats on the launch! As a developer who's been working a lot with TypeScript and LLMs, I'll definitely take a look.


It’s also the name of a great Phish song!


Two great Phish songs!


Axillary in English also means "of armpit"


Same in spanish.


I think that's also the more "technical" term for armpit in english, so probably not an accident: https://en.wikipedia.org/wiki/Axilla



In Spanish too. The usual term it's "sobaco" which came from sub-brachium, under the arm.


Didn't know that! In Portuguese, we use the word "Suvaco"


Same in English. (At least, in medicine.)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: