Hi all, at work I get a lot of questions about the state of the art in open source language models, and how to build chatbots on top of your own data.
I made a 100% open source knowledge-grounded chatbot that allows you to ask questions and chat with the Transformers docs. Powered by Flan-UL2 (which I've anecdotally found to be the most performative commercially licensed open source instruction tuned LLM), Langchain, Instructor Embeddings (STOTA in vector embeddings), and FAISS.
You can clone the space and play around with your own data, clone the repo locally, and take every line of code for your own projects.
Hugging Face internally has a lot of very strong believers (including the exec team) in open source + decentralized approach towards AI as the only safe and productive way forward. HF is ostensibly the only place to find deep+broad resources for open source DL and transformers/diffusers are the libraries du jour of open DL.
Compute is really important for AI, and having a cloud provider align themselves with an organization which is genuinely trying to be "Open" AI is I think a positive step forward.
Not disagreeing with your first point, but as I see it - this only enables HF to continue to spend money on making good open source tooling + doing research on open alternatives to closed source LLMs (something sorely needed), while opening them up to more enterprise customers who primarily use AWS.
FWIW I am an ML engineer there (maybe should have disclosed earlier) and I feel pretty optimistic about the opportunities this will enable for the open source community. Maybe with the visibility into the closed door discussions I have a more positive attitude, or maybe I'm being naive.
>Not disagreeing with your first point, but as I see it - this only enables HF to continue to spend money on making good open source tooling + doing research on open alternatives to closed source LLMs (something sorely needed), while opening them up to more enterprise customers who primarily use AWS.
This is the response/ justification every time. The issue is that it never goes that way.
>Time will tell!
It always does. What its shown me is that one step towards the slippery slope is enough to abandon hope for the project. Every good and useful open source project I've seen go this way inevitably turns their back on their customer and long outlives their usefulness.
My name is Eno - today I'm launching Pet Portrait AI. We generate 40 custom pet portraits using deep learning (Stable Diffusion + Dreambooth) in a variety of styles. The pictures come in standard (1024x1024) and high resolution (2048x2048). The photos are great for social media, posters, custom gifts, etc.
In the backend, when you upload your photos we fine-tune a custom model based on stable diffusion (right now the 1.5 runwayML weights) using the dream booth technique. We then generate over 100 different images which we filter down to 40 quality images. We are doing this filtering by hand for now, in order to ensure order quality - but in the future we'd like to build a custom classifier which can pick up our "eye for quality" and automatically select the best generations.
This was a really fun service to build out, all feedback welcome!
I am a research affiliate with the Galileo Project, and I just want to suggest to anyone who is skeptical about our goals to visit the website (https://projects.iq.harvard.edu/galileo/home) in particular the ground rules and FAQ section to see by what means we are attempting to establish a methodology for rigorously addressing the question of ETC technology within our solar system. This is a question with many directions by which it can be addressed, and because there is little public data available we do not have priors that point to the notion that one direction is “more likely” than other directions. Thus, to be as rigorous as possible we are assessing as many possibilities as we can within budgetary constraints and standard scientific practices.
As for the notion of UFO/UAP flying around, for over 70 years in the United States there have been reports of unidentified aerial phenomenon, with reports of various degrees of quality and provenance. In the 1940s there was a general public acceptance that UFOs represented physical objects, but confidence and reporting towards that idea fell off quickly. I will not get into the nuance of the public discourse on UFOs in America - but it is safe to say that it is one of the more interesting historys of science. In the last 5 years there has been an absolute tidal shift in government and academic interest in this topic, mainly fueled by recent admissions by the department of defense of the reality of UAP confirmed by multiple sensor systems. Within the project, we do not have definitive beliefs about the nature of UAP and instead simply seek to corroborate the data.
The team is a wonderful array of multi-disciplinary scientists from all walks of life and with credentials which are akin to that of any major scientific endeavor. I urge you to investigate why so many people are interested in this question, and to dispel any preconceived notions of what is “possible” within the context of science. Truth is objective, and so is data - only time will tell if this whole thing was simply a misdirection or a dead end, but we should appreciate that it is still possible to ask hard questions about the world we live in today and to receive funding to answer those questions.
This is interesting and if the experimental evidence confirms this hypothesis, it bodes well for our future. A universe where we can interact with spacetime via engineering is one that allows for a lot of creative freedom. They also have another interesting article claiming that the imaginary structure of QM is the result of stochastic optimization on spacetimes: https://www.nature.com/articles/s41598-019-56357-3
Maybe the UAPs really are just secret warp drive tech we made 20 or 30 years ago.
> This is just a teaser. We will be able to generate images, sound, anything at will, with natural language. The holodeck is about to become real in our lifetimes.
Does anyone have any similar resources for other forms of media generated via natural language inputs?
I made a 100% open source knowledge-grounded chatbot that allows you to ask questions and chat with the Transformers docs. Powered by Flan-UL2 (which I've anecdotally found to be the most performative commercially licensed open source instruction tuned LLM), Langchain, Instructor Embeddings (STOTA in vector embeddings), and FAISS.
You can clone the space and play around with your own data, clone the repo locally, and take every line of code for your own projects.