Hacker News new | past | comments | ask | show | jobs | submit login

RAG cli from llamaindex, allow you to do it 100% locally when used with ollama or llamacpp instead of OpenAI.

https://docs.llamaindex.ai/en/stable/getting_started/starter...




and at some point (https://github.com/ggerganov/llama.cpp/issues/7444) you will be able to use Phi-3-vision https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

but for now you will have to use python.

You can try it here https://ai.azure.com/explore/models/Phi-3-vision-128k-instru... to get an idea of its OCR + QA abilities


Does the llamaindex PDF indexer correctly deal with multi-column PDFs? Most I've seen don't, and you get very odd results because of this.


i've made quite good conversions from pdf to markdown with https://github.com/VikParuchuri/marker . it's slow but worth a shot. Markdown should be easily parseable by a rag.

i'm trying to get a similar system setup on my computer.


This looks worth exploring, so thanks. The author has done a bunch of work beyond what PyMuPDF does on multicolumn layouts.


Locally you can choose pypdf or mupdf wich are good but not perfect. If you can send your data online llamaparse is quite good.


Pulling the text out of the PDFs correctly and independently is correct.



https://milvus.io/docs/integrate_with_llamaindex.md

Pretty easy to run local and lightweight with Milvus Lite with LlamaIndex


llamaindex has an horrible API, very poor docs and is constantly changing. I do not recommend it.


Any alternative?


Vanilla python


So your solution to “I don’t like flying [specific airline]” would be “how about a big pile of aluminum and some jet fuel”?


LOL `papichulo`? Que tigre!?


Jaja la primera palabra que se me vino a la cabeza




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: