More

pkhodiyar · 2025-04-25T16:43:43 1745599423

hey folks, Priyansh this side, I just put together a list of all the tools I could find that are openai endpoint compatible, and so now this gives you the power use your openai based project and add RAG functionality.

if I missed some tools, feel free to jot them below

pkhodiyar · 2025-04-10T14:50:57 1744296657

Hey HN,

We’ve just rolled out an OpenAI-Compatible Endpoint at CustomGPT.ai that should make it super easy to try Retrieval-Augmented Generation (RAG) in your existing OpenAI-based code.

Now, hundreds of tools in the OpenAI ecosystem can add RAG capabilities with minimal changes.

Docs here - https://docs.customgpt.ai/reference/customgptai-openai-sdk-c...

All you do is: 1. Swap your api_key to the CustomGPT one, 2. Change the base_url to our endpoint. And thats it.

You can keep using your OpenAI Python SDK code. Under the hood, we handle context retrieval from your project knowledge sources before generating a final answer.

We support the chat.completions endpoint with the same request/response structure. If you call an unsupported endpoint, we return a 404 or 501.

This opens up the entire ecosystem of OpenAI-compatible tools, frameworks, and services for your RAG workflows. Everything else—conversation format, message handling, etc.—remains the same.

Check out a quick Python snippet:

from openai import OpenAI client = OpenAI( api_key="CUSTOMGPT_API_KEY", base_url="https://app.customgpt.ai/api/v1/projects/{project_id}/" ) response = client.chat.completions.create( model="gpt-4", # We'll ignore the model param and use your project's default messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who are you?"} ], ) print(response.choices[0].message.content)

We’re ignoring certain OpenAI parameters like model and temperature. If you try to call an endpoint we don’t support, you’ll get a 404 or 501. Otherwise, your code runs pretty much the same.

We built this because we kept hearing people say, “I’d like to try CustomGPT.ai for better context retrieval, but I already have so much code in the OpenAI ecosystem.” Hopefully this bridges the gap. Feedback and PR requests are welcome. Let us know how it goes!

Hope this helps folks who’ve been on the fence about trying RAG but don’t want to break everything they already have running!

If you have any question regrading the implementation, please ask below

pkhodiyar · 2025-03-13T12:45:25 1741869925

It gives me immense joy to announce that we OLake’s (olake.io/docs) now supports Postgres as a data souce. Now you can:

1. Sync Postgres -> AWS S3 in parquet format [JSON dump and Level 1 flattened data support] 2. Sync Postgres -> Iceberg [in a few days, schema evolution support of Iceberg] 3. Sync Postgres -> Local Filestorage with everything we support for S3.

Want to test it out locally? Sync Postgres via OLake -> MinIO (using JDBC catalog) -> Query using Engines that support Iceberg V2 tables [doc launching pretty soon]

We talked with 100s of DE’s to understand their pain point and the two major issues for them were replicating Postgres and MySQL to a lakehouse format. They said, we delivered.

Our source connectors and writers are independent, meaning that after Apache Iceberg writers are being supported (PR - https://github.com/datazip-inc/olake/pull/113), any new connector will be able to dump to Iceberg with minimal changes!

pkhodiyar · 2025-02-11T09:09:50 1739264990

[updated ]Slack community link - https://join.slack.com/t/getolake/shared_invite/zt-2utw44do6...

pkhodiyar · 2025-01-29T07:10:12 1738134612

If you have ever run TBs of Analytics data via MySQL, how have you done it?

pkhodiyar · 2025-01-15T07:14:42 1736925282

When building OLake, our goal was simple: Fastest DB to Data LakeHouse (Apache Iceberg to start) data pipeline.

Checkout GtiHub repository for OLake - https://github.com/datazip-inc/olake

Over time, many of us who’ve worked with data pipelines have dealt with the toil of building one-off ETL scripts, battling performance bottlenecks, or worrying about vendor lock-in.

With OLake, we wanted a clean, open-source solution that solves these problems in a straightforward, high-performing manner.

In this blog, I’m going to walk you through the architecture of OLake—how we capture data from MongoDB, push it into S3 in Apache Iceberg format or other data Lakehouse formats, and handle everything from schema evolution to high-volume parallel loads.

pkhodiyar · 2024-11-17T09:32:20 1731835940

wrote a blog on all the ways to flattening a nested JSON key, let me know how was it

pkhodiyar · 2024-11-11T10:40:27 1731321627

wrote a blog around handling changing data types while ingesting a JSON data (from mongodb or other no-sql dbs) to a sql environment and how current ETL tools handle such changes

pkhodiyar · 2024-11-06T07:28:56 1730878136

wrote a blog on it, maybe helpful to the community.

pkhodiyar · 2024-08-20T08:14:40 1724141680

datazip kind of says they are the cheapest in terms of value per $1 data injestion and transformation platform (complete data engineering full stack platform) any users?

We are thinking to migrate from snowflake. Redshift and BigQuery seems good too but had to ask HN