Database-less torrent website

ineedasername · on Jan 13, 2022

The headline doesn't make much sense. There's still a database, what changes is simply where it's stored when in use. "Torrent Website w/o DB Server" might work better.

That aside, would this method scale? 135,000 torrents doesn't seem comprehensive, so I would expect real world use to have many more. Maybe a different SQLite db for different categories?

moolcool · on Jan 13, 2022

Since it's search is just based on prefixes, you could probably just alphabetically partition the databases.

yonixw · on Jan 13, 2022

Enhance it with the fact you can lazy load sqlite query using HTTP Range as demonstrated in: https://news.ycombinator.com/item?id=27016630

samwillis · on Jan 13, 2022

As the SQLite file is being downloaded vie IPFS it wouldn’t be a HTTP Range request. Does IPFS have the equivalent?

(My guess would be yes as I believe it breaks up the file for distribution, but have no idea if it’s exposed in the IPFS.js api)

EDIT:

After a quick scan of the docs, I think you can do this (but I certainly don't know enough). With at least the "The Mutable Files API" which "is a virtual file system on top of IPFS that exposes a Unix like API", you can provide an `offset` and `length` to `ipfs.files.read(path, [options])`. I don't know if that then translates to only downloading that part of the file from IPFS or not.

https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FI...

EDIT2:

In fact the `ipfs.cat` api that the OP is using takes `offset` and `length` parameters too.

https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FI...

samwillis · on Jan 13, 2022

Continuing my looking into this, the SQLite range requests trick was implemented in this [0] project.

It turns out that project was inspired by a few others [1], two of which [2][3] implement a vfs for SQLite on top of bittorrent doing exactly this suggestion, but with a bittorrent hosted file rather than IPFS. They only download the parts of the SQLite files needed when querying.

0: https://github.com/phiresky/sql.js-httpvfs

1: https://github.com/phiresky/sql.js-httpvfs#inspiration

2: https://github.com/lmatteis/torrent-net

3: https://github.com/bittorrent/sqltorrent

SahAssar · on Jan 15, 2022

It doesn't have to be range requests, you could split the file instead of depending on the range requests. Essentially it's the same as the chunking instructions for hosters who have a maximum file size as is demonstrated here: https://github.com/phiresky/sql.js-httpvfs/blob/master/creat...

Closi · on Jan 13, 2022

Exactly the thought I had reading this! Combining these projects would be awesome, proper semantic web! Plus it would mean near-zero loading times as the db size scaled (although ultimately some peers would still need the full database).

lovasoa · on Jan 13, 2022

Well, you could reimplement "absurd sql" over ipfs. Possible, but far from trivial.

lekevicius · on Jan 13, 2022

One unmentioned con: no updates, no new torrents can be added (or, updates require re-deployment of full new .sqlite db, together with a new website). I think there's a space for decentralized database format. Something that would have immutable rows (not the whole db), ranges and search, indexes, etc. Maybe there's something like this already?

mosselman · on Jan 13, 2022

> (or, updates require re-deployment of full new .sqlite db, together with a new website)

Seems like a small price to pay.

I can see new stuff living in torrents that then get committed to a sqlite database that is then deployed every now and then.

rakoo · on Jan 13, 2022

Mutable torrents do exist (https://www.bittorrent.org/beps/bep_0046.html). Your "URL" can now be a public key, and clients will fetch the latest version of the torrent.

ddtaylor · on Jan 13, 2022

This is actually really neat I wasn't aware of these BEPs. Do you know of any place where they are being used?

rakoo · on Jan 13, 2022

libtorrent has it, webtorrent has been working on it but development of this feature seems to have stalled. That's a real shame, because webtorrent can be used directly from the browser so it would have been very practical to use a single dependency and have pretty much the same functionalities as the article

ddtaylor · on Jan 13, 2022

I am also curious if any services or sites are using it yet.

publiush · on Jan 13, 2022

I actually made a system called federalist [1] that uses mutable torrents. Hope you like it!

[1] https://github.com/publiusfederalist/federalist

oscargrouch · on Jan 13, 2022

My current project can expose a sqlite database over torrent wrapped by the Chrome cache filesystem.

The torrent is merkle-tree based, and for each change in the SQLite memory page it also updates the merkle tree giving you a different torrent infohash at the end.

So as databases are published together with the applications the initial torrent info is the same, so theres a swarm of initial peers. Once it gets updated and the torrent "turns into another", the solution is to use the RPC api (that my solution provides) over the initial swarm of peers and use some sort of distributed algorithm like raft or gossip to define how to deal with the subsequent changes over the peers.

The important part is that all other changed torrents from all the peers are available to all peers in the network, and with a RPC working as a abstract interface to organize them, it will work for a lot of different scenarios.

smallerfish · on Jan 13, 2022

That's interesting. Are you aware of the absurd-sql project? It may have some useful stuff that you can leverage.

Also, how do peers find each other? And finally, do you intend to open source it?

oscargrouch · on Jan 13, 2022

>That's interesting. Are you aware of the absurd-sql project?

No, thanks for pointing that out, will take a look into it.

> Also, how do peers find each other?

The idea is to use the torrents as a common shareable resource where given the peers have the same interest in that torrent, lets say the torrents works as a "meta-database" with just enough immutable metadata info, giving the developers of that application a list of peers that will have the same RPC service you designed, with a interface that suits your purpose according to your application goals (a distributed Youtube for instance), and giving you know and designed that API yourself whatever you want from each node, lets say a piece of data, be it a file or a database key-value range, you can ask your API for it, combining the torrent peers and whatever distribution combo you need.

> And finally, do you intend to open source it?

I've just did

https://github.com/mumba-org/mumba

It's badly documented given i'm on the final touches before a proper launch, but the "documentation" of the storage layer is planned for today, giving how important it is for the whole thing.

The first thing i've tried to get right was the storage layer, and the capacity to use mutable sqlite databases over torrent (together with files which are simpler given they are meant to be immutable).

Given theres no proper doc yet, i can point out to the source at

https://github.com/mumba-org/mumba/tree/main/lib/storage

where:

https://github.com/mumba-org/mumba/tree/main/lib/storage/bac...

is a modified chrome cache storage layer (which is the real underlying disk storage) that abstract the files and databases storages that from the bit-torrent layer perspective are on the disk.

https://github.com/mumba-org/mumba/blob/main/lib/storage/sto...

is the front end

and the:

https://github.com/mumba-org/mumba/blob/main/lib/storage/tor...

is the main abstraction that may be a "fileset"(collection of files as in torrent) or a dataset/database, which in this case you can get the sqlite db handle from the torrent object and deal with it as normal database.

I've take care to enable a key-value store over the sqlite btree, so both form of databases are possible, a key-value and a normal SQL database.

Key-values are important for the distributed case where you may want the nodes to have partial data and abstract a SQL layer (or whatever) on top of the distributed nodes, which is a better solution for distributed storage.

For the database distribution, a 64k SQLite memory page maps to a torrent piece of the same size, which can be synchronized over other peers that knows what's the root of the merkle tree of the given database is (you can use the RPC layer to coordinate this or use the bit-torrent DHT updating your slot with the new merkle root)

BTW This is what i'm using to distribute the applications, the DHT which points to a "database torrent" which in turn is a index to other files and database torrents. The application owner have write permissions over that particular DHT slot and can change the root merkle anytime the application itself changes (a new version, or some of the assets).

What im finishing right now is exactly the higher level layers that automate this whole thing and make it work "under the hood" without the users or applications distributors need to understand how it is implemented.

Of course the developers will have access to iterating over the peers of a giving torrent to group them over RPC interfaces, and the idea is also to give access to the DHT so solutions that need a access to mutable ever-changing merkle root is also possible (even though the torrent based solution cover most if not all of the ground), but the idea is to give the devs the tools to be creative about their solutions.

(The solution is a whole browser-based UI/web applications platform, that use torrents and the torrent DHT for the p2p storage layer)

smallerfish · on Jan 13, 2022

Thanks, super interesting. When I get time to work on my side project again I'll try it out. I have a server-less frontend app and have been thinking about a way to allow sync of data between users. I had been looking at webrtc, but the need for a stun server makes the UX ugly. Having a torrent (and/or IPFS) to either be the direct data layer and/or to at least serve as a common point to sync webrtc info seems like a good approach.

oscargrouch · on Jan 13, 2022

Yes, it can even serve to expose the others WebRTC api's that are meant for media, where you can have a initial RPC booststrap interface, that will call your application service, that in turns its programmed by you to use the WebRTC api to stream audio and video from that peer. (or use the datastream if you want)

ItsMonkk · on Jan 13, 2022

Good luck with this, I've been reading all the various HN posts on this topic and you seem to be the closest to the holy grail.

Counter to Moxie's argument[0], I think a lot more people would be willing to host a server, if all they had to do to host a server is to seed a torrent. We already know many people are willing to do that.

I think this would be interesting if the application itself was just a Docker container that could output to a browser, similar to many other local-hosted approaches[1].

[0]: https://news.ycombinator.com/item?id=29845208

[1]: https://registry.hub.docker.com/r/plexinc/pms-docker#!

oscargrouch · on Jan 13, 2022

> Good luck with this, I've been reading all the various HN posts on this topic and you seem to be the closest to the holy grail.

Thanks, its nice to hear that after all those years working on it and keep going by understanding how important is for all of us to get out of this rigged game where the cloud computer(+ web clients rulers) FAANG's are controlling where we are heading and a future where we(hackers) have less options technologically, not having enough freedom to program services that can ignore the mothership and create the same sort of services only relying on peers, is a goal being pursuit by them.

> I think this would be interesting if the application itself was just a Docker container that could output to a browser, similar to many other local-hosted approaches[1].

What my solution do is that the application have a "service" process that works like daemon to the application, being able to resolve route requests both locally (when over IPC) and remotely (when over RPC).

The routes are meant to serve html or any other content to the UI application (which is also another process akin to the Chrome renderer).

But the fact that every application might have a ever-running process, means that in some cases it can create other kind of services. For instance:

You can create a PostgreSQL wrapper application where you present the the database SQL api over RPC as a service, and manage the PostgreSQL instance with your service process.

The same works for Docker or QEMU for instance. Your application can be docker-based and can be deployed only on Linux.

Your applications (the service and the UI process) are natives by the way, and even the UI application controls the Web layer (rendering, layout, etc) nativelly talking directly to it (as a first SDK enabled over Swift for now).

This architecture will allow you to do those things, and present a way for the users to interact with your solution in a UX that is packed and develop together with the "backend".

With this hole thing you program the frontend and the backend as a whole, in the same programming language.

all2 · on Jan 13, 2022

What about a distributed hash table [0]? Yggdrasil uses it for network routing information [1].

[0] https://en.wikipedia.org/wiki/Distributed_hash_table

[1] https://yggdrasil-network.github.io/2018/07/17/world-tree.ht...

delusional · on Jan 13, 2022

I think you just reinvented the bit torrent mainline DHT.

chrisjc · on Jan 13, 2022

> Something that would have immutable rows

Isn't this just a log (or a stream like Kafka or Kinesis)? In fact you might even be able say every database already has this ;) (binlog, oplog, etc)

> not the whole db

If rows are immutable, what part of the db is left not being immutable? If rows were immutable, doesn't that imply that any existing "ranges, searches, indexes, etc" are immutable too?

If you going to the effort of making a decentralized database, why not also decompose all of these parts from one another... no reason the tables (logs), indexes, search, etc have to live in the same database, they could be spun off as completely different parts. Basically a something that indexes logs and then something else that takes those indexes and makes the searchable.

In fact this is all the rage right now with centralized databases... all of the work being done with streaming systems just seems to be an effort of decomposing and inverting databases... every old is new again.

Anyways, I agree with the idea and don't really know enough about decentralized systems to really understand why such a distributed database can't or hasn't already been built.

imtringued · on Jan 14, 2022

There should be a separation of data and the actual website that does the reading. I once thought about something similar (not for torrents), but having to download the whole database (300MB and more) was a bad idea for UX.

So what I propose is this: There is a central source of truth hosted on IPFS via IPNS or mutable torrents. The data format can be whatever is more efficient to update and store, not easiest to read and query. (An event log is fine) Then, there is a reader application, like an RSS reader. The reader application can be a desktop application or it can be a centrally hosted web app that keeps a conventional SQL database behind the scenes. The idea is that there will be multiple reader providers like there are competing email providers.

pradn · on Jan 13, 2022

I think it's possible to use IPNS to create a key-value mapping where the value can be changed.

https://flyingzumwalt.github.io/ipfs-tutorials/curriculum/fi...

gerwim · on Jan 13, 2022

Well there is Dolt: https://www.dolthub.com/

thinkloop · on Jan 13, 2022

IPFS requires "pinning" content for it to exist, meaning you host it, whoever pins content becomes liable for its distribution, if no one pins it, it disappears. That's why you can't freely host things like child porn on IPFS.

etagate · on Jan 13, 2022

This is true even for a torrent btw. And (un?)fortunately you can distribute anything on IPFS as long as it's encrypted.

Gigachad · on Jan 13, 2022

No one is going to pin encrypted data they do not understand. And IPFS doesn’t get users to seed random files, you only seed stuff you downloaded. So really it’s not much different to hosting it on a personal http server except any downloaders can also host.

nybble41 · on Jan 13, 2022

> No one is going to pin encrypted data they do not understand.

Select users who have the decryption key (communicated out-of-band) might be willing to pin the encrypted data for the benefit of others who also have the key. There are also for-pay pinning services who probably wouldn't care whether the data was encrypted. And whether it's pinned it or not, as long as it's downloaded the data remains available to be shared (at least until it's GC'd).

Gigachad · on Jan 13, 2022

Ok but you could do that on literally any platform. I could encrypt data and stick it on google drive or aws and the hosts wouldn’t have any clue what it was.

nybble41 · on Jan 14, 2022

Sure, but doing it through IPFS means that you aren't dependent on a particular centralized service. If you put the encrypted file on Google Drive or AWS and then later take it down, the link breaks. Or someone might edit the file so that the link no longer gives the same content. On IPFS the original link continues to work as long as anyone has the file, and the content is immutable, so there are some advantages compared to plain HTTP.

mavhc · on Jan 13, 2022

Sounds very similar to bittorrent itself

only4here · on Jan 13, 2022

This will probably make illegal stuff less common, hopefully. At least outside of the darknets.

vindarel · on Jan 13, 2022

related: https://torrent-paradise.ml/ Also based on IPFS with this HTTP gateway. I find a lot of torrents in there.

etagate · on Jan 13, 2022

Love this website too. But in this case the database and search engine is hosted on the server, it only uses IPFS to distribute a copy of a folder ready out of the box (db included) to be hosted as another instance (a backup).

mab122 · on Jan 13, 2022

You actually should be able to access it using IPFS at/ipns/torrent-paradise.ml or /ipns/12D3KooWB3GY1u6zMLqnf3MJ8zhX3SS1oBj7VXk3xp6sJJiFGZXp but it seems that clicking search throws an JS error :/

ThalesX · on Jan 13, 2022

If any dev is here, I'm on IPFS and can't seem to search, getting console error:

> Uncaught ReferenceError: passQueryToResultpage is not defined

jokoon · on Jan 13, 2022

any similar link you know for DHT searches?

donkarma · on Jan 13, 2022

That's not going to stop takedowns, they don't care about if it's technically on your site or not.

tablespoon · on Jan 13, 2022

> That's not going to stop takedowns, they don't care about if it's technically on your site or not.

Yep.

"The underlying mistake is thinking about this like a game where you can make up rules for the government to follow." - https://news.ycombinator.com/item?id=29913036

Also the assumptions here kind of remind me of domain fronting: https://www.zdnet.com/article/amazons-aws-latest-to-give-up-.... Basically, it's assuming IPFS will protect it from the authorities, but what might actually happen is it makes IPFS a target of the authorities. Now that probably won't happen with this because the authorities don't actually care that much about torrenting/piracy, but the error is still there.

Chris2048 · on Jan 13, 2022

The "rules for the government to follow" are the laws, and these aren't being made up, but rather pre-exist (safe harbour laws) with some pretty strong backers (goog et al)

tablespoon · on Jan 13, 2022

>> "The underlying mistake is thinking about this like a game where you can make up rules for the government to follow." - https://news.ycombinator.com/item?id=29913036

> The "rules for the government to follow" are the laws, and these aren't being made up, but rather pre-exist (safe harbour laws) with some pretty strong backers (goog et al)

That's true, but to elaborate on that quote a little bit: you can still make the mistake the it describes by making up your own interpretations of "the laws," rather than making up rules out of whole cloth. That's extremely common (especially with Constitutional law), and probably the most frequent way of making that mistake.

Chris2048 · on Jan 13, 2022

Do they also not care whether they technically have the authority of law or not?

imtringued · on Jan 14, 2022

They can change the law to whatever is more profitable...

jtbayly · on Jan 13, 2022

Exactly. I forget where I just saw this. It seemed like a particularly absurd example, but the examples are numerous. Torrent sites don’t have the copyrighted content either, but they get shut down all the time.

roywiggins · on Jan 13, 2022

But the torrents themselves tend to stay seeded... if you stick the site itself on IPFS and anyone in any jurisdiction can "seed" it, then it's going to be hard to kill by court order.

Hakashiro · on Jan 16, 2022

These projects while technically impressive and interesting, I'm sure wouldn't make much of a difference to copyright holders. I can imagine something like this: - Please take down our IP

- I can't because I am not hosting it

- But you are displaying it on your website, take your website down or filter what your website displays to avoid showing copyrighted material

And that's the end of it. It doesn't matter where the data is stored. It does matter, a lot, where the data is displayed. If your website displays illegal content, you're facilitating access so you're breaching the law. And, honestly, I'm not sure there's an easy solution to that other than using onion or I2P sites which are only marginally more safe against copyright takedown requests because, due to their very nature, they are obscure and difficult to access.

xialvjun · on Jan 13, 2022

Since it has used IPFS, why not host the content of those torrents with IPFS?

betwixthewires · on Jan 14, 2022

You can do that. And I like the idea. But for now, torrents are the go to method of sharing large pieces of data, and utilizing IPFS to move the centralized aspect of torrents to a decentralized space is a good idea.

danka · on Jan 13, 2022

I wonder why the author ignored the option of compression in the post. Even with a simple gzip DEFLATE compression, those 10MB of plain text could get as small as a 1MB archive and possibly more, meaning that in a compressed 10MB payload you could fit much much more than 135K records.

vermilingua · on Jan 13, 2022

It isn't 10MB of plain text though, it's 10MB of binary SQLite database. I agree that compression would be useful here, but I don't think a simple gzip DEFLATE would be.

I was curious so I compressed that torrent db with a few different methods:

  11.1MB 11116544B dump.sqlite
  10.2MB 10155419B dump.csv
   6.6MB  6573399B dump.sqlite.gz
   6.6MB  6565771B dump.zip
   5.6MB  5616842B dump.rar

gzip is certainly suitable to be used in this situation, I stand corrected.

rdubz · on Jan 13, 2022

Back of envelope:

The 10MB estimated size came from [100 bytes per row] * [100k rows].

50 of the bytes per row were "description", which should compress well (2-3x, I'd guess).

40 bytes per row were the IPFS ID/hash, IIUC. I assumed this is like a Git hash, 40 hex chars, which is really just 20 bytes of entropy.

He also estimated 14 bytes for the size (stored as a string representation of a decimal integer, up to 1e15 - 1, or 1PB?). That's about 50 bits or 6-7 bytes, as a binary integer. Sizes wouldn't be uniformly distributed though so it would compress to even fewer bytes.

So if SQLite was smart (or one gzips the whole db file, like you did), it makes sense that a factor of 2 or so is reclaimable.

MuchHustle · on Jan 14, 2022

Theres already like 10 crypto projects implementing that. The most prominent is probably Bit Torrent Token (BTT). https://www.bittorrent.com/token/btt/

SuchAnonMuchWow · on Jan 13, 2022

Reading just the title, I was expecting storing/serving magnet files on DNS, as someone did a few years back if I remember correctly (just as a joke/poc, not as a serious solution).

I'm pretty sure it was posted on hw, but I can't find the link anymore.

politician · on Jan 13, 2022

How do immutable systems like IPFS manage the problem of updates to a blog?

yonixw · on Jan 13, 2022

You have "publish ids" which you control using private keys.. like a dns... so you can have your main page address known and all the other posts linked from that page

Gigachad · on Jan 13, 2022

They have another system called IPNS but unless it’s improved since I checked, it wasn’t very practical.

lekevicius · on Jan 13, 2022

What's wrong with IPNS? I'm using it for my decentralized website (via fleek), and it seems to work well.

nybble41 · on Jan 13, 2022

It has a reputation for being slow if you're not using the DNS-based version of the protocol (DNSLink). Though the experimental IPNS-over-PubSub interface helps if both sites are using it.

tecleandor · on Jan 13, 2022

I'm curious about why didn't they compress the SQLite file using gzip. Even in level 1, for superfast compression/decompression halved the file size.

etagate · on Jan 13, 2022

I knew I could do it, I just wanted to make a demo of the whole thing to see if it would have worked in practice. But yes there's space for a lot of improvements like compression and lazy load with sqlite (cited in a comment)

tecleandor · on Jan 13, 2022

Great! I was honestly wondering because I didn't know if there were any technical limitations/problems, but it's quite interesting nonetheless.

ronyfadel · on Jan 13, 2022

I'm disappointed that I don't find result for "La vita e bella" but I do find results for "La vita è bella".

dfghdfhs · on Jan 13, 2022

Could someone more savvy clarify....

On that page I do inspect, got console and do:

loadDBAndExec('SELECT * FROM my_table LIMIT 10;')

I get

Uncaught ReferenceError: loadDBAndExec is not defined

Why doesn't this work?

manigandham · on Jan 13, 2022

That's just example code in the blog post.

The live demo uses a webworker with different code: https://boredcaveman.xyz/demo/megacat/database-app.js

senjin · on Jan 13, 2022

Makes me wonder if you could host a pouchdb in ipfs

kderbyma · on Jan 13, 2022

love these kind of things. I am interested in distribution methods for smarter clients without needing a dedicated server to host from.

dfghdfhs · on Jan 13, 2022

I love the internet.