The headline doesn't make much sense. There's still a database, what changes is simply where it's stored when in use. "Torrent Website w/o DB Server" might work better.
That aside, would this method scale? 135,000 torrents doesn't seem comprehensive, so I would expect real world use to have many more. Maybe a different SQLite db for different categories?
As the SQLite file is being downloaded vie IPFS it wouldn’t be a HTTP Range request. Does IPFS have the equivalent?
(My guess would be yes as I believe it breaks up the file for distribution, but have no idea if it’s exposed in the IPFS.js api)
EDIT:
After a quick scan of the docs, I think you can do this (but I certainly don't know enough). With at least the "The Mutable Files API" which "is a virtual file system on top of IPFS that exposes a Unix like API", you can provide an `offset` and `length` to `ipfs.files.read(path, [options])`. I don't know if that then translates to only downloading that part of the file from IPFS or not.
Continuing my looking into this, the SQLite range requests trick was implemented in this [0] project.
It turns out that project was inspired by a few others [1], two of which [2][3] implement a vfs for SQLite on top of bittorrent doing exactly this suggestion, but with a bittorrent hosted file rather than IPFS. They only download the parts of the SQLite files needed when querying.
It doesn't have to be range requests, you could split the file instead of depending on the range requests. Essentially it's the same as the chunking instructions for hosters who have a maximum file size as is demonstrated here: https://github.com/phiresky/sql.js-httpvfs/blob/master/creat...
Exactly the thought I had reading this! Combining these projects would be awesome, proper semantic web! Plus it would mean near-zero loading times as the db size scaled (although ultimately some peers would still need the full database).
One unmentioned con: no updates, no new torrents can be added (or, updates require re-deployment of full new .sqlite db, together with a new website). I think there's a space for decentralized database format. Something that would have immutable rows (not the whole db), ranges and search, indexes, etc. Maybe there's something like this already?
libtorrent has it, webtorrent has been working on it but development of this feature seems to have stalled. That's a real shame, because webtorrent can be used directly from the browser so it would have been very practical to use a single dependency and have pretty much the same functionalities as the article
My current project can expose a sqlite database over torrent wrapped by the Chrome cache filesystem.
The torrent is merkle-tree based, and for each change in the SQLite memory page
it also updates the merkle tree giving you a different torrent infohash at the end.
So as databases are published together with the applications the initial torrent info is the same, so theres a swarm of initial peers. Once it gets updated and the torrent "turns into another", the solution is to use the RPC api (that my solution provides) over the initial swarm of peers and use some sort of distributed algorithm like raft or gossip to define how to deal with the subsequent changes over the peers.
The important part is that all other changed torrents from all the peers are available to all peers in the network, and with a RPC working as a abstract interface to organize them, it will work for a lot of different scenarios.
>That's interesting. Are you aware of the absurd-sql project?
No, thanks for pointing that out, will take a look into it.
> Also, how do peers find each other?
The idea is to use the torrents as a common shareable resource where given the peers have the same interest in that torrent, lets say the torrents works as a "meta-database" with just enough immutable metadata info, giving the developers of that application a list of peers that will have the same RPC service you designed, with a interface that suits your purpose according to your application goals (a distributed Youtube for instance), and giving you know and designed that API yourself whatever you want from each node, lets say a piece of data, be it a file or a database key-value range, you can ask your API for it, combining the torrent peers and whatever distribution combo you need.
It's badly documented given i'm on the final touches before a proper launch, but the "documentation" of the storage layer is planned for today, giving how important it is for the whole thing.
The first thing i've tried to get right was the storage layer, and the capacity to use mutable sqlite databases over torrent (together with files which are simpler given they are meant to be immutable).
Given theres no proper doc yet, i can point out to the source at
is a modified chrome cache storage layer (which is the real underlying disk storage) that abstract the files and databases storages that from the bit-torrent layer perspective are on the disk.
is the main abstraction that may be a "fileset"(collection of files as in torrent) or a dataset/database, which in this case you can get the sqlite db handle from the torrent object and deal with it as normal database.
I've take care to enable a key-value store over the sqlite btree, so both form of databases are possible, a key-value and a normal SQL database.
Key-values are important for the distributed case where you may want the nodes to have partial data and abstract a SQL layer (or whatever) on top of the distributed nodes, which is a better solution for distributed storage.
For the database distribution, a 64k SQLite memory page maps to a torrent piece of the same size, which can be synchronized over other peers that knows what's the root of the merkle tree of the given database is (you can use the RPC layer to coordinate this or use the bit-torrent DHT updating your slot with the new merkle root)
BTW This is what i'm using to distribute the applications, the DHT which points to a "database torrent" which in turn is a index to other files and database torrents. The application owner have write permissions over that particular DHT slot and can change the root merkle anytime the application itself changes (a new version, or some of the assets).
What im finishing right now is exactly the higher level layers that automate this whole thing and make it work "under the hood" without the users or applications distributors need to understand how it is implemented.
Of course the developers will have access to iterating over the peers of a giving torrent to group them over RPC interfaces, and the idea is also to give access to the DHT so solutions that need a access to mutable ever-changing merkle root is also possible (even though the torrent based solution cover most if not all of the ground), but the idea is to give the devs the tools to be creative about their solutions.
(The solution is a whole browser-based UI/web applications platform, that use torrents and the torrent DHT for the p2p storage layer)
Thanks, super interesting. When I get time to work on my side project again I'll try it out. I have a server-less frontend app and have been thinking about a way to allow sync of data between users. I had been looking at webrtc, but the need for a stun server makes the UX ugly. Having a torrent (and/or IPFS) to either be the direct data layer and/or to at least serve as a common point to sync webrtc info seems like a good approach.
Yes, it can even serve to expose the others WebRTC api's that are meant for media, where you can have a initial RPC booststrap interface, that will call your application service, that in turns its programmed by you to use the WebRTC api to stream audio and video from that peer. (or use the datastream if you want)
Good luck with this, I've been reading all the various HN posts on this topic and you seem to be the closest to the holy grail.
Counter to Moxie's argument[0], I think a lot more people would be willing to host a server, if all they had to do to host a server is to seed a torrent. We already know many people are willing to do that.
I think this would be interesting if the application itself was just a Docker container that could output to a browser, similar to many other local-hosted approaches[1].
> Good luck with this, I've been reading all the various HN posts on this topic and you seem to be the closest to the holy grail.
Thanks, its nice to hear that after all those years working on it and keep going by understanding how important is for all of us to get out of this rigged game where the cloud computer(+ web clients rulers) FAANG's are controlling where we are heading and a future where we(hackers) have less options technologically, not having enough freedom to program services that can ignore the mothership and create the same sort of services only relying on peers, is a goal being pursuit by them.
> I think this would be interesting if the application itself was just a Docker container that could output to a browser, similar to many other local-hosted approaches[1].
What my solution do is that the application have a "service" process that works like daemon to the application, being able to resolve route requests both locally (when over IPC) and remotely (when over RPC).
The routes are meant to serve html or any other content to the UI application (which is also another process akin to the Chrome renderer).
But the fact that every application might have a ever-running process, means that in some cases it can create other kind of services. For instance:
You can create a PostgreSQL wrapper application where you present the the database SQL api over RPC as a service, and manage the PostgreSQL instance with your service process.
The same works for Docker or QEMU for instance. Your application can be docker-based and can be deployed only on Linux.
Your applications (the service and the UI process) are natives by the way, and even the UI application controls the Web layer (rendering, layout, etc) nativelly talking directly to it (as a first SDK enabled over Swift for now).
This architecture will allow you to do those things, and present a way for the users to interact with your solution in a UX that is packed and develop together with the "backend".
With this hole thing you program the frontend and the backend as a whole, in the same programming language.
Isn't this just a log (or a stream like Kafka or Kinesis)? In fact you might even be able say every database already has this ;) (binlog, oplog, etc)
> not the whole db
If rows are immutable, what part of the db is left not being immutable? If rows were immutable, doesn't that imply that any existing "ranges, searches, indexes, etc" are immutable too?
If you going to the effort of making a decentralized database, why not also decompose all of these parts from one another... no reason the tables (logs), indexes, search, etc have to live in the same database, they could be spun off as completely different parts. Basically a something that indexes logs and then something else that takes those indexes and makes the searchable.
In fact this is all the rage right now with centralized databases... all of the work being done with streaming systems just seems to be an effort of decomposing and inverting databases... every old is new again.
Anyways, I agree with the idea and don't really know enough about decentralized systems to really understand why such a distributed database can't or hasn't already been built.
There should be a separation of data and the actual website that does the reading. I once thought about something similar (not for torrents), but having to download the whole database (300MB and more) was a bad idea for UX.
So what I propose is this: There is a central source of truth hosted on IPFS via IPNS or mutable torrents. The data format can be whatever is more efficient to update and store, not easiest to read and query. (An event log is fine) Then, there is a reader application, like an RSS reader. The reader application can be a desktop application or it can be a centrally hosted web app that keeps a conventional SQL database behind the scenes. The idea is that there will be multiple reader providers like there are competing email providers.
IPFS requires "pinning" content for it to exist, meaning you host it, whoever pins content becomes liable for its distribution, if no one pins it, it disappears. That's why you can't freely host things like child porn on IPFS.
No one is going to pin encrypted data they do not understand. And IPFS doesn’t get users to seed random files, you only seed stuff you downloaded. So really it’s not much different to hosting it on a personal http server except any downloaders can also host.
> No one is going to pin encrypted data they do not understand.
Select users who have the decryption key (communicated out-of-band) might be willing to pin the encrypted data for the benefit of others who also have the key. There are also for-pay pinning services who probably wouldn't care whether the data was encrypted. And whether it's pinned it or not, as long as it's downloaded the data remains available to be shared (at least until it's GC'd).
Ok but you could do that on literally any platform. I could encrypt data and stick it on google drive or aws and the hosts wouldn’t have any clue what it was.
Sure, but doing it through IPFS means that you aren't dependent on a particular centralized service. If you put the encrypted file on Google Drive or AWS and then later take it down, the link breaks. Or someone might edit the file so that the link no longer gives the same content. On IPFS the original link continues to work as long as anyone has the file, and the content is immutable, so there are some advantages compared to plain HTTP.
Love this website too. But in this case the database and search engine is hosted on the server, it only uses IPFS to distribute a copy of a folder ready out of the box (db included) to be hosted as another instance (a backup).
You actually should be able to access it using IPFS at/ipns/torrent-paradise.ml or /ipns/12D3KooWB3GY1u6zMLqnf3MJ8zhX3SS1oBj7VXk3xp6sJJiFGZXp but it seems that clicking search throws an JS error :/
Also the assumptions here kind of remind me of domain fronting: https://www.zdnet.com/article/amazons-aws-latest-to-give-up-.... Basically, it's assuming IPFS will protect it from the authorities, but what might actually happen is it makes IPFS a target of the authorities. Now that probably won't happen with this because the authorities don't actually care that much about torrenting/piracy, but the error is still there.
The "rules for the government to follow" are the laws, and these aren't being made up, but rather pre-exist (safe harbour laws) with some pretty strong backers (goog et al)
> The "rules for the government to follow" are the laws, and these aren't being made up, but rather pre-exist (safe harbour laws) with some pretty strong backers (goog et al)
That's true, but to elaborate on that quote a little bit: you can still make the mistake the it describes by making up your own interpretations of "the laws," rather than making up rules out of whole cloth. That's extremely common (especially with Constitutional law), and probably the most frequent way of making that mistake.
Exactly. I forget where I just saw this. It seemed like a particularly absurd example, but the examples are numerous. Torrent sites don’t have the copyrighted content either, but they get shut down all the time.
But the torrents themselves tend to stay seeded... if you stick the site itself on IPFS and anyone in any jurisdiction can "seed" it, then it's going to be hard to kill by court order.
These projects while technically impressive and interesting, I'm sure wouldn't make much of a difference to copyright holders. I can imagine something like this:
- Please take down our IP
- I can't because I am not hosting it
- But you are displaying it on your website, take your website down or filter what your website displays to avoid showing copyrighted material
And that's the end of it. It doesn't matter where the data is stored. It does matter, a lot, where the data is displayed. If your website displays illegal content, you're facilitating access so you're breaching the law.
And, honestly, I'm not sure there's an easy solution to that other than using onion or I2P sites which are only marginally more safe against copyright takedown requests because, due to their very nature, they are obscure and difficult to access.
You can do that. And I like the idea. But for now, torrents are the go to method of sharing large pieces of data, and utilizing IPFS to move the centralized aspect of torrents to a decentralized space is a good idea.
I wonder why the author ignored the option of compression in the post.
Even with a simple gzip DEFLATE compression, those 10MB of plain text could get as small as a 1MB archive and possibly more, meaning that in a compressed 10MB payload you could fit much much more than 135K records.
It isn't 10MB of plain text though, it's 10MB of binary SQLite database. I agree that compression would be useful here, but I don't think a simple gzip DEFLATE would be.
I was curious so I compressed that torrent db with a few different methods:
The 10MB estimated size came from [100 bytes per row] * [100k rows].
50 of the bytes per row were "description", which should compress well (2-3x, I'd guess).
40 bytes per row were the IPFS ID/hash, IIUC. I assumed this is like a Git hash, 40 hex chars, which is really just 20 bytes of entropy.
He also estimated 14 bytes for the size (stored as a string representation of a decimal integer, up to 1e15 - 1, or 1PB?). That's about 50 bits or 6-7 bytes, as a binary integer. Sizes wouldn't be uniformly distributed though so it would compress to even fewer bytes.
So if SQLite was smart (or one gzips the whole db file, like you did), it makes sense that a factor of 2 or so is reclaimable.
Theres already like 10 crypto projects implementing that. The most prominent is probably Bit Torrent Token (BTT).
https://www.bittorrent.com/token/btt/
Reading just the title, I was expecting storing/serving magnet files on DNS, as someone did a few years back if I remember correctly (just as a joke/poc, not as a serious solution).
I'm pretty sure it was posted on hw, but I can't find the link anymore.
You have "publish ids" which you control using private keys.. like a dns... so you can have your main page address known and all the other posts linked from that page
It has a reputation for being slow if you're not using the DNS-based version of the protocol (DNSLink). Though the experimental IPNS-over-PubSub interface helps if both sites are using it.
I knew I could do it, I just wanted to make a demo of the whole thing to see if it would have worked in practice. But yes there's space for a lot of improvements like compression and lazy load with sqlite (cited in a comment)
That aside, would this method scale? 135,000 torrents doesn't seem comprehensive, so I would expect real world use to have many more. Maybe a different SQLite db for different categories?