Hacker News new | past | comments | ask | show | jobs | submit login
Full Ethereum blockchain now available as a BigQuery public dataset (cloud.google.com)
133 points by matt2000 on Aug 31, 2018 | hide | past | favorite | 44 comments



It's not that difficult, just a bit annoying, to get the full Ethereum and Bitcoin blockchains running locally and create a database-type access. You will probably spend more time waiting for the download to finish than actually writing code. Bitcoin is a little harder than Ethereum because of the format changes and that might require using an older version (at least that's what I did).


It's not trivial, that's for sure. And it is not download times that kill you, it is recomputing that is really slow, and you need it if you want full history. You really want to have SSDs. The export itself is quite fast though.


Yes, an SSD is needed for sure. I just called it a download because that's when bitcoind/geth do their validation/sync.


Sure, just wanted to make it clear, since download of that 1TB (?) could be much faster.


But the problem is that using BigQuery is not cheap even when it is public.


And you still have to implement your own "verify against actual blockchain node" function yourself before actually trusting any query results.


Is the purpose not more for analysis than validation?


I don't follow cryptocurrency news, but the example query of cost transferred sums is interesting. What the heck happened around July 2nd?


There was a Chinese decentralized exchange that (foolishly) decided they would include ERC20 coins for trading according to which had the most transfers from unique accounts to the exchange (I believe?). This incentivised individuals who wanted their coins included to distribute them among as many sock puppet accounts as possible and transfer them separately to the exchange. The resulting transaction traffic had a non-negligible impact on network capacity. In cases of load like that (or many individuals with strong incentives for their transactions to get in) people pay higher fees to increase the probability that their transaction is included in the next block.


A bunch of hungover Canadians?


In case anyone is curious about the size of the Ethereum blockchain (which the linked article doesn't even mention but seems to inevitably come up in comments), I took some measurements recently: https://twitter.com/shazow/status/1004506114392838146

tl;dr: ~65GB of bandwidth is required to download all the necessary data. The default node indexing and denormalization of this data takes around 100GB after compacting.


For practical interaction with the network this can be drastically reduced by using light sync, of course.


Wonder how they decide whether or not a contract is ERC20 or ERC721 compliant to set the `is_erc20` and `is_erc721` values in the `contracts` table.


Probably by checking if contract supports the mandatory functions.

(I did some eth blockchain analytics a while ago)


Reads like a good April Fools' Day joke.


haha, the size of the Ethereum blockchain is making it centralized. Less and less nodes can download and keep up with it.


>haha, the size of the Ethereum blockchain is making it centralized

This is to be expected. Centralization and hierarchies are the fundamental ways to deal with complexity, which is why all non-hierarchical systems are doomed if they grow too large. That's as true for physical or biological systems as it is for markets and currencies, which is why the whole decentralised crypto dream is a fool's errand.


Crude statements like this paper over so much of the reasoning and detail around the purpose of blockchains. It's so much more than a single dimensional configuration slider between centralized and decentralized.


I don't mean this as an attack or to be argumentative but could you elaborate on why it's so much more than a single dimensional configuration slider?

A comment with a statement or assertion and no reasoning is very difficult to discuss.


I could create a long list, but to start, it's not directly controlled by a single individual or organization. The mechanism used to assert leverage is different than typical centralized systems. It's not a contract or a special governmentally assigned right or physical intimidation that "centralizes" the network. It's simply computing resources. But unlike other centralized systems, anyone anywhere with enough of those resources could assert the same leverage on the network as anyone else. Furthermore, because of its distributed nature, even if someone had a majority of computing resources mining the network, the protocol is designed to disincentivize participants from cheating as it would destroy the value of the network. And they couldn't even tamper with the ledger, as public key cryptography is used to sign transactions, so they could only stop transactions from happening. It is also fully visible and transparent.

This is fundamentally different from your typical centralized database sitting behind physical walls with physical security and governmental protection that a single entity could mutate and obfuscate at will.


Technically the Ethereum network is majority owned by Vitalik Buterin and the folks who payed 0.50 USD (likely much less, if paid in BTC mined earlier).

72,009,99 of the 101,684,297 Ether means the Ethereum network has a worse Gini coefficient than North Korea or really any Fiat based economy.

See also: "Quantifying decentralization"

https://news.earn.com/quantifying-decentralization-e39db233c...


>likely much less, if paid in BTC mined earlier).

You realize that bitcoin is money, right?

It's like saying "You can afford to sell me your house for $10k today, afterall you only bought it for $5k back in 1910".

When the BTC was spent at the time of the ETH crowdsale... the person doing the buying compared options of:

A) Selling BTC at prevailing prices

B) Doing nothing

C) Spending BTC on ETH at prevailing prices

Another analogy:

Because you bought a car today for 30k, funded by the 150 shares of AAPL you sold just now.... means you only realy paid 3k for that car because you bought AAPL shares cheaper in the 1990's.

I'm just pointing out the absurdity of the statement and the significant misunderstanding about the nature of capital.

Then going on to talk about the Gini coefficient as though it means anything at all seals the deal that you are out of your depth and just trying to sound smart


You realize Bitcoin/Ethereum are simply computer software right?

And you missed the essential caveat; aprox 4.11% of Bitcoin addresses control 96.53% of all BTC in circulation. This is a conservative estimate, as anyone familiar with how Bitcoin addresses and wallets work, would know one user is likely controlling many addresses.

A perfect example of this, is a few days ago 1933phfhK3ZgFQNLGSDXvqCn32k2buXY8a created a script to subdivide 111,114 BTC into several hundred addresses, from 60,000 / 30,000 / 20,000 / 10,000 / 5,000 / 500 and then to 100 BTC accounts, over the course of a few hours. [2] Following the movement here now leads to recent deposits into the Binance and Bitfinex wallets.

And most importantly, on the computer science behind Ethereum - In any DLT network with an adversarial threat model it's impossible to create a smart contract with any functionality relying on external data inputs (betting on the outcome of a sports game, or tracking any real world data input) within the network as there's no way to validate the authenticity of that data unless a trusted 3rd party is designated, at which point the network becomes useless. Not even to mention the question of why anyone would want to use a token with such wildly fluctuating market price, and who's supply is controlled by a small userbase of oligarchs.

Additionally, the entire cryptocoin market has an Achilles' heel.. Tether, and Bitfinex are widely suspected of counterfeiting aprox $4,000,000,000 USD (by producing USDT for free anytime they want) [3] [4] [5]

[1] https://www.sec.gov/rules/sro/cboebzx/2018/34-83520.pdf

[2] https://www.reddit.com/r/Bitcoin/comments/9bfnff/near_1b_are...

[3] https://medium.com/@bitfinexed/latest

[4] https://blog.chainalysis.com/reports/tether-aug

[5] https://www.bloomberg.com/news/articles/2018-08-24/not-even-...


I think "centralization" is pretty ambiguous. Is Bitcoin centralized? Full nodes can be run on home PCs, so verification is decentralized, but only ASIC mining is profitable, so mining has become highly centralized. The Lightning Network adds additional roles, like payment hubs and watchtowers, with varying degrees of centralization.

So we can't characterize Bitcoin as a whole as centralized or decentralized. We should talk about specific roles, and the various attacks that could be performed by coordinated members within each role. Like we could talk about the possibility of Bitmain coordinating a 51% double spend attack, or the possibility of a major LN hub bribing watchtowers to ignore foul play.


How are markets centralized, exactly? If centralization and hierarchies were the answer to all complexity, then the feudal system would still be relevant.

Markets are complex, organic things, with a lot of players.


Really? funny to think we are anything but feudal. Markets may have large number of players but the number of players which matter are much less.


I'm not even sure what you mean by that, but if you think we have anything remotely resembling a feudal economy, I invite you to do more research into history.


That's why planned economies work so well, right?


centrally planned economies do not work well, but the focus here lies on the word 'central', not 'planned'. All economies are planned to a significant degree, albeit in corporations (which internally do plan rather than rely on market mechanisms). The equivalent of cryptocurrencies in the economic sphere would be turning every company into an army of independent individual contractors whose only tool is the legal contract. This would lead to unmanageable complexity and overhead that would be unacceptable and very soon end in people naturally organizing in competent hierarchies, producing the dreaded institutions and middlemen that cryptocurrency tries to do away with.

Read for this very question Coase's essay, The Nature of the Firm.


I've read Coase. The equivalent of cryptocurrencies in the economic sphere would be exactly what cryptocurrencies are: partially decentralized. Firms form due to the transaction costs of decentralization. Mining pools form due to the transaction costs of decentralization.


  Mining pools form due to the transaction costs of decentralization.
Better explained as a result of the mining protocol being winner takes all, and is granted write access authority (for guessing a increasingly larger random number), that by pooling everyone's lottery tickets together and paying the pool operator, the agree to take a small sum of each successful guess from the pool (assuming the pool fairly pays out participants).

"decentralized" except not.


Variance is a transaction cost.


Obviously there is a happy gray between completely black and completely white.


Fewer full nodes can download and keep up. But light weight ethereum clients[1] shouldn’t have an issue.

[1] https://github.com/ethereum/wiki/wiki/Light-client-protocol


People running light clients should also be running their own full nodes, otherwise the shortage of peers will only get worse https://github.com/ethereum/go-ethereum/issues/15454


Ultimately, all full nodes will also be light clients--that is the nature of the upcoming sharding changes.

It does not make sense to have a full node for every light client. A full node is perfectly capable of safely serving hundreds or thousands of light clients.

Also there are several initiatives towards incentivizing more light clients. I am working on one: https://vipnode.org/


I wasn't implying a full node for every light client, of course that doesn't make sense. People using light clients in their products, like a wallet or DApp, are probably adding tens to thousands of light clients. When you add that load to the network, you should also deploy a few full nodes, and probably add them as static peers if you want the application to work well.


> Also there are several initiatives towards incentivizing more light clients. I am working on one: https://vipnode.org/

This looks cool, I may have to give it a try!


This is why Bitcoin blockchain is more concerned about keeping the block sizes small and efficient as possible than adding fancy smart contract features that just add bloat to the blockchain.


Ethereum's block sizes are quite small, about 25kb per block right now.

Actually, block sizes are not limited by bytes like in bitcoin, but by 'gas'. This gas limit can be dynamically adjusted by miner votes, and the way the incentives work, it keeps the blocks not too big, but also not too small.

One feature of Ethereum is that it automatically discourages mining centralization using the 'uncle rewards' system. When the blocks increase in size (and thus put pressure on centralization as you noted), the uncle rate increases too, which is undesirable for miner profits. If the uncle rate gets too high, miner's interest is to vote down the block 'gas' limit, this ensures all the blocks can prpagate around the network fairly.


I didn't even know this. That's a great example of how interesting you can get with the incentives system, makes me so excited for the future.


compare the size of the bitcoin blockchain and the ethereum blockchain.

Now see when Bitcoin was launched and see when Ethereum was launched.

Facts speak for themselves.


The public access to application data showcased here seems to strengthen my suspicion that Ethereum will spawn an era of machine learning innovation. It seems analogous to what happened with TCP/IP: the former lets everyone connect 1-1, while the latter lets everyone access and analyze entire application datasets. Of course it is possible to provide universal access to an application dataset with TCP/IP, but this is costly (effort, money) and incentives don't always align; organizations of today often work to prevent access to application datasets. The ethereum future, on other hand, is exciting: organizations having an incentive to share application datasets, making money with innovative analysis / applications of the datasets.


The incentive you speak of to share application datasets - is that a Layer 2 application on Ethereum where a token would be used for accessing data? Intriguing idea but not sure I completely follow what you are describing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: