Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Parquet is the opposite of simple. Even when good libraries are available (which it usually isn't), it is painful to read a Parquet file. Try reading a Parquet file using Java and Apache Parquet lib, for example.

I skimmed their docs a bit: https://parquet.apache.org/docs/

I would not look forward to implementing that.

It all seems rather complex, and even worse: not actually all that well described. I suppose all the information is technically there, but it's really not a well-design well-written specification that's easy to implement. The documentation seems like an afterthought.

This is probably why good libraries are rare.



Thanks for the link. I couldn't even get past this part:

> Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem

Nope and nope.


Yeah, it all seems very application specific. I mean, for starters columnar storage isn't really appropriate for a lot of data. That's perfectly fine! Nothing wrong with any of this. Just means it's not a great candidate for a general application-agnostic data-exchange format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: