Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
pxx
on June 9, 2021
|
parent
|
context
|
favorite
| on:
Text Classification by Data Compression
Aren't the block sizes too small? gzip uses 64k block sizes and it seems like the compressed sizes are several times larger.
w-m
on June 9, 2021
[–]
How about interleaving the test data then, instead of appending it to the very end? For gzip, if the block size is 64k (another comment says 32k?), split the corpus text into 32k blocks, and interleave it with 32k blocks of the test set.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: