I am releasing JFK-TELL, a dataset I generated by extracting text from the scann...

ggm · 2025-04-09T07:23:07 1744183387

Who is independently error checking this? Surely given the depth of conspiracy theory, simply machine driven OCR with no independent validation is going to "feed the beast" more than it intends?

farhanhubble · 2025-04-09T07:54:03 1744185243

The intention is that it should be used for all kinds of research, including the efficacy of LLM OCRs. It shouldn't be too difficult to look at the original PDF to substantiate any "fact" anyone wants to quote from the dataset.

I'm more interested in seeing if people can find new insights, questions, or inconsistencies by reviewing the documents at scale, automatically.