I'm working at a company that processes data through multiple distinct stages, and struggling to figure out what tooling to use for versioning and maintaining an auditable history of changes.
I'd be interested to hear about first hand experiences with how you version production data, use it safely during testing or experimentation, and maintain audit trails.
Data Version Control - https://news.ycombinator.com/item?id=41888937 - Oct 2024 (52 comments)
Data Version Control - https://news.ycombinator.com/item?id=33047634 - Oct 2022 (59 comments)
Oxen.ai: Fast Unstructured Data Version Control - https://news.ycombinator.com/item?id=34831547 - Feb 2023 (63 comments)
Show HN: Oxen.ai – Fast Unstructured Data Version Control - https://news.ycombinator.com/item?id=34825056 - Feb 2023 (5 comments)
Ask HN: How do you version your data? - https://news.ycombinator.com/item?id=13683539 - Feb 2017 (55 comments)
With regards to tooling, https://github.com/pachyderm/pachyderm may satisfy this use case.