Web archiving is a solved problem, you record the website in an interactive environment [1], and everything what happens on-screen will be saved in a single file in an open [2] and standardized [3] file format authored by Internet Archive and endorsed by Library of Congress for preservation [4].
You can store the resulting WARC file wherever, be it on S3 or under your pillow.
As an archivist, I urge everybody here to not reinvent the wheel, please..
The technical side is a solved problem, the legal side not so much.
You practically cannot preserve and make available something if the copyright holders don't want you to. If the copyright situation is complicated you bear the risk.
You can say, that this is how it is supposed to be, but it is not like it works in the non-digital realm. You could argue that we'd need something like digital monument protection, where artifacts can be preserved against the copyright holders will.
You can store the resulting WARC file wherever, be it on S3 or under your pillow.
As an archivist, I urge everybody here to not reinvent the wheel, please..
[1] https://webrecorder.net/
[2] https://en.m.wikipedia.org/wiki/WARC_(file_format)
[3] https://www.iso.org/standard/68004.html
[4] https://www.loc.gov/preservation/digital/formats/fdd/fdd0002...