Pandoc + Zotero + better-bibtex user here. This is indeed nice, thanks. One possible problem I see (but I am not sure what could be done to solve it) is that often metadata are incomplete or partially wrong, and I need to fix the resulting bibtex entry by hand. This would be difficult with your tool, I think. Have you thought about this?
Each citation is added to a formatted JSON file when it is first seen. After this it stays the same. This means you can fix the citation data by hand by just editing the JSON file. You should include the JSON file in your version control. An example is here [1]
Of course this might not be as comfortable as a GUI editor like Zotero, but I think I still prefer doing some small tweaks to a JSON file vs. the problems I've had in the past with Zotero exporting. The auto-export didn't always work consistently, and Zotero has many features that I don't need at all that make the whole thing more complex and fragile. Even collections are somewhat confusing to work with - I've often added citations to the wrong collection because it always adds to the currently open one, and then had to spend time to find it again. You also can't know which references are actually used and which aren't.
And I pretty much always gave up as soon as someone else needs to work on the same paper, which then leads to managing the .bib file by hand anyways.
Thanks for this. Just a quick follow-up: if I use the same reference in different papers, do I need to make those changes we were discussing on each per-paper json file, or is there a way to do them once and for all?
Well, you can use the same json file for multiple documents.
By default it will be stored in the directory from which pandoc is invoked, but you can also change that by setting `url2cite-cache: filename` in the markdown frontmatter or in the pandoc invocation: `pandoc -M url2cite-cache=../bla.json --filter ...`.
> All citation data is cached (permanently) as bibtex as well as CSL to citation-cache.json. This is both to improve performance and to make sure references stay the same forever after the initial fetch, as well as to avoid problems if the API might be down in the future. This also means that errors in the citation data can be fixed manually, although if you find you need to do a lot of manual tweaking you might again be better off with Zotero.
What happens if the link content dies? Zotero pulls down a full text copy of the document and saves it, IIRC, and I no longer trust URLs on the web to stay up for any period of time (or be immutable at that position).
That's one problem with relying on automatically extracted information - sometimes it's not really what you want. In this case that's just what GitHub puts in the og:description tag for large? repos, probably to make it appear that way on Google. Of course I could fix it for this instance but then it wouldn't really accurately represent what you can expect...
The relevant code to extract that is a Zotero Translator [1] so that's what would have to be changed for this.
The floating footnotes need to be way more opaque, they're unreadable as is.
Reverse citation would also be interesting feature to have - on gwern.net, I use https://ricon.dev/ to provide reverse citation links which search on DOI (if specified) or title (if no DOI) of links.