Our "text" isn't actually text, but rather 32-byte bytea values (SHA256 hashes)....

indigo945 · on Aug 30, 2023

Well, hashes aren't strings, they are binary blobs often represented as a hex string. Storing them as bytea may give better performance than dropping them all into a humongous table, even though it wastes slightly more disk space (if values indeed repeat that often).

derefr · on Aug 30, 2023

I'm not sure you read what I wrote correctly. The technique is called "string interning" regardless of what exactly you're interning. In our case, we have a table assigning IDs to 32-byte bytea values.

(Also, to be pedantic, a Postgres BYTEA is a string; it's just what a programming language would call a "raw string" — i.e. a string of bytes. Postgres TEXT, meanwhile, is a string of characters, required to be valid in a given character encoding. The PG TEXT and BYTEA types are the same ADT with the same set of applicable operations. Of course, this puts the lie to the name "BYTEA" [i.e. "byte array"] — it's definitely not an array ADT of any kind. If it were, then you'd be able to use PG's array operations on it!)