It does a Fourier analysis of sections of the song, and puts the results in a database. A Fourier analysis yields what frequencies make up a waveform along with their amplitudes, so it is very compact.
Taking the DTFT of a signal yields exactly the same amount of information, so it's not really more compact. Shazam used a spectrogram (which is more information than the original signal) and searched for peaks to create a finger print.
It's not the analysis that is compact, but the fingerprint derived from it.
I know it contains the same information, but it makes it easy to discard the low amplitude frequencies, and the frequencies that are not heard by the ears, or are not particularly important to our ears.
It does a Fourier analysis of sections of the song, and puts the results in a database. A Fourier analysis yields what frequencies make up a waveform along with their amplitudes, so it is very compact.