You gotta be in the in-crowd to understand that this paper, like so many others, is one of those dumb posthoc analogy/metaphor papers. These papers are where they just ran a bunch of experiments (ie just ran the training script over and over) and formulated a hypothesis empirically. Of course in order to lend the hypothesis some credibility they have to make an allusion to something formal/mathematical:
> Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain
> Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain
Brilliant and very rigorous!