For a lot of details about FFTW checkout my interview RCE 73 of the FFTW authors.
FFTW is probably the most popular DFT library ever. What many don't realize is that there are many ways to calculate an FFT depending on the number of samples. FFTW addresses this by creating a plan and using dynamic code generation to try to quickly calculate an optimal method.
What is not realized is that the number of samples should be kept multiples of small primes, 2,3,5 etc. Calculating an FFT where the number of samples is a single large prime is the worse case.
So in this sample code going from 999983, a prime number, to 999984, who's prime factors are 2 x 2 x 2 x 2 x 3 x 83 x 251, resulted in a performance improvement of about 3x.
Performance isn't always simple.