trnfft: the NeuronCore is the unit of parallelism
v0.20 ships multi-NeuronCore batch-split FFT. B transforms dispatched across C NeuronCores gives B/C per core — no shared state, no communication, linear scaling up to B = C. The architecture made this obvious. Getting the dispatch infrastructure right was less so.