Skip to content

trnsolver

trnsolver v0.10.0: Schur decomposition and the architecture of two-stage algorithms

v0.10.0 ships schur(A)A = Q @ T @ Q.T, T quasi-upper-triangular — closing the last open item in trnsolver's factorization table. The implementation uses Householder-QR throughout, and the reason that works on Trainium is the same reason the earlier Jacobi path for eigh eventually had to grow a second stage: two-stage algorithms present uniform-shape kernel calls, and uniform shapes keep the NEFF cache warm.

trnsolver: Jacobi for Trainium — when the hardware inverts the algorithm choice

Symmetric eigh on Trainium wants Jacobi, not Householder-QR — even though Householder has the better asymptotic FLOP count. The inversion doesn't sit in the arithmetic; it sits in the 128-partition Tensor Engine tile and NKI 0.3.0's per-traced-graph compile model. Phase 1 is the simulator-validated correctness gate; hardware-perf numbers land in Phase 3, once the Tensor Engine reformulation and multi-core sharding are in.