trntensor v0.12.0: the last NotImplementedError — completing the sharding contract
v0.12.0 closes the last NotImplementedError on the CPU-testable sharding surface. The _execute_sharded function could already name two operand types — output-parallel (sharding dimension maps to an output index) and reduce-parallel (sharding dimension maps to a contracted index) — but a single einsum containing both kinds raised NotImplementedError("Mixed…"). v0.12.0 replaces that raise with a nested dispatch loop. The architectural story is that Trainium's multi-chip topology named this loop structure before the code existed to implement it.