NKI Backend¶

The NKI dispatch layer controls whether RNG operations run on the native Trainium GpSimd engine (Philox 4×32) or fall back to torch.Generator.

Backend selection¶

import trnrand

trnrand.set_backend("auto")     # NKI on Trainium, PyTorch elsewhere (default)
trnrand.set_backend("pytorch")  # force PyTorch fallback
trnrand.set_backend("nki")      # force NKI (requires neuronxcc)

trnrand.HAS_NKI is True when nki>=0.3.0 is importable. trnrand.get_backend() returns the active backend name.

Environment variables¶

Env var	Effect
`TRNRAND_USE_SIMULATOR=1`	Dispatch routes kernels through `nki.simulate(kernel)(numpy_args)` on CPU — bypasses `torch_xla` and hardware. Use for fast correctness iteration; no NEFF compile.
`TRNRAND_REQUIRE_NKI=1`	Kernel-path failures re-raise instead of silently falling back to PyTorch. Used by the validation suite to catch silent kernel breakage.

See Developing NKI kernels for the simulator vs hardware workflow.

Philox kernel¶

The NKI Philox kernel lives in trnrand/nki/dispatch.py. The strategy:

Counter-based — (counter, key) → output, no shared state across tiles.
Each tile gets a disjoint counter range and runs the multiply-XOR rounds on the GpSimd engine.
Same engine used by cuRAND and JAX.

Status: migrated to NKI 0.3.0 namespace. Philox compiles + executes on trn1 hardware but output has an algorithmic bug under investigation (see #1 / #26). Box-Muller passes the simulator; hits a trn1 compile restriction (NCC_IBIR605) that doesn't apply to trn2+. All trnrand.* generation falls back to torch.Generator by default until the kernel ships.