Three transformer architectures converge to the same number. Why? | Manifund