Well to be honest this is more or less the expected behavior. The GPU crashed in a not recoverable way and so we ended in an endless loop trying to reset the hardware over and over again when userspace did a command submission. This is unfortunate, but crashing the kernel is always possible and exactly this is the reason why there are usually dedicated micro-controllers for temperature and fan control.