turns out I spoke too fast. The GPU still hangs, but Linux is better at recovering. There are still GPU hang(ring 0 stalled for more than) messages in dmesg.