https://bugzilla.kernel.org/show_bug.cgi?id=207673
--- Comment #2 from phileimer (phil@jpmr.org) --- I can give more information about the over temperature problem :
* if I keep the 120C limit, the card runs at power level 3 until the driver crashes
* limiting at 100C allows the driver to decrease power level to 2 after a small overshoot, i.e. the temperature reaches 103/104C
* once at power level 2, the temperature stabilizes around 96C
* to test further, I decreased the case fan speed, and then, even with the 100C limit, the card continues to run at power level 2 until the driver crashes around 112C
So, there seems to be 2 problems : * the default 120C is clearly too high, at least for this board/chip * the temperature limit is used to go from PWL 3 to PWL 2, but there's no decrease to a lower PWL (1 or 0), as a safe measure