At the risk of sending things off in the wrong direction, my first thought is some kind of funky data caching thing when reading GRBM_STATUS using POWER hardware. If bit 31 were always 1 and the other bits were behaving normally then the idea of being stuck at 100% load would make more sense, but bit 31 stuck at 1 and all the rest stuck at 0 seems really odd.