On Thu, Jul 29, 2021 at 1:49 PM Rob Clark robdclark@gmail.com wrote:
On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly caleb.connolly@linaro.org wrote:
On 29/07/2021 21:24, Rob Clark wrote:
On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly caleb.connolly@linaro.org wrote:
Hi Rob,
I've done some more testing! It looks like before that patch ("drm/msm: Devfreq tuning") the GPU would never get above the second frequency in the OPP table (342MHz) (at least, not in glxgears). With the patch applied it would more aggressively jump up to the max frequency which seems to be unstable at the default regulator voltages.
*ohh*, yeah, ok, that would explain it
Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 0.988v (instead of the stock 0.516v) makes the GPU stable at the higher frequencies.
Applying this patch reverts the behaviour, and the GPU never goes above 342MHz in glxgears, losing ~30% performance in glxgear.
I think (?) that enabling CPR support would be the proper solution to this - that would ensure that the regulators run at the voltage the hardware needs to be stable.
Is hacking the voltage higher (although ideally not quite that high) an acceptable short term solution until we have CPR? Or would it be safer to just not make use of the higher frequencies on a630 for now?
tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already on CC and I added sboyd, maybe one of them knows better.
In the short term, removing the higher problematic OPPs from dts might be a better option than this patch (which I'm dropping), since there is nothing stopping other workloads from hitting higher OPPs.
Oh yeah that sounds like a more sensible workaround than mine .
I'm slightly curious why I didn't have problems at higher OPPs on my c630 laptop (sdm850)
Perhaps you won the sillicon lottery - iirc sdm850 is binned for higher clocks as is out of the factory.
Would it be best to drop the OPPs for all devices? Or just those affected? I guess it's possible another c630 might crash where yours doesn't?
I've not heard any reports of similar issues from the handful of other folks with c630's on #aarch64-laptops.. but I can't really say if that is luck or not.
Maybe just remove it for affected devices? But I'll defer to Bjorn.
Just as another datapoint, I was just marveling at how suddenly smooth the UI was performing on db845c and Caleb pointed me at the "drm/msm: Devfreq tuning" patch as the likely cause of the improvement, and mid-discussion my board crashed into USB crash mode: [ 146.157696][ C0] adreno 5000000.gpu: CP | AHB bus error [ 146.163303][ C0] adreno 5000000.gpu: CP | AHB bus error [ 146.168837][ C0] adreno 5000000.gpu: RBBM | ATB bus overflow [ 146.174960][ C0] adreno 5000000.gpu: CP | HW fault | status=0x00000000 [ 146.181917][ C0] adreno 5000000.gpu: CP | AHB bus error [ 146.187547][ C0] adreno 5000000.gpu: CP illegal instruction error [ 146.194009][ C0] adreno 5000000.gpu: CP | AHB bus error [ 146.308909][ T9] Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP [ 146.317150][ T9] Modules linked in: [ 146.320941][ T9] CPU: 3 PID: 9 Comm: kworker/u16:1 Tainted: G W 5.14.0-mainline-06795-g42b258c2275c #24 [ 146.331974][ T9] Hardware name: Thundercomm Dragonboar Format: Log Type - Time(microsec) - Message - Optional Info Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic S - QC_IMAGE_VERSION_STRING=BOOT.XF.2.0-00371-SDM845LZB-1 S - IMAGE_VARIANT_STRING=SDM845LA S - OEM_IMAGE_VERSION_STRING=TSBJ-FA-PC-02170
So Caleb sent me to this thread. :)
I'm still trying to trip it again, but it does seem like db845c is also seeing some stability issues with Linus' HEAD.
thanks -john