https://bugzilla.kernel.org/show_bug.cgi?id=205915
Bug ID: 205915 Summary: AMDGPU: Screen flicker after resume from suspend Product: Drivers Version: 2.5 Kernel Version: 5.5.0-rc2 Hardware: x86-64 OS: Linux Tree: Fedora Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: onil@pm.me Regression: No
After resume from suspend screen flickers, flickering intensifies during mouse movement. reverting to 5.4 fixes the issue.
workaround: changing dpm_force_performance_level to "high" (and optionally afterwards to "low") resolves the issue while "auto" introduces it again.
OS: Fedora 31 on gnome wayland GPU: 5700XT Pulse Monitor: LG 27UK650W on DP 1.4 Mesa: 19.3
no info/error message on syslog.
https://bugzilla.kernel.org/show_bug.cgi?id=205915
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #1 from Alex Deucher (alexdeucher@gmail.com) --- Please attach your xorg log (if using X) and dmesg output.
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #2 from onil (onil@pm.me) --- Created attachment 286367 --> https://bugzilla.kernel.org/attachment.cgi?id=286367&action=edit dmesg after resume from suspend
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #3 from onil (onil@pm.me) --- Problem only occurs when dpm_force_performance_level is set to "low" before suspend. if dpm_force_performance_level is "auto" or "high" there is no flickering after resume from suspend.
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #4 from onil (onil@pm.me) --- Please disregard comment #3, flickering still occurs when performance level is "auto", after manually switching it to "low" or "high" flickering stops.
https://bugzilla.kernel.org/show_bug.cgi?id=205915
Manuel Ullmann (labre@posteo.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |labre@posteo.de
--- Comment #5 from Manuel Ullmann (labre@posteo.de) --- Confirmed on navi14. Bisected it to 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4. Log is indicating exactly nothing. I’ll attach them nonetheless, if you happen to see more than me. It’s just a wild guess, but could it be, that your indicator for a safe lowering of the clock is not always a good indicator? I mean you know your code a lot better than me. A clock, that is raised too high while voltage being too low could however be a reason for the flickering on OpenGL operations, that I’m seeing. Have not seen those on Vulkan or framebuffer though. Anyway, have a look for yourself:
commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 Author: Noah Abradjian noah.abradjian@amd.com Date: Fri Sep 27 16:30:57 2019 -0400
drm/amd/display: Make clk mgr the only dto update point
[Why]
* Clk Mgr DTO update point did not cover all needed updates, as it included a check for plane_state which does not exist yet when the updater is called on driver startup * This resulted in another update path in the pipe programming sequence, based on a dppclk update flag * However, this alternate path allowed for stray DTO updates, some of which would occur in the wrong order during dppclk lowering and cause underflow
[How]
* Remove plane_state check and use of plane_res.dpp->inst, getting rid of sequence dependencies (this results in extra dto programming for unused pipes but that doesn't cause issues and is a small cost) * Allow DTOs to be updated even if global clock is equal, to account for edge case exposed by diags tests * Remove update_dpp_dto call in pipe programming sequence (leave update to dppclk_control there, as that update is necessary and shouldn't occur in clk mgr) * Remove call to optimize_bandwidth when committing state, as it is not needed and resulted in sporadic underflows even with other fixes in place
Signed-off-by: Noah Abradjian noah.abradjian@amd.com Reviewed-by: Jun Lei Jun.Lei@amd.com Acked-by: Leo Li sunpeng.li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
.../gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c | 14 +++++++++----- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 3 ++- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 ---- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 8 +------- 4 files changed, 12 insertions(+), 17 deletions(-)
modified drivers/gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c @@ -108,11 +108,12 @@ void dcn20_update_clocks_update_dpp_dto(struct clk_mgr_internal *clk_mgr, for (i = 0; i < clk_mgr->base.ctx->dc->res_pool->pipe_count; i++) { int dpp_inst, dppclk_khz;
- if (!context->res_ctx.pipe_ctx[i].plane_state) - continue; - - dpp_inst = context->res_ctx.pipe_ctx[i].plane_res.dpp->inst; + /* Loop index will match dpp->inst if resource exists, + * and we want to avoid dependency on dpp object + */ + dpp_inst = i; dppclk_khz = context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz; + clk_mgr->dccg->funcs->update_dpp_dto( clk_mgr->dccg, dpp_inst, dppclk_khz); } @@ -235,6 +236,7 @@ void dcn2_update_clocks(struct clk_mgr *clk_mgr_base,
update_dispclk = true; } + if (dc->config.forced_clocks == false || (force_reset && safe_to_lower)) { if (dpp_clock_lowered) { // if clock is being lowered, increase DTO before lowering refclk @@ -244,10 +246,12 @@ void dcn2_update_clocks(struct clk_mgr *clk_mgr_base, // if clock is being raised, increase refclk before lowering DTO if (update_dppclk || update_dispclk) dcn20_update_clocks_update_dentist(clk_mgr); - if (update_dppclk) + // always update dtos unless clock is lowered and not safe to lower + if (new_clocks->dppclk_khz >= dc->current_state->bw_ctx.bw.dcn.clk.dppclk_khz) dcn20_update_clocks_update_dpp_dto(clk_mgr, context); } } + if (update_dispclk && dmcu && dmcu->funcs->is_dmcu_initialized(dmcu)) { /*update dmcu for wait_loop count*/ modified drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c @@ -171,7 +171,8 @@ void rn_update_clocks(struct clk_mgr *clk_mgr_base, // if clock is being raised, increase refclk before lowering DTO if (update_dppclk || update_dispclk) rn_vbios_smu_set_dppclk(clk_mgr, clk_mgr_base->clks.dppclk_khz); - if (update_dppclk) + // always update dtos unless clock is lowered and not safe to lower + if (new_clocks->dppclk_khz >= dc->current_state->bw_ctx.bw.dcn.clk.dppclk_khz) dcn20_update_clocks_update_dpp_dto(clk_mgr, context); }
modified drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1241,10 +1241,6 @@ static enum dc_status dc_commit_state_no_check(struct dc *dc, struct dc_state *c
dc_enable_stereo(dc, context, dc_streams, context->stream_count);
- if (!dc->optimize_seamless_boot) - /* pplib is notified if disp_num changed */ - dc->hwss.optimize_bandwidth(dc, context); - for (i = 0; i < context->stream_count; i++) context->streams[i]->mode_changed = false;
modified drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -1202,15 +1202,9 @@ static void dcn20_update_dchubp_dpp( struct dpp *dpp = pipe_ctx->plane_res.dpp; struct dc_plane_state *plane_state = pipe_ctx->plane_state;
- if (pipe_ctx->update_flags.bits.dppclk) { + if (pipe_ctx->update_flags.bits.dppclk) dpp->funcs->dpp_dppclk_control(dpp, false, true);
- dc->res_pool->dccg->funcs->update_dpp_dto( - dc->res_pool->dccg, - dpp->inst, - pipe_ctx->plane_res.bw.dppclk_khz); - } - /* TODO: Need input parameter to tell current DCHUB pipe tie to which OTG * VTG is within DCHUBBUB which is commond block share by each pipe HUBP. * VTG is 1:1 mapping with OTG. Each pipe HUBP will select which VTG
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #6 from Manuel Ullmann (labre@posteo.de) --- Created attachment 286981 --> https://bugzilla.kernel.org/attachment.cgi?id=286981&action=edit Xorg log on commit 50575eb5b339683ee148189beae5eb53ccb3158b
https://bugzilla.kernel.org/show_bug.cgi?id=205915
Manuel Ullmann (labre@posteo.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #286981|0 |1 is obsolete| |
--- Comment #7 from Manuel Ullmann (labre@posteo.de) --- Created attachment 286983 --> https://bugzilla.kernel.org/attachment.cgi?id=286983&action=edit Xorg log of good commit 50575eb5b339683ee148189beae5eb53ccb3158b
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #8 from Manuel Ullmann (labre@posteo.de) --- Created attachment 286985 --> https://bugzilla.kernel.org/attachment.cgi?id=286985&action=edit Kernel log of good commit 50575eb5b339683ee148189beae5eb53ccb3158b
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #9 from Manuel Ullmann (labre@posteo.de) --- Created attachment 286987 --> https://bugzilla.kernel.org/attachment.cgi?id=286987&action=edit Xorg log of bad commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #10 from Manuel Ullmann (labre@posteo.de) --- Created attachment 286989 --> https://bugzilla.kernel.org/attachment.cgi?id=286989&action=edit Kernel log of bad commit 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #11 from Manuel Ullmann (labre@posteo.de) --- Reverting 1ea8751bd28d1ec2b36a56ec6bc1ac28903d09b4 on HEAD, currently 5.5-rc7+, seems to fix it. The revert should probably be limited to the 2 hunks in the clock manager files to preserve the underflow avoidance: - if (update_dppclk) + // always update dtos unless clock is lowered and not safe to lower + if (new_clocks->dppclk_khz >= dc->current_state->bw_ctx.bw.dcn.clk.dppclk_khz)
For the sake of comleteness: Steps to reproduce: 1. Suspend with Navi and activated display core. 2. Resume. 3. Start glxgears.
Actual results: Animation flickers. This happens on any OpenGL accelerated task, like Firefox with hardware acceleration or similar.
Expected results: Flicker-free rendering.
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #12 from Manuel Ullmann (labre@posteo.de) --- Created attachment 287449 --> https://bugzilla.kernel.org/attachment.cgi?id=287449&action=edit Call optimize bandwidth again.
I pinned the issue down to the four lines, which call optimize_bandwidth. Unfortunately the commit message indicates, that its removal fixed other issues, so it can not be simply put back in.
Maybe this one is also vendor specific voltage lowering. Are there some sys nodes, which expose vendor adaptations? I’ve got one from MSI [1] with 130 Watt TDP.
[1]: https://us.msi.com/Graphics-card/Radeon-RX-5500-XT-GAMING-X-8G/Specification
https://bugzilla.kernel.org/show_bug.cgi?id=205915
Manuel Ullmann (labre@posteo.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #287449|0 |1 is patch| | Attachment #287449|application/mbox |text/plain mime type| |
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #13 from Alex Deucher (alexdeucher@gmail.com) --- Does this series help? https://patchwork.freedesktop.org/patch/353367/
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #14 from Alex Deucher (alexdeucher@gmail.com) --- See also: https://bugzilla.kernel.org/show_bug.cgi?id=205915 https://bugzilla.kernel.org/show_bug.cgi?id=206393 https://bugzilla.kernel.org/show_bug.cgi?id=206575
https://bugzilla.kernel.org/show_bug.cgi?id=205915
--- Comment #15 from Manuel Ullmann (labre@posteo.de) --- That fixed it. Thanks.
dri-devel@lists.freedesktop.org