Re: [PATCH 0/3] drm/radeon kexec fixes

11 Sep 2013

On Wed, Sep 11, 2013 at 5:01 AM, Markus Trippelsdorf
markus@trippelsdorf.de wrote:
...
On 2013.09.09 at 11:38 +0200, Christian König wrote:
...
Am 09.09.2013 11:21, schrieb Markus Trippelsdorf:
...
On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote:
...
Markus Trippelsdorf markus@trippelsdorf.de writes:
...
Here are a couple of patches that get kexec working with radeon devices.
I've tested this on my RS780.
Comments or flames are welcome.
Thanks.
A couple of high level comments.
This looks promising for the usual case.
Removing the printk at the end of the kexec path seems a little dubious,
what of other cpus, interrupt handlers, etc.  Basically estabilishing a
new rule on when printk is allowed seems a little dubious at this point,
even if it is a useful debugging trick.
OK. I will drop this patch. It doesn't seem to be necessary, because I
cannot reproduce the printk related hang anymore.
...
Having a clean shutdown of the radeon definitely seems worth doing,
because the cases where we care abouty video are when a person is in
front of the system.
Yes. But please note that even with radeon_pci_shutdown implemented, I
still get ring test failures on roughly every eighth kexec boot:
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
  radeon 0000:01:05.0: disabling GPU acceleration
That's definitely better than the current state of affairs, with ring
test failures on every second boot. But I haven't figured out the reason
for these failures yet. It's curious that once a ring test failure
occurs, it will reliably fail after each kexec invocation, no matter how
often repeated. Only a reboot brings the machine back to normal.
The main problem here is that the AMD gfx hardware doesn't really
support being reinitialized once booted (for various reasons). It's a
(intended) limitation of the hardware design that you can only
initialize certain blocks once every power cycle, so the whole approach
actually will never work 100% reliable.
All you can hope for is that stopping the hardware while shutting down
the old kernel and starting it again results in exactly the same
hardware parameters (offsets, clock etc...) otherwise starting the
blocks will just fail and you end up with disabled acceleration like above.
Sorry, but there isn't much we can do about this,
I've tested this further and it turned out that if I revert commit
f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown"
patch, the initialization failures seem to go away completely.
Any idea what's going on?
You are disabling dynamic power management with that patch reverted.
The patch fixed a copy paste typo in the register.  Bit 0
(SCLK_PWRMGT_OFF) of register SCLK_PWRMGT_CNTL controls whether
dynamic engine clock control is enabled.  Bit 0 (GLOBAL_PWRMGT_EN) of
register GENERAL_PWRMGT controls whether dynamic power management
(dynamic engine/memory/voltage, controls etc.) is enabled at all.
Alex
...
Here's the patch:

diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c
index fa0de46..4e8c1988 100644
--- a/drivers/gpu/drm/radeon/r600_dpm.c
+++ b/drivers/gpu/drm/radeon/r600_dpm.c
@@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev)
 void r600_enable_sclk_control(struct radeon_device *rdev, bool enable)
 {
        if (enable)

          WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF);




          WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF);
  else




          WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);




          WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);



}
void r600_enable_mclk_control(struct radeon_device *rdev, bool enable)
--
Markus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 0/3] drm/radeon kexec fixes