Re: Debugging Thinkpad T430s occasional suspend failure.

16 Feb 2013

      On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins hughd@google.com wrote:
...
I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
yesterday, to save an oopsing core kernel ip there, instead of hashed
pm trace info (it makes sense in this case to invert your sequence,
putting the high order into years and the low order into minutes).
That sounds like a good idea in general. The PM_TRACE() thing was done
to figure out things that locked up the PCI bus etc, but encoding the
oopses during suspend sounds like a really good idea too.
Is your patch clean enough to just be made part of the standard
PM_TRACE infrastructure, or was it something really hacky and one-off?
...
Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
/home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
         * enable dithering as needed, but that costs bandwidth.  So choose
         * the minimum value that expresses the full color range of the fb but
         * also stays within the max display bpc discovered above.
         */
    switch (fb->depth) {

ffffffff812d60e9:       48 8b 55 c0             mov    -0x40(%rbp),%rdx
ffffffff812d60ed:       8b 02                   mov    (%rdx),%eax
(gcc chose to pass a pointer to fb->depth down to the function,
instead of fb itself, since that is the only use of it there.)
I expect that fb is NULL; but with an average of one failure to resume
per day, and ~26 bits of info per crash, this is not a fast procedure!
I notice that intel_pipe_set_base() allows for NULL fb,
so I'm currently running with the oops-in-rtc hackery, plus

  switch (fb->depth) {

  if (WARN_ON(!fb))

          bpc = 8;

  else switch (fb->depth) {

There's been a fair bit of change to intel_display.c since 3.7 (if
my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
ironlake_, but I've not yet spotted anything obvious; nor yet looked
to see where fb would originate from anyway.
Once I've got just a little more info out of it, I'll start another
thread addressed principally to the drm/gpu/i915 guys.
I think it's worth it to give them a heads-up already. So I've cc'd
the main suspects here..
Daniel, Dave - any comments about a NULL fb in
intel_choose_pipe_bpp_dither() during either suspend or resume? Some
googling shows this:
https://bugzilla.redhat.com/show_bug.cgi?id=895123
which sounds remarkably similar, and is also during a suspend attempt
(but apparently Satish got a full oops out).. Some timing race with a
worker entry?
Linus

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Debugging Thinkpad T430s occasional suspend failure.