n 2015/9/29 18:51, Borislav Petkov wrote:
On Tue, Sep 29, 2015 at 04:50:36PM +0800, Jiang Liu wrote:
So could you please help to apply the attached debug patch to gather more information about the regression?
Sure, just did.
I'm sending you a full s/r cycle attempt caught over serial in a private message.
Hi Boris,
From the log file, we got to know that the NULL pointer dereference
was caused by AMD IOMMU device. For normal MSI-enabled PCI devices, we get valid irq numbers such as: [ 74.661170] ahci 0000:04:00.0: irqdomain: freeze msi 1 irq28 [ 74.661297] radeon 0000:01:00.0: irqdomain: freeze msi 1 irq47 But for AMD IOMMU device, we got an invalid irq number(0) after enabling MSI as: [ 74.662488] pci 0000:00:00.2: irqdomain: freeze msi 1 irq0 which then caused NULL pointer deference when __pci_restore_msi_state() gets called by system resume code. So we need to figure out why we got irq number 0 after enabling MSI for AMD IOMMU device. The only hint I got is that iommu driver just grabbing the PCI device without providing a PCI device driver for IOMMU PCI device, we have solved a similar case for eata driver. So could you please help to apply this debug patch to gather more info and send me /proc/interrupts? Thanks! Gerry
O>
Thanks.