[Public]
Hi,
After investigating quite some time on this issue, found freeze problem is not with the amdgpu part of buddy allocator patch as the patch doesn’t throw any issues when applied separately on top of the stable base of drm-next. After digging more into this issue, the below patch seems to be the cause of this problem,
drm/ttm: rework bulk move handling v5 https://cgit.freedesktop.org/drm/drm/commit/?id=fee2ede155423b0f7a559050a397...
when this patch applied on top of the stable (working version) of drm-next without buddy allocator patch, we can see multiple issues listed below, each thrown randomly at every GravityMark run, 1. general protection fault at ttm_lru_bulk_move_tail() 2. NULL pointer deference at ttm_lru_bulk_move_tail() 3. NULL pointer deference at ttm_resource_init().
Regards, Arun. -----Original Message----- From: Alex Deucher alexdeucher@gmail.com Sent: Monday, May 16, 2022 8:36 PM To: Mike Lothian mike@fireburn.co.uk Cc: Paneer Selvam, Arunpravin Arunpravin.PaneerSelvam@amd.com; Intel Graphics Development intel-gfx@lists.freedesktop.org; amd-gfx list amd-gfx@lists.freedesktop.org; Maling list - DRI developers dri-devel@lists.freedesktop.org; Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; Matthew Auld matthew.auld@intel.com Subject: Re: [PATCH v12] drm/amdgpu: add drm buddy support to amdgpu
On Mon, May 16, 2022 at 8:40 AM Mike Lothian mike@fireburn.co.uk wrote:
Hi
The merge window for 5.19 will probably be opening next week, has there been any progress with this bug?
It took a while to find a combination of GPUs that would repro the issue, but now that we can, it is still being investigated.
Alex
Thanks
Mike
On Mon, 2 May 2022 at 17:31, Mike Lothian mike@fireburn.co.uk wrote:
On Mon, 2 May 2022 at 16:54, Arunpravin Paneer Selvam arunpravin.paneerselvam@amd.com wrote:
On 5/2/2022 8:41 PM, Mike Lothian wrote:
On Wed, 27 Apr 2022 at 12:55, Mike Lothian mike@fireburn.co.uk wrote:
On Tue, 26 Apr 2022 at 17:36, Christian König christian.koenig@amd.com wrote:
Hi Mike,
sounds like somehow stitching together the SG table for PRIME doesn't work any more with this patch.
Can you try with P2P DMA disabled?
-CONFIG_PCI_P2PDMA=y +# CONFIG_PCI_P2PDMA is not set
If that's what you're meaning, then there's no difference, I'll upload my dmesg to the gitlab issue
Apart from that can you take a look Arun?
Thanks, Christian.
Hi
Have you had any success in replicating this?
Hi Mike, I couldn't replicate on my Raven APU machine. I see you have 2 cards initialized, one is Renoir and the other is Navy Flounder. Could you give some more details, are you running Gravity Mark on Renoir and what is your system RAM configuration?
Cheers
Mike
Hi
It's a PRIME laptop, it failed on the RENOIR too, it caused a lockup, but systemd managed to capture it, I'll attach it to the issue
I've got 64GB RAM, the 6800M has 12GB VRAM
Cheers
Mike