hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in.
The fix is to call hmm_pte_need_fault to update fault variable.
Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee Signed-off-by: Philip Yang Philip.Yang@amd.com --- mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/hmm.c b/mm/hmm.c index 9f22562e2c43..7ca4fb39d3d8 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -544,6 +544,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, swp_entry_t entry = pte_to_swp_entry(pte);
if (!non_swap_entry(entry)) { + cpu_flags = pte_to_hmm_pfn_flags(range, pte); + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, + &fault, &write_fault); if (fault || write_fault) goto fault; return 0;
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote:
hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in.
The fix is to call hmm_pte_need_fault to update fault variable.
Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee Signed-off-by: Philip Yang Philip.Yang@amd.com
Reviewed-by: Jérôme Glisse jglisse@redhat.com
mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/hmm.c b/mm/hmm.c index 9f22562e2c43..7ca4fb39d3d8 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -544,6 +544,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, swp_entry_t entry = pte_to_swp_entry(pte);
if (!non_swap_entry(entry)) {
cpu_flags = pte_to_hmm_pfn_flags(range, pte);
hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
&fault, &write_fault); if (fault || write_fault) goto fault; return 0;
-- 2.17.1
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote:
hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in.
The fix is to call hmm_pte_need_fault to update fault variable.
Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee
I'll fix it for you but please be careful not to send Change-id's to the public lists.
Also what is the Fixes line for this?
Signed-off-by: Philip Yang Philip.Yang@amd.com mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)
Ralph has also been looking at this area also so I'll give him a bit to chime in, otherwise with Jerome's review this looks OK to go to linux-next
Thanks, Jason
On 2019-08-15 8:54 p.m., Jason Gunthorpe wrote:
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote:
hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in.
The fix is to call hmm_pte_need_fault to update fault variable.
Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee
I'll fix it for you but please be careful not to send Change-id's to the public lists.
Thanks, the change-id was added by our Gerrit hook, I need generate patch files, remove change-id line and then send out modified patch files in future.
Also what is the Fixes line for this?
This fixes the issue found by the internal rocrtst, the rocrtstFunc.Memory_Max_Mem evicted some user buffers, and then following test restore those user buffers failed because the buffers are swapped out and application doesn't touch the buffers to swap it in.
Signed-off-by: Philip Yang Philip.Yang@amd.com mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)
Ralph has also been looking at this area also so I'll give him a bit to chime in, otherwise with Jerome's review this looks OK to go to linux-next
Ok, thanks for helping push this to hmm branch at https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Thanks, Jason
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote:
hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in.
The fix is to call hmm_pte_need_fault to update fault variable.
Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee Signed-off-by: Philip Yang Philip.Yang@amd.com mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)
Applied to hmm.git, thanks
I fixed the commit message:
Author: Yang, Philip Philip.Yang@amd.com Date: Thu Aug 15 20:52:56 2019 +0000
mm/hmm: fix hmm_range_fault()'s handling of swapped out pages
hmm_range_fault() may return NULL pages because some of the pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is during the swapped out page pte path, hmm_vma_handle_pte() doesn't update the fault variable from cpu_flags, so it failed to call hmm_vam_do_fault() to swap the page in.
The fix is to call hmm_pte_need_fault() to update fault variable.
Fixes: 74eee180b935 ("mm/hmm/mirror: device page fault handler") Link: https://lore.kernel.org/r/20190815205227.7949-1-Philip.Yang@amd.com Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: "Jérôme Glisse" jglisse@redhat.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com
dri-devel@lists.freedesktop.org