On 2019-08-15 8:54 p.m., Jason Gunthorpe wrote:
On Thu, Aug 15, 2019 at 08:52:56PM +0000, Yang, Philip wrote:
hmm_range_fault may return NULL pages because some of pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is for swapped out page pte path, hmm_vma_handle_pte doesn't update fault variable from cpu_flags, so it failed to call hmm_vam_do_fault to swap the page in.
The fix is to call hmm_pte_need_fault to update fault variable.
Change-Id: I2e8611485563d11d938881c18b7935fa1e7c91ee
I'll fix it for you but please be careful not to send Change-id's to the public lists.
Thanks, the change-id was added by our Gerrit hook, I need generate patch files, remove change-id line and then send out modified patch files in future.
Also what is the Fixes line for this?
This fixes the issue found by the internal rocrtst, the rocrtstFunc.Memory_Max_Mem evicted some user buffers, and then following test restore those user buffers failed because the buffers are swapped out and application doesn't touch the buffers to swap it in.
Signed-off-by: Philip Yang Philip.Yang@amd.com mm/hmm.c | 3 +++ 1 file changed, 3 insertions(+)
Ralph has also been looking at this area also so I'll give him a bit to chime in, otherwise with Jerome's review this looks OK to go to linux-next
Ok, thanks for helping push this to hmm branch at https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Thanks, Jason