On Tue, Aug 6, 2019 at 11:40 PM Christoph Hellwig <hch(a)infradead.org> wrote:
>
> I'm not an all that huge fan of super magic macro loops. But in this
> case I don't see how it could even work, as we get special callbacks
> for huge pages and holes, and people are trying to add a few more ops
> as well.
Yeah, in this case we definitely don't want to make some magic loop walker.
Loops are certainly simpler than callbacks for most cases (and often
faster because you don't …
[View More]have indirect calls which now are getting
quite expensive), but the walker code really does end up having tons
of different cases that you'd have to handle with magic complex
conditionals or switch statements instead.
So the "walk over range using this set of callbacks" is generally the
right interface. If there is some particular case that might be very
simple and the callback model is expensive due to indirect calls for
each page, then such a case should probably use the normal page
walking loops (that we *used* to have everywhere - the "walk_range()"
interface is the "new" model for all the random odd special cases).
Linus
[View Less]
Most use cases for DRM_MSM will prefer to build both DRM and MSM_DRM as
modules but there are some cases where DRM might be built in for whatever
reason and in those situations it is preferable to still keep MSM as a
module by default and let the user decide if they _really_ want to build
it in.
Additionally select QCOM_COMMAND_DB for ARCH_QCOM targets to make sure
it doesn't get missed when we need it for a6xx tarets.
Signed-off-by: Jordan Crouse <jcrouse(a)codeaurora.org>
---
…
[View More]drivers/gpu/drm/msm/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 9c37e4d..3b2334b 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -14,11 +14,12 @@ config DRM_MSM
select SHMEM
select TMPFS
select QCOM_SCM if ARCH_QCOM
+ select QCOM_COMMAND_DB if ARCH_QCOM
select WANT_DEV_COREDUMP
select SND_SOC_HDMI_CODEC if SND_SOC
select SYNC_FILE
select PM_OPP
- default y
+ default m
help
DRM/KMS driver for MSM/snapdragon.
--
2.7.4
[View Less]
From: Sean Paul <seanpaul(a)chromium.org>
clk_get_parent returns an error pointer upon failure, not NULL. So the
checks as they exist won't catch a failure. This patch changes the
checks and the return values to properly handle an error pointer.
Fixes: c4d8cfe516dc ("drm/msm/dsi: add implementation for helper functions")
Cc: Sibi Sankar <sibis(a)codeaurora.org>
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Rob Clark <robdclark(a)chromium.org>
Cc: <stable(a)vger.kernel.…
[View More]org> # v4.19+
Signed-off-by: Sean Paul <seanpaul(a)chromium.org>
---
drivers/gpu/drm/msm/dsi/dsi_host.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c
index aa35d18ab43c9..02acb4338721a 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -421,15 +421,15 @@ static int dsi_clk_init(struct msm_dsi_host *msm_host)
}
msm_host->byte_clk_src = clk_get_parent(msm_host->byte_clk);
- if (!msm_host->byte_clk_src) {
- ret = -ENODEV;
+ if (IS_ERR(msm_host->byte_clk_src)) {
+ ret = PTR_ERR(msm_host->byte_clk_src);
pr_err("%s: can't find byte_clk clock. ret=%d\n", __func__, ret);
goto exit;
}
msm_host->pixel_clk_src = clk_get_parent(msm_host->pixel_clk);
- if (!msm_host->pixel_clk_src) {
- ret = -ENODEV;
+ if (IS_ERR(msm_host->pixel_clk_src)) {
+ ret = PTR_ERR(msm_host->pixel_clk_src);
pr_err("%s: can't find pixel_clk clock. ret=%d\n", __func__, ret);
goto exit;
}
--
Sean Paul, Software Engineer, Google / Chromium OS
[View Less]
On Tue, Aug 06, 2019 at 07:05:42PM +0300, Christoph Hellwig wrote:
> There is only a single place where the pgmap is passed over a function
> call, so replace it with local variables in the places where we deal
> with the pgmap.
>
> Signed-off-by: Christoph Hellwig <hch(a)lst.de>
> mm/hmm.c | 62 ++++++++++++++++++++++++--------------------------------
> 1 file changed, 27 insertions(+), 35 deletions(-)
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index …
[View More]9a908902e4cc..d66fa29b42e0 100644
> +++ b/mm/hmm.c
> @@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister);
>
> struct hmm_vma_walk {
> struct hmm_range *range;
> - struct dev_pagemap *pgmap;
> unsigned long last;
> unsigned int flags;
> };
> @@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> struct hmm_vma_walk *hmm_vma_walk = walk->private;
> struct hmm_range *range = hmm_vma_walk->range;
> + struct dev_pagemap *pgmap = NULL;
> unsigned long pfn, npages, i;
> bool fault, write_fault;
> uint64_t cpu_flags;
> @@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> pfn = pmd_pfn(pmd) + pte_index(addr);
> for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
> if (pmd_devmap(pmd)) {
> - hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
> - hmm_vma_walk->pgmap);
> - if (unlikely(!hmm_vma_walk->pgmap))
> + pgmap = get_dev_pagemap(pfn, pgmap);
> + if (unlikely(!pgmap))
> return -EBUSY;
Unrelated to this patch, but what is the point of getting checking
that the pgmap exists for the page and then immediately releasing it?
This code has this pattern in several places.
It feels racy
> }
> pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
> }
> - if (hmm_vma_walk->pgmap) {
> - put_dev_pagemap(hmm_vma_walk->pgmap);
> - hmm_vma_walk->pgmap = NULL;
Putting the value in the hmm_vma_walk would have made some sense to me
if the pgmap was not set to NULL all over the place. Then the most
xa_loads would be eliminated, as I would expect the pgmap tends to be
mostly uniform for these use cases.
Is there some reason the pgmap ref can't be held across
faulting/sleeping? ie like below.
Anyhow, I looked over this pretty carefully and the change looks
functionally OK, I just don't know why the code is like this in the
first place.
Reviewed-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/mm/hmm.c b/mm/hmm.c
index 9a908902e4cc38..4e30128c23a505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
}
pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
}
- if (hmm_vma_walk->pgmap) {
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
hmm_vma_walk->last = end;
return 0;
#else
@@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
return 0;
fault:
- if (hmm_vma_walk->pgmap) {
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
pte_unmap(ptep);
/* Fault any virtual address we were asked to fault */
return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
return r;
}
}
- if (hmm_vma_walk->pgmap) {
- /*
- * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
- * so that we can leverage get_dev_pagemap() optimization which
- * will not re-take a reference on a pgmap if we already have
- * one.
- */
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
pte_unmap(ptep - 1);
hmm_vma_walk->last = addr;
@@ -751,10 +733,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
cpu_flags;
}
- if (hmm_vma_walk->pgmap) {
- put_dev_pagemap(hmm_vma_walk->pgmap);
- hmm_vma_walk->pgmap = NULL;
- }
hmm_vma_walk->last = end;
return 0;
}
@@ -1026,6 +1004,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
/* Keep trying while the range is valid. */
} while (ret == -EBUSY && range->valid);
+ /*
+ * We do put_dev_pagemap() here so that we can leverage
+ * get_dev_pagemap() optimization which will not re-take a
+ * reference on a pgmap if we already have one.
+ */
+ if (hmm_vma_walk->pgmap)
+ put_dev_pagemap(hmm_vma_walk->pgmap);
+
if (ret) {
unsigned long i;
[View Less]
Hi Dave, Daniel,
Fixes for 5.3. Nothing too major bug-wise. I'm reverting the kfd GWS ioctl
that was added this cycle. After working with it for a while the kfd team
decided it wasn't quite right. I should have been stricter with it in the
beginning. Revert it.
The following changes since commit 9c8c9c7cdb4c8fb48a2bc70f41a07920f761d2cd:
Merge tag 'exynos-drm-fixes-for-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes (2019-08-02 17:10:17 +0200)…
[View More]
are available in the Git repository at:
git://people.freedesktop.org/~agd5f/linux tags/drm-fixes-5.3-2019-08-07
for you to fetch changes up to 4b3e30ed3ec7864e798403a63ff2e96bd0c19ab0:
Revert "drm/amdkfd: New IOCTL to allocate queue GWS" (2019-08-07 10:21:38 -0500)
----------------------------------------------------------------
drm-fixes-5.3-2019-08-07:
amdgpu:
- Fixes VCN to handle the latest navi10 firmware
- Fixes for fan control on navi10
- Properly handle SMU metrics table on navi10
- Fix a resume regression on Stoney
amdkfd:
- Revert new GWS ioctl. It's not ready.
----------------------------------------------------------------
Alex Deucher (1):
Revert "drm/amdkfd: New IOCTL to allocate queue GWS"
Evan Quan (1):
drm/amd/powerplay: correct navi10 vcn powergate
Kevin Wang (1):
drm/amd/powerplay: honor hw limit on fetching metrics data for navi10
Likun Gao (1):
drm/amdgpu: pin the csb buffer on hw init for gfx v8
Marek Olšák (1):
Revert "drm/amdgpu: fix transform feedback GDS hang on gfx10 (v2)"
Matt Coffin (1):
drm/amd/powerplay: Allow changing of fan_control in smu_v11_0
Thong Thai (2):
drm/amd/amdgpu/vcn_v2_0: Mark RB commands as KMD commands
drm/amd/amdgpu/vcn_v2_0: Move VCN 2.0 specific dec ring test to vcn_v2_0
drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 12 +---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 40 +++++++++++++
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 44 +++++++++++---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 28 ---------
drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 +-
drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h | 1 +
drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 79 +++++++++++++++++---------
drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 2 +-
include/uapi/linux/kfd_ioctl.h | 20 +------
11 files changed, 138 insertions(+), 94 deletions(-)
[View Less]
On Tue, Aug 06, 2019 at 07:05:38PM +0300, Christoph Hellwig wrote:
>
> Hi Jérôme, Ben, Felix and Jason,
>
> below is a series against the hmm tree which cleans up various minor
> bits and allows HMM_MIRROR to be built on all architectures.
>
> Diffstat:
>
> 11 files changed, 94 insertions(+), 210 deletions(-)
>
> A git tree is also available at:
>
> git://git.infradead.org/users/hch/misc.git hmm-cleanups.2
>
> Gitweb:
>
> …
[View More]http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-cleanup…
>
> Changes since v1:
> - fix the cover letter subject
> - improve various patch descriptions
> - use svmm->mm in nouveau_range_fault
> - inverse the hmask field when using it
> - select HMM_MIRROR instead of making it a user visible option
I think it is straightforward enough to move into -next, so applied to
the hmm.git lets get some more reviewed-bys/tested-by though.
For now I dropped 'remove the pgmap field from struct hmm_vma_walk'
just to hear the followup and 'amdgpu: remove
CONFIG_DRM_AMDGPU_USERPTR' until the AMD team Acks
Thanks,
Jason
[View Less]
On Fri, Aug 02, 2019 at 05:13:30AM -0700, kernelci.org bot wrote:
Today's -next still fails to boot on CM-QS600 with qcom_defconfig:
> qcom_defconfig:
> gcc-8:
> qcom-apq8064-cm-qs600: 1 failed lab
This has been going on since June. It crashes initializing the GPU:
[ 4.261135] adreno 4300000.adreno-3xx: 4300000.adreno-3xx supply vddcx not found, using dummy regulator
[ 4.270254] msm 5100000.mdp: [drm:msm_gpu_init] A320: using IOMMU
[ 4.280025] 8…
[View More]<--- cut here ---
[ 4.285557] Unable to handle kernel paging request at virtual address 40000000
[ 4.288430] pgd = (ptrval)
[ 4.295714] [40000000] *pgd=00000000
[ 4.298329] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[ 4.302054] Modules linked in:
[ 4.307352] CPU: 2 PID: 88 Comm: kworker/2:1 Tainted: G W 5.3.0-rc3-next-20190807 #1
[ 4.310391] Hardware name: Generic DT based system
[ 4.319353] Workqueue: events deferred_probe_work_func
[ 4.319930] usb 1-1: New USB device found, idVendor=04b4, idProduct=6570, bcdDevice=32.99
[ 4.324201] PC is at v7_dma_clean_range+0x1c/0x34
[ 4.324214] LR is at __dma_page_cpu_to_dev+0x28/0x8c
...
[ 4.753642] [] (v7_dma_clean_range) from [] (__dma_page_cpu_to_dev+0x28/0x8c)
[ 4.761795] [] (__dma_page_cpu_to_dev) from [] (arm_dma_sync_sg_for_device+0x4c/0x64)
[ 4.770654] [] (arm_dma_sync_sg_for_device) from [] (get_pages+0x1bc/0x218)
[ 4.780199] [] (get_pages) from [] (msm_gem_get_and_pin_iova+0xb4/0x13c)
[ 4.788704] [] (msm_gem_get_and_pin_iova) from [] (_msm_gem_kernel_new+0x38/0xa8)
[ 4.797386] [] (_msm_gem_kernel_new) from [] (msm_gem_kernel_new+0x24/0x2c)
[ 4.806501] [] (msm_gem_kernel_new) from [] (msm_gpu_init+0x4a4/0x614)
[ 4.815021] [] (msm_gpu_init) from [] (adreno_gpu_init+0x17c/0x288)
[ 4.823342] [] (adreno_gpu_init) from [] (a3xx_gpu_init+0x84/0x108)
[ 4.831239] [] (a3xx_gpu_init) from [] (adreno_bind+0x1c4/0x268)
[ 4.839224] [] (adreno_bind) from [] (component_bind_all+0x11c/0x258)
[ 4.847213] [] (component_bind_all) from [] (msm_drm_bind+0xf8/0x638)
[ 4.855282] [] (msm_drm_bind) from [] (try_to_bring_up_master+0x1fc/0x2b8)
More details including full logs and the image file at:
https://kernelci.org/boot/id/5d4ac1e659b514754b31b293/
[View Less]
On Tue, Aug 6, 2019 at 1:48 AM Christoph Hellwig <hch(a)lst.de> wrote:
>
> This goes in the wrong direction. drm_cflush_* are a bad API we need to
> get rid of, not add use of it. The reason for that is two-fold:
>
> a) it doesn't address how cache maintaince actually works in most
> platforms. When talking about a cache we three fundamental operations:
>
> 1) write back - this writes the content of the cache back to the
> backing …
[View More]memory
> 2) invalidate - this remove the content of the cache
> 3) write back + invalidate - do both of the above
Agreed that drm_cflush_* isn't a great API. In this particular case
(IIUC), I need wb+inv so that there aren't dirty cache lines that drop
out to memory later, and so that I don't get a cache hit on
uncached/wc mmap'ing.
> b) which of the above operation you use when depends on a couple of
> factors of what you want to do with the range you do the cache
> maintainance operations
>
> Take a look at the comment in arch/arc/mm/dma.c around line 30 that
> explains how this applies to buffer ownership management. Note that
> "for device" applies to "for userspace" in the same way, just that
> userspace then also needs to follow this protocol. So the whole idea
> that random driver code calls random low-level cache maintainance
> operations (and use the non-specific term flush to make it all more
> confusing) is a bad idea. Fortunately enough we have really good
> arch helpers for all non-coherent architectures (this excludes the
> magic i915 won't be covered by that, but that is a separate issue
> to be addressed later, and the fact that while arm32 did grew them
> very recently and doesn't expose them for all configs, which is easily
> fixable if needed) with arch_sync_dma_for_device and
> arch_sync_dma_for_cpu. So what we need is to figure out where we
> have valid cases for buffer ownership transfer outside the DMA
> API, and build proper wrappers around the above function for that.
> My guess is it should probably be build to go with the iommu API
> as that is the only other way to map memory for DMA access, but
> if you have a better idea I'd be open to discussion.
Tying it in w/ iommu seems a bit weird to me.. but maybe that is just
me, I'm certainly willing to consider proposals or to try things and
see how they work out.
Exposing the arch_sync_* API and using that directly (bypassing
drm_cflush_*) actually seems pretty reasonable and pragmatic. I did
have one doubt, as phys_to_virt() is only valid for kernel direct
mapped memory (AFAIU), what happens for pages that are not in kernel
linear map? Maybe it is ok to ignore those pages, since they won't
have an aliased mapping?
BR,
-R
[View Less]
On Tue, Aug 06, 2019 at 07:05:47PM +0300, Christoph Hellwig wrote:
> pte_index is an internal arch helper in various architectures,
> without consistent semantics. Open code that calculation of a PMD
> index based on the virtual address instead.
>
> Signed-off-by: Christoph Hellwig <hch(a)lst.de>
> ---
> mm/hmm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
There sure are a lot of different ways to express this, but this one
looks OK to me, at least …
[View More]the switch from the PTRS_PER_PTE expression
in the x86 imlpementation to PMD_MASK looks equivalent
Reviewed-by: Jason Gunthorpe <jgg(a)mellanox.com>
Jason
[View Less]