Hi everyone,
Here is the second batch of patches to add GK20A support to Nouveau. This time we are adding the actual chip support, and this series brings the driver to a point where a slightly-tweaked Mesa successfully runs shaders and renders triangles on GBM! Many thanks to Thierry Reding and the people on the #nouveau IRC channel for their help without which we would not have reached this milestone.
A few lines of hacks (not included here) are still needed to deal with cached mappings triggering external aborts and CPU/GPU memory coherency issues, but I hope to understand and address these issues next.
Most of the changes below have already been seen (and sometimes reviewed) in an earlier patchset. What has been added is proper PGRAPH support (still needing an external firmware and mostly reusing NVE4's code) as well as a better RAM implementation.
How to represent and manage VRAM has been the hardest part to deal with, since GK20A shares the system memory with the CPU without any kind of partition. I have tried various approaches (included some in which no RAM object ever gets instanciated) and finally decided to go for one using DMA-contiguous memory allocations and relying on BAR mappings for kernel access and exposure to user-space, as it fits better with existing code and keeps us safe from most of the CPU/GPU memory coherency issues (at the cost of some performance).
Looking forward to your review of these few patches! :)
Cheers, Alex.
Alexandre Courbot (12): drm/nouveau: fix missing newline drm/nouveau/timer: skip calibration on GK20A drm/nouveau/bar: only ioremap BAR3 if it exists drm/nouveau/bar/nvc0: support chips without BAR3 drm/nouveau/fifo: add GK20A support drm/nouveau/ibus: add GK20A support drm/nouveau/fb: add GK20A support drm/nouveau/graph: enable when using external firmware drm/nouveau/graph: pad firmware code at load time drm/nouveau/graph: add GK20A support drm/nouveau: support GK20A in nouveau_accel_init() drm/nouveau: support for probing GK20A
drivers/gpu/drm/nouveau/Makefile | 5 + drivers/gpu/drm/nouveau/core/engine/device/nve0.c | 20 +++ drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h | 1 + drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c | 35 +++++ .../gpu/drm/nouveau/core/engine/graph/ctxnve4.c | 4 +- drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c | 12 +- drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h | 9 ++ drivers/gpu/drm/nouveau/core/engine/graph/nve4.c | 2 +- drivers/gpu/drm/nouveau/core/engine/graph/nvea.c | 75 +++++++++ drivers/gpu/drm/nouveau/core/include/engine/fifo.h | 1 + .../gpu/drm/nouveau/core/include/engine/graph.h | 1 + drivers/gpu/drm/nouveau/core/include/subdev/fb.h | 1 + drivers/gpu/drm/nouveau/core/include/subdev/ibus.h | 1 + drivers/gpu/drm/nouveau/core/subdev/bar/base.c | 7 +- drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c | 101 +++++++------ drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c | 56 +++++++ drivers/gpu/drm/nouveau/core/subdev/fb/priv.h | 1 + drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c | 168 +++++++++++++++++++++ drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c | 110 ++++++++++++++ drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 ++- drivers/gpu/drm/nouveau/nouveau_drm.c | 12 +- 21 files changed, 578 insertions(+), 63 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c create mode 100644 drivers/gpu/drm/nouveau/core/engine/graph/nvea.c create mode 100644 drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c create mode 100644 drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c create mode 100644 drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c
Add a missing newline at the end of a DRM_INFO message.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index b2a674531fba..ef27949057c3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -322,7 +322,7 @@ nouveau_get_hdmi_dev(struct drm_device *dev) struct pci_dev *pdev = dev->pdev;
if (!pdev) { - DRM_INFO("not a PCI device; no HDMI"); + DRM_INFO("not a PCI device; no HDMI\n"); drm->hdmi_device = NULL; return; }
On Mon, Mar 24, 2014 at 05:42:23PM +0900, Alexandre Courbot wrote:
Add a missing newline at the end of a DRM_INFO message.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/nouveau_drm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Thierry Reding treding@nvidia.com
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
+ /* gk20a does not have the calibration registers */ + if (device->chipset == 0xea) + goto skip_clk_init; + /* aim for 31.25MHz, which gives us nanosecond timestamps */ d = 1000000 / 32;
@@ -235,20 +239,23 @@ nv04_timer_init(struct nouveau_object *object) d >>= 1; }
- /* restore the time before suspend */ - lo = priv->suspend_time; - hi = (priv->suspend_time >> 32); - nv_debug(priv, "input frequency : %dHz\n", f); nv_debug(priv, "input multiplier: %d\n", m); nv_debug(priv, "numerator : 0x%08x\n", n); nv_debug(priv, "denominator : 0x%08x\n", d); nv_debug(priv, "timer frequency : %dHz\n", (f * m) * d / n); - nv_debug(priv, "time low : 0x%08x\n", lo); - nv_debug(priv, "time high : 0x%08x\n", hi);
nv_wr32(priv, NV04_PTIMER_NUMERATOR, n); nv_wr32(priv, NV04_PTIMER_DENOMINATOR, d); + +skip_clk_init: + /* restore the time before suspend */ + lo = priv->suspend_time; + hi = (priv->suspend_time >> 32); + + nv_debug(priv, "time low : 0x%08x\n", lo); + nv_debug(priv, "time high : 0x%08x\n", hi); + nv_wr32(priv, NV04_PTIMER_INTR_0, 0xffffffff); nv_wr32(priv, NV04_PTIMER_INTR_EN_0, 0x00000000); nv_wr32(priv, NV04_PTIMER_TIME_1, hi);
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote:
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
- /* gk20a does not have the calibration registers */
- if (device->chipset == 0xea)
goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
Thierry
On Tue, Mar 25, 2014 at 7:54 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote:
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
/* gk20a does not have the calibration registers */
if (device->chipset == 0xea)
goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
I've already handled this in a slightly different way in the tree I'd previously pointed Alex at (I think!), as I needed to do the same for GM107.
Should just be able to use that implementation (so, just change the probe patch) here too.
Thanks, Ben.
Thierry
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Mar 26, 2014 at 1:19 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:54 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote:
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
/* gk20a does not have the calibration registers */
if (device->chipset == 0xea)
goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
I've already handled this in a slightly different way in the tree I'd previously pointed Alex at (I think!), as I needed to do the same for GM107.
Should just be able to use that implementation (so, just change the probe patch) here too.
I will skip this patch and use your implementation then. Btw, shouldn't the source file for the GK20A implementation be named nvea.c instead of gk20a.c?
On Fri, Apr 11, 2014 at 12:46 PM, Alexandre Courbot gnurou@gmail.com wrote:
On Wed, Mar 26, 2014 at 1:19 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:54 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote:
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
/* gk20a does not have the calibration registers */
if (device->chipset == 0xea)
goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
I've already handled this in a slightly different way in the tree I'd previously pointed Alex at (I think!), as I needed to do the same for GM107.
Should just be able to use that implementation (so, just change the probe patch) here too.
I will skip this patch and use your implementation then. Btw, shouldn't the source file for the GK20A implementation be named nvea.c instead of gk20a.c?
For the Maxwell stuff I've been using "gm107" now too. Since we're working with you guys these days it seems better to use the same names for things ;)
On 04/11/2014 04:31 PM, Ben Skeggs wrote:
On Fri, Apr 11, 2014 at 12:46 PM, Alexandre Courbot gnurou@gmail.com wrote:
On Wed, Mar 26, 2014 at 1:19 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:54 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote:
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
/* gk20a does not have the calibration registers */
if (device->chipset == 0xea)
goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
I've already handled this in a slightly different way in the tree I'd previously pointed Alex at (I think!), as I needed to do the same for GM107.
Should just be able to use that implementation (so, just change the probe patch) here too.
I will skip this patch and use your implementation then. Btw, shouldn't the source file for the GK20A implementation be named nvea.c instead of gk20a.c?
For the Maxwell stuff I've been using "gm107" now too. Since we're working with you guys these days it seems better to use the same names for things ;)
So would you like us to use the same naming scheme as well? So far all my patches use "nvea.c" whenever I need to add code.
On Fri, Apr 11, 2014 at 5:34 PM, Alexandre Courbot acourbot@nvidia.com wrote:
On 04/11/2014 04:31 PM, Ben Skeggs wrote:
On Fri, Apr 11, 2014 at 12:46 PM, Alexandre Courbot gnurou@gmail.com wrote:
On Wed, Mar 26, 2014 at 1:19 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:54 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote:
GK20A's timer is directly attached to the system timer and cannot be calibrated. Skip the calibration phase on that chip since the corresponding registers do not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c index c0bdd10358d7..822fe0d8a871 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) if (ret) return ret;
/* gk20a does not have the calibration registers */
if (device->chipset == 0xea)
goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
I've already handled this in a slightly different way in the tree I'd previously pointed Alex at (I think!), as I needed to do the same for GM107.
Should just be able to use that implementation (so, just change the probe patch) here too.
I will skip this patch and use your implementation then. Btw, shouldn't the source file for the GK20A implementation be named nvea.c instead of gk20a.c?
For the Maxwell stuff I've been using "gm107" now too. Since we're working with you guys these days it seems better to use the same names for things ;)
So would you like us to use the same naming scheme as well? So far all my patches use "nvea.c" whenever I need to add code.
If it's not too much of a problem at this point, then that'd be good. Right before I send -next for the next merge window I'll likely do a mass rename anyway, so if we can get your patches merged before then (which would be really good!), it doesn't matter much.
Ben.
On Mon, Apr 14, 2014 at 5:35 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Fri, Apr 11, 2014 at 5:34 PM, Alexandre Courbot acourbot@nvidia.com wrote:
On 04/11/2014 04:31 PM, Ben Skeggs wrote:
On Fri, Apr 11, 2014 at 12:46 PM, Alexandre Courbot gnurou@gmail.com wrote:
On Wed, Mar 26, 2014 at 1:19 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:54 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:24PM +0900, Alexandre Courbot wrote: > > GK20A's timer is directly attached to the system timer and cannot be > calibrated. Skip the calibration phase on that chip since the > corresponding registers do not exist. > > Signed-off-by: Alexandre Courbot acourbot@nvidia.com > --- > drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c | 19 > +++++++++++++------ > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c > b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c > index c0bdd10358d7..822fe0d8a871 100644 > --- a/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c > +++ b/drivers/gpu/drm/nouveau/core/subdev/timer/nv04.c > @@ -185,6 +185,10 @@ nv04_timer_init(struct nouveau_object *object) > if (ret) > return ret; > > + /* gk20a does not have the calibration registers */ > + if (device->chipset == 0xea) > + goto skip_clk_init;
I'm concerned that this won't scale in the future. Perhaps a better solution would be to add a "flags" or "features" field to struct nouveau_device along with feature bits such as HAS_CALIBRATION or similar.
That way we don't have to touch this code for every new future Tegra chip. Unless perhaps if there's a reason to expect things to change in newer generations.
I've already handled this in a slightly different way in the tree I'd previously pointed Alex at (I think!), as I needed to do the same for GM107.
Should just be able to use that implementation (so, just change the probe patch) here too.
I will skip this patch and use your implementation then. Btw, shouldn't the source file for the GK20A implementation be named nvea.c instead of gk20a.c?
For the Maxwell stuff I've been using "gm107" now too. Since we're working with you guys these days it seems better to use the same names for things ;)
So would you like us to use the same naming scheme as well? So far all my patches use "nvea.c" whenever I need to add code.
If it's not too much of a problem at this point, then that'd be good. Right before I send -next for the next merge window I'll likely do a mass rename anyway, so if we can get your patches merged before then (which would be really good!), it doesn't matter much.
No problem, I will update the naming to follow what you did with the timer driver and gm107.
Hopefully I will soon manage to carve out some time to rebase these patches and send v2.
Some chips that use system memory exclusively (e.g. GK20A) do not expose 2 BAR regions. For them only BAR1 exists, and it should be used for USERD mapping. Do not map BAR3 if its resource does not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/core/subdev/bar/base.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bar/base.c b/drivers/gpu/drm/nouveau/core/subdev/bar/base.c index bdf594116f3f..d713eeb75b13 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/bar/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/bar/base.c @@ -110,6 +110,7 @@ nouveau_bar_create_(struct nouveau_object *parent, { struct nouveau_device *device = nv_device(parent); struct nouveau_bar *bar; + bool has_bar3 = nv_device_resource_len(device, 3) != 0; int ret;
ret = nouveau_subdev_create_(parent, engine, oclass, 0, "BARCTL", @@ -118,8 +119,10 @@ nouveau_bar_create_(struct nouveau_object *parent, if (ret) return ret;
- bar->iomem = ioremap(nv_device_resource_start(device, 3), - nv_device_resource_len(device, 3)); + if (has_bar3) + bar->iomem = ioremap(nv_device_resource_start(device, 3), + nv_device_resource_len(device, 3)); + return 0; }
On Mon, Mar 24, 2014 at 05:42:25PM +0900, Alexandre Courbot wrote:
Some chips that use system memory exclusively (e.g. GK20A) do not expose 2 BAR regions. For them only BAR1 exists, and it should be used for USERD mapping. Do not map BAR3 if its resource does not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/bar/base.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bar/base.c b/drivers/gpu/drm/nouveau/core/subdev/bar/base.c index bdf594116f3f..d713eeb75b13 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/bar/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/bar/base.c @@ -110,6 +110,7 @@ nouveau_bar_create_(struct nouveau_object *parent, { struct nouveau_device *device = nv_device(parent); struct nouveau_bar *bar;
- bool has_bar3 = nv_device_resource_len(device, 3) != 0;
I don't think this variable is really necessary since you only use the expression once anyway, but I don't feel very strongly about it, so either way:
Reviewed-by: Thierry Reding treding@nvidia.com
On Tue, Mar 25, 2014 at 8:13 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:25PM +0900, Alexandre Courbot wrote:
Some chips that use system memory exclusively (e.g. GK20A) do not expose 2 BAR regions. For them only BAR1 exists, and it should be used for USERD mapping. Do not map BAR3 if its resource does not exist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/subdev/bar/base.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bar/base.c b/drivers/gpu/drm/nouveau/core/subdev/bar/base.c index bdf594116f3f..d713eeb75b13 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/bar/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/bar/base.c @@ -110,6 +110,7 @@ nouveau_bar_create_(struct nouveau_object *parent, { struct nouveau_device *device = nv_device(parent); struct nouveau_bar *bar;
bool has_bar3 = nv_device_resource_len(device, 3) != 0;
I don't think this variable is really necessary since you only use the expression once anyway, but I don't feel very strongly about it, so either way:
I'd also prefer it folded in, but it's not a big deal. A couple of other upcoming comments will require a re-spin anyway, so we may as well :)
Reviewed-by: Thierry Reding treding@nvidia.com
Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Adapt the NVC0 BAR driver to make it able to support chips that do not expose a BAR3. When this happens, BAR1 is then used for USERD mapping and the BAR alloc() functions is disabled, making GPU objects unable to rely on BAR for data access and falling back to PRAMIN.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c | 101 +++++++++++++------------ 1 file changed, 52 insertions(+), 49 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c b/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c index 3f30db62e656..5da1b9447af0 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c @@ -79,87 +79,88 @@ nvc0_bar_unmap(struct nouveau_bar *bar, struct nouveau_vma *vma) }
static int -nvc0_bar_ctor(struct nouveau_object *parent, struct nouveau_object *engine, - struct nouveau_oclass *oclass, void *data, u32 size, - struct nouveau_object **pobject) +nvc0_bar_init_vm(struct nvc0_bar_priv *priv, int nr, int bar) { - struct nouveau_device *device = nv_device(parent); - struct nvc0_bar_priv *priv; + struct nouveau_device *device = nv_device(&priv->base); struct nouveau_gpuobj *mem; struct nouveau_vm *vm; + resource_size_t bar_len; int ret;
- ret = nouveau_bar_create(parent, engine, oclass, &priv); - *pobject = nv_object(priv); - if (ret) - return ret; - - /* BAR3 */ ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x1000, 0, 0, - &priv->bar[0].mem); - mem = priv->bar[0].mem; + &priv->bar[nr].mem); + mem = priv->bar[nr].mem; if (ret) return ret;
ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x8000, 0, 0, - &priv->bar[0].pgd); + &priv->bar[nr].pgd); if (ret) return ret;
- ret = nouveau_vm_new(device, 0, nv_device_resource_len(device, 3), 0, &vm); + bar_len = nv_device_resource_len(device, bar); + + ret = nouveau_vm_new(device, 0, bar_len, 0, &vm); if (ret) return ret;
atomic_inc(&vm->engref[NVDEV_SUBDEV_BAR]);
- ret = nouveau_gpuobj_new(nv_object(priv), NULL, - (nv_device_resource_len(device, 3) >> 12) * 8, - 0x1000, NVOBJ_FLAG_ZERO_ALLOC, - &vm->pgt[0].obj[0]); - vm->pgt[0].refcount[0] = 1; - if (ret) - return ret; + /* + * Bootstrap page table lookup. + */ + if (bar == 3) { + ret = nouveau_gpuobj_new(nv_object(priv), NULL, + (bar_len >> 12) * 8, 0x1000, + NVOBJ_FLAG_ZERO_ALLOC, + &vm->pgt[0].obj[0]); + vm->pgt[0].refcount[0] = 1; + if (ret) + return ret; + }
- ret = nouveau_vm_ref(vm, &priv->bar[0].vm, priv->bar[0].pgd); + ret = nouveau_vm_ref(vm, &priv->bar[nr].vm, priv->bar[nr].pgd); nouveau_vm_ref(NULL, &vm, NULL); if (ret) return ret;
- nv_wo32(mem, 0x0200, lower_32_bits(priv->bar[0].pgd->addr)); - nv_wo32(mem, 0x0204, upper_32_bits(priv->bar[0].pgd->addr)); - nv_wo32(mem, 0x0208, lower_32_bits(nv_device_resource_len(device, 3) - 1)); - nv_wo32(mem, 0x020c, upper_32_bits(nv_device_resource_len(device, 3) - 1)); + nv_wo32(mem, 0x0200, lower_32_bits(priv->bar[nr].pgd->addr)); + nv_wo32(mem, 0x0204, upper_32_bits(priv->bar[nr].pgd->addr)); + nv_wo32(mem, 0x0208, lower_32_bits(bar_len - 1)); + nv_wo32(mem, 0x020c, upper_32_bits(bar_len - 1));
- /* BAR1 */ - ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x1000, 0, 0, - &priv->bar[1].mem); - mem = priv->bar[1].mem; - if (ret) - return ret; + return 0; +}
- ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x8000, 0, 0, - &priv->bar[1].pgd); - if (ret) - return ret; +static int +nvc0_bar_ctor(struct nouveau_object *parent, struct nouveau_object *engine, + struct nouveau_oclass *oclass, void *data, u32 size, + struct nouveau_object **pobject) +{ + struct nouveau_device *device = nv_device(parent); + struct nvc0_bar_priv *priv; + bool has_bar3 = nv_device_resource_len(device, 3) != 0; + int ret;
- ret = nouveau_vm_new(device, 0, nv_device_resource_len(device, 1), 0, &vm); + ret = nouveau_bar_create(parent, engine, oclass, &priv); + *pobject = nv_object(priv); if (ret) return ret;
- atomic_inc(&vm->engref[NVDEV_SUBDEV_BAR]); + /* BAR3 */ + if (has_bar3) { + ret = nvc0_bar_init_vm(priv, 0, 3); + if (ret) + return ret; + priv->base.alloc = nouveau_bar_alloc; + priv->base.kmap = nvc0_bar_kmap; + }
- ret = nouveau_vm_ref(vm, &priv->bar[1].vm, priv->bar[1].pgd); - nouveau_vm_ref(NULL, &vm, NULL); + /* BAR1 */ + ret = nvc0_bar_init_vm(priv, 1, 1); if (ret) return ret;
- nv_wo32(mem, 0x0200, lower_32_bits(priv->bar[1].pgd->addr)); - nv_wo32(mem, 0x0204, upper_32_bits(priv->bar[1].pgd->addr)); - nv_wo32(mem, 0x0208, lower_32_bits(nv_device_resource_len(device, 1) - 1)); - nv_wo32(mem, 0x020c, upper_32_bits(nv_device_resource_len(device, 1) - 1)); - - priv->base.alloc = nouveau_bar_alloc; - priv->base.kmap = nvc0_bar_kmap; priv->base.umap = nvc0_bar_umap; priv->base.unmap = nvc0_bar_unmap; priv->base.flush = nv84_bar_flush; @@ -201,7 +202,9 @@ nvc0_bar_init(struct nouveau_object *object) nv_mask(priv, 0x100c80, 0x00000001, 0x00000000);
nv_wr32(priv, 0x001704, 0x80000000 | priv->bar[1].mem->addr >> 12); - nv_wr32(priv, 0x001714, 0xc0000000 | priv->bar[0].mem->addr >> 12); + if (priv->bar[0].mem) + nv_wr32(priv, 0x001714, + 0xc0000000 | priv->bar[0].mem->addr >> 12); return 0; }
On Mon, Mar 24, 2014 at 05:42:26PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c b/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c
[...]
static int -nvc0_bar_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
struct nouveau_oclass *oclass, void *data, u32 size,
struct nouveau_object **pobject)
+nvc0_bar_init_vm(struct nvc0_bar_priv *priv, int nr, int bar) {
[...]
- /* BAR3 */ ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x1000, 0, 0,
&priv->bar[0].mem);
- mem = priv->bar[0].mem;
&priv->bar[nr].mem);
mem = priv->bar[nr].mem; if (ret) return ret;
ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x8000, 0, 0,
&priv->bar[0].pgd);
if (ret) return ret;&priv->bar[nr].pgd);
[...]
+static int +nvc0_bar_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
struct nouveau_oclass *oclass, void *data, u32 size,
struct nouveau_object **pobject)
+{
[...]
- /* BAR3 */
- if (has_bar3) {
ret = nvc0_bar_init_vm(priv, 0, 3);
[...]
- /* BAR1 */
- ret = nvc0_bar_init_vm(priv, 1, 1); if (ret) return ret;
The calls to nvc0_bar_init_vm() are somewhat confusing in my opinion. It is hard to see from the invocation what these numbers mean and therefore distinguish which parameter is which.
Perhaps a slightly more readable way would be to pass in a pointer to a structure as second parameter instead of the index into an array. So it'd look somewhat like this:
if (has_bar3) { ret = nvc0_bar_init_vm(priv, &priv->bar[0], 3); ... } ... ret = nvc0_bar_init_vm(priv, &priv->bar[1], 1); ...
Unfortunately that would require a new type to be created for the bar[] structures, so it'd be slightly more intrusive.
Thierry
On Tue, Mar 25, 2014 at 7:10 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:26PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c b/drivers/gpu/drm/nouveau/core/subdev/bar/nvc0.c
[...]
static int -nvc0_bar_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
struct nouveau_oclass *oclass, void *data, u32 size,
struct nouveau_object **pobject)
+nvc0_bar_init_vm(struct nvc0_bar_priv *priv, int nr, int bar) {
[...]
/* BAR3 */ ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x1000, 0, 0,
&priv->bar[0].mem);
mem = priv->bar[0].mem;
&priv->bar[nr].mem);
mem = priv->bar[nr].mem; if (ret) return ret; ret = nouveau_gpuobj_new(nv_object(priv), NULL, 0x8000, 0, 0,
&priv->bar[0].pgd);
&priv->bar[nr].pgd); if (ret) return ret;
[...]
+static int +nvc0_bar_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
struct nouveau_oclass *oclass, void *data, u32 size,
struct nouveau_object **pobject)
+{
[...]
/* BAR3 */
if (has_bar3) {
ret = nvc0_bar_init_vm(priv, 0, 3);
[...]
/* BAR1 */
ret = nvc0_bar_init_vm(priv, 1, 1); if (ret) return ret;
The calls to nvc0_bar_init_vm() are somewhat confusing in my opinion. It is hard to see from the invocation what these numbers mean and therefore distinguish which parameter is which.
Perhaps a slightly more readable way would be to pass in a pointer to a structure as second parameter instead of the index into an array. So it'd look somewhat like this:
if (has_bar3) { ret = nvc0_bar_init_vm(priv, &priv->bar[0], 3); ... } ... ret = nvc0_bar_init_vm(priv, &priv->bar[1], 1); ...
Unfortunately that would require a new type to be created for the bar[] structures, so it'd be slightly more intrusive.
These types are local to nvc0.c anyway, so I don't think it would hurt. And you are right that the code would become more readable as a result, passing array indexes as arguments is not a common practice (and should not be).
Alex.
GK20A's FIFO is compatible with NVE0, but only features 128 channels and 1 runlist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/Makefile | 1 + drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h | 1 + drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c | 35 ++++++++++++++++++++++ drivers/gpu/drm/nouveau/core/include/engine/fifo.h | 1 + 4 files changed, 38 insertions(+) create mode 100644 drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c
diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index d310c195bdfe..a90087bbdf88 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -237,6 +237,7 @@ nouveau-y += core/engine/fifo/nv50.o nouveau-y += core/engine/fifo/nv84.o nouveau-y += core/engine/fifo/nvc0.o nouveau-y += core/engine/fifo/nve0.o +nouveau-y += core/engine/fifo/nvea.o nouveau-y += core/engine/fifo/nv108.o nouveau-y += core/engine/graph/ctxnv40.o nouveau-y += core/engine/graph/ctxnv50.o diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h b/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h index 014344ebee66..e96b32bb1bbc 100644 --- a/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h +++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h @@ -8,6 +8,7 @@ int nve0_fifo_ctor(struct nouveau_object *, struct nouveau_object *, struct nouveau_object **); void nve0_fifo_dtor(struct nouveau_object *); int nve0_fifo_init(struct nouveau_object *); +int nve0_fifo_fini(struct nouveau_object *, bool);
struct nve0_fifo_impl { struct nouveau_oclass base; diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c b/drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c new file mode 100644 index 000000000000..5e3a9514df5d --- /dev/null +++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c @@ -0,0 +1,35 @@ +/* + * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include "nve0.h" + +struct nouveau_oclass * +nvea_fifo_oclass = &(struct nve0_fifo_impl) { + .base.handle = NV_ENGINE(FIFO, 0xea), + .base.ofuncs = &(struct nouveau_ofuncs) { + .ctor = nve0_fifo_ctor, + .dtor = nve0_fifo_dtor, + .init = nve0_fifo_init, + .fini = nve0_fifo_fini, + }, + .channels = 128, +}.base; diff --git a/drivers/gpu/drm/nouveau/core/include/engine/fifo.h b/drivers/gpu/drm/nouveau/core/include/engine/fifo.h index 26b6b2bb1112..823356f45137 100644 --- a/drivers/gpu/drm/nouveau/core/include/engine/fifo.h +++ b/drivers/gpu/drm/nouveau/core/include/engine/fifo.h @@ -109,6 +109,7 @@ extern struct nouveau_oclass *nv50_fifo_oclass; extern struct nouveau_oclass *nv84_fifo_oclass; extern struct nouveau_oclass *nvc0_fifo_oclass; extern struct nouveau_oclass *nve0_fifo_oclass; +extern struct nouveau_oclass *nvea_fifo_oclass; extern struct nouveau_oclass *nv108_fifo_oclass;
void nv04_fifo_intr(struct nouveau_subdev *);
On Mon, Mar 24, 2014 at 05:42:27PM +0900, Alexandre Courbot wrote:
GK20A's FIFO is compatible with NVE0, but only features 128 channels and 1 runlist.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/Makefile | 1 + drivers/gpu/drm/nouveau/core/engine/fifo/nve0.h | 1 + drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c | 35 ++++++++++++++++++++++ drivers/gpu/drm/nouveau/core/include/engine/fifo.h | 1 + 4 files changed, 38 insertions(+) create mode 100644 drivers/gpu/drm/nouveau/core/engine/fifo/nvea.c
Looks good to me:
Reviewed-by: Thierry Reding treding@nvidia.com
Add support for initializing the priv ring of GK20A. This is done by the BIOS on desktop GPUs, but needs to be done by hand on Tegra.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/Makefile | 1 + drivers/gpu/drm/nouveau/core/include/subdev/ibus.h | 1 + drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c | 110 +++++++++++++++++++++ 3 files changed, 112 insertions(+) create mode 100644 drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c
diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index a90087bbdf88..592141e62dda 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -132,6 +132,7 @@ nouveau-y += core/subdev/i2c/nv94.o nouveau-y += core/subdev/i2c/nvd0.o nouveau-y += core/subdev/ibus/nvc0.o nouveau-y += core/subdev/ibus/nve0.o +nouveau-y += core/subdev/ibus/nvea.o nouveau-y += core/subdev/instmem/base.o nouveau-y += core/subdev/instmem/nv04.o nouveau-y += core/subdev/instmem/nv40.o diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/ibus.h b/drivers/gpu/drm/nouveau/core/include/subdev/ibus.h index 88814f159d89..056a42f92705 100644 --- a/drivers/gpu/drm/nouveau/core/include/subdev/ibus.h +++ b/drivers/gpu/drm/nouveau/core/include/subdev/ibus.h @@ -30,5 +30,6 @@ nouveau_ibus(void *obj)
extern struct nouveau_oclass nvc0_ibus_oclass; extern struct nouveau_oclass nve0_ibus_oclass; +extern struct nouveau_oclass nvea_ibus_oclass;
#endif diff --git a/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c b/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c new file mode 100644 index 000000000000..151851286e99 --- /dev/null +++ b/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c @@ -0,0 +1,110 @@ +/* + * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include <subdev/ibus.h> + +struct nvea_ibus_priv { + struct nouveau_ibus base; +}; + +static void +nvea_ibus_init_priv_ring(struct nvea_ibus_priv *priv) +{ + nv_mask(priv, 0x137250, 0x3f, 0); + + nv_mask(priv, 0x000200, 0x20, 0); + udelay(20); + nv_mask(priv, 0x000200, 0x20, 0x20); + + nv_wr32(priv, 0x12004c, 0x4); + nv_wr32(priv, 0x122204, 0x2); + nv_rd32(priv, 0x122204); +} + +static void +nvea_ibus_intr(struct nouveau_subdev *subdev) +{ + struct nvea_ibus_priv *priv = (void *)subdev; + u32 status0 = nv_rd32(priv, 0x120058); + s32 retry = 100; + u32 command; + + if (status0 & 0x7) { + nv_debug(priv, "resetting priv ring\n"); + nvea_ibus_init_priv_ring(priv); + } + + /* Acknowledge interrupt */ + nv_mask(priv, 0x12004c, 0x2, 0x2); + + while (--retry >= 0) { + command = nv_rd32(priv, 0x12004c) & 0x3f; + if (command == 0) + break; + } + + if (retry < 0) + nv_warn(priv, "timeout waiting for ringmaster ack\n"); +} + +static int +nvea_ibus_init(struct nouveau_object *object) +{ + struct nvea_ibus_priv *priv = (void *)object; + int ret; + + ret = _nouveau_ibus_init(object); + if (ret) + return ret; + + nvea_ibus_init_priv_ring(priv); + + return 0; +} + +static int +nvea_ibus_ctor(struct nouveau_object *parent, struct nouveau_object *engine, + struct nouveau_oclass *oclass, void *data, u32 size, + struct nouveau_object **pobject) +{ + struct nvea_ibus_priv *priv; + int ret; + + ret = nouveau_ibus_create(parent, engine, oclass, &priv); + *pobject = nv_object(priv); + if (ret) + return ret; + + nv_subdev(priv)->intr = nvea_ibus_intr; + return 0; +} + +struct nouveau_oclass +nvea_ibus_oclass = { + .handle = NV_SUBDEV(IBUS, 0xea), + .ofuncs = &(struct nouveau_ofuncs) { + .ctor = nvea_ibus_ctor, + .dtor = _nouveau_ibus_dtor, + .init = nvea_ibus_init, + .fini = _nouveau_ibus_fini, + }, +};
On Mon, Mar 24, 2014 at 05:42:28PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c b/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c
[...]
+#include <subdev/ibus.h>
+struct nvea_ibus_priv {
- struct nouveau_ibus base;
+};
+static void +nvea_ibus_init_priv_ring(struct nvea_ibus_priv *priv) +{
- nv_mask(priv, 0x137250, 0x3f, 0);
- nv_mask(priv, 0x000200, 0x20, 0);
- udelay(20);
usleep_range()?
+static void +nvea_ibus_intr(struct nouveau_subdev *subdev) +{
[...]
- /* Acknowledge interrupt */
- nv_mask(priv, 0x12004c, 0x2, 0x2);
- while (--retry >= 0) {
command = nv_rd32(priv, 0x12004c) & 0x3f;
if (command == 0)
break;
- }
- if (retry < 0)
nv_warn(priv, "timeout waiting for ringmaster ack\n");
+}
Perhaps I'm being paranoid, but this loop now depends on the frequency of the various clocks involved and therefore might break at some point if the frequencies get sufficiently high.
So a slightly safer implementation would use a proper timeout using a combination of msecs_to_jiffies(), time_before() and usleep_range(), like so:
timeout = jiffies + msecs_to_jiffies(...);
while (time_before(jiffies, timeout)) { command = nv_rd32(...) & 0x3f; if (command == 0) break;
usleep_range(...); }
if (time_after(jiffies, timeout)) nv_warn(...);
This assumes that there's some known timeout after which the ringmaster is expected to have acked the interrupt. On that note, I wonder if the warning is accurate here: it's my understanding that writing 0x2 to the register does acknowledge the interrupt, so the ringmaster does in fact "clear" it rather than "acknowledge" it, doesn't it?
Although now that I mention it I seem to remember that this write is actually sending a command to the ring master and perhaps waiting for the register to return to 0 is indeed waiting for an ACK of sorts. Maybe adding a comment or so describing what this sequence does would be appropriate here?
Thierry
On Tue, Mar 25, 2014 at 7:34 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:28PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c b/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c
[...]
+#include <subdev/ibus.h>
+struct nvea_ibus_priv {
struct nouveau_ibus base;
+};
+static void +nvea_ibus_init_priv_ring(struct nvea_ibus_priv *priv) +{
nv_mask(priv, 0x137250, 0x3f, 0);
nv_mask(priv, 0x000200, 0x20, 0);
udelay(20);
usleep_range()?
Sure.
+static void +nvea_ibus_intr(struct nouveau_subdev *subdev) +{
[...]
/* Acknowledge interrupt */
nv_mask(priv, 0x12004c, 0x2, 0x2);
while (--retry >= 0) {
command = nv_rd32(priv, 0x12004c) & 0x3f;
if (command == 0)
break;
}
if (retry < 0)
nv_warn(priv, "timeout waiting for ringmaster ack\n");
+}
Perhaps I'm being paranoid, but this loop now depends on the frequency of the various clocks involved and therefore might break at some point if the frequencies get sufficiently high.
So a slightly safer implementation would use a proper timeout using a combination of msecs_to_jiffies(), time_before() and usleep_range(), like so:
timeout = jiffies + msecs_to_jiffies(...); while (time_before(jiffies, timeout)) { command = nv_rd32(...) & 0x3f; if (command == 0) break; usleep_range(...); } if (time_after(jiffies, timeout)) nv_warn(...);
Right, now that I look at this code again I don't even understand why I left it this way. Maybe I left some early test code slip into the final patch, sorry about that.
This assumes that there's some known timeout after which the ringmaster is expected to have acked the interrupt. On that note, I wonder if the warning is accurate here: it's my understanding that writing 0x2 to the register does acknowledge the interrupt, so the ringmaster does in fact "clear" it rather than "acknowledge" it, doesn't it?
Although now that I mention it I seem to remember that this write is actually sending a command to the ring master and perhaps waiting for the register to return to 0 is indeed waiting for an ACK of sorts. Maybe adding a comment or so describing what this sequence does would be appropriate here?
Can we from an IP point of view? AFAIK this sequence has never been publicly documented.
On Wed, Apr 2, 2014 at 9:52 AM, Alexandre Courbot gnurou@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:34 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:28PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c b/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c
[...]
+#include <subdev/ibus.h>
+struct nvea_ibus_priv {
struct nouveau_ibus base;
+};
+static void +nvea_ibus_init_priv_ring(struct nvea_ibus_priv *priv) +{
nv_mask(priv, 0x137250, 0x3f, 0);
nv_mask(priv, 0x000200, 0x20, 0);
udelay(20);
usleep_range()?
Sure.
+static void +nvea_ibus_intr(struct nouveau_subdev *subdev) +{
[...]
/* Acknowledge interrupt */
nv_mask(priv, 0x12004c, 0x2, 0x2);
while (--retry >= 0) {
command = nv_rd32(priv, 0x12004c) & 0x3f;
if (command == 0)
break;
}
if (retry < 0)
nv_warn(priv, "timeout waiting for ringmaster ack\n");
+}
Perhaps I'm being paranoid, but this loop now depends on the frequency of the various clocks involved and therefore might break at some point if the frequencies get sufficiently high.
So a slightly safer implementation would use a proper timeout using a combination of msecs_to_jiffies(), time_before() and usleep_range(), like so:
timeout = jiffies + msecs_to_jiffies(...); while (time_before(jiffies, timeout)) { command = nv_rd32(...) & 0x3f; if (command == 0) break; usleep_range(...); } if (time_after(jiffies, timeout)) nv_warn(...);
Right, now that I look at this code again I don't even understand why I left it this way. Maybe I left some early test code slip into the final patch, sorry about that.
I just remembered about this, but there's also the nv_wait() macro, which you could use, e.g.
if (!nv_wait(subdev, 0x12004c, 0x3f, 0x00)) nv_warn(...)
It has built-in timeout logic/etc (although no sleeps in the middle). It does use the timer subdev, so if that's not operational at this point, you can't use it.
This assumes that there's some known timeout after which the ringmaster is expected to have acked the interrupt. On that note, I wonder if the warning is accurate here: it's my understanding that writing 0x2 to the register does acknowledge the interrupt, so the ringmaster does in fact "clear" it rather than "acknowledge" it, doesn't it?
Although now that I mention it I seem to remember that this write is actually sending a command to the ring master and perhaps waiting for the register to return to 0 is indeed waiting for an ACK of sorts. Maybe adding a comment or so describing what this sequence does would be appropriate here?
Can we from an IP point of view? AFAIK this sequence has never been publicly documented. _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
On Wed, Apr 2, 2014 at 11:18 PM, Ilia Mirkin imirkin@alum.mit.edu wrote:
On Wed, Apr 2, 2014 at 9:52 AM, Alexandre Courbot gnurou@gmail.com wrote:
On Tue, Mar 25, 2014 at 7:34 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:28PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c b/drivers/gpu/drm/nouveau/core/subdev/ibus/nvea.c
[...]
+#include <subdev/ibus.h>
+struct nvea_ibus_priv {
struct nouveau_ibus base;
+};
+static void +nvea_ibus_init_priv_ring(struct nvea_ibus_priv *priv) +{
nv_mask(priv, 0x137250, 0x3f, 0);
nv_mask(priv, 0x000200, 0x20, 0);
udelay(20);
usleep_range()?
Sure.
+static void +nvea_ibus_intr(struct nouveau_subdev *subdev) +{
[...]
/* Acknowledge interrupt */
nv_mask(priv, 0x12004c, 0x2, 0x2);
while (--retry >= 0) {
command = nv_rd32(priv, 0x12004c) & 0x3f;
if (command == 0)
break;
}
if (retry < 0)
nv_warn(priv, "timeout waiting for ringmaster ack\n");
+}
Perhaps I'm being paranoid, but this loop now depends on the frequency of the various clocks involved and therefore might break at some point if the frequencies get sufficiently high.
So a slightly safer implementation would use a proper timeout using a combination of msecs_to_jiffies(), time_before() and usleep_range(), like so:
timeout = jiffies + msecs_to_jiffies(...); while (time_before(jiffies, timeout)) { command = nv_rd32(...) & 0x3f; if (command == 0) break; usleep_range(...); } if (time_after(jiffies, timeout)) nv_warn(...);
Right, now that I look at this code again I don't even understand why I left it this way. Maybe I left some early test code slip into the final patch, sorry about that.
I just remembered about this, but there's also the nv_wait() macro, which you could use, e.g.
if (!nv_wait(subdev, 0x12004c, 0x3f, 0x00)) nv_warn(...)
It has built-in timeout logic/etc (although no sleeps in the middle). It does use the timer subdev, so if that's not operational at this point, you can't use it.
IBUS comes after TIMER in the nv_subdev_type enum, so I guess that should work. Will try this solution, thanks!
Add a simple FB device for GK20A, as well as a RAM implementation based on contiguous DMA memory allocations suitable for chips that use system memory as video RAM.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/Makefile | 2 + drivers/gpu/drm/nouveau/core/include/subdev/fb.h | 1 + drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c | 56 ++++++++ drivers/gpu/drm/nouveau/core/subdev/fb/priv.h | 1 + drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c | 168 +++++++++++++++++++++++ 5 files changed, 228 insertions(+) create mode 100644 drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c create mode 100644 drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c
diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index 592141e62dda..708d2e33835f 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -100,6 +100,7 @@ nouveau-y += core/subdev/fb/nvaa.o nouveau-y += core/subdev/fb/nvaf.o nouveau-y += core/subdev/fb/nvc0.o nouveau-y += core/subdev/fb/nve0.o +nouveau-y += core/subdev/fb/nvea.o nouveau-y += core/subdev/fb/ramnv04.o nouveau-y += core/subdev/fb/ramnv10.o nouveau-y += core/subdev/fb/ramnv1a.o @@ -114,6 +115,7 @@ nouveau-y += core/subdev/fb/ramnva3.o nouveau-y += core/subdev/fb/ramnvaa.o nouveau-y += core/subdev/fb/ramnvc0.o nouveau-y += core/subdev/fb/ramnve0.o +nouveau-y += core/subdev/fb/ramnvea.o nouveau-y += core/subdev/fb/sddr3.o nouveau-y += core/subdev/fb/gddr5.o nouveau-y += core/subdev/gpio/base.o diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/fb.h b/drivers/gpu/drm/nouveau/core/include/subdev/fb.h index d7ecafbae1ca..3905816755ba 100644 --- a/drivers/gpu/drm/nouveau/core/include/subdev/fb.h +++ b/drivers/gpu/drm/nouveau/core/include/subdev/fb.h @@ -105,6 +105,7 @@ extern struct nouveau_oclass *nvaa_fb_oclass; extern struct nouveau_oclass *nvaf_fb_oclass; extern struct nouveau_oclass *nvc0_fb_oclass; extern struct nouveau_oclass *nve0_fb_oclass; +extern struct nouveau_oclass *nvea_fb_oclass;
#include <subdev/bios/ramcfg.h>
diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c b/drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c new file mode 100644 index 000000000000..62dbec48481e --- /dev/null +++ b/drivers/gpu/drm/nouveau/core/subdev/fb/nvea.c @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include "nvc0.h" + +struct nvea_fb_priv { + struct nouveau_fb base; +}; + +static int +nvea_fb_ctor(struct nouveau_object *parent, struct nouveau_object *engine, + struct nouveau_oclass *oclass, void *data, u32 size, + struct nouveau_object **pobject) +{ + struct nvea_fb_priv *priv; + int ret; + + ret = nouveau_fb_create(parent, engine, oclass, &priv); + *pobject = nv_object(priv); + if (ret) + return ret; + + return 0; +} + +struct nouveau_oclass * +nvea_fb_oclass = &(struct nouveau_fb_impl) { + .base.handle = NV_SUBDEV(FB, 0xea), + .base.ofuncs = &(struct nouveau_ofuncs) { + .ctor = nvea_fb_ctor, + .dtor = _nouveau_fb_dtor, + .init = _nouveau_fb_init, + .fini = _nouveau_fb_fini, + }, + .memtype = nvc0_fb_memtype_valid, + .ram = &nvea_ram_oclass, +}.base; diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h b/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h index edaf95dee612..0b95a25504d3 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h +++ b/drivers/gpu/drm/nouveau/core/subdev/fb/priv.h @@ -32,6 +32,7 @@ extern struct nouveau_oclass nva3_ram_oclass; extern struct nouveau_oclass nvaa_ram_oclass; extern struct nouveau_oclass nvc0_ram_oclass; extern struct nouveau_oclass nve0_ram_oclass; +extern struct nouveau_oclass nvea_ram_oclass;
int nouveau_sddr3_calc(struct nouveau_ram *ram); int nouveau_gddr5_calc(struct nouveau_ram *ram, bool nuts); diff --git a/drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c b/drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c new file mode 100644 index 000000000000..4e3d757614b6 --- /dev/null +++ b/drivers/gpu/drm/nouveau/core/subdev/fb/ramnvea.c @@ -0,0 +1,168 @@ +/* + * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include "priv.h" + +#include <subdev/fb.h> + +#include <linux/mm.h> +#include <linux/types.h> +#include <linux/dma-contiguous.h> + +static void +nvea_ram_put(struct nouveau_fb *pfb, struct nouveau_mem **pmem) +{ + struct device *dev = nv_device_base(nv_device(pfb)); + struct nouveau_mem *mem = *pmem; + int i; + + *pmem = NULL; + + for (i = 0; i < mem->size; i++) { + struct page *page; + + if (mem->pages[i] == 0) + break; + + page = pfn_to_page(dma_to_pfn(dev, mem->pages[i])); + dma_release_from_contiguous(dev, page, 1); + } + + kfree(mem->pages); + kfree(mem); +} + +static int +nvea_ram_get(struct nouveau_fb *pfb, u64 size, u32 align, u32 ncmin, + u32 memtype, struct nouveau_mem **pmem) +{ + struct device *dev = nv_device_base(nv_device(pfb)); + struct nouveau_mem *mem; + int type = memtype & 0xff; + dma_addr_t dma_addr; + int npages; + int order; + int i; + + nv_debug(pfb, "%s: size: %llx align: %x, ncmin: %x\n", __func__, size, + align, ncmin); + + npages = size >> PAGE_SHIFT; + if (npages == 0) + npages = 1; + + if (align == 0) + align = PAGE_SIZE; + align >>= PAGE_SHIFT; + + /* round alignment to the next power of 2, if needed */ + order = fls(align); + if ((align & (align - 1)) == 0) + order--; + + ncmin >>= PAGE_SHIFT; + /* + * allocate pages by chunks of "align" size, otherwise we may leave + * holes in the contiguous memory area. + */ + if (ncmin == 0) + ncmin = npages; + else if (align > ncmin) + ncmin = align; + + mem = kzalloc(sizeof(*mem), GFP_KERNEL); + if (!mem) + return -ENOMEM; + + mem->size = npages; + mem->memtype = type; + + mem->pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL); + if (!mem) { + kfree(mem); + return -ENOMEM; + } + + while (npages) { + struct page *pages; + int pos = 0; + + /* don't overflow in case size is not a multiple of ncmin */ + if (ncmin > npages) + ncmin = npages; + + pages = dma_alloc_from_contiguous(dev, ncmin, order); + if (!pages) { + nvea_ram_put(pfb, &mem); + return -ENOMEM; + } + + dma_addr = pfn_to_dma(nv_device_base(nv_device(pfb)), + page_to_pfn(pages)); + + nv_debug(pfb, " alloc count: %x, order: %x, addr: %x\n", ncmin, + order, dma_addr); + + for (i = 0; i < ncmin; i++) + mem->pages[pos + i] = dma_addr + (PAGE_SIZE * i); + + pos += ncmin; + npages -= ncmin; + } + + mem->offset = (u64)mem->pages[0]; + + *pmem = mem; + + return 0; +} + +static int +nvea_ram_ctor(struct nouveau_object *parent, struct nouveau_object *engine, + struct nouveau_oclass *oclass, void *data, u32 datasize, + struct nouveau_object **pobject) +{ + struct nouveau_ram *ram; + int ret; + + ret = nouveau_ram_create(parent, engine, oclass, &ram); + *pobject = nv_object(ram); + if (ret) + return ret; + ram->type = NV_MEM_TYPE_STOLEN; + ram->size = get_num_physpages() << PAGE_SHIFT; + + ram->get = nvea_ram_get; + ram->put = nvea_ram_put; + + return 0; +} + +struct nouveau_oclass +nvea_ram_oclass = { + .ofuncs = &(struct nouveau_ofuncs) { + .ctor = nvea_ram_ctor, + .dtor = _nouveau_ram_dtor, + .init = _nouveau_ram_init, + .fini = _nouveau_ram_fini, + }, +};
nvc0_graph_ctor() would only let the graphics engine be enabled if its oclass has a proper microcode linked to it. This prevents GR from being enabled at all on chips that rely exclusively on external firmware, even though such a use-case is valid.
Relax the conditions enabling the GR engine to also include the case where an external firmware has also been loaded.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index 6ef8bf181b2d..f997a18f5760 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -1133,10 +1133,14 @@ nvc0_graph_ctor(struct nouveau_object *parent, struct nouveau_object *engine, struct nvc0_graph_oclass *oclass = (void *)bclass; struct nouveau_device *device = nv_device(parent); struct nvc0_graph_priv *priv; + bool use_fw; int ret, i;
+ use_fw = nouveau_boolopt(device->cfgopt, "NvGrUseFW", false); + ret = nouveau_graph_create(parent, engine, bclass, - (oclass->fecs.ucode != NULL), &priv); + (oclass->fecs.ucode != NULL) || use_fw, + &priv); *pobject = nv_object(priv); if (ret) return ret; @@ -1146,7 +1150,7 @@ nvc0_graph_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
priv->base.units = nvc0_graph_units;
- if (nouveau_boolopt(device->cfgopt, "NvGrUseFW", false)) { + if (use_fw) { nv_info(priv, "using external firmware\n"); if (nvc0_graph_ctor_fw(priv, "fuc409c", &priv->fuc409c) || nvc0_graph_ctor_fw(priv, "fuc409d", &priv->fuc409d) ||
On Mon, Mar 24, 2014 at 05:42:30PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index 6ef8bf181b2d..f997a18f5760 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -1133,10 +1133,14 @@ nvc0_graph_ctor(struct nouveau_object *parent, struct nouveau_object *engine, struct nvc0_graph_oclass *oclass = (void *)bclass; struct nouveau_device *device = nv_device(parent); struct nvc0_graph_priv *priv;
- bool use_fw;
Perhaps "ext_fw" or "use_ext_fw" would be more accurate.
int ret, i;
- use_fw = nouveau_boolopt(device->cfgopt, "NvGrUseFW", false);
- ret = nouveau_graph_create(parent, engine, bclass,
(oclass->fecs.ucode != NULL), &priv);
(oclass->fecs.ucode != NULL) || use_fw,
&priv);
Or perhaps a more intuitive way would be to name the variable "enable" and have something like:
if (!nouveau_boolopt(device->cfgopt, "NvGrUseFW", false)) enable = oclass->fecs.ucode != NULL; else enable = true;
ret = nouveau_graph_create(parent, engine, bclass, enable, &priv); ...
Thierry
On Tue, Mar 25, 2014 at 8:58 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:30PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index 6ef8bf181b2d..f997a18f5760 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -1133,10 +1133,14 @@ nvc0_graph_ctor(struct nouveau_object *parent, struct nouveau_object *engine, struct nvc0_graph_oclass *oclass = (void *)bclass; struct nouveau_device *device = nv_device(parent); struct nvc0_graph_priv *priv;
bool use_fw;
Perhaps "ext_fw" or "use_ext_fw" would be more accurate.
int ret, i;
use_fw = nouveau_boolopt(device->cfgopt, "NvGrUseFW", false);
ret = nouveau_graph_create(parent, engine, bclass,
(oclass->fecs.ucode != NULL), &priv);
(oclass->fecs.ucode != NULL) || use_fw,
&priv);
Or perhaps a more intuitive way would be to name the variable "enable" and have something like:
if (!nouveau_boolopt(device->cfgopt, "NvGrUseFW", false)) enable = oclass->fecs.ucode != NULL; else enable = true; ret = nouveau_graph_create(parent, engine, bclass, enable, &priv); ...
Agreed, this looks a lot nicer.
Thierry
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Mar 26, 2014 at 1:21 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 8:58 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:30PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index 6ef8bf181b2d..f997a18f5760 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -1133,10 +1133,14 @@ nvc0_graph_ctor(struct nouveau_object *parent, struct nouveau_object *engine, struct nvc0_graph_oclass *oclass = (void *)bclass; struct nouveau_device *device = nv_device(parent); struct nvc0_graph_priv *priv;
bool use_fw;
Perhaps "ext_fw" or "use_ext_fw" would be more accurate.
int ret, i;
use_fw = nouveau_boolopt(device->cfgopt, "NvGrUseFW", false);
ret = nouveau_graph_create(parent, engine, bclass,
(oclass->fecs.ucode != NULL), &priv);
(oclass->fecs.ucode != NULL) || use_fw,
&priv);
Or perhaps a more intuitive way would be to name the variable "enable" and have something like:
if (!nouveau_boolopt(device->cfgopt, "NvGrUseFW", false)) enable = oclass->fecs.ucode != NULL; else enable = true; ret = nouveau_graph_create(parent, engine, bclass, enable, &priv); ...
Agreed, this looks a lot nicer.
Will fix that, thanks!
Pad the microcode to a multiple of 0x40, otherwise firmware will fail to run from non-prepadded firmware files.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index f997a18f5760..367e72daf8b1 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -768,6 +768,10 @@ nvc0_graph_init_fw(struct nvc0_graph_priv *priv, u32 fuc_base, nv_wr32(priv, fuc_base + 0x0188, i >> 6); nv_wr32(priv, fuc_base + 0x0184, code->data[i]); } + + /* code must be padded to 0x40 */ + for (; i < (((code->size / 4) + 0x3f) & ~0x3f); i++) + nv_wr32(priv, fuc_base + 0x0184, 0); }
static void
On Mon, Mar 24, 2014 at 05:42:31PM +0900, Alexandre Courbot wrote:
Pad the microcode to a multiple of 0x40, otherwise firmware will fail to run from non-prepadded firmware files.
Perhaps this (and the comment in the code) should mention a unit. In this case it's 0x40 words. Also, I think using a decimal number would read easier here and in the comment. Furthermore, perhaps choosing a byte as the unit would be even more intuitive.
But either way the code is correct, so:
Reviewed-by: Thierry Reding treding@nvidia.com
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Pad the microcode to a multiple of 0x40, otherwise firmware will fail to run from non-prepadded firmware files.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index f997a18f5760..367e72daf8b1 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -768,6 +768,10 @@ nvc0_graph_init_fw(struct nvc0_graph_priv *priv, u32 fuc_base, nv_wr32(priv, fuc_base + 0x0188, i >> 6); nv_wr32(priv, fuc_base + 0x0184, code->data[i]); }
/* code must be padded to 0x40 */
for (; i < (((code->size / 4) + 0x3f) & ~0x3f); i++)
"for (; i & 0x3f; i++)" would work just as well :)
nv_wr32(priv, fuc_base + 0x0184, 0);
}
static void
1.9.1
Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
On Wed, Mar 26, 2014 at 1:22 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Pad the microcode to a multiple of 0x40, otherwise firmware will fail to run from non-prepadded firmware files.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c index f997a18f5760..367e72daf8b1 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c @@ -768,6 +768,10 @@ nvc0_graph_init_fw(struct nvc0_graph_priv *priv, u32 fuc_base, nv_wr32(priv, fuc_base + 0x0188, i >> 6); nv_wr32(priv, fuc_base + 0x0184, code->data[i]); }
/* code must be padded to 0x40 */
for (; i < (((code->size / 4) + 0x3f) & ~0x3f); i++)
"for (; i & 0x3f; i++)" would work just as well :)
Indeed. >_<
Add a GR device for GK20A based on NVE4, with the correct classes definitions (GK20A's 3D class is 0xa297).
Most of the NVE4 code can be used on GK20A, so make relevant bits of NVE4 available to other chips as well.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/Makefile | 1 + .../gpu/drm/nouveau/core/engine/graph/ctxnve4.c | 4 +- drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h | 9 +++ drivers/gpu/drm/nouveau/core/engine/graph/nve4.c | 2 +- drivers/gpu/drm/nouveau/core/engine/graph/nvea.c | 75 ++++++++++++++++++++++ .../gpu/drm/nouveau/core/include/engine/graph.h | 1 + 6 files changed, 89 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/core/engine/graph/nvea.c
diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index 708d2e33835f..032a4744f843 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -270,6 +270,7 @@ nouveau-y += core/engine/graph/nvc8.o nouveau-y += core/engine/graph/nvd7.o nouveau-y += core/engine/graph/nvd9.o nouveau-y += core/engine/graph/nve4.o +nouveau-y += core/engine/graph/nvea.o nouveau-y += core/engine/graph/nvf0.o nouveau-y += core/engine/graph/nv108.o nouveau-y += core/engine/mpeg/nv31.o diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c b/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c index e2de73ee5eee..3904073f860d 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c @@ -804,7 +804,7 @@ nve4_grctx_init_unk[] = { {} };
-static void +void nve4_grctx_generate_mods(struct nvc0_graph_priv *priv, struct nvc0_grctx *info) { u32 magic[GPC_MAX][2]; @@ -962,7 +962,7 @@ nve4_grctx_generate_main(struct nvc0_graph_priv *priv, struct nvc0_grctx *info) nv_mask(priv, 0x41be10, 0x00800000, 0x00800000); }
-static struct nvc0_graph_init * +struct nvc0_graph_init * nve4_grctx_init_hub[] = { nvc0_grctx_init_base, nve4_grctx_init_unk40xx, diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h index b0ab6de270b2..904f09b540c5 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h @@ -221,6 +221,8 @@ void nvc0_grctx_generate_r418bb8(struct nvc0_graph_priv *); void nve4_grctx_generate_r418bb8(struct nvc0_graph_priv *); void nvc0_grctx_generate_r406800(struct nvc0_graph_priv *);
+void nve4_grctx_generate_mods(struct nvc0_graph_priv *, struct nvc0_grctx *); + extern struct nouveau_oclass *nvc0_grctx_oclass; extern struct nvc0_graph_init *nvc0_grctx_init_hub[]; extern struct nvc0_graph_init nvc0_grctx_init_base[]; @@ -237,12 +239,17 @@ extern struct nvc0_graph_init nvc0_grctx_init_gpc_1[]; extern struct nvc0_graph_init nvc0_grctx_init_tpc[]; extern struct nvc0_graph_init nvc0_grctx_init_icmd[]; extern struct nvc0_graph_init nvd9_grctx_init_icmd[]; // +extern struct nvc0_graph_init nve4_grctx_init_icmd[]; + +extern struct nvc0_graph_init *nve4_grctx_init_hub[]; +extern struct nvc0_graph_init *nve4_grctx_init_gpc[];
extern struct nvc0_graph_mthd nvc0_grctx_init_mthd[]; extern struct nvc0_graph_init nvc0_grctx_init_902d[]; extern struct nvc0_graph_init nvc0_grctx_init_9039[]; extern struct nvc0_graph_init nvc0_grctx_init_90c0[]; extern struct nvc0_graph_init nvc0_grctx_init_mthd_magic[]; +extern struct nvc0_graph_init nve4_grctx_init_a097[];
void nvc1_grctx_generate_mods(struct nvc0_graph_priv *, struct nvc0_grctx *); void nvc1_grctx_generate_unkn(struct nvc0_graph_priv *); @@ -277,6 +284,8 @@ extern struct nvc0_graph_init nvf0_grctx_init_unk60xx[];
extern struct nouveau_oclass *nv108_grctx_oclass;
+extern struct nvc0_graph_init *nve4_graph_init_mmio[]; + #define mmio_data(s,a,p) do { \ info->buffer[info->buffer_nr] = round_up(info->addr, (a)); \ info->addr = info->buffer[info->buffer_nr++] + (s); \ diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c b/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c index 05ec09c88517..442857c5c120 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c @@ -298,7 +298,7 @@ nve4_graph_init(struct nouveau_object *object) return nvc0_graph_init_ctxctl(priv); }
-static struct nvc0_graph_init * +struct nvc0_graph_init * nve4_graph_init_mmio[] = { nve4_graph_init_regs, nvc0_graph_init_unk40xx, diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvea.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvea.c new file mode 100644 index 000000000000..d5e6a1adcacb --- /dev/null +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvea.c @@ -0,0 +1,75 @@ +/* + * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include "nvc0.h" + +static struct nouveau_oclass +nvea_graph_sclass[] = { + { 0x902d, &nouveau_object_ofuncs }, + { 0xa040, &nouveau_object_ofuncs }, + { 0xa297, &nouveau_object_ofuncs }, + { 0xa0c0, &nouveau_object_ofuncs }, + {} +}; + +static struct nvc0_graph_mthd +nvea_grctx_init_mthd[] = { + { 0xa297, nve4_grctx_init_a097, }, + { 0x902d, nvc0_grctx_init_902d, }, + { 0x902d, nvc0_grctx_init_mthd_magic, }, + {} +}; + +struct nouveau_oclass * +nvea_grctx_oclass = &(struct nvc0_grctx_oclass) { + .base.handle = NV_ENGCTX(GR, 0xea), + .base.ofuncs = &(struct nouveau_ofuncs) { + .ctor = nvc0_graph_context_ctor, + .dtor = nvc0_graph_context_dtor, + .init = _nouveau_graph_context_init, + .fini = _nouveau_graph_context_fini, + .rd32 = _nouveau_graph_context_rd32, + .wr32 = _nouveau_graph_context_wr32, + }, + .main = nve4_grctx_generate_main, + .mods = nve4_grctx_generate_mods, + .unkn = nve4_grctx_generate_unkn, + .hub = nve4_grctx_init_hub, + .gpc = nve4_grctx_init_gpc, + .icmd = nve4_grctx_init_icmd, + .mthd = nvea_grctx_init_mthd, +}.base; + + +struct nouveau_oclass * +nvea_graph_oclass = &(struct nvc0_graph_oclass) { + .base.handle = NV_ENGINE(GR, 0xea), + .base.ofuncs = &(struct nouveau_ofuncs) { + .ctor = nvc0_graph_ctor, + .dtor = nvc0_graph_dtor, + .init = nve4_graph_init, + .fini = _nouveau_graph_fini, + }, + .cclass = &nvea_grctx_oclass, + .sclass = nvea_graph_sclass, + .mmio = nve4_graph_init_mmio, +}.base; diff --git a/drivers/gpu/drm/nouveau/core/include/engine/graph.h b/drivers/gpu/drm/nouveau/core/include/engine/graph.h index 97705618de97..632d7e172172 100644 --- a/drivers/gpu/drm/nouveau/core/include/engine/graph.h +++ b/drivers/gpu/drm/nouveau/core/include/engine/graph.h @@ -68,6 +68,7 @@ extern struct nouveau_oclass *nvc8_graph_oclass; extern struct nouveau_oclass *nvd7_graph_oclass; extern struct nouveau_oclass *nvd9_graph_oclass; extern struct nouveau_oclass *nve4_graph_oclass; +extern struct nouveau_oclass *nvea_graph_oclass; extern struct nouveau_oclass *nvf0_graph_oclass; extern struct nouveau_oclass *nv108_graph_oclass;
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Add a GR device for GK20A based on NVE4, with the correct classes definitions (GK20A's 3D class is 0xa297).
Most of the NVE4 code can be used on GK20A, so make relevant bits of NVE4 available to other chips as well.
This will need a bit of a rebase on top of the tree I mentioned earlier (also queued for drm-next now), where I've further split out and named the various chunks of state.
Does GK104 match entirely correctly, or just happen to work? I could probably hunt down the GK20A netlist images and check that actually :)
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/Makefile | 1 + .../gpu/drm/nouveau/core/engine/graph/ctxnve4.c | 4 +- drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h | 9 +++ drivers/gpu/drm/nouveau/core/engine/graph/nve4.c | 2 +- drivers/gpu/drm/nouveau/core/engine/graph/nvea.c | 75 ++++++++++++++++++++++ .../gpu/drm/nouveau/core/include/engine/graph.h | 1 + 6 files changed, 89 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/core/engine/graph/nvea.c
diff --git a/drivers/gpu/drm/nouveau/Makefile b/drivers/gpu/drm/nouveau/Makefile index 708d2e33835f..032a4744f843 100644 --- a/drivers/gpu/drm/nouveau/Makefile +++ b/drivers/gpu/drm/nouveau/Makefile @@ -270,6 +270,7 @@ nouveau-y += core/engine/graph/nvc8.o nouveau-y += core/engine/graph/nvd7.o nouveau-y += core/engine/graph/nvd9.o nouveau-y += core/engine/graph/nve4.o +nouveau-y += core/engine/graph/nvea.o nouveau-y += core/engine/graph/nvf0.o nouveau-y += core/engine/graph/nv108.o nouveau-y += core/engine/mpeg/nv31.o diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c b/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c index e2de73ee5eee..3904073f860d 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/ctxnve4.c @@ -804,7 +804,7 @@ nve4_grctx_init_unk[] = { {} };
-static void +void nve4_grctx_generate_mods(struct nvc0_graph_priv *priv, struct nvc0_grctx *info) { u32 magic[GPC_MAX][2]; @@ -962,7 +962,7 @@ nve4_grctx_generate_main(struct nvc0_graph_priv *priv, struct nvc0_grctx *info) nv_mask(priv, 0x41be10, 0x00800000, 0x00800000); }
-static struct nvc0_graph_init * +struct nvc0_graph_init * nve4_grctx_init_hub[] = { nvc0_grctx_init_base, nve4_grctx_init_unk40xx, diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h index b0ab6de270b2..904f09b540c5 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvc0.h @@ -221,6 +221,8 @@ void nvc0_grctx_generate_r418bb8(struct nvc0_graph_priv *); void nve4_grctx_generate_r418bb8(struct nvc0_graph_priv *); void nvc0_grctx_generate_r406800(struct nvc0_graph_priv *);
+void nve4_grctx_generate_mods(struct nvc0_graph_priv *, struct nvc0_grctx *);
extern struct nouveau_oclass *nvc0_grctx_oclass; extern struct nvc0_graph_init *nvc0_grctx_init_hub[]; extern struct nvc0_graph_init nvc0_grctx_init_base[]; @@ -237,12 +239,17 @@ extern struct nvc0_graph_init nvc0_grctx_init_gpc_1[]; extern struct nvc0_graph_init nvc0_grctx_init_tpc[]; extern struct nvc0_graph_init nvc0_grctx_init_icmd[]; extern struct nvc0_graph_init nvd9_grctx_init_icmd[]; // +extern struct nvc0_graph_init nve4_grctx_init_icmd[];
+extern struct nvc0_graph_init *nve4_grctx_init_hub[]; +extern struct nvc0_graph_init *nve4_grctx_init_gpc[];
extern struct nvc0_graph_mthd nvc0_grctx_init_mthd[]; extern struct nvc0_graph_init nvc0_grctx_init_902d[]; extern struct nvc0_graph_init nvc0_grctx_init_9039[]; extern struct nvc0_graph_init nvc0_grctx_init_90c0[]; extern struct nvc0_graph_init nvc0_grctx_init_mthd_magic[]; +extern struct nvc0_graph_init nve4_grctx_init_a097[];
void nvc1_grctx_generate_mods(struct nvc0_graph_priv *, struct nvc0_grctx *); void nvc1_grctx_generate_unkn(struct nvc0_graph_priv *); @@ -277,6 +284,8 @@ extern struct nvc0_graph_init nvf0_grctx_init_unk60xx[];
extern struct nouveau_oclass *nv108_grctx_oclass;
+extern struct nvc0_graph_init *nve4_graph_init_mmio[];
#define mmio_data(s,a,p) do { \ info->buffer[info->buffer_nr] = round_up(info->addr, (a)); \ info->addr = info->buffer[info->buffer_nr++] + (s); \ diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c b/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c index 05ec09c88517..442857c5c120 100644 --- a/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nve4.c @@ -298,7 +298,7 @@ nve4_graph_init(struct nouveau_object *object) return nvc0_graph_init_ctxctl(priv); }
-static struct nvc0_graph_init * +struct nvc0_graph_init * nve4_graph_init_mmio[] = { nve4_graph_init_regs, nvc0_graph_init_unk40xx, diff --git a/drivers/gpu/drm/nouveau/core/engine/graph/nvea.c b/drivers/gpu/drm/nouveau/core/engine/graph/nvea.c new file mode 100644 index 000000000000..d5e6a1adcacb --- /dev/null +++ b/drivers/gpu/drm/nouveau/core/engine/graph/nvea.c @@ -0,0 +1,75 @@ +/*
- Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
- Permission is hereby granted, free of charge, to any person obtaining a
- copy of this software and associated documentation files (the "Software"),
- to deal in the Software without restriction, including without limitation
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
- and/or sell copies of the Software, and to permit persons to whom the
- Software is furnished to do so, subject to the following conditions:
- The above copyright notice and this permission notice shall be included in
- all copies or substantial portions of the Software.
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- DEALINGS IN THE SOFTWARE.
- */
+#include "nvc0.h"
+static struct nouveau_oclass +nvea_graph_sclass[] = {
{ 0x902d, &nouveau_object_ofuncs },
{ 0xa040, &nouveau_object_ofuncs },
{ 0xa297, &nouveau_object_ofuncs },
{ 0xa0c0, &nouveau_object_ofuncs },
{}
+};
+static struct nvc0_graph_mthd +nvea_grctx_init_mthd[] = {
{ 0xa297, nve4_grctx_init_a097, },
{ 0x902d, nvc0_grctx_init_902d, },
{ 0x902d, nvc0_grctx_init_mthd_magic, },
{}
+};
+struct nouveau_oclass * +nvea_grctx_oclass = &(struct nvc0_grctx_oclass) {
.base.handle = NV_ENGCTX(GR, 0xea),
.base.ofuncs = &(struct nouveau_ofuncs) {
.ctor = nvc0_graph_context_ctor,
.dtor = nvc0_graph_context_dtor,
.init = _nouveau_graph_context_init,
.fini = _nouveau_graph_context_fini,
.rd32 = _nouveau_graph_context_rd32,
.wr32 = _nouveau_graph_context_wr32,
},
.main = nve4_grctx_generate_main,
.mods = nve4_grctx_generate_mods,
.unkn = nve4_grctx_generate_unkn,
.hub = nve4_grctx_init_hub,
.gpc = nve4_grctx_init_gpc,
.icmd = nve4_grctx_init_icmd,
.mthd = nvea_grctx_init_mthd,
+}.base;
+struct nouveau_oclass * +nvea_graph_oclass = &(struct nvc0_graph_oclass) {
.base.handle = NV_ENGINE(GR, 0xea),
.base.ofuncs = &(struct nouveau_ofuncs) {
.ctor = nvc0_graph_ctor,
.dtor = nvc0_graph_dtor,
.init = nve4_graph_init,
.fini = _nouveau_graph_fini,
},
.cclass = &nvea_grctx_oclass,
.sclass = nvea_graph_sclass,
.mmio = nve4_graph_init_mmio,
+}.base; diff --git a/drivers/gpu/drm/nouveau/core/include/engine/graph.h b/drivers/gpu/drm/nouveau/core/include/engine/graph.h index 97705618de97..632d7e172172 100644 --- a/drivers/gpu/drm/nouveau/core/include/engine/graph.h +++ b/drivers/gpu/drm/nouveau/core/include/engine/graph.h @@ -68,6 +68,7 @@ extern struct nouveau_oclass *nvc8_graph_oclass; extern struct nouveau_oclass *nvd7_graph_oclass; extern struct nouveau_oclass *nvd9_graph_oclass; extern struct nouveau_oclass *nve4_graph_oclass; +extern struct nouveau_oclass *nvea_graph_oclass; extern struct nouveau_oclass *nvf0_graph_oclass; extern struct nouveau_oclass *nv108_graph_oclass;
-- 1.9.1
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Mar 26, 2014 at 1:24 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Add a GR device for GK20A based on NVE4, with the correct classes definitions (GK20A's 3D class is 0xa297).
Most of the NVE4 code can be used on GK20A, so make relevant bits of NVE4 available to other chips as well.
This will need a bit of a rebase on top of the tree I mentioned earlier (also queued for drm-next now), where I've further split out and named the various chunks of state.
Will do that.
Does GK104 match entirely correctly, or just happen to work? I could probably hunt down the GK20A netlist images and check that actually :)
Do you mean, the init sequence? I haven't checked in detail (we are certainly doing things differently in the non-DRM driver), but the registers seem to match and the GPU is able to render after that. I admit I have not looked much further for now.
The only register that does not exist on GK20A is 0x260, but when accessing it Nouveau will be able to continue unharmed after a memory fault.
On Thu, Apr 3, 2014 at 12:03 AM, Alexandre Courbot gnurou@gmail.com wrote:
On Wed, Mar 26, 2014 at 1:24 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Add a GR device for GK20A based on NVE4, with the correct classes definitions (GK20A's 3D class is 0xa297).
Most of the NVE4 code can be used on GK20A, so make relevant bits of NVE4 available to other chips as well.
This will need a bit of a rebase on top of the tree I mentioned earlier (also queued for drm-next now), where I've further split out and named the various chunks of state.
Will do that.
Does GK104 match entirely correctly, or just happen to work? I could probably hunt down the GK20A netlist images and check that actually :)
Do you mean, the init sequence? I haven't checked in detail (we are certainly doing things differently in the non-DRM driver), but the registers seem to match and the GPU is able to render after that. I admit I have not looked much further for now.
I meant the arrays of register data, there's generally been some differences for most major chipset bumps. Where can I find the netlist firmware packages that go with the android driver? I can pull all the required info out of there to check :)
The only register that does not exist on GK20A is 0x260, but when accessing it Nouveau will be able to continue unharmed after a memory fault.
I have it in my queue to fix that too, the register doesn't exist on a couple of the other chips we support too.
GK20A does not embed a dedicated COPY engine and thus cannot allocate the copy channel that nouveau_accel_init() attempts to create. It also lacks any display hardware, so the creation of a software channel does not apply neither.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index ef27949057c3..f2394e84eae6 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -171,6 +171,11 @@ nouveau_accel_init(struct nouveau_drm *drm) return; }
+ if (device->chipset == 0xea) { + /* gk20a does not have CE0/CE1 */ + arg0 = NVE0_CHANNEL_IND_ENGINE_GR; + arg1 = 1; + } else if (device->card_type >= NV_E0) { ret = nouveau_channel_new(drm, &drm->client, NVDRM_DEVICE, NVDRM_CHAN + 1, @@ -207,6 +212,10 @@ nouveau_accel_init(struct nouveau_drm *drm) return; }
+ /* Need to figure out how to handle sw for gk20a */ + if (device->chipset == 0xea) + goto skip_sw_init; + ret = nouveau_object_new(nv_object(drm), NVDRM_CHAN, NVDRM_NVSW, nouveau_abi16_swclass(drm), NULL, 0, &object); if (ret == 0) { @@ -233,6 +242,7 @@ nouveau_accel_init(struct nouveau_drm *drm) return; }
+skip_sw_init: if (device->card_type < NV_C0) { ret = nouveau_gpuobj_new(drm->device, NULL, 32, 0, 0, &drm->notify);
On Mon, Mar 24, 2014 at 05:42:33PM +0900, Alexandre Courbot wrote:
GK20A does not embed a dedicated COPY engine and thus cannot allocate the copy channel that nouveau_accel_init() attempts to create. It also lacks any display hardware, so the creation of a software channel does not apply neither.
Perhaps this should be two separate patches?
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
[...]
- if (device->chipset == 0xea) {
/* gk20a does not have CE0/CE1 */
This would be another good candidate for a feature flag.
arg0 = NVE0_CHANNEL_IND_ENGINE_GR;
arg1 = 1;
- } else if (device->card_type >= NV_E0) {
The formatting here is somewhat weird. From a brief look I couldn't find any indication that nouveau deviates from the standard coding style, so this should be:
} else if (...) {
- /* Need to figure out how to handle sw for gk20a */
- if (device->chipset == 0xea)
goto skip_sw_init;
The commit message makes it sound like SW isn't needed since gk20a "lacks any display hardware". In that case the comment here doesn't make much sense.
Thierry
On Tue, Mar 25, 2014 at 9:10 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:33PM +0900, Alexandre Courbot wrote:
GK20A does not embed a dedicated COPY engine and thus cannot allocate the copy channel that nouveau_accel_init() attempts to create. It also lacks any display hardware, so the creation of a software channel does not apply neither.
Perhaps this should be two separate patches?
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
[...]
if (device->chipset == 0xea) {
/* gk20a does not have CE0/CE1 */
This would be another good candidate for a feature flag.
There are ways to query this in a chipset-independent way. However, despite reporting it as an error if no copy engines are available, the code should continue on without the channel happily. Perhaps we can just punt the relevent error messages to a debug loglevel for now?
arg0 = NVE0_CHANNEL_IND_ENGINE_GR;
arg1 = 1;
} else if (device->card_type >= NV_E0) {
The formatting here is somewhat weird. From a brief look I couldn't find any indication that nouveau deviates from the standard coding style, so this should be:
} else if (...) {
I use the former in a few places, despite it not entirely being "correct".. It looks nicer though :) I don't mind either way though.
/* Need to figure out how to handle sw for gk20a */
if (device->chipset == 0xea)
goto skip_sw_init;
The commit message makes it sound like SW isn't needed since gk20a "lacks any display hardware". In that case the comment here doesn't make much sense.
Thierry
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Mar 26, 2014 at 1:27 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 9:10 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:33PM +0900, Alexandre Courbot wrote:
GK20A does not embed a dedicated COPY engine and thus cannot allocate the copy channel that nouveau_accel_init() attempts to create. It also lacks any display hardware, so the creation of a software channel does not apply neither.
Perhaps this should be two separate patches?
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
[...]
if (device->chipset == 0xea) {
/* gk20a does not have CE0/CE1 */
This would be another good candidate for a feature flag.
There are ways to query this in a chipset-independent way. However, despite reporting it as an error if no copy engines are available, the code should continue on without the channel happily. Perhaps we can just punt the relevent error messages to a debug loglevel for now?
Sure, that would be more future-proof as well.
arg0 = NVE0_CHANNEL_IND_ENGINE_GR;
arg1 = 1;
} else if (device->card_type >= NV_E0) {
The formatting here is somewhat weird. From a brief look I couldn't find any indication that nouveau deviates from the standard coding style, so this should be:
} else if (...) {
I use the former in a few places, despite it not entirely being "correct".. It looks nicer though :) I don't mind either way though.
Yeah, I just followed the style of the file here. Whether it needs to change or not is not my call. :P
/* Need to figure out how to handle sw for gk20a */
if (device->chipset == 0xea)
goto skip_sw_init;
The commit message makes it sound like SW isn't needed since gk20a "lacks any display hardware". In that case the comment here doesn't make much sense.
Correct. As far as I have looked (that is, not very far), SW methods are used for display-related functions, but there might be other use-cases too. Maybe someone who knows better can confirm?
On Wed, Apr 2, 2014 at 10:14 AM, Alexandre Courbot gnurou@gmail.com wrote:
/* Need to figure out how to handle sw for gk20a */
if (device->chipset == 0xea)
goto skip_sw_init;
The commit message makes it sound like SW isn't needed since gk20a "lacks any display hardware". In that case the comment here doesn't make much sense.
Correct. As far as I have looked (that is, not very far), SW methods are used for display-related functions, but there might be other use-cases too. Maybe someone who knows better can confirm?
http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/software/nvc0...
Take a look at nvc0_software_mthd_mp_control -- that's used in the mesa driver to... well, control those things :) They're related to PGRAPH, which should be available on the GK20A. [Not sure what they do though. One of them is about turning off error reporting.]
-ilia
On Thu, Apr 3, 2014 at 12:23 AM, Ilia Mirkin imirkin@alum.mit.edu wrote:
On Wed, Apr 2, 2014 at 10:14 AM, Alexandre Courbot gnurou@gmail.com wrote:
/* Need to figure out how to handle sw for gk20a */
if (device->chipset == 0xea)
goto skip_sw_init;
The commit message makes it sound like SW isn't needed since gk20a "lacks any display hardware". In that case the comment here doesn't make much sense.
Correct. As far as I have looked (that is, not very far), SW methods are used for display-related functions, but there might be other use-cases too. Maybe someone who knows better can confirm?
http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/software/nvc0...
Take a look at nvc0_software_mthd_mp_control -- that's used in the mesa driver to... well, control those things :) They're related to PGRAPH, which should be available on the GK20A. [Not sure what they do though. One of them is about turning off error reporting.]
Hm, I don't know why I didn't notice that was there for now.. I wouldn't be entirely surprised if nvidia had reserved method ids in the 3d object class that were supposed to be used for this, rather than from a sw object class..
-ilia
On Wed, Mar 26, 2014 at 1:27 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Tue, Mar 25, 2014 at 9:10 AM, Thierry Reding thierry.reding@gmail.com wrote:
On Mon, Mar 24, 2014 at 05:42:33PM +0900, Alexandre Courbot wrote:
GK20A does not embed a dedicated COPY engine and thus cannot allocate the copy channel that nouveau_accel_init() attempts to create. It also lacks any display hardware, so the creation of a software channel does not apply neither.
Perhaps this should be two separate patches?
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
[...]
if (device->chipset == 0xea) {
/* gk20a does not have CE0/CE1 */
This would be another good candidate for a feature flag.
There are ways to query this in a chipset-independent way. However, despite reporting it as an error if no copy engines are available, the code should continue on without the channel happily. Perhaps we can just punt the relevent error messages to a debug loglevel for now?
Do you know how to query this in a chipset-independant way? I have failed to find any information for this.
The code does continue without any issue after reporting the error, so indeed that check is not strictly necessary. But I was just mimicking what follows right after:
if (device->chipset >= 0xa3 && device->chipset != 0xaa && device->chipset != 0xac) { ret = nouveau_channel_new(drm, &drm->client, NVDRM_DEVICE, NVDRM_CHAN + 1, NvDmaFB, NvDmaTT, &drm->cechan); if (ret) NV_ERROR(drm, "failed to create ce channel, %d\n", ret);
arg0 = NvDmaFB; arg1 = NvDmaTT; } else { arg0 = NvDmaFB; arg1 = NvDmaTT; }
So if we are trying to avoid showing this error for 0xa0 class devices, why not for NV_E0?
Set the correct subdev/engine classes when GK20A (0xea) is probed.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com --- drivers/gpu/drm/nouveau/core/engine/device/nve0.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c index 987edbc30a09..8509dd57de1f 100644 --- a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c +++ b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c @@ -156,6 +156,26 @@ nve0_identify(struct nouveau_device *device) device->oclass[NVDEV_ENGINE_PPP ] = &nvc0_ppp_oclass; device->oclass[NVDEV_ENGINE_PERFMON] = &nve0_perfmon_oclass; break; + case 0xea: + device->cname = "GK20A"; + device->oclass[NVDEV_SUBDEV_MC ] = nvc3_mc_oclass; + device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass; + device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass; + device->oclass[NVDEV_SUBDEV_FB ] = nvea_fb_oclass; + device->oclass[NVDEV_SUBDEV_IBUS ] = &nvea_ibus_oclass; + device->oclass[NVDEV_SUBDEV_INSTMEM] = nv50_instmem_oclass; + device->oclass[NVDEV_SUBDEV_VM ] = &nvc0_vmmgr_oclass; + device->oclass[NVDEV_SUBDEV_BAR ] = &nvc0_bar_oclass; + device->oclass[NVDEV_ENGINE_DMAOBJ ] = &nvd0_dmaeng_oclass; + device->oclass[NVDEV_ENGINE_FIFO ] = nvea_fifo_oclass; + /* TODO will need an implementation for this at some point... */ +#if 0 + device->oclass[NVDEV_ENGINE_SW ] = nvc0_software_oclass; +#endif + device->oclass[NVDEV_ENGINE_GR ] = nvea_graph_oclass; + device->oclass[NVDEV_ENGINE_COPY2 ] = &nve0_copy2_oclass; + device->oclass[NVDEV_ENGINE_PERFMON] = &nve0_perfmon_oclass; + break; case 0xf0: device->cname = "GK110"; device->oclass[NVDEV_SUBDEV_VBIOS ] = &nouveau_bios_oclass;
On Mon, Mar 24, 2014 at 05:42:34PM +0900, Alexandre Courbot wrote: [...]
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c
[...]
/* TODO will need an implementation for this at some point... */
Do we? If so the commit message in 11/12 is misleading.
Thierry
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Set the correct subdev/engine classes when GK20A (0xea) is probed.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/engine/device/nve0.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c index 987edbc30a09..8509dd57de1f 100644 --- a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c +++ b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c @@ -156,6 +156,26 @@ nve0_identify(struct nouveau_device *device) device->oclass[NVDEV_ENGINE_PPP ] = &nvc0_ppp_oclass; device->oclass[NVDEV_ENGINE_PERFMON] = &nve0_perfmon_oclass; break;
case 0xea:
device->cname = "GK20A";
device->oclass[NVDEV_SUBDEV_MC ] = nvc3_mc_oclass;
device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass;
As per note on the PTIMER patch, can just switch this to "gk20a_timer_oclass" on the latest code.
device->oclass[NVDEV_SUBDEV_FB ] = nvea_fb_oclass;
device->oclass[NVDEV_SUBDEV_IBUS ] = &nvea_ibus_oclass;
device->oclass[NVDEV_SUBDEV_INSTMEM] = nv50_instmem_oclass;
device->oclass[NVDEV_SUBDEV_VM ] = &nvc0_vmmgr_oclass;
device->oclass[NVDEV_SUBDEV_BAR ] = &nvc0_bar_oclass;
device->oclass[NVDEV_ENGINE_DMAOBJ ] = &nvd0_dmaeng_oclass;
device->oclass[NVDEV_ENGINE_FIFO ] = nvea_fifo_oclass;
/* TODO will need an implementation for this at some point... */
+#if 0
device->oclass[NVDEV_ENGINE_SW ] = nvc0_software_oclass;
+#endif
device->oclass[NVDEV_ENGINE_GR ] = nvea_graph_oclass;
device->oclass[NVDEV_ENGINE_COPY2 ] = &nve0_copy2_oclass;
device->oclass[NVDEV_ENGINE_PERFMON] = &nve0_perfmon_oclass;
break; case 0xf0: device->cname = "GK110"; device->oclass[NVDEV_SUBDEV_VBIOS ] = &nouveau_bios_oclass;
-- 1.9.1
Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
On Wed, Mar 26, 2014 at 1:28 PM, Ben Skeggs skeggsb@gmail.com wrote:
On Mon, Mar 24, 2014 at 6:42 PM, Alexandre Courbot acourbot@nvidia.com wrote:
Set the correct subdev/engine classes when GK20A (0xea) is probed.
Signed-off-by: Alexandre Courbot acourbot@nvidia.com
drivers/gpu/drm/nouveau/core/engine/device/nve0.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c index 987edbc30a09..8509dd57de1f 100644 --- a/drivers/gpu/drm/nouveau/core/engine/device/nve0.c +++ b/drivers/gpu/drm/nouveau/core/engine/device/nve0.c @@ -156,6 +156,26 @@ nve0_identify(struct nouveau_device *device) device->oclass[NVDEV_ENGINE_PPP ] = &nvc0_ppp_oclass; device->oclass[NVDEV_ENGINE_PERFMON] = &nve0_perfmon_oclass; break;
case 0xea:
device->cname = "GK20A";
device->oclass[NVDEV_SUBDEV_MC ] = nvc3_mc_oclass;
device->oclass[NVDEV_SUBDEV_BUS ] = nvc0_bus_oclass;
device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass;
As per note on the PTIMER patch, can just switch this to "gk20a_timer_oclass" on the latest code.
Just saw your patch and will try to use it for the next version (if I can figure out how to reapply it to a Linux tree).
Hi Alexandre,
Am Montag, den 24.03.2014, 17:42 +0900 schrieb Alexandre Courbot:
Hi everyone,
[...]
A few lines of hacks (not included here) are still needed to deal with cached mappings triggering external aborts and CPU/GPU memory coherency issues, but I hope to understand and address these issues next.
For the coherency issue part you may want to look at my Nouveau on ARM series. Most of it never made it upstream, as I lacked the time to work further on this, but it solves the coherency issue from the kernel.
It does so by doing the necessary manual cache flushes/invalidates on buffer access, so costs some performance. To avoid this you really want to get writecombined mappings into the kernel<->userspace interface. Simply mapping the pushbuf as WC/US has brought a 7% performance increase in OpenArena when I last tested this. This test was done with only one PCIe lane, so the perf increase may be even better with a more adequate interconnect.
Regards, Lucas
Hi Lucas,
On Mon, Mar 24, 2014 at 10:19 PM, Lucas Stach l.stach@pengutronix.de wrote:
Hi Alexandre,
Am Montag, den 24.03.2014, 17:42 +0900 schrieb Alexandre Courbot:
Hi everyone,
[...]
A few lines of hacks (not included here) are still needed to deal with cached mappings triggering external aborts and CPU/GPU memory coherency issues, but I hope to understand and address these issues next.
For the coherency issue part you may want to look at my Nouveau on ARM series. Most of it never made it upstream, as I lacked the time to work further on this, but it solves the coherency issue from the kernel.
Oh, thanks for pointing this out, it will probably be most useful. Shall I assume the patches at https://www.mail-archive.com/nouveau@lists.freedesktop.org/msg13557.html are up-to-date? Would you mind if I include the relevant patches of yours in the next iteration of this series?
It does so by doing the necessary manual cache flushes/invalidates on buffer access, so costs some performance. To avoid this you really want to get writecombined mappings into the kernel<->userspace interface. Simply mapping the pushbuf as WC/US has brought a 7% performance increase in OpenArena when I last tested this. This test was done with only one PCIe lane, so the perf increase may be even better with a more adequate interconnect.
Interestingly if I allow writecombined mappings in the kernel I get faults when attempting the read the mapped area:
[ 78.074854] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf003e010 ... [ 78.337862] [<c03491a8>] (nouveau_bo_rd32) from [<c0346374>] (nouveau_fence_update+0x5c/0x80) [ 78.352536] [<c0346374>] (nouveau_fence_update) from [<c03463b0>] (nouveau_fence_done+0x18/0x28) [ 78.367531] [<c03463b0>] (nouveau_fence_done) from [<c02b852c>] (ttm_bo_wait+0x104/0x184) [ 78.381915] [<c02b852c>] (ttm_bo_wait) from [<c034c718>] (nouveau_gem_ioctl_cpu_prep+0x40/0xe8) [ 78.396849] [<c034c718>] (nouveau_gem_ioctl_cpu_prep) from [<c029fd5c>] (drm_ioctl+0x404/0x4b8) [ 78.411790] [<c029fd5c>] (drm_ioctl) from [<c0343960>] (nouveau_drm_ioctl+0x54/0x80) [ 78.425805] [<c0343960>] (nouveau_drm_ioctl) from [<c00ea5ec>] (do_vfs_ioctl+0x3f0/0x5bc) [ 78.440277] [<c00ea5ec>] (do_vfs_ioctl) from [<c00ea7ec>] (SyS_ioctl+0x34/0x5c) [ 78.453918] [<c00ea7ec>] (SyS_ioctl) from [<c000e5a0>] (ret_fast_syscall+0x0/0x30)
To avoid these I need to set the VRAM default_caching to TTM_PL_FLAG_UNCACHED. It is not clear to me why this is needed. The BO being accessed through the BAR, they are correctly considered as IO memory and mapped using ttm_bo_ioremap(), so it really seems to be unhappy with the WC mapping itself.
Note that if I go ahead and force the use of pgprot_writecombine() in ttm_io_prot() to get writecombined user-space mappings, pure DRM programs that map a buffer and try to read it fail similarly, while Mesa's glReadPixels() seems to be happy. I'm not sure what it does differently here.
Cheers, Alex.
Hi Alexandre,
Am Mittwoch, den 26.03.2014, 15:33 +0900 schrieb Alexandre Courbot:
Hi Lucas,
On Mon, Mar 24, 2014 at 10:19 PM, Lucas Stach l.stach@pengutronix.de wrote:
Hi Alexandre,
Am Montag, den 24.03.2014, 17:42 +0900 schrieb Alexandre Courbot:
Hi everyone,
[...]
A few lines of hacks (not included here) are still needed to deal with cached mappings triggering external aborts and CPU/GPU memory coherency issues, but I hope to understand and address these issues next.
For the coherency issue part you may want to look at my Nouveau on ARM series. Most of it never made it upstream, as I lacked the time to work further on this, but it solves the coherency issue from the kernel.
Oh, thanks for pointing this out, it will probably be most useful. Shall I assume the patches at https://www.mail-archive.com/nouveau@lists.freedesktop.org/msg13557.html are up-to-date? Would you mind if I include the relevant patches of yours in the next iteration of this series?
It does so by doing the necessary manual cache flushes/invalidates on buffer access, so costs some performance. To avoid this you really want to get writecombined mappings into the kernel<->userspace interface. Simply mapping the pushbuf as WC/US has brought a 7% performance increase in OpenArena when I last tested this. This test was done with only one PCIe lane, so the perf increase may be even better with a more adequate interconnect.
Interestingly if I allow writecombined mappings in the kernel I get faults when attempting the read the mapped area:
This is most likely because your handling of those buffers produces conflicting mappings (if my understanding of what you are doing is right).
At first you allocate memory from CMA without changing the pgprot flags. This yields pages which are mapped uncached or cached (when moveable pages are purged from CMA to make space for your buffer) into the kernels linear space.
Later you regard this memory as iomem (it isn't!) and let TTM remap those pages into the vmalloc area with pgprot set to writecombined.
I don't know exactly why this is causing havoc, but having two conflicting virtual mappings of the same physical memory is documented to at least produce undefined behavior on ARMv7.
Regards, Lucas
[ 78.074854] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf003e010 ... [ 78.337862] [<c03491a8>] (nouveau_bo_rd32) from [<c0346374>] (nouveau_fence_update+0x5c/0x80) [ 78.352536] [<c0346374>] (nouveau_fence_update) from [<c03463b0>] (nouveau_fence_done+0x18/0x28) [ 78.367531] [<c03463b0>] (nouveau_fence_done) from [<c02b852c>] (ttm_bo_wait+0x104/0x184) [ 78.381915] [<c02b852c>] (ttm_bo_wait) from [<c034c718>] (nouveau_gem_ioctl_cpu_prep+0x40/0xe8) [ 78.396849] [<c034c718>] (nouveau_gem_ioctl_cpu_prep) from [<c029fd5c>] (drm_ioctl+0x404/0x4b8) [ 78.411790] [<c029fd5c>] (drm_ioctl) from [<c0343960>] (nouveau_drm_ioctl+0x54/0x80) [ 78.425805] [<c0343960>] (nouveau_drm_ioctl) from [<c00ea5ec>] (do_vfs_ioctl+0x3f0/0x5bc) [ 78.440277] [<c00ea5ec>] (do_vfs_ioctl) from [<c00ea7ec>] (SyS_ioctl+0x34/0x5c) [ 78.453918] [<c00ea7ec>] (SyS_ioctl) from [<c000e5a0>] (ret_fast_syscall+0x0/0x30)
To avoid these I need to set the VRAM default_caching to TTM_PL_FLAG_UNCACHED. It is not clear to me why this is needed. The BO being accessed through the BAR, they are correctly considered as IO memory and mapped using ttm_bo_ioremap(), so it really seems to be unhappy with the WC mapping itself.
Note that if I go ahead and force the use of pgprot_writecombine() in ttm_io_prot() to get writecombined user-space mappings, pure DRM programs that map a buffer and try to read it fail similarly, while Mesa's glReadPixels() seems to be happy. I'm not sure what it does differently here.
Cheers, Alex.
On Wed, Mar 26, 2014 at 7:33 PM, Lucas Stach l.stach@pengutronix.de wrote:
It does so by doing the necessary manual cache flushes/invalidates on buffer access, so costs some performance. To avoid this you really want to get writecombined mappings into the kernel<->userspace interface. Simply mapping the pushbuf as WC/US has brought a 7% performance increase in OpenArena when I last tested this. This test was done with only one PCIe lane, so the perf increase may be even better with a more adequate interconnect.
Interestingly if I allow writecombined mappings in the kernel I get faults when attempting the read the mapped area:
This is most likely because your handling of those buffers produces conflicting mappings (if my understanding of what you are doing is right).
At first you allocate memory from CMA without changing the pgprot flags. This yields pages which are mapped uncached or cached (when moveable pages are purged from CMA to make space for your buffer) into the kernels linear space.
Later you regard this memory as iomem (it isn't!) and let TTM remap those pages into the vmalloc area with pgprot set to writecombined.
I don't know exactly why this is causing havoc, but having two conflicting virtual mappings of the same physical memory is documented to at least produce undefined behavior on ARMv7.
IIUC this is not exactly what happens with GK20A, so let me explain how VRAM is currently accessed to make sure we are in sync.
VRAM pages are allocated by nvea_ram_get(), which allocates chunks of contiguous memory using dma_alloc_from_contiguous(). At that time I don't think the pages are mapped anywhere for the CPU to see (contrary to dma_alloc_coherent() for instance). Nouveau will then map the memory into the GPU context's address space, but it is only when nouveau_ttm_io_mem_reserve() is called that a BAR mapping is created, making the memory accessible to the CPU through the BAR window (which I consider as I/O memory).
The area of the BAR window pointing to the VRAM is then mapped to the kernel (using ioremap_wc() or ioremap_nocache()) or user-space (where ttm_io_prot() is called to get the pgprot_t to use). It is when this mapping is writecombined that I get the faults.
So as far as I can tell, only at most one CPU mapping exists at any time for VRAM memory, which goes through the BAR to access the actual physical memory. It would probably be faster and more logical to map the RAM directly so the CPU can address it, but going through the BAR reduces CPU/GPU synchronization issues and there are a few cases where we would need to map through the BAR anyway (e.g. tiled memory to be made linear for the CPU).
I don't know if that help understanding what the issue might be - I just wanted to make sure we are talking about the same thing. :)
Thanks, Alex.
dri-devel@lists.freedesktop.org