From: Thierry Reding treding@nvidia.com
Tegra186 and later are different than earlier generations in that they use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves slightly differently in that the geometry for IOMMU domains is set only after a device was attached to it. This is to make sure that the SMMU instance that the domain belongs to is known, because each instance can have a different input address space (i.e. geometry).
Work around this by moving all IOVA allocations to a point where the geometry of the domain is properly initialized.
This supersedes the following patch:
https://patchwork.kernel.org/patch/10775579/
Thierry
Thierry Reding (5): drm/tegra: Store parent pointer in Tegra DRM clients drm/tegra: vic: Load firmware on demand drm/tegra: Setup shared IOMMU domain after initialization drm/tegra: Restrict IOVA space to DMA mask gpu: host1x: Supports 40-bit addressing on Tegra186
drivers/gpu/drm/tegra/drm.c | 57 +++++++++++++++++++++---------------- drivers/gpu/drm/tegra/drm.h | 1 + drivers/gpu/drm/tegra/vic.c | 17 ++++++----- drivers/gpu/host1x/dev.c | 2 +- 4 files changed, 44 insertions(+), 33 deletions(-)
From: Thierry Reding treding@nvidia.com
Tegra DRM clients need access to their parent, so store a pointer to it upon registration.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 2 ++ drivers/gpu/drm/tegra/drm.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 4b70ce664c41..61dcbd218ffc 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -1041,6 +1041,7 @@ int tegra_drm_register_client(struct tegra_drm *tegra, { mutex_lock(&tegra->clients_lock); list_add_tail(&client->list, &tegra->clients); + client->drm = tegra; mutex_unlock(&tegra->clients_lock);
return 0; @@ -1051,6 +1052,7 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra, { mutex_lock(&tegra->clients_lock); list_del_init(&client->list); + client->drm = NULL; mutex_unlock(&tegra->clients_lock);
return 0; diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index f1763b4d5b5f..2c809755bca7 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -88,6 +88,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct tegra_drm_client { struct host1x_client base; struct list_head list; + struct tegra_drm *drm;
unsigned int version; const struct tegra_drm_client_ops *ops;
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
Tegra DRM clients need access to their parent, so store a pointer to it upon registration.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 2 ++ drivers/gpu/drm/tegra/drm.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 4b70ce664c41..61dcbd218ffc 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -1041,6 +1041,7 @@ int tegra_drm_register_client(struct tegra_drm *tegra, { mutex_lock(&tegra->clients_lock); list_add_tail(&client->list, &tegra->clients);
client->drm = tegra; mutex_unlock(&tegra->clients_lock);
return 0;
@@ -1051,6 +1052,7 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra, { mutex_lock(&tegra->clients_lock); list_del_init(&client->list);
client->drm = NULL; mutex_unlock(&tegra->clients_lock);
return 0;
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index f1763b4d5b5f..2c809755bca7 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -88,6 +88,7 @@ int tegra_drm_submit(struct tegra_drm_context *context, struct tegra_drm_client { struct host1x_client base; struct list_head list;
struct tegra_drm *drm;
unsigned int version; const struct tegra_drm_client_ops *ops;
There is no problem with retrieving of the parent from the clients base structure, hence either this patch could be dropped or please mention in the commit message that the "drm" pointer duplication is solely for helping with keeping code cleaner.
From: Thierry Reding treding@nvidia.com
Loading the firmware requires an allocation of IOVA space to make sure that the VIC's Falcon microcontroller can read the firmware if address translation via the SMMU is enabled.
However, the allocation currently happens at a time where the geometry of an IOMMU domain may not have been initialized yet. This happens for example on Tegra186 and later where an ARM SMMU is used. Domains which are created by the ARM SMMU driver postpone the geometry setup until a device is attached to the domain. This is because IOMMU domains aren't attached to a specific IOMMU instance at allocation time and hence the input address space, which defines the geometry, is not known yet.
Work around this by postponing the firmware load until it is needed at the time where a channel is opened to the VIC. At this time the shared IOMMU domain's geometry has been properly initialized.
As a byproduct this allows the Tegra DRM to be created in the absence of VIC firmware, since the VIC initialization no longer fails if the firmware can't be found.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/gpu/drm/tegra/vic.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index d47983deb1cf..afbdc33f49bc 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client) vic->domain = tegra->domain; }
- if (!vic->falcon.data) { - vic->falcon.data = tegra; - err = falcon_load_firmware(&vic->falcon); - if (err < 0) - goto detach; - } - vic->channel = host1x_channel_request(client->dev); if (!vic->channel) { err = -ENOMEM; @@ -256,6 +249,16 @@ static int vic_open_channel(struct tegra_drm_client *client, if (err < 0) return err;
+ if (!vic->falcon.data) { + vic->falcon.data = client->drm; + + err = falcon_load_firmware(&vic->falcon); + if (err < 0) { + pm_runtime_put(vic->dev); + return err; + } + } + err = vic_boot(vic); if (err < 0) { pm_runtime_put(vic->dev);
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
Loading the firmware requires an allocation of IOVA space to make sure that the VIC's Falcon microcontroller can read the firmware if address translation via the SMMU is enabled.
However, the allocation currently happens at a time where the geometry of an IOMMU domain may not have been initialized yet. This happens for example on Tegra186 and later where an ARM SMMU is used. Domains which are created by the ARM SMMU driver postpone the geometry setup until a device is attached to the domain. This is because IOMMU domains aren't attached to a specific IOMMU instance at allocation time and hence the input address space, which defines the geometry, is not known yet.
Work around this by postponing the firmware load until it is needed at the time where a channel is opened to the VIC. At this time the shared IOMMU domain's geometry has been properly initialized.
As a byproduct this allows the Tegra DRM to be created in the absence of VIC firmware, since the VIC initialization no longer fails if the firmware can't be found.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/vic.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index d47983deb1cf..afbdc33f49bc 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client) vic->domain = tegra->domain; }
- if (!vic->falcon.data) {
vic->falcon.data = tegra;
err = falcon_load_firmware(&vic->falcon);
if (err < 0)
goto detach;
- }
- vic->channel = host1x_channel_request(client->dev); if (!vic->channel) { err = -ENOMEM;
@@ -256,6 +249,16 @@ static int vic_open_channel(struct tegra_drm_client *client, if (err < 0) return err;
- if (!vic->falcon.data) {
vic->falcon.data = client->drm;
err = falcon_load_firmware(&vic->falcon);
if (err < 0) {
pm_runtime_put(vic->dev);
return err;
}
- }
- err = vic_boot(vic); if (err < 0) { pm_runtime_put(vic->dev);
This only moves the firmware data-copying to a later stage and doesn't touch reading out of the firmware file, hence the claim about the "byproduct" is invalid. Please take a look at the patch I posted sometime ago [0] and feel free to use it as a reference.
On Wed, Jan 23, 2019 at 03:47:45PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
Loading the firmware requires an allocation of IOVA space to make sure that the VIC's Falcon microcontroller can read the firmware if address translation via the SMMU is enabled.
However, the allocation currently happens at a time where the geometry of an IOMMU domain may not have been initialized yet. This happens for example on Tegra186 and later where an ARM SMMU is used. Domains which are created by the ARM SMMU driver postpone the geometry setup until a device is attached to the domain. This is because IOMMU domains aren't attached to a specific IOMMU instance at allocation time and hence the input address space, which defines the geometry, is not known yet.
Work around this by postponing the firmware load until it is needed at the time where a channel is opened to the VIC. At this time the shared IOMMU domain's geometry has been properly initialized.
As a byproduct this allows the Tegra DRM to be created in the absence of VIC firmware, since the VIC initialization no longer fails if the firmware can't be found.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/vic.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index d47983deb1cf..afbdc33f49bc 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client) vic->domain = tegra->domain; }
- if (!vic->falcon.data) {
vic->falcon.data = tegra;
err = falcon_load_firmware(&vic->falcon);
if (err < 0)
goto detach;
- }
- vic->channel = host1x_channel_request(client->dev); if (!vic->channel) { err = -ENOMEM;
@@ -256,6 +249,16 @@ static int vic_open_channel(struct tegra_drm_client *client, if (err < 0) return err;
- if (!vic->falcon.data) {
vic->falcon.data = client->drm;
err = falcon_load_firmware(&vic->falcon);
if (err < 0) {
pm_runtime_put(vic->dev);
return err;
}
- }
- err = vic_boot(vic); if (err < 0) { pm_runtime_put(vic->dev);
This only moves the firmware data-copying to a later stage and doesn't touch reading out of the firmware file, hence the claim about the "byproduct" is invalid. Please take a look at the patch I posted sometime ago [0] and feel free to use it as a reference.
You're right, that hunk ended up in some other patch. And indeed this patch looks pretty much like yours, so I've merged both together (mine hadn't moved things out to a separate function, so I did that now, and mine still reuses the client->drm pointer introduced in an earlier patch to make it easier to pass that around).
Will send out v2 of this patch.
Thierry
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
Loading the firmware requires an allocation of IOVA space to make sure that the VIC's Falcon microcontroller can read the firmware if address translation via the SMMU is enabled.
However, the allocation currently happens at a time where the geometry of an IOMMU domain may not have been initialized yet. This happens for example on Tegra186 and later where an ARM SMMU is used. Domains which are created by the ARM SMMU driver postpone the geometry setup until a device is attached to the domain. This is because IOMMU domains aren't attached to a specific IOMMU instance at allocation time and hence the input address space, which defines the geometry, is not known yet.
Work around this by postponing the firmware load until it is needed at the time where a channel is opened to the VIC. At this time the shared IOMMU domain's geometry has been properly initialized.
As a byproduct this allows the Tegra DRM to be created in the absence of VIC firmware, since the VIC initialization no longer fails if the firmware can't be found.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/vic.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index d47983deb1cf..afbdc33f49bc 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client) vic->domain = tegra->domain; }
- if (!vic->falcon.data) {
vic->falcon.data = tegra;
err = falcon_load_firmware(&vic->falcon);
if (err < 0)
goto detach;
- }
- vic->channel = host1x_channel_request(client->dev); if (!vic->channel) { err = -ENOMEM;
@@ -256,6 +249,16 @@ static int vic_open_channel(struct tegra_drm_client *client, if (err < 0) return err;
- if (!vic->falcon.data) {
vic->falcon.data = client->drm;
err = falcon_load_firmware(&vic->falcon);
if (err < 0) {
Also, notice that this hunk misses the "vic->falcon.data = NULL", hence the second channel opening will likely to crash kernel.
pm_runtime_put(vic->dev);
return err;
}
- }
- err = vic_boot(vic); if (err < 0) { pm_runtime_put(vic->dev);
From: Thierry Reding treding@nvidia.com
Move initialization of the shared IOMMU domain after the host1x device has been initialized. At this point all the Tegra DRM clients have been attached to the shared IOMMU domain.
This is important because Tegra186 and later use an ARM SMMU, for which the driver defers setting up the geometry for a domain until a device is attached to it. This is to ensure that the domain is properly set up for a specific ARM SMMU instance, which is unknown at allocation time.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 54 ++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 61dcbd218ffc..271c7a5fc954 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -92,10 +92,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) return -ENOMEM;
if (iommu_present(&platform_bus_type)) { - u64 carveout_start, carveout_end, gem_start, gem_end; - struct iommu_domain_geometry *geometry; - unsigned long order; - tegra->domain = iommu_domain_alloc(&platform_bus_type); if (!tegra->domain) { err = -ENOMEM; @@ -105,27 +101,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) err = iova_cache_get(); if (err < 0) goto domain; - - geometry = &tegra->domain->geometry; - gem_start = geometry->aperture_start; - gem_end = geometry->aperture_end - CARVEOUT_SZ; - carveout_start = gem_end + 1; - carveout_end = geometry->aperture_end; - - order = __ffs(tegra->domain->pgsize_bitmap); - init_iova_domain(&tegra->carveout.domain, 1UL << order, - carveout_start >> order); - - tegra->carveout.shift = iova_shift(&tegra->carveout.domain); - tegra->carveout.limit = carveout_end >> tegra->carveout.shift; - - drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1); - mutex_init(&tegra->mm_lock); - - DRM_DEBUG("IOMMU apertures:\n"); - DRM_DEBUG(" GEM: %#llx-%#llx\n", gem_start, gem_end); - DRM_DEBUG(" Carveout: %#llx-%#llx\n", carveout_start, - carveout_end); }
mutex_init(&tegra->clients_lock); @@ -159,6 +134,35 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (err < 0) goto fbdev;
+ if (tegra->domain) { + u64 carveout_start, carveout_end, gem_start, gem_end; + dma_addr_t start, end; + unsigned long order; + + start = tegra->domain->geometry.aperture_start; + end = tegra->domain->geometry.aperture_end; + + gem_start = start; + gem_end = end - CARVEOUT_SZ; + carveout_start = gem_end + 1; + carveout_end = end; + + order = __ffs(tegra->domain->pgsize_bitmap); + init_iova_domain(&tegra->carveout.domain, 1UL << order, + carveout_start >> order); + + tegra->carveout.shift = iova_shift(&tegra->carveout.domain); + tegra->carveout.limit = carveout_end >> tegra->carveout.shift; + + drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1); + mutex_init(&tegra->mm_lock); + + DRM_DEBUG("IOMMU apertures:\n"); + DRM_DEBUG(" GEM: %#llx-%#llx\n", gem_start, gem_end); + DRM_DEBUG(" Carveout: %#llx-%#llx\n", carveout_start, + carveout_end); + } + if (tegra->hub) { err = tegra_display_hub_prepare(tegra->hub); if (err < 0)
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
Move initialization of the shared IOMMU domain after the host1x device has been initialized. At this point all the Tegra DRM clients have been attached to the shared IOMMU domain.
This is important because Tegra186 and later use an ARM SMMU, for which the driver defers setting up the geometry for a domain until a device is attached to it. This is to ensure that the domain is properly set up for a specific ARM SMMU instance, which is unknown at allocation time.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 54 ++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 61dcbd218ffc..271c7a5fc954 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -92,10 +92,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) return -ENOMEM;
if (iommu_present(&platform_bus_type)) {
u64 carveout_start, carveout_end, gem_start, gem_end;
struct iommu_domain_geometry *geometry;
unsigned long order;
- tegra->domain = iommu_domain_alloc(&platform_bus_type); if (!tegra->domain) { err = -ENOMEM;
@@ -105,27 +101,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) err = iova_cache_get(); if (err < 0) goto domain;
geometry = &tegra->domain->geometry;
gem_start = geometry->aperture_start;
gem_end = geometry->aperture_end - CARVEOUT_SZ;
carveout_start = gem_end + 1;
carveout_end = geometry->aperture_end;
order = __ffs(tegra->domain->pgsize_bitmap);
init_iova_domain(&tegra->carveout.domain, 1UL << order,
carveout_start >> order);
tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
mutex_init(&tegra->mm_lock);
DRM_DEBUG("IOMMU apertures:\n");
DRM_DEBUG(" GEM: %#llx-%#llx\n", gem_start, gem_end);
DRM_DEBUG(" Carveout: %#llx-%#llx\n", carveout_start,
carveout_end);
}
mutex_init(&tegra->clients_lock);
@@ -159,6 +134,35 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (err < 0) goto fbdev;
- if (tegra->domain) {
u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end;
unsigned long order;
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
gem_start = start;
gem_end = end - CARVEOUT_SZ;
carveout_start = gem_end + 1;
carveout_end = end;
order = __ffs(tegra->domain->pgsize_bitmap);
init_iova_domain(&tegra->carveout.domain, 1UL << order,
carveout_start >> order);
tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
mutex_init(&tegra->mm_lock);
DRM_DEBUG("IOMMU apertures:\n");
DRM_DEBUG(" GEM: %#llx-%#llx\n", gem_start, gem_end);
DRM_DEBUG(" Carveout: %#llx-%#llx\n", carveout_start,
carveout_end);
- }
- if (tegra->hub) { err = tegra_display_hub_prepare(tegra->hub); if (err < 0)
Looks good,
Reviewed-by: Dmitry Osipenko digetx@gmail.com
BTW, the "carveout" domain is only relevant for T124+, will be better to avoid its allocation on older Tegra's and then to factor out IOVA setup-code into a standalone function. Of course that could be done later on in a different series if desired.. just a minor comment.
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
Move initialization of the shared IOMMU domain after the host1x device has been initialized. At this point all the Tegra DRM clients have been attached to the shared IOMMU domain.
This is important because Tegra186 and later use an ARM SMMU, for which the driver defers setting up the geometry for a domain until a device is attached to it. This is to ensure that the domain is properly set up for a specific ARM SMMU instance, which is unknown at allocation time.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 54 ++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 61dcbd218ffc..271c7a5fc954 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -92,10 +92,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) return -ENOMEM;
if (iommu_present(&platform_bus_type)) {
u64 carveout_start, carveout_end, gem_start, gem_end;
struct iommu_domain_geometry *geometry;
unsigned long order;
- tegra->domain = iommu_domain_alloc(&platform_bus_type); if (!tegra->domain) { err = -ENOMEM;
@@ -105,27 +101,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) err = iova_cache_get(); if (err < 0) goto domain;
geometry = &tegra->domain->geometry;
gem_start = geometry->aperture_start;
gem_end = geometry->aperture_end - CARVEOUT_SZ;
carveout_start = gem_end + 1;
carveout_end = geometry->aperture_end;
order = __ffs(tegra->domain->pgsize_bitmap);
init_iova_domain(&tegra->carveout.domain, 1UL << order,
carveout_start >> order);
tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
mutex_init(&tegra->mm_lock);
DRM_DEBUG("IOMMU apertures:\n");
DRM_DEBUG(" GEM: %#llx-%#llx\n", gem_start, gem_end);
DRM_DEBUG(" Carveout: %#llx-%#llx\n", carveout_start,
carveout_end);
}
mutex_init(&tegra->clients_lock);
@@ -159,6 +134,35 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (err < 0) goto fbdev;
- if (tegra->domain) {
u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end;
unsigned long order;
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
gem_start = start;
gem_end = end - CARVEOUT_SZ;
carveout_start = gem_end + 1;
carveout_end = end;
Given that seems falcon can't address >32bit at least right now, will make sense to reserve carveout from the bottom.
[snip]
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end; + u64 dma_mask = dma_get_mask(&device->dev); dma_addr_t start, end; unsigned long order;
- start = tegra->domain->geometry.aperture_start; - end = tegra->domain->geometry.aperture_end; + start = tegra->domain->geometry.aperture_start & dma_mask; + end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Thierry
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
You could also try to change the IOVA allocation logic and see what will fail :)
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
Secondly, right now there is no support for Host1x commands-stream >32bit BO address patching. Current UAPI may not work well in that case, though that's not really a problem given the staging nature of the UAPI. Something to consider for the UAPI de-staging.
23.01.2019 18:55, Dmitry Osipenko пишет:
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
On 23.1.2019 21.42, Dmitry Osipenko wrote:
23.01.2019 18:55, Dmitry Osipenko пишет:
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.
Mikko
24.01.2019 13:24, Mikko Perttunen пишет:
On 23.1.2019 21.42, Dmitry Osipenko wrote:
23.01.2019 18:55, Dmitry Osipenko пишет:
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end; + u64 dma_mask = dma_get_mask(&device->dev); dma_addr_t start, end; unsigned long order; - start = tegra->domain->geometry.aperture_start; - end = tegra->domain->geometry.aperture_end; + start = tegra->domain->geometry.aperture_start & dma_mask; + end = tegra->domain->geometry.aperture_end & dma_mask; gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.
TRM (up to T196) suggests that DMATRFMOFFS is a 16bit offset. Is it a kind of TRM bug or I'm missing something?
I suppose it should be fine to just reserve carveout from the bottom of IOVA space and done with it.
On 24.1.2019 15.15, Dmitry Osipenko wrote:
24.01.2019 13:24, Mikko Perttunen пишет:
On 23.1.2019 21.42, Dmitry Osipenko wrote:
23.01.2019 18:55, Dmitry Osipenko пишет:
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет: > From: Thierry Reding treding@nvidia.com > > On Tegra186 and later, the ARM SMMU provides an input address space that > is 48 bits wide. However, memory clients can only address up to 40 bits. > If the geometry is used as-is, allocations of IOVA space can end up in a > region that cannot be addressed by the memory clients. > > To fix this, restrict the IOVA space to the DMA mask of the host1x > device. Note that, technically, the IOVA space needs to be restricted to > the intersection of the DMA masks for all clients that are attached to > the IOMMU domain. In practice using the DMA mask of the host1x device is > sufficient because all host1x clients share the same DMA mask. > > Signed-off-by: Thierry Reding treding@nvidia.com > --- > drivers/gpu/drm/tegra/drm.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c > index 271c7a5fc954..0c5f1e6a0446 100644 > --- a/drivers/gpu/drm/tegra/drm.c > +++ b/drivers/gpu/drm/tegra/drm.c > @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) > if (tegra->domain) { > u64 carveout_start, carveout_end, gem_start, gem_end; > + u64 dma_mask = dma_get_mask(&device->dev); > dma_addr_t start, end; > unsigned long order; > - start = tegra->domain->geometry.aperture_start; > - end = tegra->domain->geometry.aperture_end; > + start = tegra->domain->geometry.aperture_start & dma_mask; > + end = tegra->domain->geometry.aperture_end & dma_mask; > gem_start = start; > gem_end = end - CARVEOUT_SZ; >
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.
TRM (up to T196) suggests that DMATRFMOFFS is a 16bit offset. Is it a kind of TRM bug or I'm missing something?
I suppose it should be fine to just reserve carveout from the bottom of IOVA space and done with it.
DMATRFMOFFS is an offset to the Falcon IMEM, so 16 bits is enough to cover that.
Mikko
24.01.2019 16:27, Mikko Perttunen пишет:
On 24.1.2019 15.15, Dmitry Osipenko wrote:
24.01.2019 13:24, Mikko Perttunen пишет:
On 23.1.2019 21.42, Dmitry Osipenko wrote:
23.01.2019 18:55, Dmitry Osipenko пишет:
23.01.2019 17:04, Thierry Reding пишет:
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote: > 23.01.2019 12:39, Thierry Reding пишет: >> From: Thierry Reding treding@nvidia.com >> >> On Tegra186 and later, the ARM SMMU provides an input address space that >> is 48 bits wide. However, memory clients can only address up to 40 bits. >> If the geometry is used as-is, allocations of IOVA space can end up in a >> region that cannot be addressed by the memory clients. >> >> To fix this, restrict the IOVA space to the DMA mask of the host1x >> device. Note that, technically, the IOVA space needs to be restricted to >> the intersection of the DMA masks for all clients that are attached to >> the IOMMU domain. In practice using the DMA mask of the host1x device is >> sufficient because all host1x clients share the same DMA mask. >> >> Signed-off-by: Thierry Reding treding@nvidia.com >> --- >> drivers/gpu/drm/tegra/drm.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c >> index 271c7a5fc954..0c5f1e6a0446 100644 >> --- a/drivers/gpu/drm/tegra/drm.c >> +++ b/drivers/gpu/drm/tegra/drm.c >> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags) >> if (tegra->domain) { >> u64 carveout_start, carveout_end, gem_start, gem_end; >> + u64 dma_mask = dma_get_mask(&device->dev); >> dma_addr_t start, end; >> unsigned long order; >> - start = tegra->domain->geometry.aperture_start; >> - end = tegra->domain->geometry.aperture_end; >> + start = tegra->domain->geometry.aperture_start & dma_mask; >> + end = tegra->domain->geometry.aperture_end & dma_mask; >> gem_start = start; >> gem_end = end - CARVEOUT_SZ; >> > > Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently > there is no support for a proper programming of the 64bit addresses in > the drivers code, hence.. won't it make sense to force IOVA mask to > 32bit for now and hope that the second halve of address registers > happen to be 0x00000000 in HW?
I think this restriction only applies to display at this point. In practice you'd be hard put to trigger that case because IOVA memory is allocated from the bottom, so you'd actually need to use up to 4 GiB of IOVA space before hitting that.
That said, I vaguely remember typing up the patch to support writing the WINBUF_START_ADDR_HI register and friends, but it looks as if that was never merged.
I'll try to dig out that patch (or rewrite it, shouldn't be very difficult) and make it part of this series. I'd rather fix that issue than arbitrarily restrict the IOVA space, because that's likely to come back and bite us at some point.
Looking at falcon.c.. it writes 32bit address only. Something need to be done about it as well, seems there is FALCON_DMATRFMOFFS register to facilitate >32bits addressing.
Although scratch about FALCON_DMATRFMOFFS, it should be about something else. Mikko, could you please clarify whether falcon could load firmware only from a 32bit AS?
The DMA base address is set using DMATRFBASE, which requires 256B alignment, meaning 40 bits are available for the address. The DMATRFFBOFFS I believe is then used as a 32-bit offset to that value.
TRM (up to T196) suggests that DMATRFMOFFS is a 16bit offset. Is it a kind of TRM bug or I'm missing something?
I suppose it should be fine to just reserve carveout from the bottom of IOVA space and done with it.
DMATRFMOFFS is an offset to the Falcon IMEM, so 16 bits is enough to cover that.
Ah, there are source DMATRFMOFFS (32bit DRAM offset) and destination DMATRFMOFFS (16bit IMEM offset). So one variant is to setup DMA address correctly, the other variant is to change the carveout layout and make IOMMU mandatory for the driver.
On Wed, Jan 23, 2019 at 04:41:44PM +0300, Dmitry Osipenko wrote:
23.01.2019 12:39, Thierry Reding пишет:
From: Thierry Reding treding@nvidia.com
On Tegra186 and later, the ARM SMMU provides an input address space that is 48 bits wide. However, memory clients can only address up to 40 bits. If the geometry is used as-is, allocations of IOVA space can end up in a region that cannot be addressed by the memory clients.
To fix this, restrict the IOVA space to the DMA mask of the host1x device. Note that, technically, the IOVA space needs to be restricted to the intersection of the DMA masks for all clients that are attached to the IOMMU domain. In practice using the DMA mask of the host1x device is sufficient because all host1x clients share the same DMA mask.
Signed-off-by: Thierry Reding treding@nvidia.com
drivers/gpu/drm/tegra/drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 271c7a5fc954..0c5f1e6a0446 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
if (tegra->domain) { u64 carveout_start, carveout_end, gem_start, gem_end;
dma_addr_t start, end; unsigned long order;u64 dma_mask = dma_get_mask(&device->dev);
start = tegra->domain->geometry.aperture_start;
end = tegra->domain->geometry.aperture_end;
start = tegra->domain->geometry.aperture_start & dma_mask;
end = tegra->domain->geometry.aperture_end & dma_mask;
gem_start = start; gem_end = end - CARVEOUT_SZ;
Wow, so IOVA could address >32bits on later Tegra's. AFAIK, currently there is no support for a proper programming of the 64bit addresses in the drivers code, hence.. won't it make sense to force IOVA mask to 32bit for now and hope that the second halve of address registers happen to be 0x00000000 in HW?
Actually we do have proper programming for 40 bit addresses in the display driver code for Tegra186 and later, see:
drivers/gpu/drm/tegra/hub.c: tegra_shared_plane_atomic_update()
Code to support that on Tegra210 and earlier should be almost identical. I'll try to add that as well. It's not actually necessary there because with an SMMU enabled on those chips, the address space will be 32 bits. I suspect that there could be an issue with running without an SMMU on Tegra124 (with > 2 GiB of memory and LPAE enabled) or Tegra210. I'll see if I can produce a case where this actually fails and then add the code for proper programming there.
In the meantime, I've sent out a v2 of the series that takes care of properly programming the various bits in host1x required for full 40-bit addresses on Tegra186 and later.
Thierry
From: Thierry Reding treding@nvidia.com
The host1x and clients instantiated on Tegra186 support addressing 40 bits of memory.
Signed-off-by: Thierry Reding treding@nvidia.com --- drivers/gpu/host1x/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 419d8929a98f..32fad4da7936 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -127,7 +127,7 @@ static const struct host1x_info host1x06_info = { .nb_bases = 16, .init = host1x06_init, .sync_offset = 0x0, - .dma_mask = DMA_BIT_MASK(34), + .dma_mask = DMA_BIT_MASK(40), .has_hypervisor = true, };
dri-devel@lists.freedesktop.org