virglrenderer has logic to validate both stride and layer_stride, but both are always zero. The fallback for that case is:
stride = width * bytes_per_pixel layer_stride = stride * num_layers
However, this assumption causes trouble in the following cases:
1) When allocating host-compatible buffers for the planned wayland integration. 2) Certain YUV buffers, which Gallium imports as 3 R8 buffers with variable strides. For example, HAL_PIXEL_FORMAT_YV12 requires that the chroma planes are aligned to 16 bytes.
This commit doesn't fix the discrepancy, but adds the necessary plumbing so we don't forget.
Signed-off-by: Gurchetan Singh gurchetansingh@chromium.org --- drivers/gpu/drm/virtio/virtgpu_drv.h | 2 ++ drivers/gpu/drm/virtio/virtgpu_ioctl.c | 9 ++++++--- drivers/gpu/drm/virtio/virtgpu_vq.c | 6 ++++++ 3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h index 314e02f94d9c..c1c9a9b8e25c 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.h +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h @@ -312,12 +312,14 @@ void virtio_gpu_cmd_submit(struct virtio_gpu_device *vgdev, void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev, uint32_t ctx_id, uint64_t offset, uint32_t level, + uint32_t stride, uint32_t layer_stride, struct virtio_gpu_box *box, struct virtio_gpu_object_array *objs, struct virtio_gpu_fence *fence); void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev, uint32_t ctx_id, uint64_t offset, uint32_t level, + uint32_t stride, uint32_t layer_stride, struct virtio_gpu_box *box, struct virtio_gpu_object_array *objs, struct virtio_gpu_fence *fence); diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c index 9af1ec62434f..98b72dead962 100644 --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c @@ -324,8 +324,10 @@ static int virtio_gpu_transfer_from_host_ioctl(struct drm_device *dev, ret = -ENOMEM; goto err_unlock; } + + /* TODO: add the correct stride / layer_stride. */ virtio_gpu_cmd_transfer_from_host_3d - (vgdev, vfpriv->ctx_id, offset, args->level, + (vgdev, vfpriv->ctx_id, offset, args->level, 0, 0, &box, objs, fence); dma_fence_put(&fence->f); return 0; @@ -369,10 +371,11 @@ static int virtio_gpu_transfer_to_host_ioctl(struct drm_device *dev, void *data, if (!fence) goto err_unlock;
+ /* TODO: add the correct stride / layer_stride. */ virtio_gpu_cmd_transfer_to_host_3d (vgdev, - vfpriv ? vfpriv->ctx_id : 0, offset, - args->level, &box, objs, fence); + vfpriv ? vfpriv->ctx_id : 0, offset, args->level, + 0, 0, &box, objs, fence); dma_fence_put(&fence->f); } return 0; diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index 80176f379ad5..9fb3c8c3b687 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -965,6 +965,7 @@ virtio_gpu_cmd_resource_create_3d(struct virtio_gpu_device *vgdev, void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev, uint32_t ctx_id, uint64_t offset, uint32_t level, + uint32_t stride, uint32_t layer_stride, struct virtio_gpu_box *box, struct virtio_gpu_object_array *objs, struct virtio_gpu_fence *fence) @@ -990,6 +991,8 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev, cmd_p->box = *box; cmd_p->offset = cpu_to_le64(offset); cmd_p->level = cpu_to_le32(level); + cmd_p->stride = cpu_to_le32(stride); + cmd_p->layer_stride = cpu_to_le32(layer_stride);
virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence); } @@ -997,6 +1000,7 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev, void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev, uint32_t ctx_id, uint64_t offset, uint32_t level, + uint32_t stride, uint32_t layer_stride, struct virtio_gpu_box *box, struct virtio_gpu_object_array *objs, struct virtio_gpu_fence *fence) @@ -1016,6 +1020,8 @@ void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev, cmd_p->box = *box; cmd_p->offset = cpu_to_le64(offset); cmd_p->level = cpu_to_le32(level); + cmd_p->stride = cpu_to_le32(stride); + cmd_p->layer_stride = cpu_to_le32(layer_stride);
virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence); }
This doesn't really break userspace, since it always passes down 0 for stride/layer_stride currently. We could:
(1) modify UAPI now and add a VIRTGPU_PARAM_STRIDE_FIX feature
(2) modify the UAPI now, and not expose a corresponding feature (i.e, VIRTGPU_PARAM_STRIDE_FIX). This would fold this minor fix into another bigger feature (like the proposed metadata query).
(3) don't modify UAPI now, wait until another feature lands.
I don't have a preference either way, as long as we get something like this eventually.
The corresponding userspace is:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2129
Signed-off-by: Gurchetan Singh gurchetansingh@chromium.org --- drivers/gpu/drm/virtio/virtgpu_ioctl.c | 8 +++----- include/uapi/drm/virtgpu_drm.h | 4 ++++ 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c index 98b72dead962..c29473ac24a1 100644 --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c @@ -325,10 +325,9 @@ static int virtio_gpu_transfer_from_host_ioctl(struct drm_device *dev, goto err_unlock; }
- /* TODO: add the correct stride / layer_stride. */ virtio_gpu_cmd_transfer_from_host_3d - (vgdev, vfpriv->ctx_id, offset, args->level, 0, 0, - &box, objs, fence); + (vgdev, vfpriv->ctx_id, offset, args->level, args->stride, + args->layer_stride, &box, objs, fence); dma_fence_put(&fence->f); return 0;
@@ -371,11 +370,10 @@ static int virtio_gpu_transfer_to_host_ioctl(struct drm_device *dev, void *data, if (!fence) goto err_unlock;
- /* TODO: add the correct stride / layer_stride. */ virtio_gpu_cmd_transfer_to_host_3d (vgdev, vfpriv ? vfpriv->ctx_id : 0, offset, args->level, - 0, 0, &box, objs, fence); + args->stride, args->layer_stride, &box, objs, fence); dma_fence_put(&fence->f); } return 0; diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h index f06a789f34cd..b2fc92c3d2be 100644 --- a/include/uapi/drm/virtgpu_drm.h +++ b/include/uapi/drm/virtgpu_drm.h @@ -117,6 +117,8 @@ struct drm_virtgpu_3d_transfer_to_host { struct drm_virtgpu_3d_box box; __u32 level; __u32 offset; + __u32 stride; + __u32 layer_stride; };
struct drm_virtgpu_3d_transfer_from_host { @@ -124,6 +126,8 @@ struct drm_virtgpu_3d_transfer_from_host { struct drm_virtgpu_3d_box box; __u32 level; __u32 offset; + __u32 stride; + __u32 layer_stride; };
#define VIRTGPU_WAIT_NOWAIT 1 /* like it */
On Tue, Oct 01, 2019 at 06:49:35PM -0700, Gurchetan Singh wrote:
This doesn't really break userspace, since it always passes down 0 for stride/layer_stride currently. We could:
(1) modify UAPI now and add a VIRTGPU_PARAM_STRIDE_FIX feature
This I think. But IMO it's not a fix, it is an added feature ...
Also missing the big picture here. Why do we need this?
For guest object we don't have strides (virtio_gpu_resource_create_3d doesn't allow this).
For host objects the host should know the strides.
Which I think is the reason why the stride and layer_stride fields in the transfer commands are effectively unused ...
- /* TODO: add the correct stride / layer_stride. */ virtio_gpu_cmd_transfer_from_host_3d
(vgdev, vfpriv->ctx_id, offset, args->level, 0, 0,
&box, objs, fence);
(vgdev, vfpriv->ctx_id, offset, args->level, args->stride,
args->layer_stride, &box, objs, fence);
What happens with old userspace running on a new kernel?
I don't think we can simply use the args here without checking we actually got something from userspace ...
diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h index f06a789f34cd..b2fc92c3d2be 100644 --- a/include/uapi/drm/virtgpu_drm.h +++ b/include/uapi/drm/virtgpu_drm.h @@ -117,6 +117,8 @@ struct drm_virtgpu_3d_transfer_to_host { struct drm_virtgpu_3d_box box; __u32 level; __u32 offset;
- __u32 stride;
- __u32 layer_stride;
};
cheers, Gerd
On Wed, Oct 2, 2019 at 1:49 AM Gerd Hoffmann kraxel@redhat.com wrote:
On Tue, Oct 01, 2019 at 06:49:35PM -0700, Gurchetan Singh wrote:
This doesn't really break userspace, since it always passes down 0 for stride/layer_stride currently. We could:
(1) modify UAPI now and add a VIRTGPU_PARAM_STRIDE_FIX feature
This I think. But IMO it's not a fix, it is an added feature ...
Also missing the big picture here. Why do we need this?
Two reasons:
a) wayland-proxing. Generally, host_stride != guest_stride and aligning to 64 is insufficient (for example, Intel x-tiled buffers). There are generally three places we can make an adjustment:
1) In the wayland guest proxy, before crossing the VM boundary 2) In the wayland host proxy, before sending to the server 3) Make sure host_stride == guest_stride at allocation time
For (1), we'd have to extend drm_virtgpu_resource_info to actually return the stride. This raises questions about strides of 1D buffers, cubemaps, etc..
For (2), somebody actually needs to write a wayland host proxy. It's too much work just for this bug.
For (3), since we have to do something like VIRTIO_GPU_CMD_METADATA_QUERY (or whatever we call it) for Vulkan and zero-copy anyways, this seemed like the most natural choice. Consequently, when guest_stride != bpp * width, we'll have to make some fixes in Mesa/virtio-gpu. The only tricky part is modifiers -- I suspect we'll keep a mapping of virtualized modifiers to host modifiers.
It could make some sense to wait for VIRTIO_GPU_CMD_METADATA_QUERY to land. If we agree (3) is a practical solution to this, I recommend we just land the first patch as a statement of purpose to save some feature bits. We can modify the uapi later.
b) We currently have hacks for YUV we can remove [i][ii]. This is related to the restriction imposed by Android guest userspace that the stride must be aligned to a certain amount of bytes.
[i] https://gitlab.freedesktop.org/virgl/virglrenderer/blob/master/src/virgl_gbm... [ii] https://chromium.googlesource.com/chromiumos/platform/minigbm/+/refs/heads/m...
I don't think we can simply use the args here without checking we
actually got something from userspace ...
Correct me if I'm wrong, but doesn't drm_ioctl(..) already make sure that the pointer is the intersection of the kernel and userspace sizes, so we can safely append data? i.e, layer_stride and stride will be zero for old user space + a new kernel.
For guest object we don't have strides (virtio_gpu_resource_create_3d doesn't allow this).
For host objects the host should know the strides.
Which I think is the reason why the stride and layer_stride fields in the transfer commands are effectively unused ...
/* TODO: add the correct stride / layer_stride. */ virtio_gpu_cmd_transfer_from_host_3d
(vgdev, vfpriv->ctx_id, offset, args->level, 0, 0,
&box, objs, fence);
(vgdev, vfpriv->ctx_id, offset, args->level, args->stride,
args->layer_stride, &box, objs, fence);
What happens with old userspace running on a new kernel?
I don't think we can simply use the args here without checking we actually got something from userspace ...
diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h index f06a789f34cd..b2fc92c3d2be 100644 --- a/include/uapi/drm/virtgpu_drm.h +++ b/include/uapi/drm/virtgpu_drm.h @@ -117,6 +117,8 @@ struct drm_virtgpu_3d_transfer_to_host { struct drm_virtgpu_3d_box box; __u32 level; __u32 offset;
__u32 stride;
__u32 layer_stride;
};
cheers, Gerd
- Make sure host_stride == guest_stride at allocation time
For (3), since we have to do something like VIRTIO_GPU_CMD_METADATA_QUERY (or whatever we call it) for Vulkan and zero-copy anyways, this seemed like the most natural choice.
Yep, for shared resources it certainly makes sense to have host and guest agree on the stride. I'd tend to not touch the current TRANSFER ioctls (and virtio commands) though, but integrate that into the blob resource support instead. We probably need blob transfer ioctls and virtio commands anyway.
I don't think we can simply use the args here without checking we
actually got something from userspace ...
Correct me if I'm wrong, but doesn't drm_ioctl(..) already make sure that the pointer is the intersection of the kernel and userspace sizes, so we can safely append data? i.e, layer_stride and stride will be zero for old user space + a new kernel.
Ah, right, I forgot the generic drm ioctl code does that service for us.
cheers, Gerd
On Wed, Oct 2, 2019 at 5:18 PM Gurchetan Singh gurchetansingh@chromium.org wrote:
On Wed, Oct 2, 2019 at 1:49 AM Gerd Hoffmann kraxel@redhat.com wrote:
On Tue, Oct 01, 2019 at 06:49:35PM -0700, Gurchetan Singh wrote:
This doesn't really break userspace, since it always passes down 0 for stride/layer_stride currently. We could:
(1) modify UAPI now and add a VIRTGPU_PARAM_STRIDE_FIX feature
This I think. But IMO it's not a fix, it is an added feature ...
Also missing the big picture here. Why do we need this?
Two reasons:
I don't fully get the picture, but drm_virtgpu_resource_create has stride. Can we send that down in transfers? It lacks layer_stride though...
On Fri, Feb 28, 2020 at 01:01:49PM -0800, Chia-I Wu wrote:
On Wed, Oct 2, 2019 at 5:18 PM Gurchetan Singh gurchetansingh@chromium.org wrote:
On Wed, Oct 2, 2019 at 1:49 AM Gerd Hoffmann kraxel@redhat.com wrote:
On Tue, Oct 01, 2019 at 06:49:35PM -0700, Gurchetan Singh wrote:
This doesn't really break userspace, since it always passes down 0 for stride/layer_stride currently. We could:
(1) modify UAPI now and add a VIRTGPU_PARAM_STRIDE_FIX feature
This I think. But IMO it's not a fix, it is an added feature ...
Also missing the big picture here. Why do we need this?
Two reasons:
I don't fully get the picture, but drm_virtgpu_resource_create has stride. Can we send that down in transfers?
It's unused and suddenly caring about it has a good chance to break stuff ...
cheers, Gerd
On Tue, Oct 01, 2019 at 06:49:34PM -0700, Gurchetan Singh wrote:
virglrenderer has logic to validate both stride and layer_stride, but both are always zero. The fallback for that case is:
stride = width * bytes_per_pixel layer_stride = stride * num_layers
However, this assumption causes trouble in the following cases:
- When allocating host-compatible buffers for the planned wayland integration.
--verbose please.
I know wayland rounds up to a multiple of 64. But that isn't a problem today. You can have virtio-gpu objects with width = 832 and define a drm_framebuffer using that with width=800 and stride set accordingly.
cheers, Gerd
dri-devel@lists.freedesktop.org