alloc_workqueue replaces deprecated create_singlethread_workqueue().
A dedicated workqueue has been used since work items need to be flushed as a group rather than individually.
Since the flip_queue workqueue is involved in page-flipping and is not being used on a memory reclaim path, WQ_MEM_RECLAIM has not been set.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar bhaktipriya96@gmail.com --- drivers/gpu/drm/radeon/radeon_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 6a41b49..bbb29c7 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -711,7 +711,7 @@ static void radeon_crtc_init(struct drm_device *dev, int index)
drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256); radeon_crtc->crtc_id = index; - radeon_crtc->flip_queue = create_singlethread_workqueue("radeon-crtc"); + radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc", 0, 0); rdev->mode_info.crtcs[index] = radeon_crtc;
if (rdev->family >= CHIP_BONAIRE) { -- 2.1.4
On Sat, Jul 02, 2016 at 04:33:50PM +0530, Bhaktipriya Shridhar wrote:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
A dedicated workqueue has been used since work items need to be flushed as a group rather than individually.
Since the flip_queue workqueue is involved in page-flipping and is not being used on a memory reclaim path, WQ_MEM_RECLAIM has not been set.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
What are the involved work items? Is it safe to run them concurrently?
Thanks.
On 02.07.2016 22:46, Tejun Heo wrote:
On Sat, Jul 02, 2016 at 04:33:50PM +0530, Bhaktipriya Shridhar wrote:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
A dedicated workqueue has been used since work items need to be flushed as a group rather than individually.
Since the flip_queue workqueue is involved in page-flipping and is not being used on a memory reclaim path, WQ_MEM_RECLAIM has not been set.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
What are the involved work items?
drivers/gpu/drm/radeon/radeon_display.c:radeon_flip_work_func()
Is it safe to run them concurrently?
Concurrently with what exactly?
Hello,
On Mon, Jul 04, 2016 at 12:58:32PM +0900, Michel Dänzer wrote:
On 02.07.2016 22:46, Tejun Heo wrote:
On Sat, Jul 02, 2016 at 04:33:50PM +0530, Bhaktipriya Shridhar wrote:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
A dedicated workqueue has been used since work items need to be flushed as a group rather than individually.
There seem to be two work items involved, the flip one and unpin one. Provided that there's no ordering requirement between the two, can't we just flush them individually?
Since the flip_queue workqueue is involved in page-flipping and is not being used on a memory reclaim path, WQ_MEM_RECLAIM has not been set.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
What are the involved work items?
drivers/gpu/drm/radeon/radeon_display.c:radeon_flip_work_func()
Is it safe to run them concurrently?
Concurrently with what exactly?
The flip and unpin work items.
Thanks.
On 06.07.2016 06:06, Tejun Heo wrote:
On Mon, Jul 04, 2016 at 12:58:32PM +0900, Michel Dänzer wrote:
On 02.07.2016 22:46, Tejun Heo wrote:
On Sat, Jul 02, 2016 at 04:33:50PM +0530, Bhaktipriya Shridhar wrote:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
A dedicated workqueue has been used since work items need to be flushed as a group rather than individually.
There seem to be two work items involved, the flip one and unpin one. Provided that there's no ordering requirement between the two, can't we just flush them individually?
There is an ordering requirement between the two queues, but it's enforced by the driver (by only queuing the unpin work once a flip has completed, which only happens after the corresponding flip work has run).
Not being very familiar with the workqueue APIs, I'll describe how it's supposed to work from a driver POV, which will hopefully help you guys decide on the most appropriate alloc_workqueue parameters.
There is one flip work queue for each hardware CRTC. At most one radeon_flip_work_func item can be queued for any of them at any time. When a radeon_flip_work_func item is queued, it should be executed ASAP (so WQ_HIGHPRI might be appropriate?).
In contrast, the radeon_unpin_work_func items aren't particularly urgent, and it's okay for several of them to be queued up. So I guess it would actually make sense to use a different workqueue for them, maybe even the default one?
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
What are the involved work items?
drivers/gpu/drm/radeon/radeon_display.c:radeon_flip_work_func()
Is it safe to run them concurrently?
Concurrently with what exactly?
The flip and unpin work items.
Yes, it's safe to run those concurrently.
Hello, Michel.
On Wed, Jul 06, 2016 at 12:12:52PM +0900, Michel Dänzer wrote:
There is an ordering requirement between the two queues, but it's enforced by the driver (by only queuing the unpin work once a flip has completed, which only happens after the corresponding flip work has run).
Oh I see.
Not being very familiar with the workqueue APIs, I'll describe how it's supposed to work from a driver POV, which will hopefully help you guys decide on the most appropriate alloc_workqueue parameters.
There is one flip work queue for each hardware CRTC. At most one radeon_flip_work_func item can be queued for any of them at any time. When a radeon_flip_work_func item is queued, it should be executed ASAP (so WQ_HIGHPRI might be appropriate?).
Hmmm... the only time WQ_HIGHPRI should be used is when it'd otherwise require a kthread w/ nice value at -20. Would that be the case here? What are the consequences of the work item getting delayed? Also, what kind of delays matter here? Is it millisec range or micro?
In contrast, the radeon_unpin_work_func items aren't particularly urgent, and it's okay for several of them to be queued up. So I guess it would actually make sense to use a different workqueue for them, maybe even the default one?
If the flip work doesn't require high priority kthread, both of them can be queued to system_wq and flushed individually on shutdown. The default workqueue now has high enough concurrency that it isn't necessary to worry about getting delayed by other work items.
Thanks.
On 06.07.2016 22:45, Tejun Heo wrote:
On Wed, Jul 06, 2016 at 12:12:52PM +0900, Michel Dänzer wrote:
Not being very familiar with the workqueue APIs, I'll describe how it's supposed to work from a driver POV, which will hopefully help you guys decide on the most appropriate alloc_workqueue parameters.
There is one flip work queue for each hardware CRTC. At most one radeon_flip_work_func item can be queued for any of them at any time. When a radeon_flip_work_func item is queued, it should be executed ASAP (so WQ_HIGHPRI might be appropriate?).
Hmmm... the only time WQ_HIGHPRI should be used is when it'd otherwise require a kthread w/ nice value at -20. Would that be the case here? What are the consequences of the work item getting delayed?
A page flip may be delayed to a later display refresh cycle.
Also, what kind of delays matter here? Is it millisec range or micro?
It can be the latter in theory, but normally rather the former.
Am 07.07.2016 um 05:32 schrieb Michel Dänzer:
On 06.07.2016 22:45, Tejun Heo wrote:
On Wed, Jul 06, 2016 at 12:12:52PM +0900, Michel Dänzer wrote:
Not being very familiar with the workqueue APIs, I'll describe how it's supposed to work from a driver POV, which will hopefully help you guys decide on the most appropriate alloc_workqueue parameters.
There is one flip work queue for each hardware CRTC. At most one radeon_flip_work_func item can be queued for any of them at any time. When a radeon_flip_work_func item is queued, it should be executed ASAP (so WQ_HIGHPRI might be appropriate?).
Hmmm... the only time WQ_HIGHPRI should be used is when it'd otherwise require a kthread w/ nice value at -20. Would that be the case here? What are the consequences of the work item getting delayed?
A page flip may be delayed to a later display refresh cycle.
Also, what kind of delays matter here? Is it millisec range or micro?
It can be the latter in theory, but normally rather the former.
Well to be precise with a typical 1920x1080@60 resolution you have about 2.16ms time under ideal conditions for the flip.
So using the high priority queue still sounds like a good idea to me.
Regards, Christian.
On 07.07.2016 16:43, Christian König wrote:
Am 07.07.2016 um 05:32 schrieb Michel Dänzer:
On 06.07.2016 22:45, Tejun Heo wrote:
On Wed, Jul 06, 2016 at 12:12:52PM +0900, Michel Dänzer wrote:
Not being very familiar with the workqueue APIs, I'll describe how it's supposed to work from a driver POV, which will hopefully help you guys decide on the most appropriate alloc_workqueue parameters.
There is one flip work queue for each hardware CRTC. At most one radeon_flip_work_func item can be queued for any of them at any time. When a radeon_flip_work_func item is queued, it should be executed ASAP (so WQ_HIGHPRI might be appropriate?).
Hmmm... the only time WQ_HIGHPRI should be used is when it'd otherwise require a kthread w/ nice value at -20. Would that be the case here? What are the consequences of the work item getting delayed?
A page flip may be delayed to a later display refresh cycle.
Also, what kind of delays matter here? Is it millisec range or micro?
It can be the latter in theory, but normally rather the former.
Well to be precise with a typical 1920x1080@60 resolution you have about 2.16ms time under ideal conditions for the flip.
So using the high priority queue still sounds like a good idea to me.
How did you arrive at 2.16ms?
Userspace can call the ioctl up to one full refresh cycle ahead of time, which is ~16ms at 60 Hz. On the other hand userspace can also call the ioctl arbitrarily close to the vertical blank period, in which case even a delay of just 1ms (or even significantly less) may cause the flip to be delayed by one refresh cycle.
Hello,
On Fri, Jul 08, 2016 at 02:52:30PM +0900, Michel Dänzer wrote:
On 07.07.2016 16:43, Christian König wrote:
Also, what kind of delays matter here? Is it millisec range or micro?
It can be the latter in theory, but normally rather the former.
Well to be precise with a typical 1920x1080@60 resolution you have about 2.16ms time under ideal conditions for the flip.
So using the high priority queue still sounds like a good idea to me.
How did you arrive at 2.16ms?
Userspace can call the ioctl up to one full refresh cycle ahead of time, which is ~16ms at 60 Hz. On the other hand userspace can also call the ioctl arbitrarily close to the vertical blank period, in which case even a delay of just 1ms (or even significantly less) may cause the flip to be delayed by one refresh cycle.
If there's too long a delay, the outcome is missing the refresh cycle, right? Hmmm... yeah, WQ_HIGHPRI probably is the right answer here. Bhaktipriya, can you please update the patch to use a dedicated workqueue with WQ_HIGHPRI?
Thanks.
alloc_workqueue replaces deprecated create_singlethread_workqueue().
Each hardware CRTC has a single flip work queue. When a radeon_flip_work_func item is queued, it needs to be executed ASAP because even a slight delay may cause the flip to be delayed by one refresh cycle.
Hence, a dedicated workqueue with WQ_HIGHPRI set, has been used here since a delay can cause the outcome to miss the refresh cycle.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar bhaktipriya96@gmail.com --- Changes in v2: -Used a dedicated work queue with WQ_HIGHPRI
drivers/gpu/drm/radeon/radeon_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 6a41b49..64b246e 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -711,7 +711,7 @@ static void radeon_crtc_init(struct drm_device *dev, int index)
drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256); radeon_crtc->crtc_id = index; - radeon_crtc->flip_queue = create_singlethread_workqueue("radeon-crtc"); + radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc", WQ_HIGHPRI, 0); rdev->mode_info.crtcs[index] = radeon_crtc;
if (rdev->family >= CHIP_BONAIRE) { -- 2.1.4
Am 16.07.2016 um 13:30 schrieb Bhaktipriya Shridhar:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
Each hardware CRTC has a single flip work queue. When a radeon_flip_work_func item is queued, it needs to be executed ASAP because even a slight delay may cause the flip to be delayed by one refresh cycle.
Hence, a dedicated workqueue with WQ_HIGHPRI set, has been used here since a delay can cause the outcome to miss the refresh cycle.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar bhaktipriya96@gmail.com
Reviewed-by: Christian König christian.koenig@amd.com
Changes in v2: -Used a dedicated work queue with WQ_HIGHPRI
drivers/gpu/drm/radeon/radeon_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 6a41b49..64b246e 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -711,7 +711,7 @@ static void radeon_crtc_init(struct drm_device *dev, int index)
drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256); radeon_crtc->crtc_id = index;
- radeon_crtc->flip_queue = create_singlethread_workqueue("radeon-crtc");
radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc", WQ_HIGHPRI, 0); rdev->mode_info.crtcs[index] = radeon_crtc;
if (rdev->family >= CHIP_BONAIRE) {
-- 2.1.4
On Sat, Jul 16, 2016 at 05:00:44PM +0530, Bhaktipriya Shridhar wrote:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
Each hardware CRTC has a single flip work queue. When a radeon_flip_work_func item is queued, it needs to be executed ASAP because even a slight delay may cause the flip to be delayed by one refresh cycle.
Hence, a dedicated workqueue with WQ_HIGHPRI set, has been used here since a delay can cause the outcome to miss the refresh cycle.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar bhaktipriya96@gmail.com
Acked-by: Tejun Heo tj@kernel.org
Thanks.
On Mon, Jul 18, 2016 at 8:48 PM, Tejun Heo tj@kernel.org wrote:
On Sat, Jul 16, 2016 at 05:00:44PM +0530, Bhaktipriya Shridhar wrote:
alloc_workqueue replaces deprecated create_singlethread_workqueue().
Each hardware CRTC has a single flip work queue. When a radeon_flip_work_func item is queued, it needs to be executed ASAP because even a slight delay may cause the flip to be delayed by one refresh cycle.
Hence, a dedicated workqueue with WQ_HIGHPRI set, has been used here since a delay can cause the outcome to miss the refresh cycle.
Since there are only a fixed number of work items, explicit concurrency limit is unnecessary here.
Signed-off-by: Bhaktipriya Shridhar bhaktipriya96@gmail.com
Acked-by: Tejun Heo tj@kernel.org
Applied. thanks!
Alex
Thanks.
-- tejun _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org