Following up on my message from Jan 19, now with a lot more hard data and a less intrusive modification. Still a prototype though. CC-ing DRI-devel and Mario Kleiner for a larger audience.
To recap, I was seeing consistent flickering with Mesa/KMS on Android on my Iconia Tab W500 tablet with an AMD C-50 APU. I was also noticing occasional flickering under Ubuntu on the same system when maximizing/restoring windows and releasing windows after moving them.
I believe that flickering is related to page flipping and the associated notifications to user space. A small modification to the page flipping code in the Radeon driver made the problem disappear on both Ubuntu and Android. As I understand it, the page flip notification logic works as follows:
1. Hardware issues vsync IRQ 2. IRQ handler calls radeon_crtc_handle_flip 3. radeon_crtc_handle_flip calls radeon_page_flip, which programs MMIO to flip pages and returns status whether the flip has been completed 4. if flip has not been completed, radeon_crtc_handle_flip uses current scanout position to predict whether flip will complete in the current frame or not 5. if flip is predicted to complete, signal user space, otherwise defer until next vsync IRQ
The condition in step 4 needed a slight modification on my hardware. If the current scanout position is negative (inside vblank interval), the page flip will not complete until the next vsync on my hardware.
I'm attaching two patches. radeon_flip_diag.diff adds some debug output to the kernel log that helped me understand the timing of VSync IRQs used for handling page-flips relative to the screen refresh in progress. It also measures the time it takes to program the page flip in MMIO in pixels scanned out (typically about 300 pixels, so relatively insignificant compared to the vertical refresh).
On my system it turned out that the scanout position at the time radeon_crtc_handle_flip was called was somewhere between 798-800 and -2-0 (the LVDS screen having 800 visible rows). I used an awk script (also attached) to compute some statistics. With the original condition almost no page flips were detected as deferred to the next vsync IRQ. With the modified condition about 50% page flips were completed immediately according to radeon_page_flip and of the remaining ones about 50% were correctly predicted to complete based on the scanout position. In total about 25% were deferred until the next vsync. These 25% must have been causing flickering with the original condition.
radeon_flip_fix.diff shows the minor modification to the condition used to decide whether a page-flip has been completed or will complete in the current screen refresh.
My conclusion is that on this particular hardware the condition that predicts page flip completion must be modified in order to avoid notifying user space of completed page flips prematurely.
Regards, Felix
On Wed, Feb 1, 2012 at 5:43 PM, Felix Kuehling felix.kuehling@amd.com wrote:
Following up on my message from Jan 19, now with a lot more hard data and a less intrusive modification. Still a prototype though. CC-ing DRI-devel and Mario Kleiner for a larger audience.
To recap, I was seeing consistent flickering with Mesa/KMS on Android on my Iconia Tab W500 tablet with an AMD C-50 APU. I was also noticing occasional flickering under Ubuntu on the same system when maximizing/restoring windows and releasing windows after moving them.
I believe that flickering is related to page flipping and the associated notifications to user space. A small modification to the page flipping code in the Radeon driver made the problem disappear on both Ubuntu and Android. As I understand it, the page flip notification logic works as follows:
Hardware issues vsync IRQ IRQ handler calls radeon_crtc_handle_flip radeon_crtc_handle_flip calls radeon_page_flip, which programs MMIO to flip pages and returns status whether the flip has been completed if flip has not been completed, radeon_crtc_handle_flip uses current scanout position to predict whether flip will complete in the current frame or not if flip is predicted to complete, signal user space, otherwise defer until next vsync IRQ
The condition in step 4 needed a slight modification on my hardware. If the current scanout position is negative (inside vblank interval), the page flip will not complete until the next vsync on my hardware.
I'm attaching two patches. radeon_flip_diag.diff adds some debug output to the kernel log that helped me understand the timing of VSync IRQs used for handling page-flips relative to the screen refresh in progress. It also measures the time it takes to program the page flip in MMIO in pixels scanned out (typically about 300 pixels, so relatively insignificant compared to the vertical refresh).
On my system it turned out that the scanout position at the time radeon_crtc_handle_flip was called was somewhere between 798-800 and -2-0 (the LVDS screen having 800 visible rows). I used an awk script (also attached) to compute some statistics. With the original condition almost no page flips were detected as deferred to the next vsync IRQ. With the modified condition about 50% page flips were completed immediately according to radeon_page_flip and of the remaining ones about 50% were correctly predicted to complete based on the scanout position. In total about 25% were deferred until the next vsync. These 25% must have been causing flickering with the original condition.
radeon_flip_fix.diff shows the minor modification to the condition used to decide whether a page-flip has been completed or will complete in the current screen refresh.
My conclusion is that on this particular hardware the condition that predicts page flip completion must be modified in order to avoid notifying user space of completed page flips prematurely.
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucher alexdeucher@gmail.com
Regards, Felix
-- _____ Felix Kuehling \ _ | MTS Software Development Eng. /|_| | SW-Linux Base Gfx | AMD |__/ | T 905.882.2600 x8928
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Felix!
Am Dienstag, den 21.02.2012, 15:07 -0500 schrieb Alex Deucher:
On Wed, Feb 1, 2012 at 5:43 PM, Felix Kuehling felix.kuehling@amd.com wrote:
Following up on my message from Jan 19, now with a lot more hard data and a less intrusive modification. Still a prototype though. CC-ing DRI-devel and Mario Kleiner for a larger audience.
To recap, I was seeing consistent flickering with Mesa/KMS on Android on my Iconia Tab W500 tablet with an AMD C-50 APU. I was also noticing occasional flickering under Ubuntu on the same system when maximizing/restoring windows and releasing windows after moving them.
I believe that flickering is related to page flipping and the associated notifications to user space. A small modification to the page flipping code in the Radeon driver made the problem disappear on both Ubuntu and Android. As I understand it, the page flip notification logic works as follows:
Hardware issues vsync IRQ IRQ handler calls radeon_crtc_handle_flip radeon_crtc_handle_flip calls radeon_page_flip, which programs MMIO to flip pages and returns status whether the flip has been completed if flip has not been completed, radeon_crtc_handle_flip uses current scanout position to predict whether flip will complete in the current frame or not if flip is predicted to complete, signal user space, otherwise defer until next vsync IRQ
The condition in step 4 needed a slight modification on my hardware. If the current scanout position is negative (inside vblank interval), the page flip will not complete until the next vsync on my hardware.
I'm attaching two patches. radeon_flip_diag.diff adds some debug output to the kernel log that helped me understand the timing of VSync IRQs used for handling page-flips relative to the screen refresh in progress. It also measures the time it takes to program the page flip in MMIO in pixels scanned out (typically about 300 pixels, so relatively insignificant compared to the vertical refresh).
On my system it turned out that the scanout position at the time radeon_crtc_handle_flip was called was somewhere between 798-800 and -2-0 (the LVDS screen having 800 visible rows). I used an awk script (also attached) to compute some statistics. With the original condition almost no page flips were detected as deferred to the next vsync IRQ. With the modified condition about 50% page flips were completed immediately according to radeon_page_flip and of the remaining ones about 50% were correctly predicted to complete based on the scanout position. In total about 25% were deferred until the next vsync. These 25% must have been causing flickering with the original condition.
radeon_flip_fix.diff shows the minor modification to the condition used to decide whether a page-flip has been completed or will complete in the current screen refresh.
My conclusion is that on this particular hardware the condition that predicts page flip completion must be modified in order to avoid notifying user space of completed page flips prematurely.
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucher alexdeucher@gmail.com
A Signed-off-by line is required, is not it? Should it also got to stable?
Additionally should the diagnose also be committed?
Felix:
git config user.name "Felix Kühling" git config user.email "felix.kuehling@amd.com" git clone git://people.freedesktop.org/~airlied/linux git checkout -b flicker-fix drm-next # I hope that is the correct one. # apply your fix git commit -a -s # add your commit message with Alex’s Reviewed-by line git format-patch -1 # use git send-email or paste it to your message editor to send the patch to the list
Thanks,
Paul
On 02/21/2012 09:07 PM, Alex Deucher wrote:
On Wed, Feb 1, 2012 at 5:43 PM, Felix Kuehlingfelix.kuehling@amd.com wrote:
Following up on my message from Jan 19, now with a lot more hard data and a less intrusive modification. Still a prototype though. CC-ing DRI-devel and Mario Kleiner for a larger audience.
To recap, I was seeing consistent flickering with Mesa/KMS on Android on my Iconia Tab W500 tablet with an AMD C-50 APU. I was also noticing occasional flickering under Ubuntu on the same system when maximizing/restoring windows and releasing windows after moving them.
I believe that flickering is related to page flipping and the associated notifications to user space. A small modification to the page flipping code in the Radeon driver made the problem disappear on both Ubuntu and Android. As I understand it, the page flip notification logic works as follows:
Hardware issues vsync IRQ IRQ handler calls radeon_crtc_handle_flip radeon_crtc_handle_flip calls radeon_page_flip, which programs MMIO to flip pages and returns status whether the flip has been completed if flip has not been completed, radeon_crtc_handle_flip uses current scanout position to predict whether flip will complete in the current frame or not if flip is predicted to complete, signal user space, otherwise defer until next vsync IRQ
The condition in step 4 needed a slight modification on my hardware. If the current scanout position is negative (inside vblank interval), the page flip will not complete until the next vsync on my hardware.
I'm attaching two patches. radeon_flip_diag.diff adds some debug output to the kernel log that helped me understand the timing of VSync IRQs used for handling page-flips relative to the screen refresh in progress. It also measures the time it takes to program the page flip in MMIO in pixels scanned out (typically about 300 pixels, so relatively insignificant compared to the vertical refresh).
On my system it turned out that the scanout position at the time radeon_crtc_handle_flip was called was somewhere between 798-800 and -2-0 (the LVDS screen having 800 visible rows). I used an awk script (also attached) to compute some statistics. With the original condition almost no page flips were detected as deferred to the next vsync IRQ. With the modified condition about 50% page flips were completed immediately according to radeon_page_flip and of the remaining ones about 50% were correctly predicted to complete based on the scanout position. In total about 25% were deferred until the next vsync. These 25% must have been causing flickering with the original condition.
radeon_flip_fix.diff shows the minor modification to the condition used to decide whether a page-flip has been completed or will complete in the current screen refresh.
My conclusion is that on this particular hardware the condition that predicts page flip completion must be modified in order to avoid notifying user space of completed page flips prematurely.
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
1. Makes the timestamps 1 refresh too late, causing timing sensitive software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
2. Can reduce the framerate due to throttling the client, especially on systems that are already challenged wrt. to their irq timing.
Is the vblank period very short on these kind of devices? From Felix description is sounds as if it is only 2 scanlines?
thanks, -mario
On 12-02-21 07:49 PM, Mario Kleiner wrote:
On 02/21/2012 09:07 PM, Alex Deucher wrote:
[snip]
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
- Makes the timestamps 1 refresh too late, causing timing sensitive
software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
Agreed. I was going to perform some more experiments on other hardware to determine what the right threshold is for different hardware generations. I hope I'll get to that this week.
- Can reduce the framerate due to throttling the client, especially
on systems that are already challenged wrt. to their irq timing.
Is the vblank period very short on these kind of devices? From Felix description is sounds as if it is only 2 scanlines?
It looks like that.
Thanks for the feedback, Felix
thanks, -mario
On 12-02-22 11:20 AM, Felix Kuehling wrote:
On 12-02-21 07:49 PM, Mario Kleiner wrote:
On 02/21/2012 09:07 PM, Alex Deucher wrote:
[snip]
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
- Makes the timestamps 1 refresh too late, causing timing sensitive
software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
Agreed. I was going to perform some more experiments on other hardware to determine what the right threshold is for different hardware generations. I hope I'll get to that this week.
I have a final version of my patch including an explanation of the observations it's based on (attached). It's against current drm-next from git://people.freedesktop.org/~airlied/linux.
- Can reduce the framerate due to throttling the client, especially
on systems that are already challenged wrt. to their irq timing.
Is the vblank period very short on these kind of devices? From Felix description is sounds as if it is only 2 scanlines?
It looks like that.
Turns out that that's not correct. Smaller negative values of vpos never showed up in my log output because I didn't print it in case update_pending was 0. The actual vblank period is 8 scan lines on this device. Still not much compared to the ~40 I was seeing with an external monitor. Anything > -4 would result in flickering in my experiments, so only 5 scan lines worth of time are available for submitting the page flip in time for the next frame. If I miss that time window, the flip is deferred by an extra frame. In practice that seems to occur in about 25% of cases on this particular device.
Regards, Felix
Thanks for the feedback, Felix
thanks, -mario
On Feb 24, 2012, at 10:20 PM, Felix Kuehling wrote:
On 12-02-22 11:20 AM, Felix Kuehling wrote:
On 12-02-21 07:49 PM, Mario Kleiner wrote:
On 02/21/2012 09:07 PM, Alex Deucher wrote:
[snip]
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
- Makes the timestamps 1 refresh too late, causing timing sensitive
software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
Agreed. I was going to perform some more experiments on other hardware to determine what the right threshold is for different hardware generations. I hope I'll get to that this week.
I have a final version of my patch including an explanation of the observations it's based on (attached). It's against current drm-next from git://people.freedesktop.org/~airlied/linux.
The idea of the current logic was that it happened that the update_pending bit isn't read back as zero at the end of the page_flip programming function, immediately after clearing the graphics hardware update_lock, e.g., maybe because there is some delay between clearing that register and the double buffering taking place. But according to the specs, if update_pending is high, ie., the hw has "accepted" the request for a page flip, it will complete as long as that request still happened within the vblank. So on a return from the page_flip function with update_pending == 1 we check if we are still inside the vblank area, ie., if the hw will certainly complete the double buffering within the current vblank, because the request was made in time.
I did all my original testing of these bits with avivo hw (rv530, r600) and with one radeon hd 5850, so i'm a bit surprised if the avivo path would unconditionally need this adjustment. Could it be that the observed accuracy of update_pending depends on the rest of the hw, e.g., bus or processor speed, or bus activity? My test machines were a MacBookPro with Core2Duo 2.3 Ghz for rv530 and a rather old AMD Athlon-64 PC for r600.
I'm worried that your observed reliability of update_pending on >= AVIVO asics could be an artifact of the specific computer hw you used and that this doesn't generalize on older or different hw. If it doesn't generalize then the patch could defer a lot of perfectly good pageflips by 1 frame, screwing the timestamps and reducing framerate.
Here is what the rv630 register programming guide says about the double-buffering of the surface base address registers:
D1GRPH_SURFACE_UPDATE_PENDING: "the double buffering occurs in vertical retrace when D1GRPH_SURFACE_UPDATE_PENDING = 1 and D1GRPH_UPDATE_LOCK = 0 and V_UPDATE = 1."
D1CRTC_V_UPDATE: "Current vertical position ... 1 = within the V_UPDATE region (between end of vertical active display and start_line)"
For us that would mean the threshold for deferred flip completion would need to be whatever that mentioned "start_line" is, and for the tested avivo hw, start_line used to be == start of active scanout, so the threshold of zero made sense.
Alex: I don't know where start_line is stored. Could it be that this value changed due to hw changes or changes in the kms driver or atombios? Or would it be possible to read that value back from some register and use it as threshold?
If you look at radeon_get_crtc_scanoutpos() you can also see that the returned vpos is corrected for some offsets. Could it be that something changed there for the DCE4.1 or DCE5 display engine or whatever the C-50 APU uses? That could also explain why suddenly such a weird threshold as vpos > -4 is needed on your tablet, because the returned vpos is offset wrongly by a few scanlines.
-mario
- Can reduce the framerate due to throttling the client, especially
on systems that are already challenged wrt. to their irq timing.
Is the vblank period very short on these kind of devices? From Felix description is sounds as if it is only 2 scanlines?
It looks like that.
Turns out that that's not correct. Smaller negative values of vpos never showed up in my log output because I didn't print it in case update_pending was 0. The actual vblank period is 8 scan lines on this device. Still not much compared to the ~40 I was seeing with an external monitor. Anything > -4 would result in flickering in my experiments, so only 5 scan lines worth of time are available for submitting the page flip in time for the next frame. If I miss that time window, the flip is deferred by an extra frame. In practice that seems to occur in about 25% of cases on this particular device.
Regards, Felix
Thanks for the feedback, Felix
thanks, -mario
-- _____ Felix Kuehling \ _ | MTS Software Development Eng. /|_| | SW-Linux Base Gfx | AMD |__/ | T 905.882.2600 x8928
<0001-Fix-deferred-page-flip-detection-logic-on-Avivo-base.patch>
********************************************************************* Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany
e-mail: mario.kleiner@tuebingen.mpg.de office: +49 (0)7071/601-1623 fax: +49 (0)7071/601-616 www: http://www.kyb.tuebingen.mpg.de/~kleinerm ********************************************************************* "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." (Richard Feynman)
On Fri, Feb 24, 2012 at 11:38 PM, Mario Kleiner mario.kleiner@tuebingen.mpg.de wrote:
On Feb 24, 2012, at 10:20 PM, Felix Kuehling wrote:
On 12-02-22 11:20 AM, Felix Kuehling wrote:
On 12-02-21 07:49 PM, Mario Kleiner wrote:
On 02/21/2012 09:07 PM, Alex Deucher wrote:
[snip]
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
- Makes the timestamps 1 refresh too late, causing timing sensitive
software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
Agreed. I was going to perform some more experiments on other hardware to determine what the right threshold is for different hardware generations. I hope I'll get to that this week.
I have a final version of my patch including an explanation of the observations it's based on (attached). It's against current drm-next from git://people.freedesktop.org/~airlied/linux.
The idea of the current logic was that it happened that the update_pending bit isn't read back as zero at the end of the page_flip programming function, immediately after clearing the graphics hardware update_lock, e.g., maybe because there is some delay between clearing that register and the double buffering taking place. But according to the specs, if update_pending is high, ie., the hw has "accepted" the request for a page flip, it will complete as long as that request still happened within the vblank. So on a return from the page_flip function with update_pending == 1 we check if we are still inside the vblank area, ie., if the hw will certainly complete the double buffering within the current vblank, because the request was made in time.
I did all my original testing of these bits with avivo hw (rv530, r600) and with one radeon hd 5850, so i'm a bit surprised if the avivo path would unconditionally need this adjustment. Could it be that the observed accuracy of update_pending depends on the rest of the hw, e.g., bus or processor speed, or bus activity? My test machines were a MacBookPro with Core2Duo 2.3 Ghz for rv530 and a rather old AMD Athlon-64 PC for r600.
I'm worried that your observed reliability of update_pending on >= AVIVO asics could be an artifact of the specific computer hw you used and that this doesn't generalize on older or different hw. If it doesn't generalize then the patch could defer a lot of perfectly good pageflips by 1 frame, screwing the timestamps and reducing framerate.
Here is what the rv630 register programming guide says about the double-buffering of the surface base address registers:
D1GRPH_SURFACE_UPDATE_PENDING: "the double buffering occurs in vertical retrace when D1GRPH_SURFACE_UPDATE_PENDING = 1 and D1GRPH_UPDATE_LOCK = 0 and V_UPDATE = 1."
D1CRTC_V_UPDATE: "Current vertical position ... 1 = within the V_UPDATE region (between end of vertical active display and start_line)"
For us that would mean the threshold for deferred flip completion would need to be whatever that mentioned "start_line" is, and for the tested avivo hw, start_line used to be == start of active scanout, so the threshold of zero made sense.
Alex: I don't know where start_line is stored. Could it be that this value changed due to hw changes or changes in the kms driver or atombios? Or would it be possible to read that value back from some register and use it as threshold?
If you look at radeon_get_crtc_scanoutpos() you can also see that the returned vpos is corrected for some offsets. Could it be that something changed there for the DCE4.1 or DCE5 display engine or whatever the C-50 APU uses? That could also explain why suddenly such a weird threshold as vpos > -4 is needed on your tablet, because the returned vpos is offset wrongly by a few scanlines.
Looks like there was a change for DCE4.1/5.0.
DCE1/2/3: CRTC:D1CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 · MMReg:0x60d8 DESCRIPTION: move start_line signal earlier by 1 line in CRTC1 Field Name Bits Default Description D1CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line signal by 1 line eariler in progressive mode
D1CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line signal by 1 line earlier in interlaced timing mode
DCE4.0: CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc DESCRIPTION: move start_line signal earlier by 1 line in CRTC Field Name Bits Default Description CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line signal by 1 line eariler in progressive mode
CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line signal by 1 line earlier in interlaced timing mode
DCE4.1/DCE5: CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc DESCRIPTION: move start_line signal earlier by 1 line in CRTC Field Name Bits Default Description CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line signal by 1 line eariler in progressive mode
CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line signal by 1 line earlier in interlaced timing mode
CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced Start Line position respect to not VBLANK. Default of 4 means the Advanced Start Line is 4 lines before the first non VBLANK line
Also bit 4 (CRTC_V_START_LINE) of CRTC_STATUS will tell you when whether the current line is outside (0) or inside (1) the start line region.
Alex
-mario
- Can reduce the framerate due to throttling the client, especially
on systems that are already challenged wrt. to their irq timing.
Is the vblank period very short on these kind of devices? From Felix description is sounds as if it is only 2 scanlines?
It looks like that.
Turns out that that's not correct. Smaller negative values of vpos never showed up in my log output because I didn't print it in case update_pending was 0. The actual vblank period is 8 scan lines on this device. Still not much compared to the ~40 I was seeing with an external monitor. Anything > -4 would result in flickering in my experiments, so only 5 scan lines worth of time are available for submitting the page flip in time for the next frame. If I miss that time window, the flip is deferred by an extra frame. In practice that seems to occur in about 25% of cases on this particular device.
Regards, Felix
Thanks for the feedback, Felix
thanks, -mario
-- _____ Felix Kuehling \ _ | MTS Software Development Eng. /|_| | SW-Linux Base Gfx | AMD |__/ | T 905.882.2600 x8928
<0001-Fix-deferred-page-flip-detection-logic-on-Avivo-base.patch>
Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany
e-mail: mario.kleiner@tuebingen.mpg.de office: +49 (0)7071/601-1623 fax: +49 (0)7071/601-616 www: http://www.kyb.tuebingen.mpg.de/~kleinerm
"For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." (Richard Feynman)
On 12-02-24 11:38 PM, Mario Kleiner wrote:
On Feb 24, 2012, at 10:20 PM, Felix Kuehling wrote:
On 12-02-22 11:20 AM, Felix Kuehling wrote:
On 12-02-21 07:49 PM, Mario Kleiner wrote:
On 02/21/2012 09:07 PM, Alex Deucher wrote:
[snip]
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
- Makes the timestamps 1 refresh too late, causing timing sensitive
software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
Agreed. I was going to perform some more experiments on other hardware to determine what the right threshold is for different hardware generations. I hope I'll get to that this week.
I have a final version of my patch including an explanation of the observations it's based on (attached). It's against current drm-next from git://people.freedesktop.org/~airlied/linux.
The idea of the current logic was that it happened that the update_pending bit isn't read back as zero at the end of the page_flip programming function, immediately after clearing the graphics hardware update_lock, e.g., maybe because there is some delay between clearing that register and the double buffering taking place. But according to the specs, if update_pending is high, ie., the hw has "accepted" the request for a page flip, it will complete as long as that request still happened within the vblank. So on a return from the page_flip function with update_pending == 1 we check if we are still inside the vblank area, ie., if the hw will certainly complete the double buffering within the current vblank, because the request was made in time.
I did all my original testing of these bits with avivo hw (rv530, r600) and with one radeon hd 5850, so i'm a bit surprised if the avivo path would unconditionally need this adjustment. Could it be that the observed accuracy of update_pending depends on the rest of the hw, e.g., bus or processor speed, or bus activity? My test machines were a MacBookPro with Core2Duo 2.3 Ghz for rv530 and a rather old AMD Athlon-64 PC for r600.
I used a few different systems:
Brazos reference board for Brazos (E-350) Iconia Tab W500 (C-50) PhenomII X4 for most of the discrete cards Athlon64 X2 for RS690
The results I got were consistent across all those systems. The only differences I saw between different generations of Avivo-based hardware were about the exact timing of the threshold when I would start seeing flickering. On R5xx and R6xx it would start flickering at -3. On newer hardware at -4 (I didn't test on R7xx).
I'm worried that your observed reliability of update_pending on >= AVIVO asics could be an artifact of the specific computer hw you used and that this doesn't generalize on older or different hw.
Given the number of different ASICs and systems, I think that's unlikely. It's more likely that this depends on the exact mode timings. I was running most of my experiments with a 19" DFP 1280x1024 at 60Hz, connected by DVI if available, or VGA on some of the IGPs.
It's possible that different mode timings would result in a slightly different threshold.
If it doesn't generalize then the patch could defer a lot of perfectly good pageflips by 1 frame, screwing the timestamps and reducing framerate.
I understand your concern. But given my observations, the current implementation potentially produces much more annoying artefacts.
Here is what the rv630 register programming guide says about the double-buffering of the surface base address registers:
D1GRPH_SURFACE_UPDATE_PENDING: "the double buffering occurs in vertical retrace when D1GRPH_SURFACE_UPDATE_PENDING = 1 and D1GRPH_UPDATE_LOCK = 0 and V_UPDATE = 1."
As far as I can tell, D1GRPH_SURFACE_UPDATE_PENDING will only be 1, if the page flip is initiated while outside the vertical retrace. The the actual page flip will occur when V_UPDATE changes to 1, that is, when the next vertical retrace occurs. So if we read D1GRPH_SURFACE_UPDATE_PENDING after initiating a page flip, it implies that we're currently not in a vertical retrace.
D1CRTC_V_UPDATE: "Current vertical position ... 1 = within the V_UPDATE region (between end of vertical active display and start_line)"
For us that would mean the threshold for deferred flip completion would need to be whatever that mentioned "start_line" is, and for the tested avivo hw, start_line used to be == start of active scanout, so the threshold of zero made sense.
In my experiments the start_line seemed to be between -4 and -2. On no ASIC did I observe a start_line == 0.
Alex: I don't know where start_line is stored. Could it be that this value changed due to hw changes or changes in the kms driver or atombios? Or would it be possible to read that value back from some register and use it as threshold?
If you look at radeon_get_crtc_scanoutpos() you can also see that the returned vpos is corrected for some offsets. Could it be that something changed there for the DCE4.1 or DCE5 display engine or whatever the C-50 APU uses? That could also explain why suddenly such a weird threshold as vpos > -4 is needed on your tablet, because the returned vpos is offset wrongly by a few scanlines.
I believe the reason I observe flickering on my tablet is due to the short vsync period. Otherwise there is nothing special about my tablet. I was able to induce the same flickering by artificially delaying the page flip on other Avivo hardware. My proposed fix resolves this potential problem on all Avivo hardware. I am very confident that it does that without deferring page flip notifications to user space unnecessarily.
Regards, Felix
-mario
- Can reduce the framerate due to throttling the client, especially
on systems that are already challenged wrt. to their irq timing.
Is the vblank period very short on these kind of devices? From Felix description is sounds as if it is only 2 scanlines?
It looks like that.
Turns out that that's not correct. Smaller negative values of vpos never showed up in my log output because I didn't print it in case update_pending was 0. The actual vblank period is 8 scan lines on this device. Still not much compared to the ~40 I was seeing with an external monitor. Anything > -4 would result in flickering in my experiments, so only 5 scan lines worth of time are available for submitting the page flip in time for the next frame. If I miss that time window, the flip is deferred by an extra frame. In practice that seems to occur in about 25% of cases on this particular device.
Regards, Felix
Thanks for the feedback, Felix
thanks, -mario
-- _____ Felix Kuehling \ _ | MTS Software Development Eng. /|_| | SW-Linux Base Gfx | AMD |__/ | T 905.882.2600 x8928
<0001-Fix-deferred-page-flip-detection-logic-on-Avivo-base.patch>
Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany
e-mail: mario.kleiner@tuebingen.mpg.de office: +49 (0)7071/601-1623 fax: +49 (0)7071/601-616 www: http://www.kyb.tuebingen.mpg.de/~kleinerm
"For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." (Richard Feynman)
On Feb 27, 2012, at 4:47 PM, Felix Kuehling wrote:
On 12-02-24 11:38 PM, Mario Kleiner wrote:
On Feb 24, 2012, at 10:20 PM, Felix Kuehling wrote:
On 12-02-22 11:20 AM, Felix Kuehling wrote:
On 12-02-21 07:49 PM, Mario Kleiner wrote:
On 02/21/2012 09:07 PM, Alex Deucher wrote:
[snip]
The fix looks ok to me. Mario any thoughts?
Reviewed-by: Alex Deucheralexdeucher@gmail.com
Hi,
the fix looks ok to me for that device, but could we make it conditional on the AMD C-50 APU and similar pieces? It is the right thing to do for that gpu, but for regular desktop gpus it is too pessimistic if it defers the pageflip timestamping and completion event for an already completed flip:
- Makes the timestamps 1 refresh too late, causing timing
sensitive software like mine to detect false positives -- reporting skipped frames were there weren't any. Not as bad as missing a really skipped frame, but still not great.
Agreed. I was going to perform some more experiments on other hardware to determine what the right threshold is for different hardware generations. I hope I'll get to that this week.
I have a final version of my patch including an explanation of the observations it's based on (attached). It's against current drm-next from git://people.freedesktop.org/~airlied/linux.
The idea of the current logic was that it happened that the update_pending bit isn't read back as zero at the end of the page_flip programming function, immediately after clearing the graphics hardware update_lock, e.g., maybe because there is some delay between clearing that register and the double buffering taking place. But according to the specs, if update_pending is high, ie., the hw has "accepted" the request for a page flip, it will complete as long as that request still happened within the vblank. So on a return from the page_flip function with update_pending == 1 we check if we are still inside the vblank area, ie., if the hw will certainly complete the double buffering within the current vblank, because the request was made in time.
I did all my original testing of these bits with avivo hw (rv530, r600) and with one radeon hd 5850, so i'm a bit surprised if the avivo path would unconditionally need this adjustment. Could it be that the observed accuracy of update_pending depends on the rest of the hw, e.g., bus or processor speed, or bus activity? My test machines were a MacBookPro with Core2Duo 2.3 Ghz for rv530 and a rather old AMD Athlon-64 PC for r600.
I used a few different systems:
Brazos reference board for Brazos (E-350) Iconia Tab W500 (C-50) PhenomII X4 for most of the discrete cards Athlon64 X2 for RS690
The results I got were consistent across all those systems. The only differences I saw between different generations of Avivo-based hardware were about the exact timing of the threshold when I would start seeing flickering. On R5xx and R6xx it would start flickering at -3. On newer hardware at -4 (I didn't test on R7xx).
I'm worried that your observed reliability of update_pending on >= AVIVO asics could be an artifact of the specific computer hw you used and that this doesn't generalize on older or different hw.
Given the number of different ASICs and systems, I think that's unlikely. It's more likely that this depends on the exact mode timings. I was running most of my experiments with a 19" DFP 1280x1024 at 60Hz, connected by DVI if available, or VGA on some of the IGPs.
It's possible that different mode timings would result in a slightly different threshold.
If it doesn't generalize then the patch could defer a lot of perfectly good pageflips by 1 frame, screwing the timestamps and reducing framerate.
I understand your concern. But given my observations, the current implementation potentially produces much more annoying artefacts.
Here is what the rv630 register programming guide says about the double-buffering of the surface base address registers:
D1GRPH_SURFACE_UPDATE_PENDING: "the double buffering occurs in vertical retrace when D1GRPH_SURFACE_UPDATE_PENDING = 1 and D1GRPH_UPDATE_LOCK = 0 and V_UPDATE = 1."
As far as I can tell, D1GRPH_SURFACE_UPDATE_PENDING will only be 1, if the page flip is initiated while outside the vertical retrace. The the actual page flip will occur when V_UPDATE changes to 1, that is, when the next vertical retrace occurs. So if we read D1GRPH_SURFACE_UPDATE_PENDING after initiating a page flip, it implies that we're currently not in a vertical retrace.
D1CRTC_V_UPDATE: "Current vertical position ... 1 = within the V_UPDATE region (between end of vertical active display and start_line)"
For us that would mean the threshold for deferred flip completion would need to be whatever that mentioned "start_line" is, and for the tested avivo hw, start_line used to be == start of active scanout, so the threshold of zero made sense.
In my experiments the start_line seemed to be between -4 and -2. On no ASIC did I observe a start_line == 0.
Alex: I don't know where start_line is stored. Could it be that this value changed due to hw changes or changes in the kms driver or atombios? Or would it be possible to read that value back from some register and use it as threshold?
If you look at radeon_get_crtc_scanoutpos() you can also see that the returned vpos is corrected for some offsets. Could it be that something changed there for the DCE4.1 or DCE5 display engine or whatever the C-50 APU uses? That could also explain why suddenly such a weird threshold as vpos > -4 is needed on your tablet, because the returned vpos is offset wrongly by a few scanlines.
I believe the reason I observe flickering on my tablet is due to the short vsync period. Otherwise there is nothing special about my tablet. I was able to induce the same flickering by artificially delaying the page flip on other Avivo hardware. My proposed fix resolves this potential problem on all Avivo hardware. I am very confident that it does that without deferring page flip notifications to user space unnecessarily.
Ok, you had a larger test set of machines and more specific test cases for this problem, i'm convinced. Thanks for all the testing. You have my...
Reviewed-by: Mario Kleiner mario.kleiner@tuebingen.mpg.de
Wrt. to that new CRTC_ADVANCED_START_LINE_POSITION register on >= DCE4.1 that Alex found. Seems its default setting of 0x4 is the reason for the flicker threshold at vpos -4. Maybe it would by worth trying for the W500 tablet or devices/modes with similar short vblank intervals to reprogram that register with 0? That would give more headroom and maybe reduce your 25% number of deferred flips? Or is this likely to cause some hw trouble or artifacts because some line buffer can't fetch pixel data fast enough?
best, -mario
On 12-02-28 09:40 AM, Mario Kleiner wrote:
Ok, you had a larger test set of machines and more specific test cases for this problem, i'm convinced. Thanks for all the testing. You have my...
Reviewed-by: Mario Kleiner mario.kleiner@tuebingen.mpg.de
Thanks. Is there anything else I need to do to get this into drm-next?
Wrt. to that new CRTC_ADVANCED_START_LINE_POSITION register on >= DCE4.1 that Alex found. Seems its default setting of 0x4 is the reason for the flicker threshold at vpos -4. Maybe it would by worth trying for the W500 tablet or devices/modes with similar short vblank intervals to reprogram that register with 0? That would give more headroom and maybe reduce your 25% number of deferred flips? Or is this likely to cause some hw trouble or artifacts because some line buffer can't fetch pixel data fast enough?
I'll ask our display team for advice.
Regards, Felix
best, -mario
On Tue, Feb 28, 2012 at 3:06 PM, Felix Kuehling felix.kuehling@amd.com wrote:
On 12-02-28 09:40 AM, Mario Kleiner wrote:
Ok, you had a larger test set of machines and more specific test cases for this problem, i'm convinced. Thanks for all the testing. You have my...
Reviewed-by: Mario Kleiner mario.kleiner@tuebingen.mpg.de
Thanks. Is there anything else I need to do to get this into drm-next?
Nope, its in there now.
Dave.
Am Mittwoch, den 29.02.2012, 10:36 +0000 schrieb Dave Airlie:
On Tue, Feb 28, 2012 at 3:06 PM, Felix Kuehling felix.kuehling@amd.com wrote:
On 12-02-28 09:40 AM, Mario Kleiner wrote:
Ok, you had a larger test set of machines and more specific test cases for this problem, i'm convinced. Thanks for all the testing. You have my...
Reviewed-by: Mario Kleiner mario.kleiner@tuebingen.mpg.de
Thanks. Is there anything else I need to do to get this into drm-next?
Nope, its in there now.
Great. The URL is [1].
Thanks,
Paul
[1] http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-core-next&id=81...
dri-devel@lists.freedesktop.org