On Wed, 4 Sep 2019 at 19:17, Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Sep 4, 2019 at 10:35 AM Feng Tang feng.tang@intel.com wrote:
Hi Daniel,
On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann tzimmermann@suse.de wrote:
Hi
Am 04.09.19 um 08:27 schrieb Feng Tang:
Thank you for testing. But don't get too excited, because the patch simulates a bug that was present in the original mgag200 code. A significant number of frames are simply skipped. That is apparently the reason why it's faster.
Thanks for the detailed info, so the original code skips time-consuming work inside atomic context on purpose. Is there any space to optmise it? If 2 scheduled update worker are handled at almost same time, can one be skipped?
To my knowledge, there's only one instance of the worker. Re-scheduling the worker before a previous instance started, will not create a second instance. The worker's instance will complete all pending updates. So in some way, skipping workers already happens.
So I think that the most often fbcon update from atomic context is the blinking cursor. If you disable that one you should be back to the old performance level I think, since just writing to dmesg is from process context, so shouldn't change.
Hmm, then for the old driver, it should also do the most update in non-atomic context?
One other thing is, I profiled that updating a 3MB shadow buffer needs 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with the cache setting of DRM shadow buffer? say the orginal code use a cachable buffer?
Hm, that would indicate the write-combining got broken somewhere. This should definitely be faster. Also we shouldn't transfer the hole thing, except when scrolling ...
First rule of fbcon usage, you are always effectively scrolling.
Also these devices might be on a PCIE 1x piece of wet string, not sure if the numbers reflect that.
Dave.