Hi All,
I have an application I'm working on where I'm using OpenGLES / EGL and dri/drm/kms. The main loop of my application looks like the code below. When running htop, I see that the number of minor faults (memory) are increasing over time at a rate of about 500 per second, due to the code below. Is this normal and something to worry about, and is there a way to get rid of the minor faults? I'm on the rockchip rk3288 platform. The faults do not come from my OpenGLES code.
while (true) { struct gbm_bo *next_bo; int waiting_for_flip = 1;
// do OpenGLES stuff ...
eglSwapBuffers(gl.display, gl.surface); next_bo = gbm_surface_lock_front_buffer(gbm.surface); g_fb = drm_fb_get_from_bo(next_bo);
ret = drmModePageFlip(drm.fd, drm.crtc_id, g_fb->fb_id, DRM_MODE_PAGE_FLIP_EVENT, &waiting_for_flip); if (ret) { printf("failed to queue page flip: %s\n", strerror(errno)); return -1; }
while (waiting_for_flip) { ret = select(drm.fd + 1, &fds, NULL, NULL, NULL); if (ret < 0) { printf("select err: %s\n", strerror(errno)); return ret; } else if (ret == 0) { printf("select timeout!\n"); return -1; } else if (FD_ISSET(0, &fds)) { printf("user interrupted!\n"); break; } drmHandleEvent(drm.fd, &evctx); } gbm_surface_release_buffer(gbm.surface, g_bo); g_bo = next_bo; }
Thanks! Bert
Hi Bert,
On Tue, Oct 26, 2021 at 05:18:47PM -0700, Bert Schiettecatte wrote:
I have an application I'm working on where I'm using OpenGLES / EGL and dri/drm/kms. The main loop of my application looks like the code below. When running htop, I see that the number of minor faults (memory) are increasing over time at a rate of about 500 per second, due to the code below. Is this normal and something to worry about, and is there a way to get rid of the minor faults? I'm on the rockchip rk3288 platform. The faults do not come from my OpenGLES code.
Coincidentally, I've been looking at Panfrost on RK3288 this week as well! I'm testing it with a project that has been using the binary blob driver for several years and unfortunately Panfrost seems to use ~15% more CPU.
Like you, I see a huge number of minor faults (~500/second compared with ~3/second on libmali). It seems that Panfrost is mmap'ing and munmap'ing buffers on every frame which doesn't happen when the same application is using the binary driver.
I've testing Linux 5.10.76 and 5.15-rc7 along with Mesa 20.3.5, 21.1.8 and main (as of 704340f0f67) and there's no real difference in performance across any of those.
Panfrost experts, is there a missed opportunity for optimisation here? Or is there something applications should be doing differently to avoid repeatedly mapping & unmapping the same buffers?
Thanks, John
Hi John
Coincidentally, I've been looking at Panfrost on RK3288 this week as well! I'm testing it with a project that has been using the binary blob driver for several years and unfortunately Panfrost seems to use ~15% more CPU. Like you, I see a huge number of minor faults (~500/second compared with ~3/second on libmali). It seems that Panfrost is mmap'ing and munmap'ing buffers on every frame which doesn't happen when the same application is using the binary driver.
Thanks for confirming you are seeing the same issue.
Panfrost experts, is there a missed opportunity for optimisation here? Or is there something applications should be doing differently to avoid repeatedly mapping & unmapping the same buffers?
Panfrost team - any update on this?
Thanks Bert
On 01/11/2021 05:20, Bert Schiettecatte wrote:
Hi John
Coincidentally, I've been looking at Panfrost on RK3288 this week as well! I'm testing it with a project that has been using the binary blob driver for several years and unfortunately Panfrost seems to use ~15% more CPU. Like you, I see a huge number of minor faults (~500/second compared with ~3/second on libmali). It seems that Panfrost is mmap'ing and munmap'ing buffers on every frame which doesn't happen when the same application is using the binary driver.
Thanks for confirming you are seeing the same issue.
Panfrost experts, is there a missed opportunity for optimisation here? Or is there something applications should be doing differently to avoid repeatedly mapping & unmapping the same buffers?
Panfrost team - any update on this?
I was hoping Alyssa would comment since she's much more familiar with Mesa than I am!
On the first point of libmali not performing mmap()s very often - I'll just note that this was a specific design goal and for example the kbase kernel driver provides ioctl()s to do CPU cache maintenance for this to work on arm platforms (i.e. it's not a portable solution).
So short answer: yes there is room for optimisation here.
However things get tricky when fitting into a portable framework. The easiest way of ensuring cache coherency is to ensure there is a clear owner - so the usual approach is mmap(), read/write some data on the CPU, munmap(), GPU accesses data, repeat. The DMA framework in the kernel will then ensure that any cache maintenance/bounce buffering or other quirks are dealt with.
Having said that we know that existing platforms don't require these 'quirks' (because libmali works on them) so in theory it should be possible for Mesa to avoid the mmap()/munmap() dance in many cases (where the memory is coherent with the GPU[1]). But this is where my knowledge of Mesa is lacking as I've no idea how to go about that.
Regards, Steve
[1] I think this should actually be true all the time with Panfrost as the buffer is mapped write-combining on the CPU if the GPU isn't fully coherent. But I haven't double checked this.
dri-devel@lists.freedesktop.org