I am working on a client/server program, where the server creates (and has access to a framebuffer), and then needs to share this framebuffer with a client program so that this client program can draw into the framebuffer directly (i.e. no memcpy).
I am trying to figureout what the “cleanest” way to do this is, such that I can support Intel’s proprietary driver, the open source AMD and NVidia drivers, and the VMWare driver (I have no need for the proprietary ADM/NVidia drivers right now). From what I can tell, GEM is one way to do this. The problem is VMWare doesn’t support GEM.
I tried (knowing it would not work), using KMS to create the framebuffer, and then sending the information needed to mmap to the client. This of course failed because the framebuffer is marked non-sharable in the kernel.
To be clear, I am fine having to manually write ioctls for each driver, if thats what it takes. But at this point, I am at a loss on the best method to share scannot buffers (or at least in a way that doesn’t make someone cringe when they see my code).
Thanks, - Rian
On Fri, Jan 17, 2014 at 6:43 AM, Rian Quinn rianquinn@gmail.com wrote:
I am working on a client/server program, where the server creates (and has access to a framebuffer), and then needs to share this framebuffer with a client program so that this client program can draw into the framebuffer directly (i.e. no memcpy).
I am trying to figureout what the “cleanest” way to do this is, such that I can support Intel’s proprietary driver, the open source AMD and NVidia drivers, and the VMWare driver (I have no need for the proprietary ADM/NVidia drivers right now). From what I can tell, GEM is one way to do this. The problem is VMWare doesn’t support GEM.
I tried (knowing it would not work), using KMS to create the framebuffer, and then sending the information needed to mmap to the client. This of course failed because the framebuffer is marked non-sharable in the kernel.
Dmabuf (or just plain old egl/glx which uses dri2 under the hood) would probably be what I suggest *except* you mention mmap. If you are doing software rendering, I guess you probably just want to suck it up and do XShmPutImage.
From what I understand, any sort of mmap access to vmwgfx buffers is
tricky, because they end up being backed by normal GL textures on the host OS side (IIUC). So the single copy upload path in XShmPutImage might be close to the ideal path for sw rendered content.
To be clear, I am fine having to manually write ioctls for each driver, if thats what it takes. But at this point, I am at a loss on the best method to share scannot buffers (or at least in a way that doesn’t make someone cringe when they see my code).
Some sort of prepare/finish access ioctls for dmabuf to bracket mmap access are, I think, what vmwgfx is missing in order to implement dmabuf mmap. But no one so far has needed them badly enough to come up with something and send patches.
IIUC, for vmwgfx there would still be a copy back to original texture on the host on finish-access, so it might not amount to anything much different from XShmPutImage. Probably better to ask some vmwgfx folks to clarify, since the virtual driver has some unique constraints which I may not be adequately representing.
BR, -R
Thanks,
- Rian
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Thanks for reply.
Actually in my case, when I say client/server, I mean replacement for X, so XShmPutImage won’t work. What we are trying to do is actually similar to Wayland, and need to provide each client with a scannot buffer for direct rendering, while at the same time, providing a means for the server to read/write to all of the scannot buffers.
We have looked into EGL (EGL_DRM_BUFFER_USE_SCANOUT_MESA), but I have yet to code this up (will likely work on that this week), to see if it will work for us.
We started with an example that we found, that was coded wrong in many ways. The biggest issue though, was that the buffer allocated using libkms could not be shared. We also looked into drmAddBuf / drmMapBuf, from what I could tell, nobody was using them. GEM seems to be the option of choice but vmwgfx doesn’t support that yet.
Thanks, - Rian
Sent with Unibox
On Jan 20, 2014, at 8:10 AM, Rob Clark robdclark@gmail.com wrote:
On Fri, Jan 17, 2014 at 6:43 AM, Rian Quinn rianquinn@gmail.com wrote:
I am working on a client/server program, where the server creates (and has access to a framebuffer), and then needs to share this framebuffer with a client program so that this client program can draw into the framebuffer directly (i.e. no memcpy).
I am trying to figureout what the “cleanest” way to do this is, such that I can support Intel’s proprietary driver, the open source AMD and NVidia drivers, and the VMWare driver (I have no need for the proprietary ADM/NVidia drivers right now). From what I can tell, GEM is one way to do this. The problem is VMWare doesn’t support GEM.
I tried (knowing it would not work), using KMS to create the framebuffer, and then sending the information needed to mmap to the client. This of course failed because the framebuffer is marked non-sharable in the kernel.
Dmabuf (or just plain old egl/glx which uses dri2 under the hood) would probably be what I suggest *except* you mention mmap. If you are doing software rendering, I guess you probably just want to suck it up and do XShmPutImage.
From what I understand, any sort of mmap access to vmwgfx buffers is tricky, because they end up being backed by normal GL textures on the host OS side (IIUC). So the single copy upload path in XShmPutImage might be close to the ideal path for sw rendered content.
To be clear, I am fine having to manually write ioctls for each driver, if thats what it takes. But at this point, I am at a loss on the best method to share scannot buffers (or at least in a way that doesn’t make someone cringe when they see my code).
Some sort of prepare/finish access ioctls for dmabuf to bracket mmap access are, I think, what vmwgfx is missing in order to implement dmabuf mmap. But no one so far has needed them badly enough to come up with something and send patches.
IIUC, for vmwgfx there would still be a copy back to original texture on the host on finish-access, so it might not amount to anything much different from XShmPutImage. Probably better to ask some vmwgfx folks to clarify, since the virtual driver has some unique constraints which I may not be adequately representing.
BR, -R
Thanks,
- Rian
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 01/20/2014 02:10 PM, Rob Clark wrote:
On Fri, Jan 17, 2014 at 6:43 AM, Rian Quinn rianquinn@gmail.com wrote:
I am working on a client/server program, where the server creates (and has access to a framebuffer), and then needs to share this framebuffer with a client program so that this client program can draw into the framebuffer directly (i.e. no memcpy).
I am trying to figureout what the “cleanest” way to do this is, such that I can support Intel’s proprietary driver, the open source AMD and NVidia drivers, and the VMWare driver (I have no need for the proprietary ADM/NVidia drivers right now). From what I can tell, GEM is one way to do this. The problem is VMWare doesn’t support GEM.
I tried (knowing it would not work), using KMS to create the framebuffer, and then sending the information needed to mmap to the client. This of course failed because the framebuffer is marked non-sharable in the kernel.
Dmabuf (or just plain old egl/glx which uses dri2 under the hood) would probably be what I suggest *except* you mention mmap. If you are doing software rendering, I guess you probably just want to suck it up and do XShmPutImage.
From what I understand, any sort of mmap access to vmwgfx buffers is
tricky, because they end up being backed by normal GL textures on the host OS side (IIUC). So the single copy upload path in XShmPutImage might be close to the ideal path for sw rendered content.
To be clear, I am fine having to manually write ioctls for each driver, if thats what it takes. But at this point, I am at a loss on the best method to share scannot buffers (or at least in a way that doesn’t make someone cringe when they see my code).
Some sort of prepare/finish access ioctls for dmabuf to bracket mmap access are, I think, what vmwgfx is missing in order to implement dmabuf mmap. But no one so far has needed them badly enough to come up with something and send patches.
IIUC, for vmwgfx there would still be a copy back to original texture on the host on finish-access, so it might not amount to anything much different from XShmPutImage. Probably better to ask some vmwgfx folks to clarify, since the virtual driver has some unique constraints which I may not be adequately representing.
BR, -R
Rian, for sharing accelerated buffers, They'd best be created with Mesa's GBM, and shared using DRM prime. Those interfaces are generic and AFAICT, Ubuntu's Mir works exactly this way. The problem is that the client would need to link to mesa, and could use GL / GLES to transfer software contents to the buffer.
For pure software contents, the server would share a generic shared memory buffer, together with a damage protocol, and composite / copy onto the framebuffer in the server.
In principle, as Rob says, the dma-buf shared using DRM prime has an mmap() method, but none of the "big" drivers Intel, Nouveau and Radeon implements it, and for vmwgfx an implementation would be extremely inefficient. Also other drivers may have issues with write-combining and tiling of the mmap'ed framebuffer.
If both the server and client would be 100% software one could create and share "dumb" kms buffers using DRM prime. If there's something in the vmwgfx driver that blocks sharing in this way, we could ease that restriction. But these buffers could not be rendered into.
GEM is, BTW, purely driver-private.
As you might see, the big restriction here is that there is no simple generic way to mmap() accelerated shared buffers from a lean client. This is intentional. For vmwgfx it's because of coherency issues that would make such an implementation inefficient. For other drivers I can imagine there are tiling- and caching issues.
/Thomas
Thanks,
- Rian
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Yeah we looked into GDM. We already link into Mesa, but I was also concerned about having to use GL to render into the buffers, as the format of the buffer is (correct me if I am wrong) specific to the graphics card (i.e. it's not a simple ARGB format which I need).
Could you point me to some docs, or headers for DRM prime. I think that using simple “dumb” KMS buffers should work fine.
In my use case, I actually have to have a memcpy because there will only be one scanout buffer managed by the server. Each client needs to directly render (ARGB) to a framebuffer, and then the server will memcpy the contents it wants, to the scanout buffer. In that past, we used a CPU memcpy, but we would like to use a DMA transfer going forward. Our plan was to use Mesa/GL to do a BLT_SRC_COPY. Will the “dumb” buffer support this?
I guess this brings up a performance question too. Is it better to render into system memory, and then DMA from that to video memory, or is it better to render into video memory, and DMA between video memory?
- Rian
Sent with Unibox
On Jan 20, 2014, at 8:41 AM, Thomas Hellstrom thomas@shipmail.org wrote:
On 01/20/2014 02:10 PM, Rob Clark wrote:
On Fri, Jan 17, 2014 at 6:43 AM, Rian Quinn rianquinn@gmail.com wrote:
I am working on a client/server program, where the server creates (and has access to a framebuffer), and then needs to share this framebuffer with a client program so that this client program can draw into the framebuffer directly (i.e. no memcpy).
I am trying to figureout what the “cleanest” way to do this is, such that I can support Intel’s proprietary driver, the open source AMD and NVidia drivers, and the VMWare driver (I have no need for the proprietary ADM/NVidia drivers right now). From what I can tell, GEM is one way to do this. The problem is VMWare doesn’t support GEM.
I tried (knowing it would not work), using KMS to create the framebuffer, and then sending the information needed to mmap to the client. This of course failed because the framebuffer is marked non-sharable in the kernel.
Dmabuf (or just plain old egl/glx which uses dri2 under the hood)
would probably be what I suggest *except* you mention mmap. If you are doing software rendering, I guess you probably just want to suck it up and do XShmPutImage.
From what I understand, any sort of mmap access to vmwgfx buffers is
tricky, because they end up being backed by normal GL textures on the
host OS side (IIUC). So the single copy upload path in XShmPutImage might be close to the ideal path for sw rendered content.
To be clear, I am fine having to manually write ioctls for each driver, if thats what it takes. But at this point, I am at a loss on the best method to share scannot buffers (or at least in a way that doesn’t make someone cringe when they see my code).
Some sort of prepare/finish access ioctls for dmabuf to bracket mmap
access are, I think, what vmwgfx is missing in order to implement dmabuf mmap. But no one so far has needed them badly enough to come up with something and send patches.
IIUC, for vmwgfx there would still be a copy back to original texture on the host on finish-access, so it might not amount to anything much different from XShmPutImage. Probably better to ask some vmwgfx folks to clarify, since the virtual driver has some unique constraints which I may not be adequately representing.
BR, -R
Rian, for sharing accelerated buffers, They'd best be created with Mesa's GBM, and shared using DRM prime. Those interfaces are generic and AFAICT, Ubuntu's Mir works exactly this way. The problem is that the client would need to link to mesa, and could use GL / GLES to transfer software contents to the buffer.
For pure software contents, the server would share a generic shared memory buffer, together with a damage protocol, and composite / copy onto the framebuffer in the server.
In principle, as Rob says, the dma-buf shared using DRM prime has an mmap() method, but none of the "big" drivers Intel, Nouveau and Radeon implements it, and for vmwgfx an implementation would be extremely inefficient. Also other drivers may have issues with write-combining and tiling of the mmap'ed framebuffer.
If both the server and client would be 100% software one could create and share "dumb" kms buffers using DRM prime. If there's something in the vmwgfx driver that blocks sharing in this way, we could ease that restriction. But these buffers could not be rendered into.
GEM is, BTW, purely driver-private.
As you might see, the big restriction here is that there is no simple generic way to mmap() accelerated shared buffers from a lean client. This is intentional. For vmwgfx it's because of coherency issues that would make such an implementation inefficient. For other drivers I can imagine there are tiling- and caching issues.
/Thomas
Thanks,
- Rian
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 01/20/2014 04:21 PM, Rian Quinn wrote:
Yeah we looked into GDM. We already link into Mesa, but I was also concerned about having to use GL to render into the buffers, as the format of the buffer is (correct me if I am wrong) specific to the graphics card (i.e. it's not a simple ARGB format which I need).
I think for sharing buffers that can be rendered into by the GPU, GBM + EGL is the API of choice. I'm not aware of any docs, but Jesse Barnes has made a small writeup:
http://virtuousgeek.org/blog/index.php/jbarnes
GBM has been designed to do just this, and to hide the driver differences (GEM, TTM objects) from the user.
It sounds very much that you're after a model identical to Ubuntu's Mir where the servers creates the shared buffers, (or Wayland's DRM compositor for that matter if you want the clients to create the framebuffers and then share them with the server). (At least my understanding :). Perhaps a good starting point would be to look at one of (or both) these compositors low-level DRM code?
Could you point me to some docs, or headers for DRM prime. I think that using simple “dumb” KMS buffers should work fine.
In my use case, I actually have to have a memcpy because there will only be one scanout buffer managed by the server. Each client needs to directly render (ARGB) to a framebuffer, and then the server will memcpy the contents it wants, to the scanout buffer. In that past, we used a CPU memcpy, but we would like to use a DMA transfer going forward. Our plan was to use Mesa/GL to do a BLT_SRC_COPY. Will the “dumb” buffer support this?
No. You can never use accelerated rendering to or from dumb buffers. Also reading from dumb buffers with the CPU may be painfully slow on some architectures where they are put in VRAM or write-combined memory.
I guess this brings up a performance question too. Is it better to render into system memory, and then DMA from that to video memory, or is it better to render into video memory, and DMA between video memory?
It's highly GPU-dependant. Many older GPUs do not support rendering into system memory.
- Rian
/Thomas
Sent with Unibox http://www.uniboxapp.com/t/sig
On Jan 20, 2014, at 8:41 AM, Thomas Hellstrom thomas@shipmail.org wrote:
On 01/20/2014 02:10 PM, Rob Clark wrote:
On Fri, Jan 17, 2014 at 6:43 AM, Rian Quinn <rianquinn@gmail.com> wrote: I am working on a client/server program, where the server creates (and has access to a framebuffer), and then needs to share this framebuffer with a client program so that this client program can draw into the framebuffer directly (i.e. no memcpy). I am trying to figureout what the “cleanest” way to do this is, such that I can support Intel’s proprietary driver, the open source AMD and NVidia drivers, and the VMWare driver (I have no need for the proprietary ADM/NVidia drivers right now). From what I can tell, GEM is one way to do this. The problem is VMWare doesn’t support GEM. I tried (knowing it would not work), using KMS to create the framebuffer, and then sending the information needed to mmap to the client. This of course failed because the framebuffer is marked non-sharable in the kernel. Dmabuf (or just plain old egl/glx which uses dri2 under the hood) would probably be what I suggest *except* you mention mmap. If you are doing software rendering, I guess you probably just want to suck it up and do XShmPutImage. From what I understand, any sort of mmap access to vmwgfx buffers is tricky, because they end up being backed by normal GL textures on the host OS side (IIUC). So the single copy upload path in XShmPutImage might be close to the ideal path for sw rendered content. To be clear, I am fine having to manually write ioctls for each driver, if thats what it takes. But at this point, I am at a loss on the best method to share scannot buffers (or at least in a way that doesn’t make someone cringe when they see my code). Some sort of prepare/finish access ioctls for dmabuf to bracket mmap access are, I think, what vmwgfx is missing in order to implement dmabuf mmap. But no one so far has needed them badly enough to come up with something and send patches. IIUC, for vmwgfx there would still be a copy back to original texture on the host on finish-access, so it might not amount to anything much different from XShmPutImage. Probably better to ask some vmwgfx folks to clarify, since the virtual driver has some unique constraints which I may not be adequately representing. BR, -R
Rian, for sharing accelerated buffers, They'd best be created with Mesa's GBM, and shared using DRM prime. Those interfaces are generic and AFAICT, Ubuntu's Mir works exactly this way. The problem is that the client would need to link to mesa, and could use GL / GLES to transfer software contents to the buffer.
For pure software contents, the server would share a generic shared memory buffer, together with a damage protocol, and composite / copy onto the framebuffer in the server.
In principle, as Rob says, the dma-buf shared using DRM prime has an mmap() method, but none of the "big" drivers Intel, Nouveau and Radeon implements it, and for vmwgfx an implementation would be extremely inefficient. Also other drivers may have issues with write-combining and tiling of the mmap'ed framebuffer.
If both the server and client would be 100% software one could create and share "dumb" kms buffers using DRM prime. If there's something in the vmwgfx driver that blocks sharing in this way, we could ease that restriction. But these buffers could not be rendered into.
GEM is, BTW, purely driver-private.
As you might see, the big restriction here is that there is no simple generic way to mmap() accelerated shared buffers from a lean client. This is intentional. For vmwgfx it's because of coherency issues that would make such an implementation inefficient. For other drivers I can imagine there are tiling- and caching issues.
/Thomas
Thanks, - Rian _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org