Hi Dave,
It is a little urgent, so I am writing this right now. As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
_______________________________________________ . . . kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page . . . kernel: [ 34.310807] Call Trace: kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] . . . _______________________________________________
The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. It appears that Linux 5.10's drm-next code broke the code.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
On Mon, 19 Oct 2020 at 05:15, Kevin Brace kevinbrace@gmx.com wrote:
Hi Dave,
It is a little urgent, so I am writing this right now. As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
cc'ing Christian,
I can't remember if we did this deliberately or if just worked by accident previously.
Either way, you should probably need a ttm_tt_create going forward.
Dave.
. . . kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page . . . kernel: [ 34.310807] Call Trace: kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] . . . _______________________________________________
The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. It appears that Linux 5.10's drm-next code broke the code.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Dave,
Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null. It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from. I think making huge changes to TTM during this development cycle broke OpenChrome DRM. Following up on the question I raised during the previous e-mail. Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality? I feel like that should be the expected behavior. Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.). Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
1) nouveau, radeon, and amdgpu already use TTM TT functionality. 2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage. 3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members. That implementation worked fine until the previous development cycle code. Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 12:50 PM From: "Dave Airlie" airlied@gmail.com To: "Kevin Brace" kevinbrace@gmx.com, "Christian König" ckoenig.leichtzumerken@gmail.com Cc: "dri-devel" dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
On Mon, 19 Oct 2020 at 05:15, Kevin Brace kevinbrace@gmx.com wrote:
Hi Dave,
It is a little urgent, so I am writing this right now. As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
cc'ing Christian,
I can't remember if we did this deliberately or if just worked by accident previously.
Either way, you should probably need a ttm_tt_create going forward.
Dave.
. . . kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page . . . kernel: [ 34.310807] Call Trace: kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] . . . _______________________________________________
The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. It appears that Linux 5.10's drm-next code broke the code.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Kevin,
the basic problem you are facing is that ttm_tt_create/destroy is mandatory (It always was). You need an implementation or otherwise you won't be able to use the system domain (additional to the optional GTT domain).
My best guess is that the difference is that we now force to initiate the system domain for all drivers.
If that is correct you just that you never ran into because you never correctly initialized TTM to support buffer moves.
I'm not sure what exactly the OpenChrome DRM driver is doing, but I strongly suggest to just drop TTM support completely and use the GEM VRAM helper layer instead.
Regards, Christian.
Am 19.10.20 um 09:23 schrieb Kevin Brace:
Hi Dave,
Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null. It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from. I think making huge changes to TTM during this development cycle broke OpenChrome DRM. Following up on the question I raised during the previous e-mail. Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality? I feel like that should be the expected behavior. Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.). Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
- nouveau, radeon, and amdgpu already use TTM TT functionality.
- ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
- OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members. That implementation worked fine until the previous development cycle code. Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 12:50 PM From: "Dave Airlie" airlied@gmail.com To: "Kevin Brace" kevinbrace@gmx.com, "Christian König" ckoenig.leichtzumerken@gmail.com Cc: "dri-devel" dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
On Mon, 19 Oct 2020 at 05:15, Kevin Brace kevinbrace@gmx.com wrote:
Hi Dave,
It is a little urgent, so I am writing this right now. As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
cc'ing Christian,
I can't remember if we did this deliberately or if just worked by accident previously.
Either way, you should probably need a ttm_tt_create going forward.
Dave.
. . . kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page . . . kernel: [ 34.310807] Call Trace: kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] . . . _______________________________________________
The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. It appears that Linux 5.10's drm-next code broke the code.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Christian,
I looked into a few more things, and figured out why OpenChrome DRM was not booting X Server. Now the situation is under control in my side of the world (OpenChrome development world), and I got X Server working again with drm-next 5.10 code and OpenChrome DRM. Code will be committed into OpenChrome DRM upstream repository's drm-next-5.10 branch in the next few days. I can see that OpenChrome DRM "just happened to" work without ttm_tt_create/destroy callbacks being specified. I can fix this issue easily. The other issue I was facing was that commit 48e07c23cbeba2a2cda7ca73be0015e727818536 (drm/ttm: nuke memory type flags) discontinued TTM_PL_FLAG_* flags. This caused incompatibility between OpenChrome DDX and the updated OpenChrome DRM that incorporated the changes made with commit 48e07c2. OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM. I changed OpenChrome DDX code to use TTM_PL_* instead for allocating memory via TTM, and OpenChrome DRM based Linux kernel was able to boot X Server properly. Again, the code change to the DDX will happen in a few days. Before I moved on from this issue, I will like to ask for one request regarding TTM and its callbacks. Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks? Although I did not bring it up with you back then, I did get hit with an issue similar to this 3 years ago.
https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-4.... https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-4....
I did not send you an e-mail about it, I do recall complaining to Alex Deucher about the above io_mem_pfn null pointer bug at XDC2017 (I did attend it, and I presented about the state of OpenChrome Project.). I brought this up with Alex because I stuck for more than 2 weeks to figure out the issue (the commit for overcoming io_mem_pfn callback issue was made on 2017-09-13, but the commit prior to that was on 2017-08-28).
https://cgit.freedesktop.org/openchrome/drm-openchrome/log/?h=drm-next-4.13&...
Anyway, someone else got hit by the same issue a month or two later
https://lists.freedesktop.org/archives/dri-devel/2017-November/159002.html https://lists.freedesktop.org/archives/dri-devel/2017-December/161168.html
These are the commits that fixed the issue.
https://cgit.freedesktop.org/drm/drm/commit/?id=ea642c3216cb2a60d1c0e760ae47... https://cgit.freedesktop.org/drm/drm/commit/?id=c67fa6edc8b11afe22c88a239631...
If you compare the commit date, I figured it out io_mem_pfn null pointer bug 2 to 3 months earlier. I think the problem we got with the latest code change to the TTM is, many TTM callbacks do not check for a null pointer for mandatory callbacks. Also, the ttm_bo_driver struct comment section inside ttm_bo_driver.h does not make it 100% clear that what is mandatory and what is not. If these were done, I am sure I would have left ttm_tt_create/destroy callbacks intact.
https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-5.... (scroll down to openchrome/openchrome_ttm.c)
Regarding your suggestion that I should abandon OpenChrome DRM's current GEM / TTM implementation and instead I should use the GEM VRAM helpers, since I was able to figure out the issues with some suggestions from you and Dave, I do not think it is worth switching to GEM VRAM helpers at this point. Obviously, the current implementation does not support acceleration, hence, it appears to be a good candidate for switching to GEM VRAM helpers. While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Monday, October 19, 2020 at 3:13 AM From: "Christian König" ckoenig.leichtzumerken@gmail.com To: "Kevin Brace" kevinbrace@gmx.com, "Dave Airlie" airlied@gmail.com, "dri-devel" dri-devel@lists.freedesktop.org Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin,
the basic problem you are facing is that ttm_tt_create/destroy is mandatory (It always was). You need an implementation or otherwise you won't be able to use the system domain (additional to the optional GTT domain).
My best guess is that the difference is that we now force to initiate the system domain for all drivers.
If that is correct you just that you never ran into because you never correctly initialized TTM to support buffer moves.
I'm not sure what exactly the OpenChrome DRM driver is doing, but I strongly suggest to just drop TTM support completely and use the GEM VRAM helper layer instead.
Regards, Christian.
Am 19.10.20 um 09:23 schrieb Kevin Brace:
Hi Dave,
Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null. It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from. I think making huge changes to TTM during this development cycle broke OpenChrome DRM. Following up on the question I raised during the previous e-mail. Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality? I feel like that should be the expected behavior. Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.). Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
- nouveau, radeon, and amdgpu already use TTM TT functionality.
- ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
- OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members. That implementation worked fine until the previous development cycle code. Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 12:50 PM From: "Dave Airlie" airlied@gmail.com To: "Kevin Brace" kevinbrace@gmx.com, "Christian König" ckoenig.leichtzumerken@gmail.com Cc: "dri-devel" dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
On Mon, 19 Oct 2020 at 05:15, Kevin Brace kevinbrace@gmx.com wrote:
Hi Dave,
It is a little urgent, so I am writing this right now. As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
cc'ing Christian,
I can't remember if we did this deliberately or if just worked by accident previously.
Either way, you should probably need a ttm_tt_create going forward.
Dave.
. . . kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page . . . kernel: [ 34.310807] Call Trace: kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] . . . _______________________________________________
The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. It appears that Linux 5.10's drm-next code broke the code.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Kevin,
OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM.
Ugh, that would be an absolute no-go for upstreaming.
Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks?
I don't think that this is useful at all. See a BUG_ON() has the same result as a NULL pointer dereference.
While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers.
TTM is quite a mess and the effort here is essential to clean it up and kill most of the driver specific workarounds we have in there.
As the maintainer of all of this I would probably reject any newly added driver which is using the layer directly instead of the VRAM GEM wrapper.
Regards, Christian.
Am 19.10.20 um 18:20 schrieb Kevin Brace:
Hi Christian,
I looked into a few more things, and figured out why OpenChrome DRM was not booting X Server. Now the situation is under control in my side of the world (OpenChrome development world), and I got X Server working again with drm-next 5.10 code and OpenChrome DRM. Code will be committed into OpenChrome DRM upstream repository's drm-next-5.10 branch in the next few days. I can see that OpenChrome DRM "just happened to" work without ttm_tt_create/destroy callbacks being specified. I can fix this issue easily. The other issue I was facing was that commit 48e07c23cbeba2a2cda7ca73be0015e727818536 (drm/ttm: nuke memory type flags) discontinued TTM_PL_FLAG_* flags. This caused incompatibility between OpenChrome DDX and the updated OpenChrome DRM that incorporated the changes made with commit 48e07c2. OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM. I changed OpenChrome DDX code to use TTM_PL_* instead for allocating memory via TTM, and OpenChrome DRM based Linux kernel was able to boot X Server properly. Again, the code change to the DDX will happen in a few days. Before I moved on from this issue, I will like to ask for one request regarding TTM and its callbacks. Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks? Although I did not bring it up with you back then, I did get hit with an issue similar to this 3 years ago.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freed... https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freed...
I did not send you an e-mail about it, I do recall complaining to Alex Deucher about the above io_mem_pfn null pointer bug at XDC2017 (I did attend it, and I presented about the state of OpenChrome Project.). I brought this up with Alex because I stuck for more than 2 weeks to figure out the issue (the commit for overcoming io_mem_pfn callback issue was made on 2017-09-13, but the commit prior to that was on 2017-08-28).
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freed...
Anyway, someone else got hit by the same issue a month or two later
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free... https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
These are the commits that fixed the issue.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freed... https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freed...
If you compare the commit date, I figured it out io_mem_pfn null pointer bug 2 to 3 months earlier. I think the problem we got with the latest code change to the TTM is, many TTM callbacks do not check for a null pointer for mandatory callbacks. Also, the ttm_bo_driver struct comment section inside ttm_bo_driver.h does not make it 100% clear that what is mandatory and what is not. If these were done, I am sure I would have left ttm_tt_create/destroy callbacks intact.
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freed... (scroll down to openchrome/openchrome_ttm.c)
Regarding your suggestion that I should abandon OpenChrome DRM's current GEM / TTM implementation and instead I should use the GEM VRAM helpers, since I was able to figure out the issues with some suggestions from you and Dave, I do not think it is worth switching to GEM VRAM helpers at this point.
Obviously, the current implementation does not support acceleration, hence, it appears to be a good candidate for switching to GEM VRAM helpers. While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers.
Regards,
Kevin Brace Brace Computer Laboratory blog https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbracecompu...
Sent: Monday, October 19, 2020 at 3:13 AM From: "Christian König" ckoenig.leichtzumerken@gmail.com To: "Kevin Brace" kevinbrace@gmx.com, "Dave Airlie" airlied@gmail.com, "dri-devel" dri-devel@lists.freedesktop.org Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin,
the basic problem you are facing is that ttm_tt_create/destroy is mandatory (It always was). You need an implementation or otherwise you won't be able to use the system domain (additional to the optional GTT domain).
My best guess is that the difference is that we now force to initiate the system domain for all drivers.
If that is correct you just that you never ran into because you never correctly initialized TTM to support buffer moves.
I'm not sure what exactly the OpenChrome DRM driver is doing, but I strongly suggest to just drop TTM support completely and use the GEM VRAM helper layer instead.
Regards, Christian.
Am 19.10.20 um 09:23 schrieb Kevin Brace:
Hi Dave,
Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null. It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from. I think making huge changes to TTM during this development cycle broke OpenChrome DRM. Following up on the question I raised during the previous e-mail. Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality? I feel like that should be the expected behavior. Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.). Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
- nouveau, radeon, and amdgpu already use TTM TT functionality.
- ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
- OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members. That implementation worked fine until the previous development cycle code. Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
Regards,
Kevin Brace Brace Computer Laboratory blog https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbracecompu...
Sent: Sunday, October 18, 2020 at 12:50 PM From: "Dave Airlie" airlied@gmail.com To: "Kevin Brace" kevinbrace@gmx.com, "Christian König" ckoenig.leichtzumerken@gmail.com Cc: "dri-devel" dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
On Mon, 19 Oct 2020 at 05:15, Kevin Brace kevinbrace@gmx.com wrote:
Hi Dave,
It is a little urgent, so I am writing this right now. As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
cc'ing Christian,
I can't remember if we did this deliberately or if just worked by accident previously.
Either way, you should probably need a ttm_tt_create going forward.
Dave.
. . . kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page . . . kernel: [ 34.310807] Call Trace: kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] . . . _______________________________________________
The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. It appears that Linux 5.10's drm-next code broke the code.
Regards,
Kevin Brace Brace Computer Laboratory blog https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbracecompu...
dri-devel mailing list dri-devel@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.free...
Hi Kevin.
On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
I know you have been working on and off on the openchrome driver for a long time now. Any chance we will see the driver submitted for upstream soon?
Sam
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
1) It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
2) Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated. Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 2:04 PM From: "Sam Ravnborg" sam@ravnborg.org To: "Kevin Brace" kevinbrace@gmx.com Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin.
On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
I know you have been working on and off on the openchrome driver for a long time now. Any chance we will see the driver submitted for upstream soon?
Sam
Hi Kevin.
On Mon, Oct 19, 2020 at 09:43:08PM +0200, Kevin Brace wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
Maybe posting what you have now - and explain that it has this defect. Chances are that you will receive feedback that may help you on your way to fix this.
With all the infrastructure improvements made the last years I would be suprised if you have managed to include it all and maybe some of the infrastructure may help you.
Also I know we have seems some cursor plane related discussions the last months so maybe there are something to gain from the people involved there.
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated. Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
Likewise.
But obviously you shall not post it to dri-devel unless you are prepared to handle the feedback that you *may* get.
I promise to take a look - but that will cover mostly trivial stuff. You have to rely on others for all the stuff around atomic modestetting and the memory handling etc. - the areas where you have challenges now.
Sam
Hi
On 19.10.20 22:28, Sam Ravnborg wrote:
Hi Kevin.
On Mon, Oct 19, 2020 at 09:43:08PM +0200, Kevin Brace wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
Maybe posting what you have now - and explain that it has this defect. Chances are that you will receive feedback that may help you on your way to fix this.
With all the infrastructure improvements made the last years I would be suprised if you have managed to include it all and maybe some of the infrastructure may help you.
Also I know we have seems some cursor plane related discussions the last months so maybe there are something to gain from the people involved there.
I'd be interested in this as well. If you could share an URL to the repo, I'd take a look. I think I even have a Via machine somewhere to give it a try.
Best regards Thomas
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated. Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
Likewise.
But obviously you shall not post it to dri-devel unless you are prepared to handle the feedback that you *may* get.
I promise to take a look - but that will cover mostly trivial stuff. You have to rely on others for all the stuff around atomic modestetting and the memory handling etc. - the areas where you have challenges now.
Sam _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Oct 19, 2020 at 3:43 PM Kevin Brace kevinbrace@gmx.com wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated.
All drivers do this. You have one buffer for the fbdev console and then userspace allocates buffers it needs in addition, otherwise, you'd overwrite your console. You can disable it by not enabling the fbdev emulation if you don't want a console.
Alex
Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 2:04 PM From: "Sam Ravnborg" sam@ravnborg.org To: "Kevin Brace" kevinbrace@gmx.com Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin.
On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
I know you have been working on and off on the openchrome driver for a long time now. Any chance we will see the driver submitted for upstream soon?
Sam
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Oct 20, 2020 at 1:17 PM Alex Deucher alexdeucher@gmail.com wrote:
On Mon, Oct 19, 2020 at 3:43 PM Kevin Brace kevinbrace@gmx.com wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated.
All drivers do this. You have one buffer for the fbdev console and then userspace allocates buffers it needs in addition, otherwise, you'd overwrite your console. You can disable it by not enabling the fbdev emulation if you don't want a console.
I don't think disabling fbdev emulation is a practical thing to do for standard PCs.
Not sure what the capabilities of the underlying hardware are, but on nouveau, we use 16bpp/8bpp framebuffers by default (for fbdev) on boards with limited VRAM (64/32/16 MB, I don't remember precisely). You'd have to add C8 drm format support for this, but assuming you have that, the core supports it OK.
Ideally you'd have logic which is able to manage buffers and move them between GPU-accessible and GPU-inaccessible memory (e.g. TTM, or perhaps some evolution of it, I'm not up on all the latest).
Cheers,
-ilia
Hi
On 19.10.20 21:43, Kevin Brace wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
This could be related to the memory problems. See below. Otherwise, I suggest to reduce the driver to the minimum functionality that is required for modesetting (even without HW cursors) and submit this code for review and merging.
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated. Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
The double allocation is expected. Atomic modesetting requires two framebuffers in video memory during the pageflip. You have to sort out the modes where 2 framebuffers do not fit into video memory at the same time.
The mode_valid callback in struct drm_mode_config_funcs [1] is a good place to do this. Check the mode's pixels with the maximum BPC against the available memory. Example code is at [2]. You should also plane for common additional layers, such as HW cursors, to require video memory. So maybe test the mode against 80% of the video memory.
Best regards Thomas
[1] https://cgit.freedesktop.org/openchrome/drm-openchrome/tree/drivers/gpu/drm/...
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 2:04 PM From: "Sam Ravnborg" sam@ravnborg.org To: "Kevin Brace" kevinbrace@gmx.com Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin.
On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
I know you have been working on and off on the openchrome driver for a long time now. Any chance we will see the driver submitted for upstream soon?
Sam
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Oct 21, 2020 at 10:03 AM Thomas Zimmermann tzimmermann@suse.de wrote:
Hi
On 19.10.20 21:43, Kevin Brace wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
This could be related to the memory problems. See below. Otherwise, I suggest to reduce the driver to the minimum functionality that is required for modesetting (even without HW cursors) and submit this code for review and merging.
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated. Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
The double allocation is expected. Atomic modesetting requires two framebuffers in video memory during the pageflip. You have to sort out the modes where 2 framebuffers do not fit into video memory at the same time.
What we have done on severly restricted discrete gpu is to keep one framebuffer for everyone in vram, and blt the kms framebuffers over as needed. With all the dirty tracking helpers for atomic that's like a one-liner to set up (or just slightly more). I think cirrus works like that (but it's using cpu memcpy since that's the only thing that exists, I guess openchrome could even use the blitter for this).
The more usual approach is what nouveau guys already explained: Just run fbcon at very low resolution so it doesn't consume too much space. -Daniel
The mode_valid callback in struct drm_mode_config_funcs [1] is a good place to do this. Check the mode's pixels with the maximum BPC against the available memory. Example code is at [2]. You should also plane for common additional layers, such as HW cursors, to require video memory. So maybe test the mode against 80% of the video memory.
Best regards Thomas
[1] https://cgit.freedesktop.org/openchrome/drm-openchrome/tree/drivers/gpu/drm/...
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 2:04 PM From: "Sam Ravnborg" sam@ravnborg.org To: "Kevin Brace" kevinbrace@gmx.com Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin.
On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
I know you have been working on and off on the openchrome driver for a long time now. Any chance we will see the driver submitted for upstream soon?
Sam
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi
On 21.10.20 10:14, Daniel Vetter wrote:
On Wed, Oct 21, 2020 at 10:03 AM Thomas Zimmermann tzimmermann@suse.de wrote:
Hi
On 19.10.20 21:43, Kevin Brace wrote:
Hi Sam,
Thanks for asking the question. The current OpenChrome DRM code has these two major issues.
- It does not support atomic modesetting
I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository. In particular, it suffers from a freeze relating to a cursor plane. The freeze is a bad kind that kern.log does not really tell me what is wrong. If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay. In other words, I am getting close to getting atomic modesetting working, but I am stuck.
This could be related to the memory problems. See below. Otherwise, I suggest to reduce the driver to the minimum functionality that is required for modesetting (even without HW cursors) and submit this code for review and merging.
- Double allocation of visible portion of frame buffer
This is a big problem left behind from the previous developer who developed OpenChrome prior to me. For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code. I believe Radeon DRM does something similar to that. The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization. This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one. At this point, OpenChrome is not supporting double buffering. This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup. I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM? Any suggestions on what to do about this issue will be greatly appreciated. Perhaps, I should post a question to dri-devel regarding this issue. I really do not know what I should do at this point.
The double allocation is expected. Atomic modesetting requires two framebuffers in video memory during the pageflip. You have to sort out the modes where 2 framebuffers do not fit into video memory at the same time.
What we have done on severly restricted discrete gpu is to keep one framebuffer for everyone in vram, and blt the kms framebuffers over as needed. With all the dirty tracking helpers for atomic that's like a one-liner to set up (or just slightly more). I think cirrus works like that (but it's using cpu memcpy since that's the only thing that exists, I guess openchrome could even use the blitter for this).
A yes, converting to SHMEM is also an option. But it prevents any kind of 3d acceleration, if you want to add that as well.
The more usual approach is what nouveau guys already explained: Just run fbcon at very low resolution so it doesn't consume too much space.
A better approach is to kill openchromes custom fbdev implementation entirely. During a review, we'd probably ask you to do this anyway. :)
Go for the generic fbdev emulation and set struct drm_mode_config.prefer_shadow_fbdev to true. This will give you a shadow buffer for the console, and the actual fbdev framebuffer can be evicted from video memory when the space is required.
Best regards Thomas
-Daniel
The mode_valid callback in struct drm_mode_config_funcs [1] is a good place to do this. Check the mode's pixels with the maximum BPC against the available memory. Example code is at [2]. You should also plane for common additional layers, such as HW cursors, to require video memory. So maybe test the mode against 80% of the video memory.
Best regards Thomas
[1] https://cgit.freedesktop.org/openchrome/drm-openchrome/tree/drivers/gpu/drm/...
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Regards,
Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com
Sent: Sunday, October 18, 2020 at 2:04 PM From: "Sam Ravnborg" sam@ravnborg.org To: "Kevin Brace" kevinbrace@gmx.com Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" airlied@redhat.com Subject: Re: It appears drm-next TTM cleanup broke something . . .
Hi Kevin.
On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
I know you have been working on and off on the openchrome driver for a long time now. Any chance we will see the driver submitted for upstream soon?
Sam
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org