On 30.04.2012 18:26, Jerome Glisse wrote:
On Mon, Apr 30, 2012 at 11:37 AM, Christian König deathsimple@vodafone.de wrote:
On 30.04.2012 17:12, Jerome Glisse wrote:
On Mon, Apr 30, 2012 at 11:12 AM, Jerome Glissej.glisse@gmail.com wrote:
On Mon, Apr 30, 2012 at 10:50 AM, Christian König deathsimple@vodafone.de wrote:
Hi Dave,
if nobody has a last moment concern please include the following patches in drm-next.
Except for some minor fixes they have already been on the list for quite some time, but I intentional left out the debugfs related patches cause we haven't finished the discussion about them yet.
If you prefer to merge them directly, I also made them available as reset-rework branch here: git://people.freedesktop.org/~deathsimple/linux
Cheers, Christian.
I am not completely ok, i am against patch 5. I need sometime to review it.
Cheers, Jerome
Sorr mean patch 7
I just started to wonder :) what's wrong with patch 7?
Please keep in mind that implementing proper locking in the lower level objects allows us to remove quite some locking in the upper layers.
By the way, do you mind when I dig into the whole locking stuff of the kernel driver a bit more? There seems to be allot of possibilities to cleanup and simplify the overall driver.
Cheers, Christian.
Well when it comes to lock, i have some kind of an idea i wanted to work for a while. GPU is transaction based in a sense, we feed rings and it works on them, 90% of the work is cs ioctl so we should be able to have one and only one lock (ignoring the whole modesetting path here for which i believe one lock too is enough). Things like PM would need to take all lock but hey i believe that's a good thing as my understanding is PM we do right now need most of the GPU idle.
So here is what we have ih spinlock (can be ignored ie left as is) irq spinlock (can be ignored ie left as is) blit mutex pm mutex cs mutex dc_hw_i2c mutex vram mutex ib mutex rings[] lock
So the real issue is ttm calling back into the driver. So the idea i had is to have a work thread that is the only one allowed to mess with the GPU. The work thread would use some locking in which it has preference over the writer (writer being cs ioctl or ttm call back or anything else that needs to schedule GPU work). This would require only two lock. I am actually thinking of having 2 list one where writer add there "transaction" and one where the worker empty the transaction, something like:
cs_ioct { sa_alloc_for_trans .... lock(trans_lock) list_add_tail(trans, trans_temp_list) unlock(trans_lock) }
workeur { lock(trans_lock) list_splice_tail(trans_list, trans_temp_list) unlock(trans_lock) // schedule ib .... // process fence& semaphore }
So there would be one transaction lock and one lock for the transaction memory allocation (ib, semaphore and alike) The workeur would also be responsible for GPU reset, there wouldn't be any ring lock as i believe one worker is enough for all rings.
For transaction the only issue really is ttm, cs is easy it's an ib+fence+semaphore, ttm can be more complex, it can be bo move, bind, unbind, ... Anyway it's doable and it's design i had in mind for a while.
Well, that sounds like the direction to go, but I would like to even avoid the submission thread, since the GPU is working on the rings asynchronously anyway.
The locking problems we are currently seeing are more a result of abusing global variables for local data. or locking to much code with to few primitives, instead of just having to much locking primitives over all. For example, I really can't see why we have a blit mutex for the r600 shader blit code? Also retrospective having one mutex per ring does sound a bit superfluous.
To sum my ideas up:
1. I suggest that memory management is self containing, e.g. you can request small amounts of memory for IBs, fences, semaphores, blitting vertex buffer etc.. without worrying about others doing that at the same time as you. That really sounds like your "lock for the transaction memory allocation", and I'm pretty sure that I've extended the SA so far to play that role pretty well.
2. Have exactly ONE ring submission mutex. This mutex is taken right before an job (and it shouldn't matter if that's a ttm move/blit or an IB) is pushed unto a ring. And it is strictly not allowed to allocate more SA memory, call into TTM etc.. while this mutex is held. Everything that's necessary to submit a job must happen before grabbing it and stored in thread local memory, e.g. on the stack or a kmalloc allocated patch of memory.
3. Protect data, not code! I.e. have locks to protect one data structures and not just say: Those two things shouldn't happen at the same time acquire that and this lock.
Over all it sounds like we are playing with similar ideas, I would suggest that I just hack together some patches, probably starting with the ring submission and r600 blit mutex and then we take a look again where this leads us.
Cheers, Christian.