On Wed, Nov 28, 2012 at 4:51 PM, Marek Olšák maraeo@gmail.com wrote:
I think the problem with Radeon/TTM is much deeper. Let me demonstrate it on the following example.
Unigine Heaven needs about 385MB of space for static resources, that's only 75% of my 512MB card. Yet, TTM is not capable of getting all of that into VRAM. If I allow GTT placements, I get 20 fps, which is the old Mesa behavior. If I force VRAM placements, I get 3 fps, because we validate buffers 10 times per frame and there's probably a lot of buffer evictions during each validation.
In theory, we should get the best performance if Radeon/TTM managed to get everything into VRAM. That's what fglrx probably does. 75% of VRAM doesn't look like too much. And that's the problem. Even if we seemingly have enough memory, the current stack is not capable of using it efficiently.
Marek
If you read my second to last paragraph this is what i explain. Right now it inefficient because each cs try to revalidate things and thus trigger bo move which increase fragmentation of memory at each cs. As i explain a way better solution is to have a true heuristic in bo placement and to not revalidate in different location things at each cs. I was working on something like that, but the minimum residency time just fix most of the regression and is a lot simpler and lot smaller patch so i consider it as a temporary fix and i also believe that in itself it make sense by putting some boundary on buffer move frequency.
Cheers, Jerome