Re: Couple of issues with amdgpu on my WX4100

4 Jan 2021


      On Mon, 4 Jan 2021 21:13:53 +0100
Christian König christian.koenig@amd.com wrote:
...
Am 04.01.21 um 19:43 schrieb Alex Williamson:
...
On Mon, 4 Jan 2021 18:39:33 +0100
Christian König christian.koenig@amd.com wrote:
...
Am 04.01.21 um 17:45 schrieb Alex Williamson:
...
On Mon, 4 Jan 2021 12:34:34 +0100
Christian König christian.koenig@amd.com wrote:
[SNIP]
That's a rather bad idea. See our GPUs for example return way more than
they actually need.
E.g. a Polaris usually returns 4GiB even when only 2GiB are installed,
because 4GiB is just the maximum amount of RAM you can put together with
the ASIC on a board.
Would the driver fail or misbehave if the BAR is sized larger than the
amount of memory on the card or is memory size determined independently
of BAR size?
Uff, good question. I have no idea.
At least the Linux driver should behave well, but no idea about the 
Windows driver stack.
...
...
Some devices even return a mask of all 1 even when they need only 2MiB,
resulting in nearly 1TiB of wasted address space with this approach.
Ugh.  I'm afraid to ask why a device with a 2MiB BAR would implement a
REBAR capability, but I guess we really can't make any assumptions
about the breadth of SKUs that ASIC might support (or sanity of the
designers).
It's a standard feature for FPGAs these days since how much BAR you need 
depends on what you load on it, and that in turn usually only happens 
after the OS is already started and you fire up your development 
environment.
...
We could probe to determine the maximum size the host can support and
potentially emulate the capability to remove sizes that we can't
allocate, but without any ability for the device to reject a size
advertised as supported via the capability protocol it makes me nervous
how we can guarantee the resources are available when the user
re-configures the device.  That might mean we'd need to reserve the
resources, up to what the host can support, regardless of what the
device can actually use.  I'm not sure how else to know how much to
reserve without device specific code in vfio-pci.  Thanks,
Well in the FPGA case I outlined above you don't really know how much 
BAR you need until the setup is completed.
E.g. you could need one BAR with just 2MiB and another with 128GB, or 
two with 64GB or.... That's the reason why somebody came up with the 
REBAR standard in the first place.
Yes, I suppose without a full bus-reset and soft-hotplug event,
resizable BARs are the best way to reconfigure a device based on FPGA
programming.  Anyway, thanks for the insights here.
...
I think I can summarize that static resizing might work for some devices 
like our GPUs, but it doesn't solve the problem in general.
Yup, I don't have a good approach for the general case for a VM yet.  We
could add a sysfs or side channel mechanism to preconfigure a BAR size,
but once we're dealing with a VM interacting with the REBAR capability
itself, it's far too easy for the guest to create a configuration that
the host might not have bus resources to support, especially if there
are multiple resizable BARs under a bridge.  Thanks,
Alex

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Couple of issues with amdgpu on my WX4100