The return value of pci_read_config_*() may not indicate a device error. However, the value read by these functions is more likely to indicate this kind of error. This presents two overlapping ways of reporting errors and complicates error checking.
It is possible to move to one single way of checking for error if the dependencies on the return value of these functions are removed, then it can later be made to return void.
Remove all uses of the return value of pci_read_config_*(). Check the actual value read for ~0. In this case, ~0 is an invalid value thus it indicates some kind of error.
In some cases it madkes sence to make the calling function return void without causing any bug. Future callers can use the value obtained from these functions for validation. This case pertain to cs5536_read() and edac_pci_read_dword()
MERGE: There is no dependency. Merge individually
Saheed O. Bolarinwa (17): ata: Drop uses of pci_read_config_*() return value atm: Drop uses of pci_read_config_*() return value bcma: Drop uses of pci_read_config_*() return value hwrng: Drop uses of pci_read_config_*() return value dmaengine: ioat: Drop uses of pci_read_config_*() return value edac: Drop uses of pci_read_config_*() return value fpga: altera-cvp: Drop uses of pci_read_config_*() return value gpio: Drop uses of pci_read_config_*() return value drm/i915/vga: Drop uses of pci_read_config_*() return value hwmon: Drop uses of pci_read_config_*() return value intel_th: pci: Drop uses of pci_read_config_*() return value i2c: Drop uses of pci_read_config_*() return value ide: Drop uses of pci_read_config_*() return value IB: Drop uses of pci_read_config_*() return value iommu/vt-d: Drop uses of pci_read_config_*() return value mtd: Drop uses of pci_read_config_*() return value net: Drop uses of pci_read_config_*() return value
drivers/ata/pata_cs5536.c | 6 +-- drivers/ata/pata_rz1000.c | 3 +- drivers/atm/eni.c | 3 +- drivers/atm/he.c | 12 +++-- drivers/atm/idt77252.c | 9 ++-- drivers/atm/iphase.c | 46 ++++++++++--------- drivers/atm/lanai.c | 4 +- drivers/atm/nicstar.c | 3 +- drivers/atm/zatm.c | 9 ++-- drivers/bcma/host_pci.c | 6 ++- drivers/char/hw_random/amd-rng.c | 6 +-- drivers/dma/ioat/dma.c | 6 +-- drivers/edac/amd64_edac.c | 8 ++-- drivers/edac/amd8111_edac.c | 16 ++----- drivers/edac/amd8131_edac.c | 6 +-- drivers/edac/i82443bxgx_edac.c | 3 +- drivers/edac/sb_edac.c | 12 +++-- drivers/edac/skx_common.c | 18 +++++--- drivers/fpga/altera-cvp.c | 8 ++-- drivers/gpio/gpio-amd8111.c | 7 ++- drivers/gpio/gpio-rdc321x.c | 21 +++++---- drivers/gpu/drm/i915/display/intel_vga.c | 3 +- drivers/hwmon/i5k_amb.c | 12 +++-- drivers/hwmon/vt8231.c | 8 ++-- drivers/hwtracing/intel_th/pci.c | 12 ++--- drivers/i2c/busses/i2c-ali15x3.c | 6 ++- drivers/i2c/busses/i2c-elektor.c | 3 +- drivers/i2c/busses/i2c-nforce2.c | 4 +- drivers/i2c/busses/i2c-sis5595.c | 17 ++++--- drivers/i2c/busses/i2c-sis630.c | 7 +-- drivers/i2c/busses/i2c-viapro.c | 11 +++-- drivers/ide/cs5536.c | 6 +-- drivers/ide/rz1000.c | 3 +- drivers/ide/setup-pci.c | 26 +++++++---- drivers/infiniband/hw/hfi1/pcie.c | 38 +++++++-------- drivers/infiniband/hw/mthca/mthca_reset.c | 19 ++++---- drivers/iommu/intel/iommu.c | 6 ++- drivers/mtd/maps/ichxrom.c | 4 +- drivers/net/can/peak_canfd/peak_pciefd_main.c | 6 ++- drivers/net/can/sja1000/peak_pci.c | 6 ++- drivers/net/ethernet/agere/et131x.c | 11 +++-- .../ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 5 +- .../cavium/liquidio/cn23xx_pf_device.c | 4 +- drivers/net/ethernet/marvell/sky2.c | 5 +- drivers/net/ethernet/mellanox/mlx4/catas.c | 7 +-- drivers/net/ethernet/mellanox/mlx4/reset.c | 10 ++-- .../net/ethernet/myricom/myri10ge/myri10ge.c | 4 +- drivers/net/wan/farsync.c | 5 +- .../broadcom/brcm80211/brcmfmac/pcie.c | 4 +- .../net/wireless/intel/iwlwifi/pcie/trans.c | 15 ++++-- 50 files changed, 270 insertions(+), 209 deletions(-)
The return value of pci_read_config_*() may not indicate a device error. However, the value read by these functions is more likely to indicate this kind of error. This presents two overlapping ways of reporting errors and complicates error checking.
It is possible to move to one single way of checking for error if the dependency on the return value of these functions is removed, then it can later be made to return void.
Remove all uses of the return value of pci_read_config_*(). Check the actual value read for ~0. In this case, ~0 is an invalid value thus it indicates some kind of error.
Suggested-by: Bjorn Helgaas bjorn@helgaas.com Signed-off-by: Saheed O. Bolarinwa refactormyself@gmail.com --- drivers/gpu/drm/i915/display/intel_vga.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_vga.c b/drivers/gpu/drm/i915/display/intel_vga.c index be333699c515..6f9406699c9d 100644 --- a/drivers/gpu/drm/i915/display/intel_vga.c +++ b/drivers/gpu/drm/i915/display/intel_vga.c @@ -99,7 +99,8 @@ intel_vga_set_state(struct drm_i915_private *i915, bool enable_decode) unsigned int reg = INTEL_GEN(i915) >= 6 ? SNB_GMCH_CTRL : INTEL_GMCH_CTRL; u16 gmch_ctrl;
- if (pci_read_config_word(i915->bridge_dev, reg, &gmch_ctrl)) { + pci_read_config_word(i915->bridge_dev, reg, &gmch_ctrl); + if (gmch_ctrl == (u16)~0) { drm_err(&i915->drm, "failed to read control word\n"); return -EIO; }
On Sat, Aug 01, 2020 at 01:24:29PM +0200, Saheed O. Bolarinwa wrote:
The return value of pci_read_config_*() may not indicate a device error. However, the value read by these functions is more likely to indicate this kind of error. This presents two overlapping ways of reporting errors and complicates error checking.
So why isn't the *value check done in the pci_read_config_* functions instead of touching gazillion callers?
For example, pci_conf{1,2}_read() could check whether the u32 *value it just read depending on the access method, whether that value is ~0 and return proper PCIBIOS_ error in that case.
The check you're replicating
if (val32 == (u32)~0)
everywhere, instead, is just ugly and tests a naked value ~0 which doesn't mean anything...
On 8/1/20 5:56 AM, Borislav Petkov wrote:
On Sat, Aug 01, 2020 at 01:24:29PM +0200, Saheed O. Bolarinwa wrote:
The return value of pci_read_config_*() may not indicate a device error. However, the value read by these functions is more likely to indicate this kind of error. This presents two overlapping ways of reporting errors and complicates error checking.
So why isn't the *value check done in the pci_read_config_* functions instead of touching gazillion callers?
For example, pci_conf{1,2}_read() could check whether the u32 *value it just read depending on the access method, whether that value is ~0 and return proper PCIBIOS_ error in that case.
The check you're replicating
if (val32 == (u32)~0)
everywhere, instead, is just ugly and tests a naked value ~0 which doesn't mean anything...
I agree, if there is a change, it should be in the pci_read_* functions.
Anything returning void should not fail and likely future users of the proposed change will not do the extra checks.
Tom
On 8/1/20 2:56 PM, Borislav Petkov wrote:
On Sat, Aug 01, 2020 at 01:24:29PM +0200, Saheed O. Bolarinwa wrote:
The return value of pci_read_config_*() may not indicate a device error. However, the value read by these functions is more likely to indicate this kind of error. This presents two overlapping ways of reporting errors and complicates error checking.
So why isn't the *value check done in the pci_read_config_* functions instead of touching gazillion callers?
Because the value ~0 has a meaning to some drivers and only drivers have this knowledge. For those cases more checks will be needed to ensure that it is an error that has actually happened.
For example, pci_conf{1,2}_read() could check whether the u32 *value it just read depending on the access method, whether that value is ~0 and return proper PCIBIOS_ error in that case.
The primary goal is to make pci_config_read*() return void, so that there is *only* one way to check for error i.e. through the obtained value. Again, only the drivers can determine if ~0 is a valid value. This information is not available inside pci_config_read*().
- Saheed
On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote:
Because the value ~0 has a meaning to some drivers and only
No, ~0 means that the PCI read failed. For *every* PCI device I know.
Here's me reading from 0xf0 offset of my hostbridge:
# setpci -s 00:00.0 0xf0.l 01000000
That device doesn't have extended config space, so the last valid byte is 0xff. Let's read beyond that:
# setpci -s 00:00.0 0x100.l ffffffff
Again, only the drivers can determine if ~0 is a valid value. This information is not available inside pci_config_read*().
Of course it is.
*every* change you've done in 6/17 - this is the only patch I have received - checks for == ~0. So that check can just as well be moved inside pci_config_read_*().
Here's how one could do it:
#define PCI_OP_READ(size, type, len) \ int noinline pci_bus_read_config_##size \ (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \ { \ int res; \ unsigned long flags; \ u32 data = 0; \ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ pci_lock_config(flags); \ res = bus->ops->read(bus, devfn, pos, len, &data); \
/* Check we actually read something which is not all 1s.*/ if (data == ~0) return PCIBIOS_READ_FAILED;
*value = (type)data; \ pci_unlock_config(flags); \ return res; \ }
Also, I'd prefer a function to *not* return void but return either an error or success. In the success case, the @value argument can be consumed by the caller and otherwise not.
In any case, that change is a step in the wrong direction and I don't like it, sorry.
On Sun, Aug 02, 2020 at 08:46:48PM +0200, Borislav Petkov wrote:
On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote:
Because the value ~0 has a meaning to some drivers and only
No, ~0 means that the PCI read failed. For *every* PCI device I know.
Wait, I'm not convinced yet. I know that if a PCI read fails, you normally get ~0 data because the host bridge fabricates it to complete the CPU load.
But what guarantees that a PCI config register cannot contain ~0? If there's something about that in the spec I'd love to know where it is because it would simplify a lot of things.
I don't think we should merge any of these patches as-is. If we *do* want to go this direction, we at least need some kind of macro or function that tests for ~0 so we have a clue about what's happening and can grep for it.
Bjorn
On Sun, Aug 02, 2020 at 02:14:06PM -0500, Bjorn Helgaas wrote:
Wait, I'm not convinced yet. I know that if a PCI read fails, you normally get ~0 data because the host bridge fabricates it to complete the CPU load.
But what guarantees that a PCI config register cannot contain ~0?
Well, I don't think you can differentiate that case, right?
I guess this is where the driver knowledge comes into play: if the read returns ~0, the pci_read_config* should probably return in that case something like:
PCIBIOS_READ_MAYBE_FAILED
to denote it is all 1s and then the caller should be able to determine, based on any of domain:bus:slot.func and whatever else the driver knows about its hardware, whether the 1s are a valid value or an error. Hopefully.
Or something better of which I cannot think of right now...
dri-devel@lists.freedesktop.org