Hi all,
This is a revised version of patch 12 from my series to lock down some follow_pfn vs VM_SPECIAL races:
https://lore.kernel.org/dri-devel/CAKwvOdnSrsnTgPEuQJyaOTSkTP2dR9208Y66HQG_h...
Stephen reported an issue on HAVE_PCI_LEGACY platforms which this patch set tries to address. Previous patches are all still in linux-next.
Stephen, would be awesome if you can give this a spin.
Björn/Greg, review on the first patch is needed, I think that's the cleanest approach from all the options I discussed with Greg in this thread:
https://lore.kernel.org/dri-devel/CAKMK7uGrdDrbtj0OyzqQc0CGrQwc2F3tFJU9vLfm2...
Cheers, Daniel
Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org
Daniel Vetter (2): PCI: also set up legacy files only after sysfs init PCI: Revoke mappings like devmem
drivers/pci/pci-sysfs.c | 11 +++++++++++ drivers/pci/proc.c | 1 + 2 files changed, 12 insertions(+)
We are already doing this for all the regular sysfs files on PCI devices, but not yet on the legacy io files on the PCI buses. Thus far now problem, but in the next patch I want to wire up iomem revoke support. That needs the vfs up an running already to make so that iomem_get_mapping() works.
Wire it up exactly like the existing code. Note that pci_remove_legacy_files() doesn't need a check since the one for pci_bus->legacy_io is sufficient.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org --- drivers/pci/pci-sysfs.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index fb072f4b3176..0c45b4f7b214 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -927,6 +927,9 @@ void pci_create_legacy_files(struct pci_bus *b) { int error;
+ if (!sysfs_initialized) + return; + b->legacy_io = kcalloc(2, sizeof(struct bin_attribute), GFP_ATOMIC); if (!b->legacy_io) @@ -1448,6 +1451,7 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev) static int __init pci_sysfs_init(void) { struct pci_dev *pdev = NULL; + struct pci_bus *pbus = NULL; int retval;
sysfs_initialized = 1; @@ -1459,6 +1463,9 @@ static int __init pci_sysfs_init(void) } }
+ while ((pbus = pci_find_next_bus(pbus))) + pci_create_legacy_files(pbus); + return 0; } late_initcall(pci_sysfs_init);
[+cc Oliver, Pali, Krzysztof]
s/also/Also/ in subject
On Thu, Feb 04, 2021 at 05:58:30PM +0100, Daniel Vetter wrote:
We are already doing this for all the regular sysfs files on PCI devices, but not yet on the legacy io files on the PCI buses. Thus far now problem, but in the next patch I want to wire up iomem revoke support. That needs the vfs up an running already to make so that iomem_get_mapping() works.
s/now problem/no problem/ s/an running/and running/ s/so that/sure that/ ?
iomem_get_mapping() doesn't exist; I don't know what that should be.
Wire it up exactly like the existing code. Note that pci_remove_legacy_files() doesn't need a check since the one for pci_bus->legacy_io is sufficient.
I'm not sure exactly what you mean by "the existing code." I could probably figure it out, but it would save time to mention the existing function here.
This looks like another instance where we should really apply Oliver's idea of converting these to attribute_groups [1].
The cover letter mentions options discussed with Greg in [2], but I don't think the "sysfs_initialized" hack vs attribute_groups was part of that discussion.
It's not absolutely a show-stopper, but it *is* a shame to extend the sysfs_initialized hack if attribute_groups could do this more cleanly and help solve more than one issue.
Bjorn
[1] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqa... [2] https://lore.kernel.org/dri-devel/CAKMK7uGrdDrbtj0OyzqQc0CGrQwc2F3tFJU9vLfm2...
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org
drivers/pci/pci-sysfs.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index fb072f4b3176..0c45b4f7b214 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -927,6 +927,9 @@ void pci_create_legacy_files(struct pci_bus *b) { int error;
- if (!sysfs_initialized)
return;
- b->legacy_io = kcalloc(2, sizeof(struct bin_attribute), GFP_ATOMIC); if (!b->legacy_io)
@@ -1448,6 +1451,7 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev) static int __init pci_sysfs_init(void) { struct pci_dev *pdev = NULL;
struct pci_bus *pbus = NULL; int retval;
sysfs_initialized = 1;
@@ -1459,6 +1463,9 @@ static int __init pci_sysfs_init(void) } }
- while ((pbus = pci_find_next_bus(pbus)))
pci_create_legacy_files(pbus);
- return 0;
} late_initcall(pci_sysfs_init); -- 2.30.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, Feb 4, 2021 at 10:50 PM Bjorn Helgaas helgaas@kernel.org wrote:
[+cc Oliver, Pali, Krzysztof]
s/also/Also/ in subject
On Thu, Feb 04, 2021 at 05:58:30PM +0100, Daniel Vetter wrote:
We are already doing this for all the regular sysfs files on PCI devices, but not yet on the legacy io files on the PCI buses. Thus far now problem, but in the next patch I want to wire up iomem revoke support. That needs the vfs up an running already to make so that iomem_get_mapping() works.
s/now problem/no problem/ s/an running/and running/ s/so that/sure that/ ?
iomem_get_mapping() doesn't exist; I don't know what that should be.
Series is based on top of linux-next, where iomem_get_mapping exists. This patch fixes the 2nd patch in this series, which I had to take out of my branch because it failed.
Wire it up exactly like the existing code. Note that pci_remove_legacy_files() doesn't need a check since the one for pci_bus->legacy_io is sufficient.
I'm not sure exactly what you mean by "the existing code." I could probably figure it out, but it would save time to mention the existing function here.
Sorry, I meant the existing code in pci_create_sysfs_dev_files().
This looks like another instance where we should really apply Oliver's idea of converting these to attribute_groups [1].
The cover letter mentions options discussed with Greg in [2], but I don't think the "sysfs_initialized" hack vs attribute_groups was part of that discussion.
Hm not sure the attribute_groups works. The problem is that I cant set up the attributes before the vfs layer is initialized, because before that point the iomem_get_mapping function doesn't return anything useful (well it crashes), because it needs to have an inode available.
So if you want to set up the attributes earlier, we'd need some kind of callback, which Greg didn't like.
It's not absolutely a show-stopper, but it *is* a shame to extend the sysfs_initialized hack if attribute_groups could do this more cleanly and help solve more than one issue.
So I think I have yet another init ordering problem here, but not sure. -Daniel
Bjorn
[1] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqa... [2] https://lore.kernel.org/dri-devel/CAKMK7uGrdDrbtj0OyzqQc0CGrQwc2F3tFJU9vLfm2...
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org
drivers/pci/pci-sysfs.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index fb072f4b3176..0c45b4f7b214 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -927,6 +927,9 @@ void pci_create_legacy_files(struct pci_bus *b) { int error;
if (!sysfs_initialized)
return;
b->legacy_io = kcalloc(2, sizeof(struct bin_attribute), GFP_ATOMIC); if (!b->legacy_io)
@@ -1448,6 +1451,7 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev) static int __init pci_sysfs_init(void) { struct pci_dev *pdev = NULL;
struct pci_bus *pbus = NULL; int retval; sysfs_initialized = 1;
@@ -1459,6 +1463,9 @@ static int __init pci_sysfs_init(void) } }
while ((pbus = pci_find_next_bus(pbus)))
pci_create_legacy_files(pbus);
return 0;
} late_initcall(pci_sysfs_init); -- 2.30.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses.
Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole.
For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm.
A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem.
The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not.
Acked-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org --- drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 0c45b4f7b214..f8afd54ca3e1 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error) @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error) @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } } + if (res_attr->mmap) + res_attr->mapping = iomem_get_mapping(); res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num); diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0;
file->private_data = fpriv; + file->f_mapping = iomem_get_mapping();
return 0; }
I see I already acked this, but if you haven't merged it yet there are a few typos in the commit log:
On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote:
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses.
s/ptes/PTEs/
Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole.
s/there's two/there are two/
For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm.
s/pci access/PCI access/ s/a specific resources/a specific resource/
A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem.
s/There's only 2/There are only two/ s/pci mmap/PCI mmap/ s/Both approach/Both approaches/ s/pfn/PFN/ s/pte/PTE/
The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not.
Acked-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org
drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 0c45b4f7b214..f8afd54ca3e1 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io;
- b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error)
@@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem;
- b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error)
@@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } }
- if (res_attr->mmap)
res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num);res_attr->mapping = iomem_get_mapping();
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0;
file->private_data = fpriv;
file->f_mapping = iomem_get_mapping();
return 0;
}
2.30.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
[+cc Krzysztof, Pali, Oliver]
On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote:
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses.
Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole.
IIUC, the idea is that if a driver calls request_mem_region() on a PCI BAR, we prevent access to the BAR via sysfs. I guess I'm OK with that if it's a real security improvement or something.
But the downside of this implementation is that it depends on iomem_get_mapping(), which doesn't work until after fs_initcalls, which means the sysfs files cannot be static attributes of devices added before that. PCI devices are typically enumerated in subsys_initcall.
Krzysztof is converting PCI sysfs files (config, rom, reset, vpd, etc) to static attributes. This is a major improvement that could get rid of pci_create_sysfs_dev_files(), the late_initcall pci_sysfs_init(), and the "sysfs_initialized" hack. This would fix a race reported by Pali [1] (thanks to Oliver for the idea [2]).
EXCEPT that this revoke change means the "resource%d", "legacy_io", and "legacy_mem" files cannot be static attributes because of iomem_get_mapping().
Any ideas on how to deal with this? Having to keep the pci_sysfs_init() initcall just for these few files seems like the tail wagging the dog.
[1] https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali [2] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqa...
For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm.
A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem.
The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not.
Acked-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org
drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 0c45b4f7b214..f8afd54ca3e1 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io;
- b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error)
@@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem;
- b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error)
@@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } }
- if (res_attr->mmap)
res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num);res_attr->mapping = iomem_get_mapping();
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0;
file->private_data = fpriv;
file->f_mapping = iomem_get_mapping();
return 0;
}
2.30.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Sat, Mar 13, 2021 at 10:57 PM Bjorn Helgaas helgaas@kernel.org wrote:
[+cc Krzysztof, Pali, Oliver]
On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote:
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses.
Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole.
IIUC, the idea is that if a driver calls request_mem_region() on a PCI BAR, we prevent access to the BAR via sysfs. I guess I'm OK with that if it's a real security improvement or something.
Yup.
But the downside of this implementation is that it depends on iomem_get_mapping(), which doesn't work until after fs_initcalls, which means the sysfs files cannot be static attributes of devices added before that. PCI devices are typically enumerated in subsys_initcall.
Krzysztof is converting PCI sysfs files (config, rom, reset, vpd, etc) to static attributes. This is a major improvement that could get rid of pci_create_sysfs_dev_files(), the late_initcall pci_sysfs_init(), and the "sysfs_initialized" hack. This would fix a race reported by Pali [1] (thanks to Oliver for the idea [2]).
EXCEPT that this revoke change means the "resource%d", "legacy_io", and "legacy_mem" files cannot be static attributes because of iomem_get_mapping().
Any ideas on how to deal with this? Having to keep the pci_sysfs_init() initcall just for these few files seems like the tail wagging the dog.
It's a bit "pick your ugly". Either we have the late init call (not pretty), or the sysfs side needs a callback to fish out the address_space for the mmap at open() time, which didn't stir up much enthusiams with Greg because we need a new callback just for these mmio files. Either approach works. -Daniel
[1] https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali [2] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqa...
For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm.
A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem.
The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not.
Acked-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org
drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 0c45b4f7b214..f8afd54ca3e1 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io;
b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error)
@@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem;
b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error)
@@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } }
if (res_attr->mmap)
res_attr->mapping = iomem_get_mapping(); res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num);
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0;
file->private_data = fpriv;
file->f_mapping = iomem_get_mapping(); return 0;
}
2.30.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Daniel,
On Thu, 4 Feb 2021 17:58:29 +0100 Daniel Vetter daniel.vetter@ffwll.ch wrote:
Hi all,
This is a revised version of patch 12 from my series to lock down some follow_pfn vs VM_SPECIAL races:
https://lore.kernel.org/dri-devel/CAKwvOdnSrsnTgPEuQJyaOTSkTP2dR9208Y66HQG_h...
Stephen reported an issue on HAVE_PCI_LEGACY platforms which this patch set tries to address. Previous patches are all still in linux-next.
Stephen, would be awesome if you can give this a spin.
OK, I applied the 2 patches on top of next-20210205 and it no longer panics for my simple boot test (PowerPC pseries_le_defconfig under qemu).
dri-devel@lists.freedesktop.org