On Wed, Feb 3, 2021 at 5:14 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jan 19, 2021 at 5:03 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jan 19, 2021 at 4:20 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jan 19, 2021 at 03:34:47PM +0100, Daniel Vetter wrote:
On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote:
On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter daniel.vetter@ffwll.ch wrote: > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > the region") /dev/kmem zaps ptes when the kernel requests exclusive > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > the default for all driver uses. > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > support. Let's plug that hole. > > For revoke_devmem() to work we need to link our vma into the same > address_space, with consistent vma->vm_pgoff. ->pgoff is already > adjusted, because that's how (io_)remap_pfn_range works, but for the > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > to adjust this at at ->open time: > > - for sysfs this is easy, now that binary attributes support this. We > just set bin_attr->mapping when mmap is supported > - for procfs it's a bit more tricky, since procfs pci access has only > one file per device, and access to a specific resources first needs > to be set up with some ioctl calls. But mmap is only supported for > the same resources as sysfs exposes with mmap support, and otherwise > rejected, so we can set the mapping unconditionally at open time > without harm. > > A special consideration is for arch_can_pci_mmap_io() - we need to > make sure that the ->f_mapping doesn't alias between ioport and iomem > space. There's only 2 ways in-tree to support mmap of ioports: generic > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > architecture hand-rolling. Both approach support ioport mmap through a > special pfn range and not through magic pte attributes. Aliasing is > therefore not a problem. > > The only difference in access checks left is that sysfs PCI mmap does > not check for CAP_RAWIO. I'm not really sure whether that should be > added or not. > > Acked-by: Bjorn Helgaas bhelgaas@google.com > Reviewed-by: Dan Williams dan.j.williams@intel.com > Signed-off-by: Daniel Vetter daniel.vetter@intel.com > Cc: Jason Gunthorpe jgg@ziepe.ca > Cc: Kees Cook keescook@chromium.org > Cc: Dan Williams dan.j.williams@intel.com > Cc: Andrew Morton akpm@linux-foundation.org > Cc: John Hubbard jhubbard@nvidia.com > Cc: Jérôme Glisse jglisse@redhat.com > Cc: Jan Kara jack@suse.cz > Cc: Dan Williams dan.j.williams@intel.com > Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: Bjorn Helgaas bhelgaas@google.com > Cc: linux-pci@vger.kernel.org > Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch > -- > v2: > - Totally new approach: Adjust filp->f_mapping at open time. Note that > this now works on all architectures, not just those support > ARCH_GENERIC_PCI_MMAP_RESOURCE > --- > drivers/pci/pci-sysfs.c | 4 ++++ > drivers/pci/proc.c | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index d15c881e2e7e..3f1c31bc0b7c 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_io->read = pci_read_legacy_io; > b->legacy_io->write = pci_write_legacy_io; > b->legacy_io->mmap = pci_mmap_legacy_io; > + b->legacy_io->mapping = iomem_get_mapping(); > pci_adjust_legacy_attr(b, pci_mmap_io); > error = device_create_bin_file(&b->dev, b->legacy_io); > if (error) > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_mem->size = 1024*1024; > b->legacy_mem->attr.mode = 0600; > b->legacy_mem->mmap = pci_mmap_legacy_mem; > + b->legacy_io->mapping = iomem_get_mapping();
Unlike the normal pci stuff below, the legacy files here go boom because they're set up much earlier in the boot sequence. This only affects HAVE_PCI_LEGACY architectures, which aren't that many. So what should we do here now:
- drop the devmem revoke for these
- rework the init sequence somehow to set up these files a lot later
- redo the sysfs patch so that it doesn't take an address_space
pointer, but instead a callback to get at that (since at open time everything is set up). Imo rather ugly
- ditch this part of the series (since there's not really any takers
for the latter parts it might just not make sense to push for this)
- something else?
Bjorn, Greg, thoughts?
What sysfs patch are you referring to here?
Currently in linux-next:
commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD -> topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup) Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Fri Nov 27 17:41:25 2020 +0100
sysfs: Support zapping of binary attr mmaps
Or the patch right before this one in this submission here:
https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vetter@ff...
Ah. Hm, a callback in the sysfs file logic seems really hairy, so I would prefer that not happen. If no one really needs this stuff, why not just drop it like you mention?
Well it is needed, but just on architectures I don't care about much. Most relevant is perhaps powerpc (that's where Stephen hit the issue). I do wonder whether we could move the legacy pci files setup to where the modern stuff is set up from pci_create_resource_files() or maybe pci_create_sysfs_dev_files() even for HAVE_PCI_LEGACY. I think that might work, but since it's legacy flow on some funny architectures (alpha, itanium, that kind of stuff) I have no idea what kind of monsters I'm going to anger :-)
Back from a week of vacation, I looked at this again and I think shouldn't be hard to fix this with the sam trick pci_create_sysfs_dev_files() uses: As long as sysfs_initialized isn't set we skip, and then later on when the vfs is up&running we can initialize everything.
To be able to apply the same thing to pci_create_legacy_files() I think all I need is to iterate overa all struct pci_bus in pci_sysfs_init() and we're good. Unfortunately I didn't find any for_each_pci_bus(), so how do I do that?
pci_find_next_bus() seems to be the answer I want. I'll see whether that works and then send out new patches. -Daniel