Hello,
Convert i915 to a new mount API and fix i915_gemfs_init() kernel Oops.
It also appears that we need to EXPORTs put_filesystem(), so i915 can properly put filesystem after it is done with kern_mount().
v2: - export put_filesystem() [Chris] - always put_filesystem() in i915_gemfs_init() [Chris] - improve i915_gemfs_init() error message [Chris]
Sergey Senozhatsky (3): fs: export put_filesystem() i915: convert to new mount API i915: do not leak module ref counter
drivers/gpu/drm/i915/gem/i915_gemfs.c | 33 +++++++++++++++++++-------- fs/filesystems.c | 1 + 2 files changed, 25 insertions(+), 9 deletions(-)
Modules can use get_fs_type(), which is exported, but are unable to put_filesystem(). Export it and let modules to also decrement corresponding file systems' reference counters.
Signed-off-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com --- fs/filesystems.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/filesystems.c b/fs/filesystems.c index 9135646e41ac..02669839b584 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -45,6 +45,7 @@ void put_filesystem(struct file_system_type *fs) { module_put(fs->owner); } +EXPORT_SYMBOL(put_filesystem);
static struct file_system_type **find_filesystem(const char *name, unsigned len) {
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
BUG: kernel NULL pointer dereference, address: 0000000000000000 PF: supervisor instruction fetch in kernel mode PF: error_code(0x0010) - not-present page RIP: 0010:0x0 Code: Bad RIP value. Call Trace: i915_gemfs_init+0x6e/0xa0 [i915] i915_gem_init_early+0x76/0x90 [i915] i915_driver_probe+0x30a/0x1640 [i915] ? kernfs_activate+0x5a/0x80 ? kernfs_add_one+0xdd/0x130 pci_device_probe+0x9e/0x110 really_probe+0xce/0x230 driver_probe_device+0x4b/0xc0 device_driver_attach+0x4e/0x60 __driver_attach+0x47/0xb0 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x61/0x90 bus_add_driver+0x167/0x1b0 driver_register+0x67/0xaa
Signed-off-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Reviewed-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/gem/i915_gemfs.c | 32 +++++++++++++++++++-------- 1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index 099f3397aada..feedc9242072 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -7,14 +7,17 @@ #include <linux/fs.h> #include <linux/mount.h> #include <linux/pagemap.h> +#include <linux/fs_context.h>
#include "i915_drv.h" #include "i915_gemfs.h"
int i915_gemfs_init(struct drm_i915_private *i915) { + struct fs_context *fc = NULL; struct file_system_type *type; struct vfsmount *gemfs; + bool ok = true;
type = get_fs_type("tmpfs"); if (!type) @@ -36,18 +39,29 @@ int i915_gemfs_init(struct drm_i915_private *i915) struct super_block *sb = gemfs->mnt_sb; /* FIXME: Disabled until we get W/A for read BW issue. */ char options[] = "huge=never"; - int flags = 0; - int err; - - err = sb->s_op->remount_fs(sb, &flags, options); - if (err) { - kern_unmount(gemfs); - return err; - } + + ok = false; + fc = fs_context_for_reconfigure(sb->s_root, 0, 0); + if (IS_ERR(fc)) + goto out; + + if (!fc->ops->parse_monolithic || + fc->ops->parse_monolithic(fc, options)) + goto out; + + if (fc->ops->reconfigure && !fc->ops->reconfigure(fc)) + ok = true; }
+out: + if (!ok) + dev_err(i915->drm.dev, + "Unable to reconfigure %s. %s\n", + "shmemfs for preferred allocation strategy", + "Continuing, but performance may suffer"); + if (!IS_ERR_OR_NULL(fc)) + put_fs_context(fc); i915->mm.gemfs = gemfs; - return 0; }
On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote:
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
Could you explain why the devil do you bother with remount at all? Why not pass the right options when mounting the damn thing?
On Mon, Aug 05, 2019 at 07:12:55PM +0100, Al Viro wrote:
On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote:
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
Could you explain why the devil do you bother with remount at all? Why not pass the right options when mounting the damn thing?
... and while we are at it, I really wonder what's going on with that gemfs thing - among the other things, this is the only user of shmem_file_setup_with_mnt(). Sure, you want your own options, but that brings another question - is there any reason for having the huge=... per-superblock rather than per-file?
After all, the readers of ->huge in mm/shmem.c are mm/shmem.c:582: (shmem_huge == SHMEM_HUGE_FORCE || sbinfo->huge) && is_huge_enabled(), sbinfo is an explicit argument
mm/shmem.c:1799: switch (sbinfo->huge) { shmem_getpage_gfp(), sbinfo comes from inode
mm/shmem.c:2113: if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) shmem_get_unmapped_area(), sb comes from file
mm/shmem.c:3531: if (sbinfo->huge) mm/shmem.c:3532: seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); ->show_options() mm/shmem.c:3880: switch (sbinfo->huge) { shmem_huge_enabled(), sbinfo comes from an inode
And the only caller of is_huge_enabled() is shmem_getattr(), with sbinfo picked from inode.
So is there any reason why the hugepage policy can't be per-file, with the current being overridable default?
On Mon, 5 Aug 2019, Al Viro wrote:
On Mon, Aug 05, 2019 at 07:12:55PM +0100, Al Viro wrote:
On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote:
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
Could you explain why the devil do you bother with remount at all? Why not pass the right options when mounting the damn thing?
... and while we are at it, I really wonder what's going on with that gemfs thing - among the other things, this is the only user of shmem_file_setup_with_mnt(). Sure, you want your own options, but that brings another question - is there any reason for having the huge=... per-superblock rather than per-file?
Yes: we want a default for how files of that superblock are to allocate their pages, without people having to fcntl or advise each of their files.
Setting aside the weirder options (within_size, advise) and emergency/ testing override (shmem_huge), we want files on an ordinary default tmpfs (huge=never) to be allocated with small pages (so users with access to that filesystem will not consume, and will not waste time and space on consuming, the more valuable huge pages); but files on a huge=always tmpfs to be allocated with huge pages whenever possible.
Or am I missing your point? Yes, hugeness can certainly be decided differently per-file, or even per-extent of file. That is already made possible through "judicious" use of madvise MADV_HUGEPAGE and MADV_NOHUGEPAGE on mmaps of the file, carried over from anon THP.
Though personally I'm averse to managing "f"objects through "m"interfaces, which can get ridiculous (notably, MADV_HUGEPAGE works on the virtual address of a mapping, but the huge-or-not alignment of that mapping must have been decided previously). In Google we do use fcntls F_HUGEPAGE and F_NOHUGEPAGE to override on a per-file basis - one day I'll get to upstreaming those.
Hugh
After all, the readers of ->huge in mm/shmem.c are mm/shmem.c:582: (shmem_huge == SHMEM_HUGE_FORCE || sbinfo->huge) && is_huge_enabled(), sbinfo is an explicit argument
mm/shmem.c:1799: switch (sbinfo->huge) { shmem_getpage_gfp(), sbinfo comes from inode
mm/shmem.c:2113: if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) shmem_get_unmapped_area(), sb comes from file
mm/shmem.c:3531: if (sbinfo->huge) mm/shmem.c:3532: seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); ->show_options() mm/shmem.c:3880: switch (sbinfo->huge) { shmem_huge_enabled(), sbinfo comes from an inode
And the only caller of is_huge_enabled() is shmem_getattr(), with sbinfo picked from inode.
So is there any reason why the hugepage policy can't be per-file, with the current being overridable default?
On Tue, Aug 06, 2019 at 12:50:10AM -0700, Hugh Dickins wrote:
that mapping must have been decided previously). In Google we do use fcntls F_HUGEPAGE and F_NOHUGEPAGE to override on a per-file basis - one day I'll get to upstreaming those.
That'd be nice - we could kill the i915 wierd private shmem instance, along with some kludges in mm/shmem.c.
On Mon, Aug 05, 2019 at 07:12:55PM +0100, Al Viro wrote:
On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote:
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
Could you explain why the devil do you bother with remount at all? Why not pass the right options when mounting the damn thing?
Incidentally, the only remaining modular user of get_fs_type() is the same i915_gemfs.c. And I wonder if we should aim for unexporting the damn thing instead of exporting put_filesystem()...
Note that users in tomoyo and apparmor are bogus - they are in the instances of ill-defined method that needs to be split and moved, with the lookups (fs type included) replaced with callers passing the values they look up and will end up using.
IOW, outside of core VFS we have very few legitimate users, and the one in kernel/trace might be better off as vfs_submount_by_name().
On (08/05/19 19:12), Al Viro wrote: [..]
On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote:
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
Could you explain why the devil do you bother with remount at all?
I would redirect this question to i915 developers. As far as I know i915 performance suffers with huge pages enabled.
Why not pass the right options when mounting the damn thing?
vfs_kern_mount()? It still requires struct file_system_type, which we need to get and put.
-ss
put_filesystem() must follow get_fs_type().
Signed-off-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com --- drivers/gpu/drm/i915/gem/i915_gemfs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index feedc9242072..93ac365ce9ce 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -24,6 +24,7 @@ int i915_gemfs_init(struct drm_i915_private *i915) return -ENODEV;
gemfs = kern_mount(type); + put_filesystem(type); if (IS_ERR(gemfs)) return PTR_ERR(gemfs);
On Mon, Aug 5, 2019 at 6:05 PM Sergey Senozhatsky sergey.senozhatsky@gmail.com wrote:
Hello,
Convert i915 to a new mount API and fix i915_gemfs_init() kernel Oops.
It also appears that we need to EXPORTs put_filesystem(), so i915 can properly put filesystem after it is done with kern_mount().
v2:
- export put_filesystem() [Chris]
- always put_filesystem() in i915_gemfs_init() [Chris]
- improve i915_gemfs_init() error message [Chris]
Sergey Senozhatsky (3): fs: export put_filesystem() i915: convert to new mount API i915: do not leak module ref counter
Fee free to add:
Reported-by: Sedat Dilek sedat.dilek@gmail.com
[1] https://lore.kernel.org/lkml/CA+icZUXh068m8UFeHDXCKDi0YfL2Z=WoONy7J7DJLqAT1C...
drivers/gpu/drm/i915/gem/i915_gemfs.c | 33 +++++++++++++++++++-------- fs/filesystems.c | 1 + 2 files changed, 25 insertions(+), 9 deletions(-)
-- 2.22.0
dri-devel@lists.freedesktop.org