On Mon, 5 Aug 2019, Al Viro wrote:
On Mon, Aug 05, 2019 at 07:12:55PM +0100, Al Viro wrote:
On Tue, Aug 06, 2019 at 01:03:06AM +0900, Sergey Senozhatsky wrote:
tmpfs does not set ->remount_fs() anymore and its users need to be converted to new mount API.
Could you explain why the devil do you bother with remount at all? Why not pass the right options when mounting the damn thing?
... and while we are at it, I really wonder what's going on with that gemfs thing - among the other things, this is the only user of shmem_file_setup_with_mnt(). Sure, you want your own options, but that brings another question - is there any reason for having the huge=... per-superblock rather than per-file?
Yes: we want a default for how files of that superblock are to allocate their pages, without people having to fcntl or advise each of their files.
Setting aside the weirder options (within_size, advise) and emergency/ testing override (shmem_huge), we want files on an ordinary default tmpfs (huge=never) to be allocated with small pages (so users with access to that filesystem will not consume, and will not waste time and space on consuming, the more valuable huge pages); but files on a huge=always tmpfs to be allocated with huge pages whenever possible.
Or am I missing your point? Yes, hugeness can certainly be decided differently per-file, or even per-extent of file. That is already made possible through "judicious" use of madvise MADV_HUGEPAGE and MADV_NOHUGEPAGE on mmaps of the file, carried over from anon THP.
Though personally I'm averse to managing "f"objects through "m"interfaces, which can get ridiculous (notably, MADV_HUGEPAGE works on the virtual address of a mapping, but the huge-or-not alignment of that mapping must have been decided previously). In Google we do use fcntls F_HUGEPAGE and F_NOHUGEPAGE to override on a per-file basis - one day I'll get to upstreaming those.
Hugh
After all, the readers of ->huge in mm/shmem.c are mm/shmem.c:582: (shmem_huge == SHMEM_HUGE_FORCE || sbinfo->huge) && is_huge_enabled(), sbinfo is an explicit argument
mm/shmem.c:1799: switch (sbinfo->huge) { shmem_getpage_gfp(), sbinfo comes from inode
mm/shmem.c:2113: if (SHMEM_SB(sb)->huge == SHMEM_HUGE_NEVER) shmem_get_unmapped_area(), sb comes from file
mm/shmem.c:3531: if (sbinfo->huge) mm/shmem.c:3532: seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); ->show_options() mm/shmem.c:3880: switch (sbinfo->huge) { shmem_huge_enabled(), sbinfo comes from an inode
And the only caller of is_huge_enabled() is shmem_getattr(), with sbinfo picked from inode.
So is there any reason why the hugepage policy can't be per-file, with the current being overridable default?