Skip to content

Commit

Permalink
Merge tag 'f2fs-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've introduced native swap file support which can
  exploit DIO, enhanced existing checkpoint=disable feature with
  additional mount option to tune the triggering condition, and allowed
  user to preallocate physical blocks in a pinned file which will be
  useful to avoid f2fs fragmentation in append-only workloads. In
  addition, we've fixed subtle quota corruption issue.

  Enhancements:
   - add swap file support which uses DIO
   - allocate blocks for pinned file
   - allow SSR and mount option to enhance checkpoint=disable
   - enhance IPU IOs
   - add more sanity checks such as memory boundary access

  Bug fixes:
   - quota corruption in very corner case of error-injected SPO case
   - fix root_reserved on remount and some wrong counts
   - add missing fsck flag

  Some patches were also introduced to clean up ambiguous i_flags and
  debugging messages codes"

* tag 'f2fs-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (33 commits)
  f2fs: improve print log in f2fs_sanity_check_ckpt()
  f2fs: avoid out-of-range memory access
  f2fs: fix to avoid long latency during umount
  f2fs: allow all the users to pin a file
  f2fs: support swap file w/ DIO
  f2fs: allocate blocks for pinned file
  f2fs: fix is_idle() check for discard type
  f2fs: add a rw_sem to cover quota flag changes
  f2fs: set SBI_NEED_FSCK for xattr corruption case
  f2fs: use generic EFSBADCRC/EFSCORRUPTED
  f2fs: Use DIV_ROUND_UP() instead of open-coding
  f2fs: print kernel message if filesystem is inconsistent
  f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()
  f2fs: avoid get_valid_blocks() for cleanup
  f2fs: ioctl for removing a range from F2FS
  f2fs: only set project inherit bit for directory
  f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags
  f2fs: replace ktype default_attrs with default_groups
  f2fs: Add option to limit required GC for checkpoint=disable
  f2fs: Fix accounting for unusable blocks
  ...
  • Loading branch information
torvalds committed Jul 13, 2019
2 parents 4ce9d18 + 2d00883 commit a641a88
Show file tree
Hide file tree
Showing 21 changed files with 1,405 additions and 773 deletions.
8 changes: 8 additions & 0 deletions Documentation/ABI/testing/sysfs-fs-f2fs
Original file line number Diff line number Diff line change
Expand Up @@ -243,3 +243,11 @@ Description:
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

What: /sys/fs/f2fs/<disk>/unusable
Date April 2019
Contact: "Daniel Rosenberg" <[email protected]>
Description:
If checkpoint=disable, it displays the number of blocks that are unusable.
If checkpoint=enable it displays the enumber of blocks that would be unusable
if checkpoint=disable were to be set.
133 changes: 123 additions & 10 deletions Documentation/filesystems/f2fs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -214,11 +214,22 @@ fsync_mode=%s Control the policy of fsync. Currently supports "posix",
non-atomic files likewise "nobarrier" mount option.
test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt
context. The fake fscrypt context is used by xfstests.
checkpoint=%s Set to "disable" to turn off checkpointing. Set to "enable"
checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable"
to reenable checkpointing. Is enabled by default. While
disabled, any unmounting or unexpected shutdowns will cause
the filesystem contents to appear as they did when the
filesystem was mounted with that option.
While mounting with checkpoint=disabled, the filesystem must
run garbage collection to ensure that all available space can
be used. If this takes too much time, the mount may return
EAGAIN. You may optionally add a value to indicate how much
of the disk you would be willing to temporarily give up to
avoid additional garbage collection. This can be given as a
number of blocks, or as a percent. For instance, mounting
with checkpoint=disable:100% would always succeed, but it may
hide up to all remaining free space. The actual space that
would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
This space is reclaimed once checkpoint=enable.

================================================================================
DEBUGFS ENTRIES
Expand Down Expand Up @@ -246,11 +257,14 @@ Files in /sys/fs/f2fs/<devname>
..............................................................................
File Content

gc_max_sleep_time This tuning parameter controls the maximum sleep
gc_urgent_sleep_time This parameter controls sleep time for gc_urgent.
500 ms is set by default. See above gc_urgent.

gc_min_sleep_time This tuning parameter controls the minimum sleep
time for the garbage collection thread. Time is
in milliseconds.

gc_min_sleep_time This tuning parameter controls the minimum sleep
gc_max_sleep_time This tuning parameter controls the maximum sleep
time for the garbage collection thread. Time is
in milliseconds.

Expand All @@ -270,9 +284,6 @@ Files in /sys/fs/f2fs/<devname>
to 1, background thread starts to do GC by given
gc_urgent_sleep_time interval.

gc_urgent_sleep_time This parameter controls sleep time for gc_urgent.
500 ms is set by default. See above gc_urgent.

reclaim_segments This parameter controls the number of prefree
segments to be reclaimed. If the number of prefree
segments is larger than the number of segments
Expand All @@ -287,7 +298,16 @@ Files in /sys/fs/f2fs/<devname>
checkpoint is triggered, and issued during the
checkpoint. By default, it is disabled with 0.

trim_sections This parameter controls the number of sections
discard_granularity This parameter controls the granularity of discard
command size. It will issue discard commands iif
the size is larger than given granularity. Its
unit size is 4KB, and 4 (=16KB) is set by default.
The maximum value is 128 (=512KB).

reserved_blocks This parameter indicates the number of blocks that
f2fs reserves internally for root.

batched_trim_sections This parameter controls the number of sections
to be trimmed out in batch mode when FITRIM
conducts. 32 sections is set by default.

Expand All @@ -309,21 +329,89 @@ Files in /sys/fs/f2fs/<devname>
the number is less than this value, it triggers
in-place-updates.

min_seq_blocks This parameter controls the threshold to serialize
write IOs issued by multiple threads in parallel.

min_hot_blocks This parameter controls the threshold to allocate
a hot data log for pending data blocks to write.

min_ssr_sections This parameter adds the threshold when deciding
SSR block allocation. If this is large, SSR mode
will be enabled early.

ram_thresh This parameter controls the memory footprint used
by free nids and cached nat entries. By default,
10 is set, which indicates 10 MB / 1 GB RAM.

ra_nid_pages When building free nids, F2FS reads NAT blocks
ahead for speed up. Default is 0.

dirty_nats_ratio Given dirty ratio of cached nat entries, F2FS
determines flushing them in background.

max_victim_search This parameter controls the number of trials to
find a victim segment when conducting SSR and
cleaning operations. The default value is 4096
which covers 8GB block address range.

migration_granularity For large-sized sections, F2FS can stop GC given
this granularity instead of reclaiming entire
section.

dir_level This parameter controls the directory level to
support large directory. If a directory has a
number of files, it can reduce the file lookup
latency by increasing this dir_level value.
Otherwise, it needs to decrease this value to
reduce the space overhead. The default value is 0.

ram_thresh This parameter controls the memory footprint used
by free nids and cached nat entries. By default,
10 is set, which indicates 10 MB / 1 GB RAM.
cp_interval F2FS tries to do checkpoint periodically, 60 secs
by default.

idle_interval F2FS detects system is idle, if there's no F2FS
operations during given interval, 5 secs by
default.

discard_idle_interval F2FS detects the discard thread is idle, given
time interval. Default is 5 secs.

gc_idle_interval F2FS detects the GC thread is idle, given time
interval. Default is 5 secs.

umount_discard_timeout When unmounting the disk, F2FS waits for finishing
queued discard commands which can take huge time.
This gives time out for it, 5 secs by default.

iostat_enable This controls to enable/disable iostat in F2FS.

readdir_ra This enables/disabled readahead of inode blocks
in readdir, and default is enabled.

gc_pin_file_thresh This indicates how many GC can be failed for the
pinned file. If it exceeds this, F2FS doesn't
guarantee its pinning state. 2048 trials is set
by default.

extension_list This enables to change extension_list for hot/cold
files in runtime.

inject_rate This controls injection rate of arbitrary faults.

inject_type This controls injection type of arbitrary faults.

dirty_segments This shows # of dirty segments.

lifetime_write_kbytes This shows # of data written to the disk.

features This shows current features enabled on F2FS.

current_reserved_blocks This shows # of blocks currently reserved.

unusable If checkpoint=disable, this shows the number of
blocks that are unusable.
If checkpoint=enable it shows the number of blocks
that would be unusable if checkpoint=disable were
to be set.

================================================================================
USAGE
Expand Down Expand Up @@ -716,3 +804,28 @@ WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG

Fallocate(2) Policy
-------------------

The default policy follows the below posix rule.

Allocating disk space
The default operation (i.e., mode is zero) of fallocate() allocates
the disk space within the range specified by offset and len. The
file size (as reported by stat(2)) will be changed if offset+len is
greater than the file size. Any subregion within the range specified
by offset and len that did not contain data before the call will be
initialized to zero. This default behavior closely resembles the
behavior of the posix_fallocate(3) library function, and is intended
as a method of optimally implementing that function.

However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
zero or random data, which is useful to the below scenario where:
1. create(fd)
2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
3. fallocate(fd, 0, 0, size)
4. address = fibmap(fd, offset)
5. open(blkdev)
6. write(blkdev, address)
Loading

0 comments on commit a641a88

Please sign in to comment.