Skip to content

Commit

Permalink
nfs: Block on write congestion
Browse files Browse the repository at this point in the history
Commit 6df25e5 ("nfs: remove reliance on bdi congestion")
introduced NFS-private solution for limiting number of writes
outstanding against a particular server. Unlike previous bdi congestion
this algorithm actually works and limits number of outstanding writeback
pages to nfs_congestion_kb which scales with amount of client's memory
and is capped at 256 MB. As a result some workloads such as random
buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
fio command to reproduce is:

fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1
  --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite
  --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1

This happens because the client sends ~256 MB worth of dirty pages to
the server and any further background writeback request is ignored until
the number of writeback pages gets below the threshold of 192 MB. By the
time this happens and clients decides to trigger another round of
writeback, the server often has no pages to write and the disk is idle.

To fix this problem and make the client react faster to eased congestion
of the server by blocking waiting for congestion to resolve instead of
aborting writeback. This improves the random 4k buffered write
throughput to 184 MB/s.

Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
  • Loading branch information
jankara authored and amschuma-ntap committed Jul 8, 2024
1 parent f8a3955 commit 2f1f310
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 4 deletions.
1 change: 1 addition & 0 deletions fs/nfs/client.c
Original file line number Diff line number Diff line change
Expand Up @@ -994,6 +994,7 @@ struct nfs_server *nfs_alloc_server(void)

server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;

init_waitqueue_head(&server->write_congestion_wait);
atomic_long_set(&server->writeback, 0);

ida_init(&server->openowner_id);
Expand Down
15 changes: 11 additions & 4 deletions fs/nfs/write.c
Original file line number Diff line number Diff line change
Expand Up @@ -425,8 +425,10 @@ static void nfs_folio_end_writeback(struct folio *folio)

folio_end_writeback(folio);
if (atomic_long_dec_return(&nfss->writeback) <
NFS_CONGESTION_OFF_THRESH)
NFS_CONGESTION_OFF_THRESH) {
nfss->write_congested = 0;
wake_up_all(&nfss->write_congestion_wait);
}
}

static void nfs_page_end_writeback(struct nfs_page *req)
Expand Down Expand Up @@ -700,12 +702,17 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
struct nfs_pageio_descriptor pgio;
struct nfs_io_completion *ioc = NULL;
unsigned int mntflags = NFS_SERVER(inode)->flags;
struct nfs_server *nfss = NFS_SERVER(inode);
int priority = 0;
int err;

if (wbc->sync_mode == WB_SYNC_NONE &&
NFS_SERVER(inode)->write_congested)
return 0;
/* Wait with writeback until write congestion eases */
if (wbc->sync_mode == WB_SYNC_NONE && nfss->write_congested) {
err = wait_event_killable(nfss->write_congestion_wait,
nfss->write_congested == 0);
if (err)
return err;
}

nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);

Expand Down
1 change: 1 addition & 0 deletions include/linux/nfs_fs_sb.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ struct nfs_server {
struct rpc_clnt * client_acl; /* ACL RPC client handle */
struct nlm_host *nlm_host; /* NLM client handle */
struct nfs_iostats __percpu *io_stats; /* I/O statistics */
wait_queue_head_t write_congestion_wait; /* wait until write congestion eases */
atomic_long_t writeback; /* number of writeback pages */
unsigned int write_congested;/* flag set when writeback gets too high */
unsigned int flags; /* various flags */
Expand Down

0 comments on commit 2f1f310

Please sign in to comment.