Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Browse files Browse the repository at this point in the history
Pull more block layer updates from Jens Axboe:
 "A followup pull request, with some parts that either needed a bit more
  testing before going in, merge sync, or just later arriving fixes.
  This contains:

   - Timer related updates from Kees. These were purposefully delayed
     since I didn't want to pull in a later v4.14-rc tag to my block
     tree.

   - ide-cd prep sense buffer fix from Bart. Also delayed, as not to
     clash with the late fix we put into 4.14-rc.

   - Small BFQ updates series from Luca and Paolo.

   - Single nvmet fix from James, fixing a non-functional case there.

   - Bio fast clone fix from Michael, which made bcache return the wrong
     data for some cases.

   - Legacy IO path regression hang fix from Ming"

* 'for-linus' of git://git.kernel.dk/linux-block:
  bio: ensure __bio_clone_fast copies bi_partno
  nvmet_fc: fix better length checking
  block: wake up all tasks blocked in get_request()
  block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP
  block, bfq: update blkio stats outside the scheduler lock
  block, bfq: add missing invocations of bfqg_stats_update_io_add/remove
  doc, block, bfq: update max IOPS sustainable with BFQ
  ide: Make ide_cdrom_prep_fs() initialize the sense buffer pointer
  md: Convert timers to use timer_setup()
  block: swim3: Convert timers to use timer_setup()
  block/aoe: Convert timers to use timer_setup()
  amifloppy: Convert timers to use timer_setup()
  block/floppy: Convert callback to pass timer_list
  • Loading branch information
torvalds committed Nov 17, 2017
2 parents a3841f9 + 62530ed commit 06ede5f
Show file tree
Hide file tree
Showing 19 changed files with 311 additions and 166 deletions.
43 changes: 37 additions & 6 deletions Documentation/block/bfq-iosched.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,27 @@ for that device, by setting low_latency to 0. See Section 3 for
details on how to configure BFQ for the desired tradeoff between
latency and throughput, or on how to maximize throughput.

On average CPUs, the current version of BFQ can handle devices
performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
reference, 30-50 KIOPS correspond to very high bandwidths with
sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
multi-queue devices too.
BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
can process for a device scheduled with BFQ. To give an idea of the
limits on slow or average CPUs, here are, first, the limits of BFQ for
three different CPUs, on, respectively, an average laptop, an old
desktop, and a cheap embedded system, in case full hierarchical
support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
- Intel i7-4850HQ: 400 KIOPS
- AMD A8-3850: 250 KIOPS
- ARM CortexTM-A53 Octa-core: 80 KIOPS

If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
support is enabled), then the sustainable throughput with BFQ
decreases, because all blkio.bfq* statistics are created and updated
(Section 4-2). For BFQ, this leads to the following maximum
sustainable throughputs, on the same systems as above:
- Intel i7-4850HQ: 310 KIOPS
- AMD A8-3850: 200 KIOPS
- ARM CortexTM-A53 Octa-core: 56 KIOPS

BFQ works for multi-queue devices too.

The table of contents follow. Impatients can just jump to Section 3.

Expand Down Expand Up @@ -500,6 +515,22 @@ BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
parameter to set the weight of a group with BFQ is blkio.bfq.weight
or io.bfq.weight.

As for cgroups-v1 (blkio controller), the exact set of stat files
created, and kept up-to-date by bfq, depends on whether
CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
the stat files documented in
Documentation/cgroup-v1/blkio-controller.txt. If, instead,
CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
blkio.bfq.io_service_bytes
blkio.bfq.io_service_bytes_recursive
blkio.bfq.io_serviced
blkio.bfq.io_serviced_recursive

The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
throughput sustainable with bfq, because updating the blkio.bfq.*
stats is rather costly, especially for some of the stats enabled by
CONFIG_DEBUG_BLK_CGROUP.

Parameters to set
-----------------

Expand Down
148 changes: 84 additions & 64 deletions block/bfq-cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

#include "bfq-iosched.h"

#ifdef CONFIG_BFQ_GROUP_IOSCHED
#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)

/* bfqg stats flags */
enum bfqg_stats_flags {
Expand Down Expand Up @@ -152,6 +152,57 @@ void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
bfqg_stats_update_group_wait_time(stats);
}

void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
unsigned int op)
{
blkg_rwstat_add(&bfqg->stats.queued, op, 1);
bfqg_stats_end_empty_time(&bfqg->stats);
if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
}

void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op)
{
blkg_rwstat_add(&bfqg->stats.queued, op, -1);
}

void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op)
{
blkg_rwstat_add(&bfqg->stats.merged, op, 1);
}

void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
uint64_t io_start_time, unsigned int op)
{
struct bfqg_stats *stats = &bfqg->stats;
unsigned long long now = sched_clock();

if (time_after64(now, io_start_time))
blkg_rwstat_add(&stats->service_time, op,
now - io_start_time);
if (time_after64(io_start_time, start_time))
blkg_rwstat_add(&stats->wait_time, op,
io_start_time - start_time);
}

#else /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */

void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
unsigned int op) { }
void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op) { }
void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op) { }
void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
uint64_t io_start_time, unsigned int op) { }
void bfqg_stats_update_dequeue(struct bfq_group *bfqg) { }
void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg) { }
void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }
void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }

#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */

#ifdef CONFIG_BFQ_GROUP_IOSCHED

/*
* blk-cgroup policy-related handlers
* The following functions help in converting between blk-cgroup
Expand Down Expand Up @@ -229,42 +280,10 @@ void bfqg_and_blkg_put(struct bfq_group *bfqg)
blkg_put(bfqg_to_blkg(bfqg));
}

void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
unsigned int op)
{
blkg_rwstat_add(&bfqg->stats.queued, op, 1);
bfqg_stats_end_empty_time(&bfqg->stats);
if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
}

void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op)
{
blkg_rwstat_add(&bfqg->stats.queued, op, -1);
}

void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op)
{
blkg_rwstat_add(&bfqg->stats.merged, op, 1);
}

void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
uint64_t io_start_time, unsigned int op)
{
struct bfqg_stats *stats = &bfqg->stats;
unsigned long long now = sched_clock();

if (time_after64(now, io_start_time))
blkg_rwstat_add(&stats->service_time, op,
now - io_start_time);
if (time_after64(io_start_time, start_time))
blkg_rwstat_add(&stats->wait_time, op,
io_start_time - start_time);
}

/* @stats = 0 */
static void bfqg_stats_reset(struct bfqg_stats *stats)
{
#ifdef CONFIG_DEBUG_BLK_CGROUP
/* queued stats shouldn't be cleared */
blkg_rwstat_reset(&stats->merged);
blkg_rwstat_reset(&stats->service_time);
Expand All @@ -276,6 +295,7 @@ static void bfqg_stats_reset(struct bfqg_stats *stats)
blkg_stat_reset(&stats->group_wait_time);
blkg_stat_reset(&stats->idle_time);
blkg_stat_reset(&stats->empty_time);
#endif
}

/* @to += @from */
Expand All @@ -284,6 +304,7 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)
if (!to || !from)
return;

#ifdef CONFIG_DEBUG_BLK_CGROUP
/* queued stats shouldn't be cleared */
blkg_rwstat_add_aux(&to->merged, &from->merged);
blkg_rwstat_add_aux(&to->service_time, &from->service_time);
Expand All @@ -296,6 +317,7 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)
blkg_stat_add_aux(&to->group_wait_time, &from->group_wait_time);
blkg_stat_add_aux(&to->idle_time, &from->idle_time);
blkg_stat_add_aux(&to->empty_time, &from->empty_time);
#endif
}

/*
Expand Down Expand Up @@ -342,6 +364,7 @@ void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg)

static void bfqg_stats_exit(struct bfqg_stats *stats)
{
#ifdef CONFIG_DEBUG_BLK_CGROUP
blkg_rwstat_exit(&stats->merged);
blkg_rwstat_exit(&stats->service_time);
blkg_rwstat_exit(&stats->wait_time);
Expand All @@ -353,10 +376,12 @@ static void bfqg_stats_exit(struct bfqg_stats *stats)
blkg_stat_exit(&stats->group_wait_time);
blkg_stat_exit(&stats->idle_time);
blkg_stat_exit(&stats->empty_time);
#endif
}

static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
{
#ifdef CONFIG_DEBUG_BLK_CGROUP
if (blkg_rwstat_init(&stats->merged, gfp) ||
blkg_rwstat_init(&stats->service_time, gfp) ||
blkg_rwstat_init(&stats->wait_time, gfp) ||
Expand All @@ -371,6 +396,7 @@ static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
bfqg_stats_exit(stats);
return -ENOMEM;
}
#endif

return 0;
}
Expand Down Expand Up @@ -887,6 +913,7 @@ static ssize_t bfq_io_set_weight(struct kernfs_open_file *of,
return bfq_io_set_weight_legacy(of_css(of), NULL, weight);
}

#ifdef CONFIG_DEBUG_BLK_CGROUP
static int bfqg_print_stat(struct seq_file *sf, void *v)
{
blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
Expand Down Expand Up @@ -991,6 +1018,7 @@ static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
0, false);
return 0;
}
#endif /* CONFIG_DEBUG_BLK_CGROUP */

struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
{
Expand Down Expand Up @@ -1028,15 +1056,6 @@ struct cftype bfq_blkcg_legacy_files[] = {
},

/* statistics, covers only the tasks in the bfqg */
{
.name = "bfq.time",
.private = offsetof(struct bfq_group, stats.time),
.seq_show = bfqg_print_stat,
},
{
.name = "bfq.sectors",
.seq_show = bfqg_print_stat_sectors,
},
{
.name = "bfq.io_service_bytes",
.private = (unsigned long)&blkcg_policy_bfq,
Expand All @@ -1047,6 +1066,16 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = (unsigned long)&blkcg_policy_bfq,
.seq_show = blkg_print_stat_ios,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
.name = "bfq.time",
.private = offsetof(struct bfq_group, stats.time),
.seq_show = bfqg_print_stat,
},
{
.name = "bfq.sectors",
.seq_show = bfqg_print_stat_sectors,
},
{
.name = "bfq.io_service_time",
.private = offsetof(struct bfq_group, stats.service_time),
Expand All @@ -1067,17 +1096,9 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = offsetof(struct bfq_group, stats.queued),
.seq_show = bfqg_print_rwstat,
},
#endif /* CONFIG_DEBUG_BLK_CGROUP */

/* the same statictics which cover the bfqg and its descendants */
{
.name = "bfq.time_recursive",
.private = offsetof(struct bfq_group, stats.time),
.seq_show = bfqg_print_stat_recursive,
},
{
.name = "bfq.sectors_recursive",
.seq_show = bfqg_print_stat_sectors_recursive,
},
{
.name = "bfq.io_service_bytes_recursive",
.private = (unsigned long)&blkcg_policy_bfq,
Expand All @@ -1088,6 +1109,16 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = (unsigned long)&blkcg_policy_bfq,
.seq_show = blkg_print_stat_ios_recursive,
},
#ifdef CONFIG_DEBUG_BLK_CGROUP
{
.name = "bfq.time_recursive",
.private = offsetof(struct bfq_group, stats.time),
.seq_show = bfqg_print_stat_recursive,
},
{
.name = "bfq.sectors_recursive",
.seq_show = bfqg_print_stat_sectors_recursive,
},
{
.name = "bfq.io_service_time_recursive",
.private = offsetof(struct bfq_group, stats.service_time),
Expand Down Expand Up @@ -1132,6 +1163,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
.private = offsetof(struct bfq_group, stats.dequeue),
.seq_show = bfqg_print_stat,
},
#endif /* CONFIG_DEBUG_BLK_CGROUP */
{ } /* terminate */
};

Expand All @@ -1147,18 +1179,6 @@ struct cftype bfq_blkg_files[] = {

#else /* CONFIG_BFQ_GROUP_IOSCHED */

void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
unsigned int op) { }
void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op) { }
void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op) { }
void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
uint64_t io_start_time, unsigned int op) { }
void bfqg_stats_update_dequeue(struct bfq_group *bfqg) { }
void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg) { }
void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }
void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }

void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
struct bfq_group *bfqg) {}

Expand Down
Loading

0 comments on commit 06ede5f

Please sign in to comment.