Skip to content

Commit f568849

Browse files
committed
Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
2 parents d9894c2 + 675675a commit f568849

139 files changed

Lines changed: 2144 additions & 2683 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/block/biodoc.txt

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -447,14 +447,13 @@ struct bio_vec {
447447
* main unit of I/O for the block layer and lower layers (ie drivers)
448448
*/
449449
struct bio {
450-
sector_t bi_sector;
451450
struct bio *bi_next; /* request queue link */
452451
struct block_device *bi_bdev; /* target device */
453452
unsigned long bi_flags; /* status, command, etc */
454453
unsigned long bi_rw; /* low bits: r/w, high: priority */
455454

456455
unsigned int bi_vcnt; /* how may bio_vec's */
457-
unsigned int bi_idx; /* current index into bio_vec array */
456+
struct bvec_iter bi_iter; /* current index into bio_vec array */
458457

459458
unsigned int bi_size; /* total size in bytes */
460459
unsigned short bi_phys_segments; /* segments after physaddr coalesce*/
@@ -480,7 +479,7 @@ With this multipage bio design:
480479
- Code that traverses the req list can find all the segments of a bio
481480
by using rq_for_each_segment. This handles the fact that a request
482481
has multiple bios, each of which can have multiple segments.
483-
- Drivers which can't process a large bio in one shot can use the bi_idx
482+
- Drivers which can't process a large bio in one shot can use the bi_iter
484483
field to keep track of the next bio_vec entry to process.
485484
(e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
486485
[TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
@@ -589,7 +588,7 @@ driver should not modify these values. The block layer sets up the
589588
nr_sectors and current_nr_sectors fields (based on the corresponding
590589
hard_xxx values and the number of bytes transferred) and updates it on
591590
every transfer that invokes end_that_request_first. It does the same for the
592-
buffer, bio, bio->bi_idx fields too.
591+
buffer, bio, bio->bi_iter fields too.
593592

594593
The buffer field is just a virtual address mapping of the current segment
595594
of the i/o buffer in cases where the buffer resides in low-memory. For high

Documentation/block/biovecs.txt

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
2+
Immutable biovecs and biovec iterators:
3+
=======================================
4+
5+
Kent Overstreet <kmo@daterainc.com>
6+
7+
As of 3.13, biovecs should never be modified after a bio has been submitted.
8+
Instead, we have a new struct bvec_iter which represents a range of a biovec -
9+
the iterator will be modified as the bio is completed, not the biovec.
10+
11+
More specifically, old code that needed to partially complete a bio would
12+
update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
13+
ended up partway through a biovec, it would increment bv_offset and decrement
14+
bv_len by the number of bytes completed in that biovec.
15+
16+
In the new scheme of things, everything that must be mutated in order to
17+
partially complete a bio is segregated into struct bvec_iter: bi_sector,
18+
bi_size and bi_idx have been moved there; and instead of modifying bv_offset
19+
and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
20+
bytes completed in the current bvec.
21+
22+
There are a bunch of new helper macros for hiding the gory details - in
23+
particular, presenting the illusion of partially completed biovecs so that
24+
normal code doesn't have to deal with bi_bvec_done.
25+
26+
* Driver code should no longer refer to biovecs directly; we now have
27+
bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs,
28+
constructed from the raw biovecs but taking into account bi_bvec_done and
29+
bi_size.
30+
31+
bio_for_each_segment() has been updated to take a bvec_iter argument
32+
instead of an integer (that corresponded to bi_idx); for a lot of code the
33+
conversion just required changing the types of the arguments to
34+
bio_for_each_segment().
35+
36+
* Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
37+
wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
38+
advances the bio integrity's iter if present.
39+
40+
There is a lower level advance function - bvec_iter_advance() - which takes
41+
a pointer to a biovec, not a bio; this is used by the bio integrity code.
42+
43+
What's all this get us?
44+
=======================
45+
46+
Having a real iterator, and making biovecs immutable, has a number of
47+
advantages:
48+
49+
* Before, iterating over bios was very awkward when you weren't processing
50+
exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c,
51+
which copies the contents of one bio into another. Because the biovecs
52+
wouldn't necessarily be the same size, the old code was tricky convoluted -
53+
it had to walk two different bios at the same time, keeping both bi_idx and
54+
and offset into the current biovec for each.
55+
56+
The new code is much more straightforward - have a look. This sort of
57+
pattern comes up in a lot of places; a lot of drivers were essentially open
58+
coding bvec iterators before, and having common implementation considerably
59+
simplifies a lot of code.
60+
61+
* Before, any code that might need to use the biovec after the bio had been
62+
completed (perhaps to copy the data somewhere else, or perhaps to resubmit
63+
it somewhere else if there was an error) had to save the entire bvec array
64+
- again, this was being done in a fair number of places.
65+
66+
* Biovecs can be shared between multiple bios - a bvec iter can represent an
67+
arbitrary range of an existing biovec, both starting and ending midway
68+
through biovecs. This is what enables efficient splitting of arbitrary
69+
bios. Note that this means we _only_ use bi_size to determine when we've
70+
reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
71+
bi_size into account when constructing biovecs.
72+
73+
* Splitting bios is now much simpler. The old bio_split() didn't even work on
74+
bios with more than a single bvec! Now, we can efficiently split arbitrary
75+
size bios - because the new bio can share the old bio's biovec.
76+
77+
Care must be taken to ensure the biovec isn't freed while the split bio is
78+
still using it, in case the original bio completes first, though. Using
79+
bio_chain() when splitting bios helps with this.
80+
81+
* Submitting partially completed bios is now perfectly fine - this comes up
82+
occasionally in stacking block drivers and various code (e.g. md and
83+
bcache) had some ugly workarounds for this.
84+
85+
It used to be the case that submitting a partially completed bio would work
86+
fine to _most_ devices, but since accessing the raw bvec array was the
87+
norm, not all drivers would respect bi_idx and those would break. Now,
88+
since all drivers _must_ go through the bvec iterator - and have been
89+
audited to make sure they are - submitting partially completed bios is
90+
perfectly fine.
91+
92+
Other implications:
93+
===================
94+
95+
* Almost all usage of bi_idx is now incorrect and has been removed; instead,
96+
where previously you would have used bi_idx you'd now use a bvec_iter,
97+
probably passing it to one of the helper macros.
98+
99+
I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
100+
now use bio_iter_iovec(), which takes a bvec_iter and returns a
101+
literal struct bio_vec - constructed on the fly from the raw biovec but
102+
taking into account bi_bvec_done (and bi_size).
103+
104+
* bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
105+
doesn't actually own the bio. The reason is twofold: firstly, it's not
106+
actually needed for iterating over the bio anymore - we only use bi_size.
107+
Secondly, when cloning a bio and reusing (a portion of) the original bio's
108+
biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
109+
over all the biovecs in the new bio - which is silly as it's not needed.
110+
111+
So, don't use bi_vcnt anymore.

arch/m68k/emu/nfblock.c

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -62,17 +62,18 @@ struct nfhd_device {
6262
static void nfhd_make_request(struct request_queue *queue, struct bio *bio)
6363
{
6464
struct nfhd_device *dev = queue->queuedata;
65-
struct bio_vec *bvec;
66-
int i, dir, len, shift;
67-
sector_t sec = bio->bi_sector;
65+
struct bio_vec bvec;
66+
struct bvec_iter iter;
67+
int dir, len, shift;
68+
sector_t sec = bio->bi_iter.bi_sector;
6869

6970
dir = bio_data_dir(bio);
7071
shift = dev->bshift;
71-
bio_for_each_segment(bvec, bio, i) {
72-
len = bvec->bv_len;
72+
bio_for_each_segment(bvec, bio, iter) {
73+
len = bvec.bv_len;
7374
len >>= 9;
7475
nfhd_read_write(dev->id, 0, dir, sec >> shift, len >> shift,
75-
bvec_to_phys(bvec));
76+
bvec_to_phys(&bvec));
7677
sec += len;
7778
}
7879
bio_endio(bio, 0);

arch/powerpc/sysdev/axonram.c

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -109,27 +109,28 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
109109
struct axon_ram_bank *bank = bio->bi_bdev->bd_disk->private_data;
110110
unsigned long phys_mem, phys_end;
111111
void *user_mem;
112-
struct bio_vec *vec;
112+
struct bio_vec vec;
113113
unsigned int transfered;
114-
unsigned short idx;
114+
struct bvec_iter iter;
115115

116-
phys_mem = bank->io_addr + (bio->bi_sector << AXON_RAM_SECTOR_SHIFT);
116+
phys_mem = bank->io_addr + (bio->bi_iter.bi_sector <<
117+
AXON_RAM_SECTOR_SHIFT);
117118
phys_end = bank->io_addr + bank->size;
118119
transfered = 0;
119-
bio_for_each_segment(vec, bio, idx) {
120-
if (unlikely(phys_mem + vec->bv_len > phys_end)) {
120+
bio_for_each_segment(vec, bio, iter) {
121+
if (unlikely(phys_mem + vec.bv_len > phys_end)) {
121122
bio_io_error(bio);
122123
return;
123124
}
124125

125-
user_mem = page_address(vec->bv_page) + vec->bv_offset;
126+
user_mem = page_address(vec.bv_page) + vec.bv_offset;
126127
if (bio_data_dir(bio) == READ)
127-
memcpy(user_mem, (void *) phys_mem, vec->bv_len);
128+
memcpy(user_mem, (void *) phys_mem, vec.bv_len);
128129
else
129-
memcpy((void *) phys_mem, user_mem, vec->bv_len);
130+
memcpy((void *) phys_mem, user_mem, vec.bv_len);
130131

131-
phys_mem += vec->bv_len;
132-
transfered += vec->bv_len;
132+
phys_mem += vec.bv_len;
133+
transfered += vec.bv_len;
133134
}
134135
bio_endio(bio, 0);
135136
}

arch/xtensa/platforms/iss/simdisk.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -103,18 +103,18 @@ static void simdisk_transfer(struct simdisk *dev, unsigned long sector,
103103

104104
static int simdisk_xfer_bio(struct simdisk *dev, struct bio *bio)
105105
{
106-
int i;
107-
struct bio_vec *bvec;
108-
sector_t sector = bio->bi_sector;
106+
struct bio_vec bvec;
107+
struct bvec_iter iter;
108+
sector_t sector = bio->bi_iter.bi_sector;
109109

110-
bio_for_each_segment(bvec, bio, i) {
111-
char *buffer = __bio_kmap_atomic(bio, i);
112-
unsigned len = bvec->bv_len >> SECTOR_SHIFT;
110+
bio_for_each_segment(bvec, bio, iter) {
111+
char *buffer = __bio_kmap_atomic(bio, iter);
112+
unsigned len = bvec.bv_len >> SECTOR_SHIFT;
113113

114114
simdisk_transfer(dev, sector, len, buffer,
115115
bio_data_dir(bio) == WRITE);
116116
sector += len;
117-
__bio_kunmap_atomic(bio);
117+
__bio_kunmap_atomic(buffer);
118118
}
119119
return 0;
120120
}

0 commit comments

Comments
 (0)