Annotation of src/MAXPHYS-NOTES, Revision 18.104.22.168
22.214.171.124 ! tls 1: Notes on eliminating fixed (usually 64K) MAXPHYS, for more efficient
! 2: operation both with single disk drives/SSDs (transfers in the 128K-256K
! 3: range of sizes are advantageous for many workloads), and particularly with
! 4: RAID sets (consider a typical 12-disk chassis of 2.5" SAS drives, set up
! 5: as an entirely ordinary P+Q parity RAID array with a single hot spare. To
! 6: feed 64K transfers to each of the resulting 8 data disks requires 512K
! 7: transfers fed to the RAID controller -- is it any wonder NetBSD performs
! 8: so poorly with such hardware for many workloads?).
! 10: The basic approach taken here:
! 12: 1) Propagate maximum-transfer size down the device tree at
! 13: autoconf time. Drivers take the max of their own
! 14: transfer-size limitations and their parents' limitations,
! 15: apply that in their minphys() routines (if they are disk
! 16: drivers) and propagate it down to their children.
! 18: 2) This is just about sufficient, for physio, since once you've
! 19: got the disk, you can find its minphys routine, and *that*
! 20: can get access to the device-instance's softc which has the
! 21: size determined by autoconf.
! 23: 3) For filesystem I/O, however, we need to be able to find that
! 24: maximum transfer size starting not with a device_t but with
! 25: a disk driver name (or major number) and unit number.
! 27: The "disk" interface within the kernel is extended to
! 28: let us fish out the dkdevice's minphys routine starting
! 29: with the data we've got. We then feed a fake, huge buffer
! 30: to that minphys and see what we get back.
! 32: This is stashed in the mount point's datastructure and is
! 33: then available to the filesystem and pager code via
! 34: vp->v_mount any time you've got a filesystem-backed vnode.
! 36: The rest is a "simple" matter of making the necessary MD adjustments
! 37: and figuring out where the rest of the hidden 64K bottlenecks are....
! 39: MAXPHYS is retained and is used as a default. A new MACHINE_MAXPHYS
! 40: must be defined, and is the actual largest transfer any hardware for
! 41: a given port can do, or which the portmaster considers appropriate.
! 43: MACHINE_MAXPHYS is used to size some on-stack arrays in the pager code
! 44: so don't go too crazy with it.
! 46: ==== STATUS ====
! 48: All work done on amd64. Not hard to get it going on other ports. Every
! 49: top-level bus attachment will need code to clamp transfer sizes
! 50: appropriately; see the PCI or ISA code here, or for an unfortunate
! 51: example of when you have to clamp more than you'd like, the pnpbios code.
! 53: Access through physio: done? Disk drivers other than sd, cd, wd
! 54: will need their minphys functions adjusted like those were, and
! 55: will be limited to MAXPHYS per transfer until they do.
! 57: A notable exception is RAIDframe. It could benefit immediately
! 58: but needs something a little more sophisticated done to its
! 59: minphys -- per-unit, it needs to sum up the maxphyses of the unit's
! 60: data (not parity!) components and return that value.
! 62: Access through filesystems - for read, controlled by uvm readahead code.
! 63: We can stash the ra max size in the ra ctx -- we can get it from v_mount
! 64: in the vnode (the uobj!) *if* we put it into struct mount. Then we only
! 65: have to do the awful walk-the-device-list crap at mount time. This likely
! 66: wins!
! 68: Unfortunately, there is still a bottleneck, probably from
! 69: the pager code (genfs I/O code). The genfs read/getpages
! 70: code is repellent and huge. Haven't even started on it yet.
! 72: I have attacked the genfs write path already, but though my printfs
! 73: show the appropriate maxpages value propagates down, the resulting
! 74: stream of I/O requests is 64K. This needs further investigation:
! 75: with maxcontig now gone from the FFS code, where on earth are we
! 76: still clamping the I/O size?