[BACK]Return to MAXPHYS-NOTES CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src

Annotation of src/MAXPHYS-NOTES, Revision 1.1.2.1

1.1.2.1 ! tls         1: Notes on eliminating fixed (usually 64K) MAXPHYS, for more efficient
        !             2: operation both with single disk drives/SSDs (transfers in the 128K-256K
        !             3: range of sizes are advantageous for many workloads), and particularly with
        !             4: RAID sets (consider a typical 12-disk chassis of 2.5" SAS drives, set up
        !             5: as an entirely ordinary P+Q parity RAID array with a single hot spare.  To
        !             6: feed 64K transfers to each of the resulting 8 data disks requires 512K
        !             7: transfers fed to the RAID controller -- is it any wonder NetBSD performs
        !             8: so poorly with such hardware for many workloads?).
        !             9:
        !            10: The basic approach taken here:
        !            11:
        !            12:        1) Propagate maximum-transfer size down the device tree at
        !            13:           autoconf time.  Drivers take the max of their own
        !            14:           transfer-size limitations and their parents' limitations,
        !            15:           apply that in their minphys() routines (if they are disk
        !            16:           drivers) and propagate it down to their children.
        !            17:
        !            18:        2) This is just about sufficient, for physio, since once you've
        !            19:           got the disk, you can find its minphys routine, and *that*
        !            20:           can get access to the device-instance's softc which has the
        !            21:           size determined by autoconf.
        !            22:
        !            23:        3) For filesystem I/O, however, we need to be able to find that
        !            24:           maximum transfer size starting not with a device_t but with
        !            25:           a disk driver name (or major number) and unit number.
        !            26:
        !            27:           The "disk" interface within the kernel is extended to
        !            28:           let us fish out the dkdevice's minphys routine starting
        !            29:           with the data we've got.  We then feed a fake, huge buffer
        !            30:           to that minphys and see what we get back.
        !            31:
        !            32:           This is stashed in the mount point's datastructure and is
        !            33:           then available to the filesystem and pager code via
        !            34:           vp->v_mount any time you've got a filesystem-backed vnode.
        !            35:
        !            36: The rest is a "simple" matter of making the necessary MD adjustments
        !            37: and figuring out where the rest of the hidden 64K bottlenecks are....
        !            38:
        !            39: MAXPHYS is retained and is used as a default.  A new MACHINE_MAXPHYS
        !            40: must be defined, and is the actual largest transfer any hardware for
        !            41: a given port can do, or which the portmaster considers appropriate.
        !            42:
        !            43: MACHINE_MAXPHYS is used to size some on-stack arrays in the pager code
        !            44: so don't go too crazy with it.
        !            45:
        !            46: ==== STATUS ====
        !            47:
        !            48: All work done on amd64.  Not hard to get it going on other ports.  Every
        !            49: top-level bus attachment will need code to clamp transfer sizes
        !            50: appropriately; see the PCI or ISA code here, or for an unfortunate
        !            51: example of when you have to clamp more than you'd like, the pnpbios code.
        !            52:
        !            53: Access through physio: done?  Disk drivers other than sd, cd, wd
        !            54: will need their minphys functions adjusted like those were, and
        !            55: will be limited to MAXPHYS per transfer until they do.
        !            56:
        !            57:        A notable exception is RAIDframe.  It could benefit immediately
        !            58:        but needs something a little more sophisticated done to its
        !            59:        minphys -- per-unit, it needs to sum up the maxphyses of the unit's
        !            60:        data (not parity!) components and return that value.
        !            61:
        !            62: Access through filesystems - for read, controlled by uvm readahead code.
        !            63: We can stash the ra max size in the ra ctx -- we can get it from v_mount
        !            64: in the vnode (the uobj!) *if* we put it into struct mount.  Then we only
        !            65: have to do the awful walk-the-device-list crap at mount time.  This likely
        !            66: wins!
        !            67:
        !            68:        Unfortunately, there is still a bottleneck, probably from
        !            69:        the pager code (genfs I/O code).  The genfs read/getpages
        !            70:        code is repellent and huge.  Haven't even started on it yet.
        !            71:
        !            72: I have attacked the genfs write path already, but though my printfs
        !            73: show the appropriate maxpages value propagates down, the resulting
        !            74: stream of I/O requests is 64K.  This needs further investigation:
        !            75: with maxcontig now gone from the FFS code, where on earth are we
        !            76: still clamping the I/O size?

CVSweb <webmaster@jp.NetBSD.org>