-
Notifications
You must be signed in to change notification settings - Fork 1
TLVC Technical notes
[By Helge Skrivervik/@mellvik 2023, 2025]
A collection of notes, musings and relevant issues that has come up while developing TLVC. Like the OS itself, these notes are continuously evolving. You're welcome to contribute.
Raw IO (aka 'direct IO' or 'character IO') provides direct access to storage devices, bypassing the buffer system. This is important functionality for maintenance (such as creating or checking filesystems, diagnosing physical media, performance testing), for backup (copying entire volumes), for performance testing and when using disk partitions for non-file system purposes (such as database systems).
Generally speaking, the buffer system is for file system access, raw IO is for device access. All device access except mount/umount should use the raw devices even though block access would work. Typical commands that should (and will soon) default to raw devices are fsck, mkfs, fdisk, makeboot, fdtest.
In his famous "The Design of the UNIX Operating System" Maurice J Bach writes about raw device access (pp328):
"The advantage of using the raw interface is speed, assuming there is no advantage to caching data for later access. Processes accessing block devices transfer blocks of data whose size is constrained by the file system logical block size. For example, if a file system has a logical block size of 1K bytes, at most 1K bytes are transferred per I/O operation. However, processes accessing the disk as a raw device can transfer many disk block during a disk operation, subject to the capabilities of the disk controller. Functionally, the process sees the same result, but the raw interface may be much faster."
In the TLVC case, the file system block size is indeed 1K, and Bach's observations are as relevant as ever. And while speed is important, it's not the most important reason to implement a raw interface, which is a recent (2023) addition to the system. Avoiding 'cache pollution' and the ability to access a device without the interference of the buffer system are more important. Think about it: A system is operational with normal activity, most if not all of which involves storage system IO. You want to copy a floppy disk, maybe even a full disk partition - for whatever purpose. If doing this via the regular block IO system, the operation would consume all or most buffers on the system for quite some time, adversely affecting all other users and/or uses/activities. Using raw IO instead, the operation would possibly be noticeable, but not adversely so. A different example: Consider the fdisk or fsck utilities. They operate outside any file system, reading and updating critical metadata which we take for granted are written when a write is complete. If going through the buffer system, a write is complete when the system decides to flush the cache, which is outside our control unless we do a manual sync in which case a write will shortly but still not immediately. This is NOT what we want or expect.
A final example: If we were to benchmark a device or some devices, there is no way to do that unless we have raw access. Benchmarking via the buffer system is like showering with your clothes on.
The notion of a 'character device' may seem illogical when referring to storage devices which at the lowest level always use blocks (sectors) for physical storage. The term is historical and 'raw' is more appropriate. In spite of the fact that the devices always transfer data in blocks, reading or writing single or just a few characters (or bytes) from/to a storage device via the raw works and works fine - but is inefficient.
In short, the primary purpose of the raw drivers is to provide direct access between applications and devices - for special purposes - without involving the buffer subsystem. Speed is a welcome additional benefit and comes not only because we're bypassing the buffer system, but because direct access in many cases allow us to make use of special hardware feature not available to block drivers, such as multisector IO.
For more about the raw drivers, see the development notes Wiki.
DMA, Direct Memory Access, is conceptually simple, complicated in practice, in particular on PCs. It's also slow, and should be avoided if possible (except on XT class systems, see below). Important note: We're talking about traditional (PC/XT and AT style) DMA, not modern variants using bus mastering with PCI and later buses.
Why do we have DMA if it's so slow? Because slow is a relative metric. Early PCs - 8088 and 8086 based - were slow by any metric. Before that, 8 bit systems, typically CP/M systems using the Z80 to 80856 CPUs were even slower. So slow they had a hard time handling floppy and early hard drive reads/writes fast enough. There were no device buffers at that time, data had to be fed to and read from the devices at the speed they were delivered (or read). Missing a cycle would mean errors and retry. So DMA was require: Get the CPU out of the way, do data transfers between the IO device and memory. The DMA controller took over the memory bus and managed the transfer without the CPU interfering. Pretty much like many minicomputer systems in the 70s and early 80s.
Even the original PC and the PC/XT were too slow to reliably handle block IO, and DMA was required. In a hurry to get the product to market and lacking viable alternatives, IBM decided to use an old DMA controller for the PC: The Intel 8237 chip was well known at the time, made for the 8bit Intel 8085 CPU that maxed out at 64k addressable RAM.
In order to provide DMA access to the PCs 1MB/20 bit memory space, IBM added a 4 bit page register which selects which 64k block of PC memory a DMA operation is to access. Smart and logical, but it doesn't fix the DMA controller's inability to cross 64k physical memory boundaries. Thus every driver that handles DMA must ensure that a DMA transfer never crosses a 64k physical boundary. This complicates drivers quite a bit. It is also confusing because it forces developers to switch between the x86 architecture segments and physical segments which over the years have caused innumerable bugs. In TLVC this is what the bounce buffers in low memory are for (see the TLVC Memory and Buffer Subsystem Wiki for details).
When the much faster PC/AT came along, the basic PC architecture had settled, it was a standard. So even if a much better DMA controller chip had been available, IBM would likely not have chosen it for compatibility reasons. The 8237 survived - and multiplied. The AT got 2 such controllers instead of one, and the page register was expanded to 8 bits to cover the expanded (24 bit/16MB) address space. In addition to doubling the number of DMA channels, the second DMA controller allowed IBM to pull another trick: By right-shifting the address lines on the second controller, it became a word-transfer controller instead of a byte-transfer controller. IOW it could transfer twice as much in one operation and twice as fast by using word-sized instead of byte-sized bus transfers. Check this link for an extensive low level run-through of the PC DMA hardware and how to program it.
Still, the DMA controller was locked at the original 4.77MHz speed. The original AT could do ISA-bus word PIO at 6MHz and the ISA-bus itself could run at up to 12MHz. Thus from the AT and onwards, DMA was avoided and became a 'compatibility feature', not a contribution to speed. And never used by equipment designed after the advent of the AT. Interestingly some post-AT expansion cards had their own much faster DMA controllers which could do efficient 'cycle stealing' from the CPU.
Where does that leave us in a TLVC context? There is no sound card support and no applications that could use such functionality, so what we have is floppy, MFM hard disks and the AMD Lance Ethernet card. All of them require old fashioned DMA and all of them are supported by TLVC, the latter under development at the time of writing. For even more technical detail than the link above, check out the OSdev Wiki.
Let's end this DMA rundown on a positive note though. When running TLVC on a XT class system with floppies and MFM hard disks, the system is surprisingly responsive thanks to DMA and interrupt driven drivers which includes the network drivers. There are few if any wait-loops, the available resources are used extremely efficiently and the system appears while not fast, responsive most if not all the time. Which is not always the case on much faster systems relying more or less entirely on PIO.
PIO is the alternative to DMA: Data transfer via read/write loops, moving one byte or word at a time to/from IO ports via the ISA bus. PIO was always there, even on devices using DMA, for communicating with the devices: Sending commands, reading statuses, handling errors etc. The difference is that with PIO only, data are transferred the same way as commands etc., repeatedly writing to or reading from an IO port. If these reads or writes are time sensitive, i.e. they have to happen at a certain rate else data get lost, the CPU must be fast enough and attentive enough, that is, not being disturbed for long enough to miss a read or write cycle. The latter is easy on a single tasking OS like DOS, much harder (and more disruptive) on a multitasking system like Linux, Unix or TLVC. Thus the requirement for DMA on the first generation of PCs.
As mentioned above, two factors changed this situation with the advent of the PC/AT. The first was processor speed. Even at the initial 6MHz clock speed, the AT was fast enough to keep up with many, maybe most IO devices while also servicing priority interrupts. The ability to run PIO on the ISA bus, word size instead of byte size, at CPU speed instead of 'DMA speed' contributed significantly. The second and maybe more important factor was that peripheral devices became increasingly intelligent and got their own buffers. Device-local buffers meant that the time-critical-ness of device IO disappeared. The IDE drives discussed in detail in the Developer notes pt. I Wiki are great examples of just that. The 'real time service' requirement went away.
Further, PC/AT clock speed increased rapidly with a continuous stream of AT-compatible products from a growing 'compatibles'-industry. In short, the need for DMA disappeared - on ISA-bus systems. DMA came back - same concept, different environment - with the PCI bus and its younger siblings, but that is a different story.
A final developer related point about the benefits of PIO is that in most cases, a tight PIO loop transferring data one way or the other may be interrupted. Another consequence of the new situation in which most if not all data transfers mean 'moving data between buffers': While speed is and will always be important, the real-time-attention element is gone.
Polling vs. interrupts - 101: [Quote from Linkedin Advice]
Polling I/O is a method of checking the status of an I/O device periodically, usually in a loop, to see if it is ready to send or receive data. For example, you can use polling to read a temperature sensor every second, or to write a message to a serial port when it is not busy. Polling I/O is simple to implement and understand, and it gives you full control over the timing and order of your I/O operations. However, this method also has some drawbacks such as wasting CPU cycles and power by constantly checking for I/O events that may not occur frequently or at all, causing delays and missed data if the polling frequency is too low or the I/O device is too fast, and blocking other tasks or processes if the polling loop is too long or too complex.
[Quote from Linkedin Advice]
Interrupt-driven I/O is a method of allowing the I/O device to notify the CPU when it is ready to send or receive data, using a hardware signal called an interrupt. This method is more efficient and responsive than polling I/O, as it avoids unnecessary checks and allows the CPU to perform other tasks until an I/O event occurs. However, there are some challenges associated with interrupt-driven I/O, such as complexity and overhead in your code due to writing and managing interrupt service routines (ISRs). Additionally, conflicts and errors can arise if multiple I/O devices share the same interrupt line or priority level, or if the ISRs interfere with each other or the main program. Furthermore, the performance and reliability of your system could be affected if the interrupt frequency is too high or the ISRs are too long or slow.
Moving from polling to interrupt driven was one of the key motivations for forking TLVC off of ELKS. At the time of writing there is still work to do to make the system completely interrupt driven: The IDE device driver is still polling, the serial line driver is polling on output and the networking subsystem is polling at the protocol (not the driver) level. Still, the difference between where TLVC is today (January 2025) and were it came from is remarkable. A much more responsive system, in particular when running off of floppies, or - as in the example mentioned above - a XT-class system with floppies and MFM disks (which are completely interrupt driven).
There is a reason the IDE disk driver interrupts have been postponed: Since IDE drives are always buffered, the PIO data transfers always happen as fast as the CPU or (more likely) the ISA bus allows. This means that there isn't much speed to gain from moving to fully interrupt driven. But there is - again as alluded to above - a second reason to make the change: Responsiveness. An interrupt driven system will always appear more responsive because system (i.e. user) activity determines priority, not device activity. In a single user, single activity setting the difference may be marginal, but it doesn't take more than some network activity or buffer syncs to make the difference apparent, even with one user.
IOW - more interrupt support is coming to TLVC, and has high priority.
TLVC: Tiny Linux for Vintage Computers