Skip to content

TLVC Developer Notes pt. II: Networking

Helge Skrivervik edited this page Dec 1, 2025 · 5 revisions

Refer to the TLVC Networking Guide Wiki for a practical introduction to networking on TLVC.

TLVC NIC driver development

[By Helge Skrivervik/@mellvik 2024/2025]

TLVC inherited the general driver API from ELKS, which remains unchanged. However, there is more to a driver interface than the API itself - in particular in a system this compact: Global variables, assumptions about the environment, configuration options etc. This document discusses these 'environmental issues' - and is subject to change as the environment evolves.

Some NIC drivers have a ASM-component with functionality that partly overlap. In order to optimize and not the least to increase portability, these components are going away over time, being replaced by C code or being merged into kernel libraries. The exception being the ne2k driver which has significant parts implemented in ASM code and most likely will remain that way for efficiency reasons.

Default configuration settings - irq, ioport, … - for NIC interfaces (and other interfaces) are set in tlvc/include/arch/ports.h. These settings populate the netif_parms-array in init/main.c and are overridden by per-NIC settings from /bootopts at boot time. ports.h is not needed but the drivers.

struct netif_parms netif_parms[MAX_ETHS] = {
    /* NOTE:  The order must match the defines in netstat.h */
    { NE2K_IRQ, NE2K_PORT, 0, NE2K_FLAGS },
    { WD_IRQ, WD_PORT, WD_RAM, WD_FLAGS },
    { EL3_IRQ, EL3_PORT, 0, EL3_FLAGS },
    { EE16_IRQ, EE16_PORT, EE16_RAM, EE16_FLAGS },
    { LANCE_IRQ, LANCE_PORT, 0, LANCE_FLAGS },
};

tlvc/include/linuxmt/netstat.h defines the structs and macros needed by the drivers. Additionally, some drivers have their own header files for constants and macros.

Inside the driver proper, the global data are conveniently accessed via these macros:

extern struct eth eths[];

/* runtime configuration set in /bootopts or defaults in ports.h */
#define net_irq     (netif_parms[ETH_EE16].irq)
#define net_port    (netif_parms[ETH_EE16].port)
#define net_ram     (netif_parms[ETH_EE16].ram)
#define net_flags   (netif_parms[ETH_EE16].flags)

static struct netif_stat netif_stat;
static char model_name[] = "ee16";
static char dev_name[] = "ee0";

The eths array is defined in tlvc/arch/i86/drivers/char/eth.c and holds the file_operations pointers for each driver, plus a pointer to the netif_stat structure, which contains the interface's MAC address and statistics collected by the driver (primarily error stats).

The model_name and dev_name are more consistently used by the networks drivers than elsewhere in the system. dev_name is the device name as found in the dev directory and is always the same for a given driver. The model_name may vary, depending on what the _probe routine finds during initialization. For example, the ne2k may report either ne2k or ne1k as the name, the wd driver may report wd8003 or wd8013 etc.

dev_name is also the name used for per-device configuration in /bootopts and may conveniently be made part of the netif_params struct, saving some string space.

…
               if (!strncmp(line,"ne0=", 4)) {
                        parse_nic(line+4, &netif_parms[ETH_NE2K]);
                        continue;
                }
                if (!strncmp(line,"wd0=", 4)) {
                        parse_nic(line+4, &netif_parms[ETH_WD]);
                        continue;
                }
…

Buffer configuration for TLVC Ethernet NICs

NIC buffering is experimental in TLVC, and currently implemented only in the ne2k NIC driver. Given its experimental nature there are several configuration options available, allowing test/evaluation of what works and what does not. There is of course a dynamic here in that what works depends on other components of the system and not the least the system itself. The more asynchronous the IO system in general becomes, the more useful NIC-buffering becomes as it can take advantage of overlapping IO. Likewise, the slower the system is, the more beneficial buffering becomes - for networking speed and for general system responsively.

As of July 2024, there is little if any benefit in allocating transmit buffers for NICs. Extensive testing confirms that given the typical speed/performance of the systems involved and the nature of today's networks, a transmitted packet will exit the NIC (and generate a transmit complete interrupt) almost before the write-packet routine has returned to its caller. This means (among other things) that allocating 4 on-NIC tx-buffers, as we do in the ee16-driver with 32k RAM, is really a waste.

The exception - as alluded to above - is really slow systems, i.e. XT class systems. On such systems even the transmit buffer becomes useful thanks to the interrupt driven nature of the NIC drivers in general and the ne2k-driver in particular. Anything that can be offloaded to run asynchronously helps general performance. So, given the availability of memory, allocating 1 or 2 transmit buffers and 2-4 receive buffers is a good thing. The latter will augment whatever buffer space the NIC itself has and reduce the chances of NIC overruns.

The current buffer implementation offers 3 buffer strategies:

  • No buffers: The driver is moving data directly to/from the requester (via far mem-move), i.e. ktcp.
  • Static buffers: The number of send and receive buffers specified in the NET_OBUFCNT and NET_IBUFCNT defines are allocated statically at compile time. The BUFCNT numbers may be 0, in which case the driver will run as if NO_BUFS was set (except the extra code is compiled in).
  • Heap allocation: Buffer space is allocated from the kernel heap, which is slightly different from static allocation in that memory is allocated only if the network is running. Memory consumption will be slightly higher because of the headers in the heap allocation.

When HEAP allocation is used, the # of buffers may be set in /bootopts via the netbufs= directive. netbufs=2,1 means 2 receive buffers, 1 transmit buffer. The kernel will not do sanity checking on the netbufs= numbers, so it's entirely possible to make the system unbootable by requesting too many buffers. 2,0 or 2,1 are reasonable choices for regular usage, the default (set in tlvc/arch/i86/drivers/net/netbuf.h) is 2.0. Again, zero is a valid selection and will turn off buffers entirely. When using heap allocation, a header strucure per buffer is also allocated from the heap, 6 bytes per buffer.

AGAIN: zero buffers is always a valid choice - which complicates the driver somewhat but makes benchmarking much more convenient.

Ktcp, slow start & congestion avoidance - an analysis

Intro

Slow start and congestion avoidance are mandatory parts of TCP but never implemented in ktcp. Not a practical problem because ktcp has typically been used either in an emulation environment or with 'bigger' (newer, faster, …) systems as the peer, with a 'full scale' TCP/IP implementations running on Linux, MacOS, Windows and the like - in a LAN setting. Predictable environments in which 'the other end' have easily smoothed over whatever weaknesses ktcp have had. A ktcp system is in most cases orders of magnitude slower than even a peer like the RaspberryPi 2b, with buffers and window sizes so limited that even a 1990 NIC becomes 'big' in terms of buffer capacity. IOW, the incompleteness of and weaknesses in ktcp have been if not invisible, acceptable. Slow but good enough.

This changes when ktcp is talking to ktcp, and the challenge gets exacerbated when the systems have significantly different speeds and possibly NIC buffer capacity. Or both systems are very slow. Like an XT class system running an ne1k NIC with 8k buffer against a 386 AT class with a ee16 NIC and 32k or even 64k NIC buffer. The slower the system, the more important the NIC buffer becomes in keeping the flow going. And while the recently introduced driver level buffers in TLVC have had minimal if any effect on faster systems, they seem to make a difference on slower ones. Since slow start and congestion avoidance is mostly about keeping data flow going with reasonable efficiency even when network and resource conditions vary, their presence (or not) have little effect on interactive (telnet) connections. Even with long listings, such connections are comparably slow and their 'jaggedness' usually caused by other limitations. OTOH, telnet connections are useful in testing for that reason - they pose an entirely different scenario and thus different challenges - small packets, long waits, etc.

Slow Start, Congestion Avoidance (SS,CA)

Slow start and congestion avoidance are different algorithms solving different but related problems. Like siamese twins, working together, implemented together, interdependent. Slow start takes a connection from no traffic to the first 'problem' - a retransmit, triggered by either a timeout or a double or triple ACK from the receiver. The starting 'window size' is one packet, meaning 'send one packet, wait for ack' mode. The window, often referred to as 'congestion window', then increases by one packet for every ack received (exponential and not slow at all in spite of the name). At the point which trouble arises (retransmit required), the window is divided by 2 and saved as the 'slow start threshold', aka 'ssthrshld'', to be used later. Then the window is set back to 1 and the procedure is restarted.

The second time (and later) is different. When the congestion window reaches there value saved in ssthrshld, slow start passes the bat to 'congestion control', which instead of the exponential window increase, uses a linear path: The window increases by one per RTT, connection round trip time, instead of per received ACK. A timed increase instead of an activity based increase. Note that the parameters are gathered from the actual connection, which is a key point: While a receiver may signal its capacity to the sender by advertising a window size and a max segment size, this is the sender's perception of the connection and the receiver's capabilities combined.

SS, CA and ktcp: When reality meets theory

In literature and net posts, slow start and congestion avoidance are discussed in the context of segments, bytes and window sizes. Also, the discussions (and implementations) are an order of magnitude beyond anything ktcp and a TLVC or ELKS system is capable of in terms of complexity, traffic, buffers and code size. For example, a typical ktcp connection has an outbound effective window size of between 1500 and 2000 bytes, limited by the amount of retransmit buffer space allowed per connection, a compile time constant. In more capable system, whether it's BSD Unix from the mid 80s or a Mac/Windows/Linux system from our day, it is the receiver's advertised window that determines how much data (and thus how many packets) may be 'inflight' at any time. And it's comparatively huge - 32k, 64k or more.

This difference, or rather, the small resources available to ktcp, severely limits the usefulness of the standard algorithms in TCP/IP, in particular the congestion avoidance part of the SS/CA twins. In an outgoing file transfer situation, the ktcp retransmit buffer typically allows 3 packets of 540 bytes each - the practical 'Max Segment Size' (MSS) defined by the system (actually 512 bytes + overhead). If we run into trouble and need retransmits, the slow start threshold becomes 1 or 2 depending on whether we round up or down. But the spec says the minimum is two packets, so 2 is the choice. But how do we do a linear increase from 2 to 3, which is the max? It's easy, but it's also nonsensical. We don't need an algorithm for that, do we?

There are other settings in which the picture looks slightly different. When running HTTP or TELNET, packet sizes vary wildly and we may at times fit 5 or even 10 packets into the retransmit buffer. Maybe unsurprisingly, this is also the type of scenario that tends to cause trouble. If the sender is at least slightly faster than the receiver, the sender will fill the retransmit buffer with as many packets as possible in a rush, and effectively choke a slow receiver. A timeout follows and the sender will start retransmitting - mechanically, one packet at a time (not back to back), without waiting for acks, while the receiver's window is 'open'. This is obviously unsmart because it kills performance. Still, this is how things worked in ktcp before the major rework in TLVC 2024/2025.

On the other hand, the slow start idea seems immediately attractive: Starting slow will avoid choking a slow receiver and the (relatively) fast acceleration will allow us to discover the receiver's choking point fast. Knowing that, we can adjust the sending to something the receiver can deal with. It sounds easy, but there is a lot more to it than meets the eye at first. Such as the receiver's ability to prioritize network activity when other activities are competing for attention locally. We obviously need a lot of flexibility and adaptability in our slow start implementation, and to consider elements not mentioned in any standard or specification.

The key challenge with a ktcp-ktcp connection is that a very slow receiver will cause the sender to time out and retransmit - not once-in-a-while but frequently, in most cases dumping the entire retransmit buffer. Not because packets have been lost (which may happen when there is a NIC rcv-buffer overrun on the recipient side), but simply because the receiver is slow. Slow start and congestion avoidance help alleviate this (more about that later), but as it turns out, they don't help with the most aggravating problem, the dumping of the entire retransmit buffer.

This is not a functional problem - TCP can deal with this, only a performance problem. What happens in a scenario where the recipient is significantly slower than the sender, ktcp running at both ends – is as follows:

  1. The recipient doesn't respond for a while, being busy with other activities like disk or floppy IO. A 'while' being as much as half a second, possibly more - like if a sync is in progress - to a synchronous (non-interrupt-driven) device.
  2. The sender's retrans buffer fills up and the oldest packet times out.
  3. The sender retransmits the first (oldest) unacked packet.
  4. Usually, the next packet will be retransmitted shortly after because the retransmit routine finds it has timed out. However, the retrans_retransmit() function has been modified to limit the number of retransmits inflight to one.
  5. When the first retransmitted packet eventually gets ack'ed by the recipient, the next one gets fired off by the sender almost instantly because it has timed out. This immediate retransmit is exactly what the (slow) recipient doesn't need: It's in the process of, but hasn't yet had the chance to ack the next packet(s) in its input queue (in most cases on the NIC), which likely is the ACK (maybe several) we're waiting for.
  6. So the next retransmits fires off, then the next and the next till the retrans buffer is empty - completely useless, and ensuring a continuous overload on the recipient.

This is obviously a waste of time (slowing down an already slow connection) even though it doesn't break anything. We're already adding an extra delay (the current inflight value) which works well on some systems but not on the slowest. Also, the extra delay is decreasing as more packets get ack'ed while the actual round trip times may be increasing, reaching levels like 1-2.5s.

The pattern we're seeing when doing transfers from a not-so-slow to a very slow system is that

  1. Thanks to the speed difference, the first retrans comes early, just after the chain fills up - the chain being primarily the sender's retrans buffer.
  2. The larger the retrans buffer, the larger the problem - which tells us that it's important to keep the receive window small on slow systems.
  3. When there is a timeout, the entire retransbuffer is almost always sent before 'regular' traffic continues.
  4. As the RTT grows - with the retransmits in particular, the situation stabilizes, and we end up with a flow adapted to the recipient's capacity.

Ideally, and this is not in the TCP spec, just common sense given the situation: on timeout the sender should retransmit only the first (oldest) unack'ed packet, then wait for the corresponding ack (or do another retransmit if the ack doesn't come) before continuing with the next packet if still unack'ed. However, ktcp cannot do that because the retrans buffer does not track connections. It's just a collection of packets with a timeout. There is no link between packets belonging to the same connection, and keeping track of retransmits per connection becomes rather complicated.

Such added complexity is as undesirable as the original simplicity is attractive. Fortunately, there is another way - a 'trick' or 'kludge' or whatever, that turns out to work quite well. With the implementation of the new algorithms in ktcp came - for convenience - a per connection counter keeping track of the number of outstanding (unack'ed) packets, commonly known as packets inflight. In most cases when there is a retransmit timeout, the retransmit buffer is fille with packets belonging to this particular connection, which means there are between 3 and 6 packets inflight. Adding this number to the timeout value (unit = 1/16s) before retransmitting any packet turns out to work well not only as a remedy for the dump-all-packets problem, but for retransmits in general: In many cases giving the slow peer just enough time to ACK one or more inflight packet and get the flow going again.

An interesting question is why the congestion avoidance algorithm doesn't fix this problem. After all, this seems to be exactly the type of problem it is aimed at - a 'lost' packet and a retransmit timeout. The answer is - it does, but the implementation doesn't. Also, there is a grey-area here, because the packets in the retransmit buffer aren't part of the main flow. Regardless, the implementation should ensure that this doesn't happen, which is what we were just discussing: The implementation should ensure that dumping of the retransmit buffer, which keeps the receiver busy receiving instead of sending more acks, doesn't (cannot) happen.

There are two more measures which turn out to improve the retransmit situation. One is an adjustment of the weighting in the RTT convergence algorithm. For every packet acked, ktcp calculates the round trip time (RTT) for this packet. The RTT is then used to update the connection RTT (and subsequently the RTO). [to be continued]…

The other is the size of the retransmit buffer: Too small and it kills performance, too large and it kills recovery from a retransmit. (A good size seems to be - with the current test-regime - 1560. Using MAX/2 is too large. [to be continued]

Worst case test scenarios

There are two settings that create 'extreme' network delays in a TLVC/ELKS system. The most obvious is floppy I/O. When the recipient system needs to write to/flush buffers to floppy and the IO passes via the BIOS, the delays can often men measured in seconds. Should the system have a large buffer cache (possibly even xms buffers) and a sync() hits, many seconds may elapse before the system is responsive again. Fortunately, on really slow (XT-class) systems, there will never be a XMS buffer cache, which will limit the max delay.

While this may not be anywhere close to a normal setting, it needs to be handled in a reasonable (robust, predictable and reliable) manner. In this setting, real packet loss due to NIC buffer overruns are common. Using the direct floppy driver makes a significant difference, but doesn't eliminate the problem.

The other setting is when the user accesses the slow system via a serial line, running at say 9600 (the actual output speed is lower - and variable depending on load), then telnets to different TLVC/ELKS system - which may be faster or the same speed - and runs a long listing, such as hd /bin/vi to the terminal. The delays caused by waiting for the serial line output (currently not interrupt driven) at this speed, turn out to be a real worst case scenario for ktcp.

[to be continued]

Clone this wiki locally