Skip to content

init_c_runtime BSS clear overshoots __bss_end by up to 32 bytes #141

@dogruis

Description

@dogruis

movem.l %d0-%d7/%a0-%a7, -(%sp)

The BSS clear loop in init_c_runtime overshoots __bss_end by 1-32 bytes
due to dbra semantics plus missing tail handling:

movea.l #__bss_start_in_ram, %a3
move.l  #__bss_end, %d0
sub.l   #__bss_start, %d0
lsr.w   #5, %d0           ; d0 = size / 32

.Lcopybss:
        movem.l %d2-%d7/%a0-%a1, (%a3)   ; writes 32 bytes
        lea.l   0x20(%a3), %a3
        dbra    %d0, .Lcopybss

dbra decrements then branches while %d0 != -1, so a loop entered with
%d0 = N runs the body N+1 times - the post-decrement final iteration falls
through when %d0 becomes -1. With size = 64 (so initial %d0 = 2)
the loop writes 3 blocks = 96 bytes; with any size mod 32 != 0 the loop
also overshoots by 32 - (size mod 32) bytes. The overshoot is always
at least 1 byte and at most 32.

Today this is masked by the RAM layout in runtime/ngdevkit.ld:

__data_start_in_ram = (__bss_start_in_ram + SIZEOF(.bss) + 3) / 4 * 4;

.data is placed immediately after .bss in RAM, so the BSS overshoot
lands on the head of .data and is rewritten by the data-copy loop a few
lines down. Any future change putting something else past BSS (debug pads,
malloc heap, ngdevkit-managed scratch) would expose silent corruption at
every boot.

The data-copy loop right below at crt0.S:215-229 already does this
correctly and is the obvious template:

        move.w  %d0, %d1
        lsr.w   #5, %d0
        andi.w  #0x001F, %d1      ; tail byte count
        bra     .Lcopydata_begin  ; pre-branch into the dbra test
.Lcopydata:
        movem.l (%a2)+, %d2-%d7/%a0-%a1
        movem.l %d2-%d7/%a0-%a1, (%a3)
        lea.l   0x20(%a3), %a3
.Lcopydata_begin:
        dbra    %d0, .Lcopydata

        bra     .Lcopylastdata_begin
.Lcopylastdata:
        move.b  (%a2)+, (%a3)+
.Lcopylastdata_begin:
        dbra    %d1, .Lcopylastdata

Suggested fix - mirror the same shape for the BSS clear:

        movea.l #__bss_start_in_ram, %a3
        move.l  #__bss_end, %d0
        sub.l   #__bss_start, %d0
        move.w  %d0, %d1
        lsr.w   #5, %d0
        andi.w  #0x001F, %d1
        bra     .Lcopybss_begin
.Lcopybss:
        movem.l %d2-%d7/%a0-%a1, (%a3)
        lea.l   0x20(%a3), %a3
.Lcopybss_begin:
        dbra    %d0, .Lcopybss

        bra     .Lcopylastbss_begin
.Lcopylastbss:
        clr.b   (%a3)+
.Lcopylastbss_begin:
        dbra    %d1, .Lcopylastbss

Repro:

Paper trace - enter .Lcopybss with %d0 = 2 (64-byte region).
After iter 1 %d0 = 1, iter 2 %d0 = 0, iter 3 %d0 = -1 and the loop
falls through. Three iterations, 96 bytes written.

Or set a watchpoint in MAME/gdb at __bss_end and run any project where
SIZEOF(.bss) & 0x1F != 0; observe the writes past it on boot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions