Skip to content

Conversation

@danolivo
Copy link
Contributor

This commit made it clear that the recently added (c436818) spock_shmem_attach is unnecessary, complicates the code, and is likely to cause future bugs. I removed that old stuff and manually tested how it works during recovery, during checkpoints, and after a hard postmaster SIGKILL.

@danolivo danolivo requested a review from mason-sharp December 24, 2025 12:51
@danolivo danolivo self-assigned this Dec 24, 2025
@danolivo danolivo added the bug Something isn't working label Dec 24, 2025
claude and others added 10 commits December 26, 2025 12:41
Allow PostgreSQL's configure script to auto-detect llvm-config
instead of hardcoding the path to /usr/bin/llvm-config-64.

This improves portability across different systems and LLVM
installations, as the hardcoded path may not exist on all
distributions. The --with-llvm flag is sufficient for configure
to find the appropriate llvm-config binary automatically.

Based on commit e41e201 from dockerfile-stable-llvm-fix branch.
Replace the anti-pattern of setting environment variables via .bashrc
with proper Docker ENV directives. This ensures:

- Environment variables are available in all RUN commands
- No need to source .bashrc in build steps
- More reliable and Docker-native approach
- Cleaner and more maintainable Dockerfile

Variables moved to ENV:
- PATH: Includes PostgreSQL bin directory
- LD_LIBRARY_PATH: Includes PostgreSQL lib directory
- PG_CONFIG: Points to pg_config binary

This simplifies the Spock build step significantly.
Reformat the configure command from a single unreadable line with
eval and quoted options to a clean multi-line format.

Changes:
- Remove unnecessary eval (potential security concern)
- Remove shell variable with quoted options pattern
- Use direct ./configure with multi-line arguments
- Each option on its own line for easy review
- Improved maintainability for future option changes

This makes it much easier to:
- Review which features are enabled
- Add or remove configure options
- Understand the build configuration at a glance
Merge multiple RUN commands into single layers to reduce the number
of layers in the final Docker image. This optimization:

- Reduces image size by combining related operations
- Improves build efficiency
- Better utilizes Docker layer caching
- Follows Dockerfile best practices

Changes:
- PostgreSQL clone + chmod: 2 RUN → 1 RUN
- pgedge setup: 3 RUN → 1 RUN
- PostgreSQL compile: 2 RUN → 1 RUN
- Spock compile: 3 RUN → 1 RUN
- Added descriptive comments for each build stage
- Removed duplicate WORKDIR directive

Total reduction: ~6 fewer layers
Introduce a build argument to control the number of parallel make jobs
instead of using hardcoded values that differed between PostgreSQL
and Spock builds.

Changes:
- Add ARG MAKE_JOBS=4 (default value)
- Replace hardcoded -j4 (PostgreSQL) with -j${MAKE_JOBS}
- Replace hardcoded -j16 (Spock) with -j${MAKE_JOBS}
- Ensures consistent parallelism across all builds

Benefits:
- Can override at build time: --build-arg MAKE_JOBS=16
- Consistent behavior between PostgreSQL and Spock
- Easy to tune for different build environments
- Default of 4 is conservative and works on most systems

The previous inconsistency (j4 vs j16) is now eliminated.
Replace the two-step pattern of COPY + RUN chmod with the modern
COPY --chmod directive available in BuildKit.

Changes:
- COPY --chmod=755 instead of COPY + RUN sudo chmod +x
- Eliminates one RUN layer
- Removes unnecessary sudo usage

Benefits:
- One fewer layer in the final image
- Cleaner and more concise Dockerfile
- Uses modern Docker/BuildKit features
- Atomic operation (no window where files lack execute permission)

This is the recommended approach in modern Dockerfiles.
Major improvements to the base image to fix several critical issues:

1. **Inline package list** - Removed dependency on lib-list.txt file
   - Eliminates need for build context with external files
   - Base image can now be built independently
   - All dependencies explicitly listed in Dockerfile

2. **Fix SSH key location** - SSH keys now created for pgedge user
   - Previously created in /root/.ssh (broken!)
   - Now correctly created in /home/pgedge/.ssh
   - Proper permissions (700 for .ssh, 600 for files)
   - Actually usable for SSH-based testing

3. **Set WORKDIR** - Added WORKDIR /home/pgedge
   - Child images no longer need to set it immediately
   - More sensible default than /

4. **Remove unused directory creation**
   - Removed mkdir /home/pgedge/spock
   - Child images create this when COPYing anyway
   - Eliminates wasted layer

5. **Better documentation**
   - Enhanced header comments
   - Inline comments for each section
   - Additional LABEL for description
   - Clarifies image purpose and contents

6. **Cleanup dnf cache** - Added dnf clean all
   - Reduces final image size

This makes the base image self-contained and actually usable
as a standalone build artifact.
Clarify and fix the user context throughout the build process to
eliminate confusion and properly handle permissions.

Changes:

1. **Explicit USER root at start**
   - Base image ends as USER pgedge
   - Step-1 needs root for installations
   - Now explicitly documented with comment

2. **Proper ownership after COPY**
   - Added chown after copying spock source
   - Ensures pgedge user can access files
   - Prevents permission issues during build

3. **Remove sudo, use su instead**
   - Changed: sudo -u pgedge → su - pgedge -c
   - More explicit about running as different user
   - Added chown before running install script

4. **Better comments on USER switching**
   - Documented why we switch to root
   - Documented why we switch back to pgedge
   - Makes build process clear

Why this matters:
- Eliminates confusion about current user context
- Proper permissions throughout build
- Better suited for CI/CD (GitHub Actions)
- More maintainable and debuggable

The image now has clear user context at each stage:
- Start: root (for installations)
- Runtime: pgedge (for testing)
The SpockApplyProgress structure contained a redundant 'key' field that
duplicated information already present in the containing SpockGroupEntry.
This redundancy violated the DRY principle and could lead to inconsistency
if the key and entry's key ever diverged.

This commit aligns Spock with PostgreSQL core RMGR conventions by:

1. Removing the 'key' field from SpockApplyProgress
2. Introducing a separate xl_progress structure for WAL records that
   explicitly includes both the key (dbid, node_id, remote_node_id) and
   the progress data
3. Updating all callers to pass OIDs explicitly rather than relying on
   the embedded key field

Benefits:
- Eliminates data duplication and potential inconsistency
- Makes the separation between in-memory and on-disk representations clearer
- Follows PostgreSQL conventions for RMGR WAL record structures
- Removes unnecessary assertions that checked key field consistency

The xl_progress structure now serves as the canonical WAL format, while
SpockApplyProgress is purely the in-memory progress data structure.
Being loaded on startup, Spock always initializes its shared memory segment
and shared HTABs in the postmaster process as well as all local pointers to
shared memory.

There is no case when the postmaster does not call shmem_startup_hook() after
cleaning up its shared memory. Also, there is no case when the postmaster
does any sensitive operations before initializing shared memory.

In case of recovery, evidence that shared memory is already initialized and
operable is the fact that redo operations can acquire locks.

Remove the 'attach' routine and its calls in checkpoint hook and bgworker
initialization to make the code more straightforward, reduce the code base,
and reduce potential errors.

Additional code quality improvements:
- Clarify TODO comment about undefined behavior with padding bytes
- Improve function documentation for return values (spock_group_progress_update)
- Remove redundant Assert statement that duplicated runtime check
- Clarify comments about progress structure initialization

This is a partial revert of commit c436818.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants