Auto-TLS support for py-libp2p#1072
Conversation
@lla-dane : Hi Abhinav. Fantastic progress on autotls module. Thank you so much for sharing the details. Appreciate it. Wish to ask if you found the fix in trio.py. Please also resolve the ci/cd issues whenever you get a chance. |
- Enhanced get_remote_address() in TrioTCPStream with address caching and defensive checks to handle socket state transitions gracefully - Fixed Ed25519PublicKey initialization to use from_bytes() method - Added proper type annotation for server_id: ID | None - Added None check for hostname before passing to ClientInitiatedHandshake - Removed unused variables (commented with explanations for future use) - Removed dead code (unused function calls with hardcoded port) - Removed debug print statements in favor of proper logging - Fixed code formatting, import ordering, and line length violations This resolves the get_remote_address() exception that was occurring when the Auto-TLS broker dials back into the node. Fixes issue reported in PR #1072 comments.
Fixes for Auto-TLS PR #1072This commit addresses the 🔧 Main Fix: Enhanced
|
| async def negotiate( | ||
| self, | ||
| communicator: IMultiselectCommunicator, | ||
| negotiate_timeout: int = DEFAULT_NEGOTIATE_TIMEOUT, | ||
| ) -> tuple[TProtocol | None, StreamHandlerFn | None]: | ||
| """ | ||
| Negotiate performs protocol selection. | ||
|
|
||
| :param stream: stream to negotiate on | ||
| :param negotiate_timeout: timeout for negotiation | ||
| :return: selected protocol name, handler function | ||
| :raise MultiselectError: raised when negotiation failed | ||
| """ | ||
| try: | ||
| with trio.fail_after(negotiate_timeout): | ||
| await self.handshake(communicator) | ||
|
|
||
| while True: | ||
| try: | ||
| print("\nNEGOTIATE LOOP") | ||
| command = await communicator.read() | ||
| print("COMMAND: ", command) | ||
| except MultiselectCommunicatorError as error: | ||
| print("ERROR IN NEGOTIATE READ") | ||
| raise MultiselectError() from error | ||
|
|
||
| if command == "ls": | ||
| supported_protocols = [ | ||
| p for p in self.handlers.keys() if p is not None | ||
| ] | ||
| response = "\n".join(supported_protocols) + "\n" | ||
|
|
||
| try: | ||
| await communicator.write(response) | ||
| except MultiselectCommunicatorError as error: | ||
| raise MultiselectError() from error | ||
|
|
||
| else: | ||
| protocol_to_check = None if not command else TProtocol(command) | ||
| if protocol_to_check in self.handlers: | ||
| try: | ||
| await communicator.write(command) | ||
| except MultiselectCommunicatorError as error: | ||
| raise MultiselectError() from error | ||
|
|
||
| return protocol_to_check, self.handlers[protocol_to_check] | ||
| try: | ||
| await communicator.write(PROTOCOL_NOT_FOUND_MSG) | ||
| print("PROTOCOL NOT IN HANDLERS: ", command) | ||
|
|
||
| except MultiselectCommunicatorError as error: | ||
| print("ERROR IN NEGOTIATE WRITE") | ||
| raise MultiselectError() from error | ||
|
|
||
| raise MultiselectError("Negotiation failed: no matching protocol") |
There was a problem hiding this comment.
Debugged further and found an issue happening here:
as we see the broker wrote tls/1.0.0 and we wrote back na as we did not had the handler for tls, so now after this, the loop should have continued, and the broker should try for another security option, but rather we got a read error.
There was a problem hiding this comment.
But this does not happen, when I dialed back to our python node, from a go-libp2p node.
Here the negotiation continued after this log
NEGOTIATE LOOP
COMMAND: /tls/1.0.0
PROTOCOL NOT IN HANDLERS: /tls/1.0.0
but the same thing does happen when the auto-tls broker dials in. I dont understand why this happens.
There was a problem hiding this comment.
@lla-dane I can't see the full images, can you please include the logs. with full commands, and output in text .
And a clear explanation to how to do the test and what you expect.
I'm confused sometimes I see echo and ping why ?
thanks
|
Yeah sure @acul71, I will explain everything properly. So in the autotls procedure, the autotls-broker has to dial in our node (which has to bee publicly accesible) and run identify protocol on our node, too see that our node is real or not. So presently when the autotls-broker is dialing in our node, there is some issue happening in the multiselect-stream protocol negotiation. LOGS: These are the first logs. There are basically to run the autotls-demo script. Here we got dialed in here in this part |
|
Since the p2p-forge autotls-broker repo: https://github.com/ipshipyard/p2p-forge, uses go-libp2p, I dialed in our node from a go-libp2p node to see what happens during the multistream-select protocol neogtiation. DIALER: LISTENER: for just debugging purpose, I dialed to our py-libp2p node from the echo example of go-libp2p. I just needed to see how the multistream-select protocol negotiation goes. |
|
@acul71: For testing, I have DM'd you the ec2 instance keys and how to connect to the instance on discord. There you can simply run the |
|
Hello @lla-dane @seetadev ############ Suspecting it could be a transport-select negotiation issue, I've verified that py-libp2p's transport-select negotiation works correctly with the main branch. The broker negotiation issue seems not a py-libp2p transport-select bug, but rather a broker-specific issue in the HTTP handler context. Evidence: Broker Simulation TestI created a Go dialer that exactly replicates the broker's Test ResultsBroker Simulation CodeThe following Go program exactly mimics the broker's connection behavior: package main
import (
"context"
"fmt"
"log"
"os"
"time"
"github.com/libp2p/go-libp2p"
"github.com/libp2p/go-libp2p/core/peer"
"github.com/libp2p/go-libp2p/p2p/net/swarm"
ma "github.com/multiformats/go-multiaddr"
)
// This mimics what the broker does: creates a libp2p host with default security
// (TLS + Noise) and tries to connect to another peer.
func main() {
if len(os.Args) < 2 {
log.Fatal("Usage: dialer <multiaddr>")
}
targetAddr := os.Args[1]
fmt.Printf("Creating libp2p host with default security (TLS + Noise)...\n")
// Create host EXACTLY like the broker does
h, err := libp2p.New(
libp2p.NoListenAddrs,
libp2p.DisableRelay(),
libp2p.WithDialTimeout(10*time.Second),
libp2p.SwarmOpts(swarm.WithDialTimeoutLocal(10*time.Second)),
)
if err != nil {
log.Fatalf("Failed to create host: %v", err)
}
defer h.Close()
fmt.Printf("Host created with ID: %s\n", h.ID())
fmt.Printf("Dialing target: %s\n", targetAddr)
// Parse the multiaddr
maddr, err := ma.NewMultiaddr(targetAddr)
if err != nil {
log.Fatalf("Invalid multiaddr: %v", err)
}
// Extract peer info
info, err := peer.AddrInfoFromP2pAddr(maddr)
if err != nil {
log.Fatalf("Failed to extract peer info: %v", err)
}
fmt.Printf("Target peer ID: %s\n", info.ID)
fmt.Printf("Target addresses: %v\n", info.Addrs)
// Try to connect with timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
startTime := time.Now()
fmt.Printf("Connecting...\n")
err = h.Connect(ctx, *info)
duration := time.Since(startTime)
if err != nil {
fmt.Printf("FAILED to connect after %v: %v\n", duration, err)
} else {
fmt.Printf("SUCCESS! Connected in %v\n", duration)
// Check what security protocol was used
conns := h.Network().ConnsToPeer(info.ID)
for _, conn := range conns {
fmt.Printf("Connection security: %s\n", conn.ConnState().Security)
}
}
}Python Listener CodeThe Python listener used for testing (main branch, no fixes): #!/usr/bin/env python3
"""
Simple Python libp2p listener for incremental fix testing.
This will be used to test each fix incrementally with the Go dialer.
"""
import logging
import sys
import trio
import multiaddr
import socket
# Enable basic logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
from libp2p import new_host
from libp2p.crypto.secp256k1 import create_new_key_pair
from libp2p.custom_types import TProtocol
def find_free_port():
"""Find a free port for listening."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(('', 0))
s.listen(1)
port = s.getsockname()[1]
return port
PROTOCOL_ID = TProtocol("/echo/1.0.0")
async def echo_handler(stream):
"""Echo handler - reads and echoes back data."""
print(f"[ECHO] Got stream! Protocol: {stream.get_protocol()}", flush=True)
try:
data = await stream.read(1024)
print(f"[ECHO] Received: {data!r}", flush=True)
await stream.write(data)
print(f"[ECHO] Echoed back", flush=True)
except Exception as e:
print(f"[ECHO] Error: {e}", flush=True)
finally:
await stream.close()
async def main():
port = find_free_port()
print(f"[MAIN] Creating libp2p host on port {port}...", flush=True)
# Create a deterministic key for testing
key_pair = create_new_key_pair()
host = new_host(key_pair=key_pair)
listen_addr = [multiaddr.Multiaddr(f"/ip4/127.0.0.1/tcp/{port}")]
async with host.run(listen_addrs=listen_addr):
# Set up echo handler
host.set_stream_handler(PROTOCOL_ID, echo_handler)
peer_id = host.get_id().to_string()
full_addr = f"/ip4/127.0.0.1/tcp/{port}/p2p/{peer_id}"
print(f"\n{'='*60}", flush=True)
print(f"[MAIN] Python libp2p host started!", flush=True)
print(f"[MAIN] Peer ID: {peer_id}", flush=True)
print(f"[MAIN] Full address: {full_addr}", flush=True)
print(f"\n[MAIN] To test with Go dialer, run:", flush=True)
print(f" /tmp/test_broker_go/dialer_test \"{full_addr}\"", flush=True)
print(f"{'='*60}\n", flush=True)
print("[MAIN] Waiting for connections (Ctrl+C to stop)...", flush=True)
# Wait forever
try:
await trio.sleep_forever()
except KeyboardInterrupt:
print("\n[MAIN] Shutting down...", flush=True)
if __name__ == "__main__":
trio.run(main)Test ExecutionPython listener (main branch, no fixes): cd /tmp/py-libp2p && source venv/bin/activate
python3 test_incremental_fixes.py
# Output: Listening on /ip4/127.0.0.1/tcp/54027/p2p/16Uiu2HAm...Go dialer (broker simulation): /tmp/test_broker_go/dialer_test "/ip4/127.0.0.1/tcp/54027/p2p/16Uiu2HAm..."Result: What This Proves
Comparison: Broker Replica vs Actual Broker
Key difference: Broker replica runs as standalone program, actual broker runs inside HTTP handler context. |
Final Answer: What Was Actually Needed?✅ CONCLUSION: Only autotls.py Fix Was Required!The test with only autotls.py changes (all core modules reverted to original) worked perfectly: Test Results:
Evidence from Logs:What This Means:✅ REQUIRED:
❌ NOT REQUIRED:
Core Modules Were Already Fine!The py-libp2p core modules were already correct and didn't need any changes. The only issue was in the application code ( Recommendation:
Summary:
The core py-libp2p implementation was already correct! 🎉 |
Auto-TLS Protocol Compliance AnalysisAuto-TLS Protocol Steps (from spec)According to
Analysis of Your Output✅ Step 1: Request Challenge from ACME ServerEvidence from output: Compliance: ✅ COMPLETE
✅ Step 2: Send Challenge to BrokerEvidence from output: Compliance: ✅ COMPLETE
✅ Step 3: Broker Tests Node and Sets DNS RecordEvidence from output: Compliance: ✅ COMPLETE
Note: The broker's ❌ Step 4: Node Queries DNSMissing from output:
Compliance: ❌ NOT IMPLEMENTED ❌ Step 5: Signal Challenge Completion to ACMEMissing from output:
Compliance: ❌ NOT IMPLEMENTED ❌ Step 6: Poll ACME Server for Challenge StatusMissing from output:
Compliance: ❌ NOT IMPLEMENTED ❌ Step 7: Finalize Certificate Request (CSR)Missing from output:
Compliance: ❌ NOT IMPLEMENTED ❌ Step 8: Poll ACME Server for CertificateMissing from output:
Compliance: ❌ NOT IMPLEMENTED ❌ Step 9: Download CertificateMissing from output:
Compliance: ❌ NOT IMPLEMENTED Summary✅ Completed Steps (3/9):
❌ Missing Steps (6/9):
ConclusionPartial Compliance: The output shows the first 3 steps of the Auto-TLS protocol were completed successfully:
However, the remaining 6 steps are not implemented in the current
The code stops after receiving the HTTP 200 OK from the broker, which indicates the broker successfully verified the node's reachability. But the full Auto-TLS flow requires completing the ACME certificate issuance process. RecommendationTo achieve full Auto-TLS protocol compliance, the following steps need to be implemented:
The current implementation is a proof-of-concept that demonstrates the broker dial-back mechanism works, but it does not complete the full certificate issuance flow. |
855360f to
8898d1b
Compare
- Enhanced get_remote_address() in TrioTCPStream with address caching and defensive checks to handle socket state transitions gracefully - Fixed Ed25519PublicKey initialization to use from_bytes() method - Added proper type annotation for server_id: ID | None - Added None check for hostname before passing to ClientInitiatedHandshake - Removed unused variables (commented with explanations for future use) - Removed dead code (unused function calls with hardcoded port) - Removed debug print statements in favor of proper logging - Fixed code formatting, import ordering, and line length violations This resolves the get_remote_address() exception that was occurring when the Auto-TLS broker dials back into the node. Fixes issue reported in PR libp2p#1072 comments.
- Enhanced get_remote_address() in TrioTCPStream with address caching and defensive checks to handle socket state transitions gracefully - Fixed Ed25519PublicKey initialization to use from_bytes() method - Added proper type annotation for server_id: ID | None - Added None check for hostname before passing to ClientInitiatedHandshake - Removed unused variables (commented with explanations for future use) - Removed dead code (unused function calls with hardcoded port) - Removed debug print statements in favor of proper logging - Fixed code formatting, import ordering, and line length violations This resolves the get_remote_address() exception that was occurring when the Auto-TLS broker dials back into the node. Fixes issue reported in PR libp2p#1072 comments.
|
All the linting errors and merge conflicts are resolved. The working example of autotls is under Heres's the demo: autotls.mp4Next I will start with restructuring all the code, in a separate autotls module, and setup a clean and modular example from there. CCing: @seetadev |
- Enhanced get_remote_address() in TrioTCPStream with address caching and defensive checks to handle socket state transitions gracefully - Fixed Ed25519PublicKey initialization to use from_bytes() method - Added proper type annotation for server_id: ID | None - Added None check for hostname before passing to ClientInitiatedHandshake - Removed unused variables (commented with explanations for future use) - Removed dead code (unused function calls with hardcoded port) - Removed debug print statements in favor of proper logging - Fixed code formatting, import ordering, and line length violations This resolves the get_remote_address() exception that was occurring when the Auto-TLS broker dials back into the node. Fixes issue reported in PR libp2p#1072 comments.
fb3002a to
16955b8
Compare
- Enhanced get_remote_address() in TrioTCPStream with address caching and defensive checks to handle socket state transitions gracefully - Fixed Ed25519PublicKey initialization to use from_bytes() method - Added proper type annotation for server_id: ID | None - Added None check for hostname before passing to ClientInitiatedHandshake - Removed unused variables (commented with explanations for future use) - Removed dead code (unused function calls with hardcoded port) - Removed debug print statements in favor of proper logging - Fixed code formatting, import ordering, and line length violations This resolves the get_remote_address() exception that was occurring when the Auto-TLS broker dials back into the node. Fixes issue reported in PR libp2p#1072 comments.
autotls-mpv.mp4This is a latest demo on the autotls procedure. There are a few vulnerabilities(and things to do) in the TLS handshake procedure. Some of the hacks that I did were:
We can get the certificate, on the listener's side by giving its ssl context a trust store, but I need to see how to properly do it, which would be done in future PRs
There may be many more issues, that you find while reveiwing, I will resolve them as soon as they are flagged. Docstrings are remaining, will work on them next |
| def generate_rsa_key(bits: int = 2048) -> RSAPrivateKey: | ||
| key = rsa.generate_private_key( | ||
| public_exponent=65537, | ||
| key_size=bits, | ||
| backend=default_backend(), | ||
| ) | ||
| return key |
There was a problem hiding this comment.
Curretly the procedure is using this RSA key generation technique, will switch to native RSA key generation in the future PRs.
| # TODO: need to also verify the signature sent by the server | ||
| # sig_b64 = msg["sig"] # Will be used for signature verification | ||
|
|
There was a problem hiding this comment.
This signature verification is remaining, will complete in a coming PR.
- Enhanced get_remote_address() in TrioTCPStream with address caching and defensive checks to handle socket state transitions gracefully - Fixed Ed25519PublicKey initialization to use from_bytes() method - Added proper type annotation for server_id: ID | None - Added None check for hostname before passing to ClientInitiatedHandshake - Removed unused variables (commented with explanations for future use) - Removed dead code (unused function calls with hardcoded port) - Removed debug print statements in favor of proper logging - Fixed code formatting, import ordering, and line length violations This resolves the get_remote_address() exception that was occurring when the Auto-TLS broker dials back into the node. Fixes issue reported in PR libp2p#1072 comments.
Add the auto-generated examples.autotls.rst file to the repository so that ReadTheDocs can find it when building the documentation. This file is generated by sphinx-apidoc and is referenced in the examples.rst toctree.
|
Hey @pacrob, @seetadev, @acul71 : I have added the docstrings and a few cleanups. In the http-utils that I have used for closing the exchanges with ACME server and libp2p-forge listener, they were recommended by AI-agent, so it may be the case that it could have been done in a simpler way. I had limited knowledge about how http api works while writing this, so relied on AI and was focused on making it work. If there are feedbacks regarding refactoring the http utils/patterns used in ACME and AutoTLS-broker negotiation code, I will surely follow them up in a future PR along with 2 changes that I have mentioned in the comments. Other than that will fix any logic errors that will flagged in this PR. Below is a ss of the working ping exchange over TCP (will integrate it with web-transport and QUIC in a future PR) between 2 peers with AutoTLS certificates being used in TLS handshake. ListenerDIALER |
| async def _do_primitive_key_exchange(self) -> None: | ||
| """ | ||
| Perform a primitive key exchange with the remote peer. | ||
|
|
||
| Sends the local public key over the raw connection, receives the | ||
| peer's key, and stores it along with the derived Peer ID for subsequent use. | ||
|
|
||
| :return: None | ||
| """ | ||
| pk_bytes = self.local_prim_pk | ||
|
|
||
| # Exchange | ||
| await self.raw_connection.write(pk_bytes) | ||
| data = await self.raw_connection.read(36) | ||
|
|
||
| pub_key_pb = PublicKey.deserialize_from_protobuf(data) | ||
| ed25518_key = Ed25519PublicKey.from_bytes(pub_key_pb.data) | ||
|
|
||
| self.remote_primitive_pk = ed25518_key | ||
| self.remote_pid = ID.from_pubkey(ed25518_key) |
There was a problem hiding this comment.
|
@lla-dane : Hey Abhinav — this is really solid work. 👏 A few highlights from my side:
Overall, this PR meaningfully moves py-libp2p forward. It unblocks Auto-TLS as a real, usable capability and sets us up nicely for QUIC, WebTransport, and broader browser-facing use cases in follow-ups. From my side, this looks ready to merge once the remaining checks settle. 🙌 Huge thanks for the persistence and for tackling something this deep in the stack — really appreciate the work here. @acul71 , @pacrob : Wish to also thank you for your great support and for enabling us get unblocked multiple times while working on the PR. Luca, your help at the beginning was simply awesome :) Appreciate it. |





Aims to resolve #555
This PR introduces Auto-TLS support for py-libp2p, alligned with the libp2p Auto-TLS client specification. The goal is to allow libp2p nodes to automatically obtain and manage CA (ACME) authorised TLS certificates, without requiring self-signed certificates.
Importance of Auto-TLS
In many environments that rely on TLS -- most notably the browser ecosystem and standard HTTPS stacks -- self-signed certificates are not accepted by defualt. Browsers, reverse proxies, and many client libraries enfore WeB PKI validation rules that require certificates to chain back to a trusted Certificate Authority. As a result, self-signed certificates often require custom trust configuration or rejected outright.
By usign CA-authorized certificates, py-libp2p transports can operate within these existing constraints instead of working around them.
Implementation
At a high level, this work wires together:
libp2p.directnamespace.The implementation closely follows the Auto-TLS client spec and the peer-id-auth flow descibed in the libp2p specs, and is intented to be a foundation that transports. (eg. QUIC / TLS / WebTransport) can build on.
References:
https://github.com/libp2p/specs/blob/master/tls/autotls-client.md
https://github.com/libp2p/specs/blob/master/http/peer-id-auth.md
https://blog.libp2p.io/autotls/