diff --git a/docs/api.md b/docs/api.md index e2b6138..7f0f28f 100644 --- a/docs/api.md +++ b/docs/api.md @@ -1,8 +1,8 @@ *** -# File Management Endpoint (`/files`) +# File Management API -This endpoint provides file management capabilities, allowing clients to upload, retrieve, and manage files through various HTTP methods. `http_files_api` must be set to `True` in pyrobusta.env to enable this API. +This API provides file management capabilities, allowing clients to upload, retrieve, and manage files through various HTTP methods. `http_files_api` must be set to `True` in pyrobusta.env to enable this API. ## Summary @@ -19,12 +19,22 @@ This endpoint provides file management capabilities, allowing clients to upload, ### 1. File Retrieval/Listing (`GET /files/{path}`) -This endpoint allows general file system interaction, enabling operations such as listing directory contents and retrieving metadata as well as downloading files. +This method allows general file system interaction, enabling operations such as listing directory contents and retrieving metadata as well as downloading files. * **Method:** `GET` * **Path:** `/files/{path}` * **Success Response:** 200 OK. +#### Example request + +```bash +$ curl 192.168.1.100/files/www +[ + {"path": "/www/examples.html", "created": "90", "size": "4507"}, + {"path": "/www/index.html", "created": "91", "size": "1198"} +] +``` + ### 2. File Upload / Overwrite (`PUT /files/{file path}`) This method is used to upload a file or overwrite an existing file at a specific path. @@ -36,19 +46,42 @@ The upload path is restricted to /www/user_data. * **Success Response:** 201 Created. * **Notes:** `transfer-encoding: chunked` is supported. +#### Example request + +```bash +$ curl -X PUT --data 'This is a test.' http://192.168.1.100/files/www/user_data/test.txt +OK + +$ curl 192.168.1.100/files/www/user_data/test.txt +This is a test. +``` + ### 3. File Upload (`POST /files`) This method handles general file uploads, designed for uploading multiple files with per-file chunking supported. Only multipart/form-data is accepted as a content type. The upload path is restricted to /www/user_data, however, content-disposition headers only have to specify the file name, /www/user_data is prepended by default. -`http_multipart` must be set to `True` in the configuration to use this endpoint. +`http_multipart` must be set to `True` in the configuration to use this method. * **Method:** `POST` * **Path:** `/files` * **Body:** File content encapsulated in multipart/form-data. * **Success Response:** 201 Created. +#### Example request + +```bash +$ echo "File 1 content" > /tmp/upload-1.txt +$ echo "File 2 content" > /tmp/upload-2.txt +$ curl -X POST --form file1='@/tmp/upload-1.txt' --form file2='@/tmp/upload-2.txt' http://192.168.1.100/files +$ curl 192.168.1.100/files/www/user_data +[ + {"path": "/www/user_data/upload-1.txt", "created": "418", "size": "15"}, + {"path": "/www/user_data/upload-2.txt", "created": "418", "size": "15"} +] +``` + ### 4. File Delete (`DELETE /files/{file path}`) This method is used to delete a file at a specific path. @@ -57,3 +90,9 @@ The path is restricted to /www/user_data. * **Method:** `PUT` * **Path:** `/files/{file path}` * **Success Response:** 204 No Content. + +#### Example request + +```bash +$ curl -X DELETE 192.168.1.100/files/www/user_data/test.txt +``` diff --git a/docs/architecture/state_machine.md b/docs/architecture/state_machine.md new file mode 100644 index 0000000..79f7160 --- /dev/null +++ b/docs/architecture/state_machine.md @@ -0,0 +1,91 @@ +# HTTP state machine parser + +[http.py](../../src/pyrobusta/protocol/http.py) implements a continuation passing parser using a +finite state machine (FSM). Each state consumes available sufficient data to make progress or explicitly +suspend until more data arrives. + +In general, states are not required to transition to a terminal state if a request is incomplete. +Instead, states return control to the asyncio event loop, which drives subsequent invocations of the +state machine based on socket readiness. The state machine may be terminated by the surrounding coroutine in +the case of a session timeout or transport error. This is a deliberate architectural decision to separate HTTP +protocol semantics from transport-level I/O scheduling concerns. + +The state machine can be decomposed to four sub-FSMs, depicted by the below diagrams. The state machine applies +to a single HTTP session with a dedicated request and response stream buffer. + + +## HTTP Request Line and Header Parsing +```mermaid +stateDiagram-v2 + + [*] --> start_parser + + start_parser --> parse_request_line_st: rx.size() > 0 + start_parser --> start_parser: empty buffer + + parse_request_line_st --> parse_headers_st: valid request line parsed + parse_request_line_st --> parse_request_line_st: incomplete line + parse_request_line_st --> [*]: 405/505 terminate + + parse_headers_st --> route_request_st: headers complete + parse_headers_st --> parse_headers_st: waiting for \r\n\r\n + parse_headers_st --> [*]: invalid headers (host missing etc.) +``` + +## Routing and Body Strategy Selection +```mermaid +stateDiagram-v2 + + route_request_st --> app_endpoint_st: endpoint + no payload + + route_request_st --> recv_payload_st: content-length body + route_request_st --> recv_chunk_size_st: chunked encoding + route_request_st --> start_multipart_parser_st: multipart body + + route_request_st --> fs_retrieve_st: GET/HEAD fallback file server + + route_request_st --> [*]: 404 no route + route_request_st --> [*]: 405 method not allowed + route_request_st --> [*]: 204 OPTIONS + + recv_payload_st --> app_endpoint_st: full body received + recv_payload_st --> recv_payload_st: waiting for content-length + + recv_chunk_size_st --> recv_chunk_st: size parsed + recv_chunk_size_st --> recv_chunk_size_st: waiting for chunk size + + recv_chunk_st --> app_endpoint_st: chunk complete + recv_chunk_st --> recv_chunk_st: waiting for full chunk +``` + +## Application Execution and Response Generation +```mermaid +stateDiagram-v2 + + app_endpoint_st --> app_endpoint_st: execute callback / process request + + app_endpoint_st --> recv_chunk_size_st: more chunked data expected + + app_endpoint_st --> generate_multipart_response_st: multipart response + + app_endpoint_st --> [*]: 200 OK (default completion) + + fs_retrieve_st --> [*]: 200 file served + fs_retrieve_st --> [*]: 403 forbidden + fs_retrieve_st --> [*]: 404 file missing + + generate_multipart_response_st --> [*]: 200 headers set + stream ready +``` + +## Multipart Request Processing +```mermaid +stateDiagram-v2 + + start_multipart_parser_st --> parse_boundary_st: boundary validated + + parse_boundary_st --> parse_complete_part_st: boundary detected + parse_boundary_st --> parse_boundary_st: waiting for boundary + + parse_complete_part_st --> parse_boundary_st: more parts remain + parse_complete_part_st --> [*]: final part processed (200) +``` \ No newline at end of file diff --git a/docs/configuration.md b/docs/configuration.md index 428ad33..b6eb3da 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -12,7 +12,7 @@ to upload it to the root directory of the target device. | http_multipart | Enable multipart HTTP requests/responses. | False | | http_mem_cap | Max memory cap (% × 0.01) of usable heap for HTTP request/response stream buffers. | 0.1 | | http_served_paths | Space delimited list of filesystem paths allowed to be served through HTTP. | "/www /lib/pyrobusta" | -| http_files_api | Enables or disables the file management API endpoint (/files), allowing to upload, download, and list files. | False | +| http_files_api | Enables or disables the [file management API](./api.md#file-management-api) endpoint (/files), allowing to upload, download, and list files. | False | | socket_max_con | Max number of socket connections of any enabled application server. | 2 | | tls | Enables or disables TLS. When turned on, cert.der/key.der must be installed at the root. | False | | log_level | Can be one of: warning, info, debug. | "info" | diff --git a/src/pyrobusta/bindings/http_connection.py b/src/pyrobusta/bindings/http_connection.py index 8577832..876937d 100644 --- a/src/pyrobusta/bindings/http_connection.py +++ b/src/pyrobusta/bindings/http_connection.py @@ -91,7 +91,10 @@ async def _run_state_machine(self): if not self._engine.is_request_empty() and self._engine.is_terminated(): self._engine.write_response_head(self._send_buf) await self._flush_response() - if self._engine.resp_handler is not None: + if ( + self._engine.resp_handler is not None + and not self._engine.method == self._engine.HEAD + ): await self._response_handler(self._engine.resp_handler) async def _response_handler(self, resp_handler): diff --git a/src/pyrobusta/protocol/http.py b/src/pyrobusta/protocol/http.py index b14fead..43462d0 100644 --- a/src/pyrobusta/protocol/http.py +++ b/src/pyrobusta/protocol/http.py @@ -488,7 +488,7 @@ def set_response_body( object, stored by the resp_handler member. resp_handler can be used for writing the body by the transport layer. This method also updates the content-type and content-length - headers. + headers. In the case of a HEAD request, the body is omitted. :param body: body to be sent in the response :param content_type: content-type of the body """ @@ -616,6 +616,20 @@ def has_payload(self): "content-length" in self.headers and self.headers["content-length"] > 0 ) or self.is_chunked() + def _consume_payload(self, rx, size, last=False): + """ + Consume data from the request buffer and increment content length counter. + Raise an exception if the content length is exceeded. Allow strict checking + of content length when the last flag is set. + """ + if "content-length" in self.headers and ( + (self.content_len_cnt + size > self.headers["content-length"]) + or (last and self.headers["content-length"] != self.content_len_cnt + size) + ): + raise InvalidContentLength() + self.content_len_cnt += size + rx.consume(size) + # ================================================================================ # Parser states # - all states must handle rx buffer argument for reading request data @@ -690,7 +704,7 @@ def _route_request_st(self, _): self.terminate(204, True) return if self.has_payload(): - if self.method == self.HEAD: + if self.method in (self.GET, self.HEAD): raise MalformedRequest() if mp_boundary := self._get_mp_boundary(self.headers): # Request body is multipart @@ -731,7 +745,7 @@ def _recv_chunk_size_st(self, rx): self.recv_chunk_size = int(bytes(rx.peek(blank_idx)), 16) if self.recv_chunk_size < 0: raise InvalidContentLength() - rx.consume(blank_idx + 2) + self._consume_payload(rx, blank_idx + 2) self.state = self._recv_chunk_st def _recv_chunk_st(self, rx): @@ -756,6 +770,8 @@ def _recv_payload_st(self, rx): def _app_endpoint_st(self, rx): """ Process a request by registered callback functions. + HEAD requests are temporarily mapped to GET for routing and callback execution, + but the response body is not sent back. """ method = self.GET if self.method == self.HEAD else self.method callback = self._get_callback(self.url, method) @@ -763,15 +779,17 @@ def _app_endpoint_st(self, rx): if self.is_chunked(): if self.recv_chunk_size: callback(self, bytes(rx.peek(self.recv_chunk_size))) - rx.consume(self.recv_chunk_size + 2) + self._consume_payload(rx, self.recv_chunk_size + 2) self.state = self._recv_chunk_size_st return + # Last chunk, callback with empty body to signal end of request body callback_response = callback(self, b"") - rx.consume(self.recv_chunk_size + 2) + self._consume_payload(rx, self.recv_chunk_size + 2, last=True) else: callback_response = callback( self, bytes(rx.peek(self.headers["content-length"])) ) + self._consume_payload(rx, self.headers["content-length"], last=True) else: callback_response = callback(self, b"") diff --git a/src/pyrobusta/protocol/http_file_server.py b/src/pyrobusta/protocol/http_file_server.py index c051e13..c8fa52e 100644 --- a/src/pyrobusta/protocol/http_file_server.py +++ b/src/pyrobusta/protocol/http_file_server.py @@ -126,9 +126,7 @@ def upload_file(http_ctx, payload: bytes): if not file_name_idx: http_ctx.terminate(400) return "text/plain", "Bad request" - file_path = normalize_path( - _TMP_DIR + "/" + f"{url_path[file_name_idx:]}.{http_ctx.id}" - ) + file_path = _TMP_DIR + "/" + f"{url_path[file_name_idx:]}.{http_ctx.id}" else: file_path = normalize_path(http_ctx.url.decode("ascii")[6:]) @@ -192,7 +190,7 @@ def bulk_upload_file(http_ctx, payload: tuple): remove(_TMP_DIR + "/" + file) # TODO: support X-Upload-Directory; pylint: disable=W0511 - target_path = normalize_path(_TMP_DIR + "/" + f"{filename}.{http_ctx.id}") + target_path = _TMP_DIR + "/" + f"{filename}.{http_ctx.id}" with open(target_path, "ab") as f: f.write(part_body) diff --git a/src/pyrobusta/protocol/http_multipart.py b/src/pyrobusta/protocol/http_multipart.py index 966c00e..0d04679 100644 --- a/src/pyrobusta/protocol/http_multipart.py +++ b/src/pyrobusta/protocol/http_multipart.py @@ -1,5 +1,12 @@ """ State machine extension for multipart parsing. + +This parser does not support chunked multipart requests, +and requires content-length header for multipart parsing. + +Requests with a preambule and epilogue are not supported, +and the parser expects the body to start with a boundary +delimiter. """ # pylint: disable=W0212,R0401 @@ -70,6 +77,8 @@ def _multipart_wrapper(tx): def _start_multipart_parser_st(self, rx): """ Initial state for processing multipart requests. + Chunked requests are not supported, and content-length + header is required for multipart parsing. """ if not "content-length" in self.headers: raise http.InvalidContentLength() @@ -79,8 +88,7 @@ def _start_multipart_parser_st(self, rx): self.mp_last_delimiter = b"--" + self.mp_boundary + b"--" if rx.peek(start_delimiter + 2) != self.mp_delimiter: raise http.MalformedRequest() - rx.consume(start_delimiter + 2) - self.content_len_cnt += start_delimiter + 2 + self._consume_payload(rx, start_delimiter + 2) self.state = self._parse_boundary_st @@ -88,11 +96,15 @@ def _parse_boundary_st(self, rx): """ State for parsing multipart boundary delimiter. """ - if ( - rx.find(b"\r\n" + self.mp_delimiter) == -1 - and rx.find(b"\r\n" + self.mp_last_delimiter) == -1 - ): + is_intermediate = rx.find(b"\r\n" + self.mp_delimiter) != -1 + is_last = rx.find(b"\r\n" + self.mp_last_delimiter) != -1 + + if not is_intermediate and not is_last: + return + + if is_last and self.content_len_cnt + rx.size() < self.headers["content-length"]: return + self.state = self._parse_complete_part_st @@ -103,16 +115,12 @@ def _parse_complete_part_st(self, rx): """ next_delimiter = rx.find(b"\r\n--" + self.mp_boundary) part = rx.peek(next_delimiter) - rx.consume(next_delimiter + 2) # Consume leading CRLF - self.content_len_cnt += next_delimiter + 2 + self._consume_payload(rx, next_delimiter + 2) # Consume leading CRLF is_final = ( rx.size() >= len(self.mp_last_delimiter) and rx.peek(len(self.mp_last_delimiter)) == self.mp_last_delimiter ) - # Validate part and content-length - if self.headers["content-length"] < self.content_len_cnt: - raise http.InvalidContentLength() part_headers, part_body = http.HttpEngine._parse_body_part(part) callback = http.HttpEngine._get_callback(self.url, self.method) @@ -121,20 +129,23 @@ def _parse_complete_part_st(self, rx): callback(self, (part_headers, part_body)) if rx.peek(len(self.mp_delimiter)) != self.mp_delimiter: raise http.MalformedRequest() - rx.consume(len(self.mp_delimiter)) - self.content_len_cnt += len(self.mp_delimiter) + self._consume_payload(rx, len(self.mp_delimiter)) self.mp_is_first = False self.state = self._parse_boundary_st return # Process last part - rx.consume(len(self.mp_last_delimiter)) - self.content_len_cnt += len(self.mp_last_delimiter) + self._consume_payload(rx, len(self.mp_last_delimiter)) + if ( - self.headers["content-length"] != self.content_len_cnt - and self.content_len_cnt + rx.size() < self.headers["content-length"] + self.content_len_cnt + 2 == self.headers["content-length"] + and rx.peek(2) == b"\r\n" ): - raise http.InvalidContentLength() + # Consume optional trailing CRLF + self._consume_payload(rx, 2, last=True) + else: + self._consume_payload(rx, 0, last=True) + self.mp_is_last = True dtype, data = callback(self, (part_headers, part_body)) diff --git a/tests/functional/test_http.py b/tests/functional/test_http.py index bcce7c4..3283d05 100644 --- a/tests/functional/test_http.py +++ b/tests/functional/test_http.py @@ -263,7 +263,7 @@ async def test_fs_access_control(): response_body = response.split(b"\r\n\r\n")[1] test_assert( - f"test FS access control - index page loaded", + f"FS access control - index page loaded", response_body, b"PyRobusta Home", ) @@ -276,7 +276,7 @@ async def test_fs_access_control(): ) test_assert( - f"test FS access control - index page rejected", + f"FS access control - index page rejected", response.startswith(b"HTTP/1.1 403 Forbidden"), True, ) @@ -286,68 +286,6 @@ async def test_fs_access_control(): await server.terminate() -@garbage_collect -async def test_fs_path_traversal(): - setup_config(served_paths="/test", files_api_enabled=True) - server, server_task = await start_server() - test_root = normalize_path("/test") - styles_dir = normalize_path("/test/style") - fmkdir(test_root) - fmkdir(styles_dir) - - index_html = normalize_path("/test/index.html") - styles_css = normalize_path("/test/style/styles.css") - - with open(index_html, "w") as f: - f.write("PyRobusta Home") - with open(styles_css, "w") as f: - f.write("/* This is the main stylesheet */") - - try: - # Test case - response = await send_request( - b"GET /files/test HTTP/1.1\r\n" - b"Connection: close\r\n" - b"Host: localhost\r\n\r\n" - ) - - # Decode chunked transfer encoding - response_body = response.split(b"\r\n\r\n")[1] - response_body_decoded = b"" - start = 0 - - while start < len(response_body): - cursor = response_body.index(b"\r\n", start) - chunk_size = int(response_body[start:cursor], 16) - if chunk_size == 0: - break - chunk_start = cursor + 2 - chunk_end = chunk_start + chunk_size - response_body_decoded += response_body[chunk_start:chunk_end] - start = chunk_end + 2 - - test_assert( - f"test FS path traversal - JSON chunks received", - json.loads(response_body_decoded), - [ - { - "path": index_html, - "created": str(stat(index_html)[9]), - "size": str(stat(index_html)[6]), - }, - { - "path": styles_css, - "created": str(stat(styles_css)[9]), - "size": str(stat(styles_css)[6]), - }, - ], - ) - finally: - delete_path(test_root) - server_task.cancel() - await server.terminate() - - @garbage_collect async def test_keepalive(): setup_config() @@ -472,7 +410,6 @@ def test_main(): asyncio.run(test_server_busy()) asyncio.run(test_chunked_transfer_encoding()) asyncio.run(test_fs_access_control()) - asyncio.run(test_fs_path_traversal()) asyncio.run(test_keepalive()) diff --git a/tests/functional/test_http_file_server.py b/tests/functional/test_http_file_server.py new file mode 100644 index 0000000..0f1bc2d --- /dev/null +++ b/tests/functional/test_http_file_server.py @@ -0,0 +1,374 @@ +import asyncio +import json +import json +import ssl +import gc + +from os import mkdir, listdir, remove, rmdir, stat + +from pyrobusta.server import http_server +from pyrobusta.protocol.http import ( + CONF_HTTP_SERVED_PATHS, + enable_optional_features, + stat, +) +from pyrobusta.utils.config import ( + CONF_TLS, + CONF_LOG_LEVEL, + CONF_HTTP_MULTIPART, + CONF_HTTP_FILES_API, + _CONFIG_CACHE, + normalize_path, + parse_config, +) + +################################################# +# Test helpers +################################################# + + +def garbage_collect(coroutine): + async def decorated(*args, **kwargs): + gc.collect() + await coroutine(*args, **kwargs) + gc.collect() + + return decorated + + +def test_assert(name, actual, expected): + print(f"Test {name}: ", end="") + if actual == expected: + print("OK") + else: + print("Fail") + raise AssertionError(f"{actual} != {expected}") + + +async def send_request(request, tls=False): + port = ( + http_server.HttpServer.LISTEN_PORT_HTTPS + if tls + else http_server.HttpServer.LISTEN_PORT_HTTP + ) + + ctx = None + if tls: + # Disable certificate verification due to self-signed cert + ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) + ctx.verify_mode = ssl.CERT_NONE + + reader, writer = await asyncio.open_connection("127.0.0.1", port, ssl=ctx) + writer.write(request) + await writer.drain() + + to_read = True + response = b"" + while to_read: + response_part = await reader.read(1024) + response += response_part + to_read = len(response_part) + writer.close() + return response + + +def multipart_response(num_responses): + i = 0 + + def response_generator(): + nonlocal i + i += 1 + if i > num_responses: + return None + return "text/plain", b"Response %s" % i + + return response_generator + + +def fmkdir(path: str): + try: + mkdir(path) + except OSError: + pass + + +def delete_path(path): + for name in listdir(path): + if path == "/": + full = "/" + name + else: + full = path + "/" + name + + try: + remove(full) + except OSError: + delete_path(full) + try: + rmdir(full) + except OSError: + pass + + +################################################# +# Test driver +################################################# + + +async def start_server(): + """ + Start an HTTP server as a background task. + """ + server = http_server.HttpServer() + server_task = asyncio.create_task(server.start_socket_server()) + await asyncio.sleep_ms(100) + return server, server_task + + +@garbage_collect +async def test_fs_path_traversal(): + setup_config(served_paths="/test") + server, server_task = await start_server() + test_root = normalize_path("/test") + styles_dir = normalize_path("/test/style") + fmkdir(test_root) + fmkdir(styles_dir) + + index_html = normalize_path("/test/index.html") + styles_css = normalize_path("/test/style/styles.css") + + with open(index_html, "w") as f: + f.write("PyRobusta Home") + with open(styles_css, "w") as f: + f.write("/* This is the main stylesheet */") + + try: + # Test case + response = await send_request( + b"GET /files/test HTTP/1.1\r\n" + b"Connection: close\r\n" + b"Host: localhost\r\n\r\n" + ) + + # Decode chunked transfer encoding + response_body = response.split(b"\r\n\r\n")[1] + response_body_decoded = b"" + start = 0 + + while start < len(response_body): + cursor = response_body.index(b"\r\n", start) + chunk_size = int(response_body[start:cursor], 16) + if chunk_size == 0: + break + chunk_start = cursor + 2 + chunk_end = chunk_start + chunk_size + response_body_decoded += response_body[chunk_start:chunk_end] + start = chunk_end + 2 + + test_assert( + f"FS path traversal - JSON chunks received", + json.loads(response_body_decoded), + [ + { + "path": index_html, + "created": str(stat(index_html)[9]), + "size": str(stat(index_html)[6]), + }, + { + "path": styles_css, + "created": str(stat(styles_css)[9]), + "size": str(stat(styles_css)[6]), + }, + ], + ) + finally: + delete_path(test_root) + server_task.cancel() + await server.terminate() + + +@garbage_collect +async def test_fs_access_control(): + setup_config(served_paths="/test/allowed") + server, server_task = await start_server() + + test_root = normalize_path("/test") + fmkdir(test_root) + + # Index page under /test/allowed -> accepted + allowed_workdir = normalize_path("/test/allowed") + allowed_index_html = normalize_path("/test/allowed/index.html") + fmkdir(allowed_workdir) + with open(allowed_index_html, "w") as f: + f.write("PyRobusta Home") + + # Index page under /test/rejected -> rejected + rejected_workdir = normalize_path("/test/rejected") + rejected_index_html = normalize_path("/test/rejected/index.html") + fmkdir(rejected_workdir) + with open(rejected_index_html, "w") as f: + f.write("PyRobusta Home") + + try: + # Case #1: /test/allowed/index.html + response = await send_request( + b"GET /files/test/allowed/index.html HTTP/1.1\r\n" + b"Connection: close\r\n" + b"Host: localhost\r\n\r\n" + ) + + response_body = response.split(b"\r\n\r\n")[1] + test_assert( + f"FS access control - index page loaded", + response_body, + b"PyRobusta Home", + ) + + # Case #2: /test/rejected/index.html + response = await send_request( + b"GET /files/test/rejected/index.html HTTP/1.1\r\n" + b"Connection: close\r\n" + b"Host: localhost\r\n\r\n" + ) + + test_assert( + f"FS access control - index page rejected", + response.startswith(b"HTTP/1.1 403 Forbidden"), + True, + ) + finally: + delete_path(test_root) + server_task.cancel() + await server.terminate() + + +@garbage_collect +async def test_bulk_file_upload(): + setup_config(http_multipart_enabled=True) + server, server_task = await start_server() + + user_data = normalize_path("/www/user_data") + tmp_dir = normalize_path("/tmp") + fmkdir(user_data) + fmkdir(tmp_dir) + + try: + data = ( + # Status line + headers + b"POST /files HTTP/1.1\r\nHost: localhost\r\n" + b"Connection:close\r\nUser-Agent: curl/8.5.0\r\nAccept: */*\r\nContent-Length: 384\r\n" + b"Content-Type: multipart/form-data; boundary=------------------------1ukf3aC3uDA7tUn2xudQXn\r\n\r\n" + # Body with 2 file parts + b"--------------------------1ukf3aC3uDA7tUn2xudQXn\r\n" + b'Content-Disposition: form-data; name="file1"; filename="upload-1.txt"\r\n' + b"Content-Type: text/plain\r\n\r\n" + b"File 1 content\n\r\n" + b"--------------------------1ukf3aC3uDA7tUn2xudQXn\r\n" + b'Content-Disposition: form-data; name="file2"; filename="upload-2.txt"\r\n' + b"Content-Type: text/plain\r\n\r\n" + b"File 2 content\n\r\n" + b"--------------------------1ukf3aC3uDA7tUn2xudQXn--\r\n" + ) + + response = await send_request(data) + test_assert( + "bulk file upload - response status is 201 Created", + response.startswith(b"HTTP/1.1 201 Created"), + True, + ) + + # Verify files were saved with correct content + with open(user_data + "/upload-1.txt", "rb") as f: + content = f.read() + test_assert( + "bulk file upload - file 1 content is correct", + content, + b"File 1 content\n", + ) + + with open(user_data + "/upload-2.txt", "rb") as f: + content = f.read() + test_assert( + "bulk file upload - file 2 content is correct", + content, + b"File 2 content\n", + ) + finally: + delete_path(user_data) + delete_path(tmp_dir) + server_task.cancel() + await server.terminate() + + +@garbage_collect +async def test_chunked_file_upload(): + setup_config() + server, server_task = await start_server() + + user_data = normalize_path("/www/user_data") + tmp_dir = normalize_path("/tmp") + fmkdir(user_data) + fmkdir(tmp_dir) + + try: + data = ( + # Status line + headers + b"PUT /files/www/user_data/upload-1.txt HTTP/1.1\r\nHost: localhost\r\n" + b"Connection:close\r\nUser-Agent: curl/8.5.0\r\nAccept: */*\r\nTransfer-Encoding: chunked\r\n" + b"Content-Type: application/octet-stream\r\n\r\n" + # Body with 1 file part sent in 2 chunks + b"16\r\n" + b"File 1 content part 1\n\r\n" + b"16\r\n" + b"File 1 content part 2\n\r\n" + b"0\r\n\r\n" + ) + + response = await send_request(data) + test_assert( + f"chunked file upload - response status is 201 Created", + response.startswith(b"HTTP/1.1 201 Created"), + True, + ) + + # Verify file was saved with correct content + with open(user_data + "/upload-1.txt", "rb") as f: + content = f.read() + test_assert( + "chunked file upload - file content is correct", + content, + b"File 1 content part 1\nFile 1 content part 2\n", + ) + finally: + delete_path(user_data) + delete_path(tmp_dir) + server_task.cancel() + await server.terminate() + + +################################################# +# Test methods +################################################# + + +def setup_config(tls_enabled=False, http_multipart_enabled=False, served_paths=""): + http_server.HttpServer.LISTEN_PORT_HTTP = 8080 + http_server.HttpServer.LISTEN_PORT_HTTPS = 4443 + + _CONFIG_CACHE[2 * CONF_LOG_LEVEL + 1] = "warning" + _CONFIG_CACHE[2 * CONF_TLS + 1] = tls_enabled + _CONFIG_CACHE[2 * CONF_HTTP_SERVED_PATHS + 1] = parse_config( + CONF_HTTP_SERVED_PATHS, served_paths + ) + _CONFIG_CACHE[2 * CONF_HTTP_MULTIPART + 1] = http_multipart_enabled + _CONFIG_CACHE[2 * CONF_HTTP_FILES_API + 1] = True + enable_optional_features() + + +def test_main(): + asyncio.run(test_fs_path_traversal()) + asyncio.run(test_fs_access_control()) + asyncio.run(test_bulk_file_upload()) + asyncio.run(test_chunked_file_upload()) + + +test_main() diff --git a/tests/functional/test_http_multipart.py b/tests/functional/test_http_multipart.py index e52f665..448a7ea 100644 --- a/tests/functional/test_http_multipart.py +++ b/tests/functional/test_http_multipart.py @@ -2,6 +2,8 @@ import ssl import gc +from os import mkdir, listdir, remove, rmdir + from pyrobusta.server import http_server from pyrobusta.protocol import http_multipart from pyrobusta.protocol.http import ( @@ -12,6 +14,7 @@ CONF_TLS, CONF_LOG_LEVEL, CONF_HTTP_MULTIPART, + CONF_HTTP_FILES_API, _CONFIG_CACHE, ) @@ -78,6 +81,30 @@ def response_generator(): return response_generator +def fmkdir(path: str): + try: + mkdir(path) + except OSError: + pass + + +def delete_path(path): + for name in listdir(path): + if path == "/": + full = "/" + name + else: + full = path + "/" + name + + try: + remove(full) + except OSError: + delete_path(full) + try: + rmdir(full) + except OSError: + pass + + ################################################# # Test driver ################################################# @@ -142,13 +169,14 @@ async def test_multipart_response(tls_enabled): ################################################# -def setup_config(tls_enabled=False): +def setup_config(tls_enabled=False, files_api_enabled=False): http_server.HttpServer.LISTEN_PORT_HTTP = 8080 http_server.HttpServer.LISTEN_PORT_HTTPS = 4443 _CONFIG_CACHE[2 * CONF_LOG_LEVEL + 1] = "warning" _CONFIG_CACHE[2 * CONF_TLS + 1] = tls_enabled _CONFIG_CACHE[2 * CONF_HTTP_MULTIPART + 1] = True + _CONFIG_CACHE[2 * CONF_HTTP_FILES_API + 1] = files_api_enabled enable_optional_features() diff --git a/tests/unit/test_http.py b/tests/unit/test_http.py index 8283b84..a02907c 100644 --- a/tests/unit/test_http.py +++ b/tests/unit/test_http.py @@ -487,6 +487,88 @@ def test_chunked_transfer_encoding_chunk_incomplete(self): self.assertEqual(self.engine.status_code, None) self.assertEqual(self.engine.state, self.engine._recv_chunk_st) + def test_payload_length_matches_content_length(self): + self.engine.url = b"/api/test" + self.engine.method = b"POST" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 11 + self.engine.state = self.engine._recv_payload_st + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback, "POST") + + payload = b"hello world" + for i in range(len(payload)): + self.rx.write(payload[i : i + 1]) + self.engine.state(self.rx) + + while self.engine.state is not None: + self.engine.state(self.rx) + + self.assertEqual(self.engine.status_code, 200) + self.assertEqual(self.engine.state, None) + test_callback.assert_called_with(self.engine, payload) + + def test_payload_length_exceeds_content_length(self): + """ + Test if the engine correctly reads the payload until content-length + and ignores remaining data. The remaining data should not cause an + error since the parser should be able to read it in a subsequent request + if the connection is kept alive. + """ + + self.engine.url = b"/api/test" + self.engine.method = b"POST" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 11 + self.engine.state = self.engine._recv_payload_st + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback, "POST") + + payload = b"hello world!" + for i in range(len(payload)): + self.rx.write(payload[i : i + 1]) + self.engine.state(self.rx) + if self.engine.state is None: + break + + while self.engine.state is not None: + self.engine.state(self.rx) + + self.assertEqual(self.engine.status_code, 200) + self.assertEqual(self.engine.state, None) + test_callback.assert_called_with(self.engine, b"hello world") + self.assertEqual( + self.rx.peek(), b"!" + ) # Remaining data after content-length is ignored + + def test_payload_length_less_than_content_length(self): + """ + Test if the engine correctly waits for the full payload when + content-length is not yet satisfied. + """ + self.engine.url = b"/api/test" + self.engine.method = b"POST" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 11 + self.engine.state = self.engine._recv_payload_st + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback, "POST") + + payload = b"hello" + for i in range(len(payload)): + self.rx.write(payload[i : i + 1]) + self.engine.state(self.rx) + if self.engine.state is None: + break + + self.engine.state(self.rx) + + self.assertEqual(self.engine.status_code, None) + self.assertEqual(self.engine.state, self.engine._recv_payload_st) + class TestFileServingStateMachine(TestHttpBase): """ diff --git a/tests/unit/test_http_file_server.py b/tests/unit/test_http_file_server.py index a6402ff..2bfb210 100644 --- a/tests/unit/test_http_file_server.py +++ b/tests/unit/test_http_file_server.py @@ -525,8 +525,7 @@ def test_file_serving_multiple_file_chunked_upload(self, *_): self.engine.url = b"/files" self.engine.method = b"POST" self.engine.version = b"HTTP/1.1" - - self.engine.headers["content-length"] = 548 + self.engine.headers["content-length"] = 565 self.engine.headers["content-type"] = "multipart/form-data" self.engine.mp_boundary = b"test-boundary" diff --git a/tests/unit/test_http_multipart.py b/tests/unit/test_http_multipart.py index 2108f12..740c488 100644 --- a/tests/unit/test_http_multipart.py +++ b/tests/unit/test_http_multipart.py @@ -165,6 +165,176 @@ def test_multipart_receiver_last_part(self): self.assertEqual(self.engine.mp_is_first, True) self.assertEqual(self.engine.mp_is_last, True) + def test_multipart_content_length_match(self): + self.engine.state = self.engine._start_multipart_parser_st + self.engine.url = b"/api/test" + self.engine.method = b"GET" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 148 + self.engine.mp_boundary = b"test-boundary" + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback) + + body_part = ( + b"--test-boundary\r\n" + b'Content-Disposition:form-data;name="file-chunk";filename="upload.txt"\r\n' + b"Content-Type:text/plain\r\n\r\n" + b"Upload content\r\n" + b"--test-boundary--" + ) + + for i in range(len(body_part)): + self.rx.write(body_part[i : i + 1]) + self.engine.state(self.rx) + + while self.engine.state is not None: + self.engine.state(self.rx) + + self.assertEqual(self.engine.status_code, 200) + test_callback.assert_called_once_with( + self.engine, + ( + { + "content-disposition": 'form-data;name="file-chunk";filename="upload.txt"', + "content-type": "text/plain", + }, + b"Upload content", + ), + ) + + def test_multipart_content_length_smaller(self): + """ + Test if the engine correctly raises an error when content-length is + smaller than actual payload length. + """ + self.engine.state = self.engine._start_multipart_parser_st + self.engine.url = b"/api/test" + self.engine.method = b"GET" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 148 - 1 + self.engine.mp_boundary = b"test-boundary" + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback) + + body_part = ( + b"--test-boundary\r\n" + b'Content-Disposition:form-data;name="file-chunk";filename="upload.txt"\r\n' + b"Content-Type:text/plain\r\n\r\n" + b"Upload content\r\n" + b"--test-boundary--" + ) + + for i in range(len(body_part)): + self.rx.write(body_part[i : i + 1]) + self.engine.state(self.rx) + + with self.assertRaises(self.http_module.InvalidContentLength): + while self.engine.state is not None: + self.engine.state(self.rx) + + def test_multipart_content_length_larger(self): + """ + Test if the engine correctly waits for remaining data when content-length is larger + than actual payload length. + """ + self.engine.state = self.engine._start_multipart_parser_st + self.engine.url = b"/api/test" + self.engine.method = b"GET" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 148 + 1 + self.engine.mp_boundary = b"test-boundary" + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback) + + body_part = ( + b"--test-boundary\r\n" + b'Content-Disposition:form-data;name="file-chunk";filename="upload.txt"\r\n' + b"Content-Type:text/plain\r\n\r\n" + b"Upload content\r\n" + b"--test-boundary--" + ) + + for i in range(len(body_part)): + self.rx.write(body_part[i : i + 1]) + self.engine.state(self.rx) + + self.engine.state(self.rx) + + self.assertEqual(self.engine.state, self.engine._parse_boundary_st) + self.assertEqual(self.engine.status_code, None) + + def test_multipart_epilogue_data(self): + """ + Test if the engine correctly raises an error when epilogue data + is present after the last boundary delimiter. + """ + self.engine.state = self.engine._start_multipart_parser_st + self.engine.url = b"/api/test" + self.engine.method = b"GET" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 148 + 13 + self.engine.mp_boundary = b"test-boundary" + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback) + + body_part = ( + b"--test-boundary\r\n" + b'Content-Disposition:form-data;name="file-chunk";filename="upload.txt"\r\n' + b"Content-Type:text/plain\r\n\r\n" + b"Upload content\r\n" + b"--test-boundary--epilogue-data" + ) + + for i in range(len(body_part)): + self.rx.write(body_part[i : i + 1]) + self.engine.state(self.rx) + + with self.assertRaises(self.http_module.InvalidContentLength): + while self.engine.state is not None: + self.engine.state(self.rx) + + def test_multipart_complete_part_trailing_crlf(self): + self.engine.state = self.engine._start_multipart_parser_st + self.engine.url = b"/api/test" + self.engine.method = b"GET" + self.engine.version = b"HTTP/1.1" + self.engine.headers["content-length"] = 150 + self.engine.mp_boundary = b"test-boundary" + + test_callback = mock.Mock(return_value=("text/plain", "OK")) + self.engine.register("/api/test", test_callback) + + body_part = ( + b"--test-boundary\r\n" + b'Content-Disposition:form-data;name="file-chunk";filename="upload.txt"\r\n' + b"Content-Type:text/plain\r\n\r\n" + b"Upload content\r\n" + b"--test-boundary--\r\n" + ) + + for i in range(len(body_part)): + self.rx.write(body_part[i : i + 1]) + self.engine.state(self.rx) + + while self.engine.state is not None: + self.engine.state(self.rx) + + self.assertEqual(self.engine.status_code, 200) + test_callback.assert_called_once_with( + self.engine, + ( + { + "content-disposition": 'form-data;name="file-chunk";filename="upload.txt"', + "content-type": "text/plain", + }, + b"Upload content", + ), + ) + if __name__ == "__main__": unittest.main(verbosity=2)