Skip to content

Conversation

@stanislav-shchetinin
Copy link
Collaborator

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

...

Copilot AI review requested due to automatic review settings December 5, 2025 19:37
@stanislav-shchetinin stanislav-shchetinin requested review from a team as code owners December 5, 2025 19:37
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

🔴 Unable to merge your PR into the base branch. Please rebase or merge it with the base branch.

@stanislav-shchetinin stanislav-shchetinin linked an issue Dec 5, 2025 that may be closed by this pull request
@ydbot
Copy link
Collaborator

ydbot commented Dec 5, 2025

Run Extra Tests

Run additional tests for this PR. You can customize:

  • Test Size: small, medium, large (default: all)
  • Test Targets: any directory path (default: ydb/)
  • Sanitizers: ASAN, MSAN, TSAN
  • Coredumps: enable for debugging (default: off)
  • Additional args: custom ya make arguments

▶  Run tests

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

🟢 2025-12-05 19:38:37 UTC The validation of the Pull Request description is successful.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for exporting and importing YDB table data to/from the local filesystem, mirroring the existing S3 export/import functionality but for local storage instead of object storage.

Key Changes:

  • New FS export/import infrastructure with filesystem-based data transfer
  • Test suites for validating FS export and import operations
  • Protocol buffer definitions for FS settings
  • Integration with existing backup/restore framework

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
ydb/core/tx/schemeshard/ut_restore/ut_restore_fs.cpp Test suite for FS import functionality with various scenarios
ydb/core/tx/schemeshard/ut_export/ut_export_fs.cpp Test suite for FS export functionality including file validation
ydb/core/tx/schemeshard/schemeshard_info_types.h Extended TImportInfo/TExportInfo to support FS kind with variant-based settings
ydb/core/tx/schemeshard/schemeshard_info_types_helper.h Helper templates for working with different import/export settings types
ydb/core/tx/schemeshard/schemeshard_import_getters.cpp Added TSchemeGetterFS for reading schema from filesystem during import
ydb/core/tx/schemeshard/schemeshard_import__create.cpp Import creation logic for FS with validation and item filling
ydb/core/tx/schemeshard/schemeshard_import_flow_proposals.cpp Import flow proposal handling for FS settings
ydb/core/tx/datashard/export_fs_uploader.cpp Core FS export uploader that writes metadata, schema, permissions, and data files
ydb/core/tx/datashard/export_fs.h FS export interface definition
ydb/core/tx/datashard/restore_unit.cpp Added FS restore case (currently stub implementation)
ydb/core/tx/datashard/backup_unit.cpp Integrated FS export into backup unit
ydb/core/protos/fs_settings.proto Protocol buffer definition for FS settings (BasePath and Path)
ydb/core/protos/flat_scheme_op.proto Added TFSSettings to backup and restore task settings
Export factory files Updated all export factory implementations to support CreateExportToFs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +52 to +53
op->SetAsyncJobResult(new TImportJobProduct(true, TString(), 0, 0));
break;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FS restore (import) implementation is a stub that immediately returns success without actually importing any data. It sets op->SetAsyncJobResult(new TImportJobProduct(true, TString(), 0, 0)) which indicates success with 0 bytes and 0 rows imported.

This means:

  1. FS import will appear to succeed but won't actually restore any data
  2. The table will be created with the correct schema (from TSchemeGetterFS) but will be empty
  3. This inconsistency could lead to silent data loss if users believe their import succeeded

Either:

  • Implement the actual FS import functionality (similar to S3's CreateS3Downloader)
  • Or add a clear error message indicating that FS data import is not yet implemented
  • Or mark the entire FS import feature as experimental/incomplete in documentation
Suggested change
op->SetAsyncJobResult(new TImportJobProduct(true, TString(), 0, 0));
break;
Abort(op, ctx, "FS data import is not implemented yet");
return false;

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
optional string BasePath = 1; // Base path on the file system (e.g., /mnt/exports)
optional string Path = 2; // Relative path for this specific backup/restore operation
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protobuf fields BasePath and Path use optional string which allows them to be empty. However, operations that use these settings expect non-empty values. Consider:

  1. Making these fields required instead of optional (though this is deprecated in proto3)
  2. Or adding validation in the code that uses these settings to reject empty paths
  3. Adding comments to document that these fields must not be empty
Suggested change
optional string BasePath = 1; // Base path on the file system (e.g., /mnt/exports)
optional string Path = 2; // Relative path for this specific backup/restore operation
// Base path on the file system (e.g., /mnt/exports). This field must not be empty.
optional string BasePath = 1;
// Relative path for this specific backup/restore operation. This field must not be empty.
optional string Path = 2;

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +114
bool WriteFile(const TString& path, const TStringBuf& data, TString& error, bool isAppend = false) {
try {
TFsPath fsPath(path);
fsPath.Parent().MkDirs();

auto flags = CreateAlways | WrOnly;
if (isAppend) {
flags = OpenAlways | WrOnly | ForAppend;
}
TFile file(path, flags);
file.Flock(LOCK_EX);
file.Write(data.data(), data.size());
file.Close();

EXPORT_LOG_D("WriteFile succeeded"
<< ": self# " << SelfId()
<< ", path# " << path
<< ", size# " << data.size());

return true;
} catch (const std::exception& ex) {
error = TStringBuilder() << "Failed to write file " << path << ": " << ex.what();
return false;
}
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WriteFile method uses file.Flock(LOCK_EX) for exclusive locking, but the file lock is released when the TFile object is destroyed. If multiple shards try to write to the same file path simultaneously (e.g., during a multi-shard export), there could be a race condition. The lock is released after file.Close(), but before the function returns, which means concurrent writes could still potentially corrupt the file.

Consider either:

  1. Validating that different shards write to different paths
  2. Using a more robust file locking mechanism that persists across the write operation
  3. Adding explicit documentation about the concurrency expectations

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +150
bool WriteMessageWithChecksum(const google::protobuf::Message& message, const TString& path, TString& error) {
TString data;
google::protobuf::TextFormat::PrintToString(message, &data);
return WriteFileWithChecksum(path, data, error);
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protobuf serialization uses TextFormat::PrintToString() which produces human-readable text format. However, the scheme.pb file extension suggests a binary protobuf format. This inconsistency could cause confusion. Either:

  1. Use SerializeToString() for binary format (matching the .pb extension)
  2. Or change the file extension to .pbtxt to indicate text format

This is inconsistent with how the import side reads the scheme file (line 1454 in schemeshard_import_getters.cpp uses ParseFromString() which expects binary format).

Copilot uses AI. Check for mistakes.

bool WriteMessage(const google::protobuf::Message& message, const TString& path, TString& error) {
TString data;
google::protobuf::TextFormat::PrintToString(message, &data);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WriteMessage method also uses TextFormat::PrintToString() for writing protobuf messages. This should use binary serialization (SerializeToString()) to match the reading side (ParseFromString() used in ProcessPermissions at line 1466 in schemeshard_import_getters.cpp). This affects permissions.pb files.

Suggested change
google::protobuf::TextFormat::PrintToString(message, &data);
if (!message.SerializeToString(&data)) {
error = TStringBuilder() << "Failed to serialize protobuf message to binary format for file " << path;
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +1492 to +1545
void Bootstrap() {
const auto settings = ImportInfo->GetFsSettings();
const TString basePath = settings.base_path();

Y_ABORT_UNLESS(ItemIdx < ImportInfo->Items.size());
auto& item = ImportInfo->Items[ItemIdx];

TString sourcePath = item.SrcPath;
if (sourcePath.empty()) {
Reply(false, "Source path is empty for import item");
return;
}

const TFsPath itemPath = TFsPath(basePath) / sourcePath;
TString error;

const TString metadataPath = itemPath / "metadata.json";
TString metadataContent;

if (!TFSHelper::ReadFile(metadataPath, metadataContent, error)) {
Reply(false, error);
return;
}

if (!ProcessMetadata(metadataContent, error)) {
Reply(false, error);
return;
}

const TString schemeFileName = NYdb::NDump::NFiles::TableScheme().FileName;
const TString schemePath = itemPath / schemeFileName;
TString schemeContent;

if (!TFSHelper::ReadFile(schemePath, schemeContent, error)) {
Reply(false, error);
return;
}

if (!ProcessScheme(schemeContent, error)) {
Reply(false, error);
return;
}

if (!ImportInfo->GetNoAcl()) {
const TString permissionsPath = itemPath / "permissions.pb";
TString permissionsContent;

if (TFSHelper::ReadFile(permissionsPath, permissionsContent, error)) {
ProcessPermissions(permissionsContent);
}
}

Reply(true);
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TSchemeGetterFS::Bootstrap() method performs synchronous file I/O operations in the actor's bootstrap method. This can block the actor system thread, especially for large files or slow filesystems. Consider:

  1. Using asynchronous I/O operations
  2. Running the file reading in a separate thread pool (using ctx.Register(..., TMailboxType::Simple, AppData()->IOPoolId) like done for the actor creation on line 472 of schemeshard_import__create.cpp)
  3. Or at minimum, adding a comment explaining why synchronous I/O is acceptable here

The actor is already registered with AppData()->IOPoolId (line 472 in schemeshard_import__create.cpp), so it won't block the main actor system, but this should be documented.

Copilot uses AI. Check for mistakes.
Comment on lines +354 to +355
}

Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source_path from user input is normalized using NBackup::NormalizeItemPath() but there's no validation to prevent path traversal attacks (e.g., ../../../etc/passwd). The path is later concatenated with base_path in TSchemeGetterFS::Bootstrap() (line 1505) using TFsPath(basePath) / sourcePath, which could allow reading arbitrary files on the filesystem if sourcePath contains .. components.

Consider adding validation to:

  1. Reject paths containing .. components
  2. Ensure the normalized path doesn't escape the base_path
  3. Use TFsPath::RealPath() or similar to resolve and validate the final path is within the allowed directory
Suggested change
}
// Path traversal validation: reject if normalized path contains ".." or is absolute
TVector<TString> pathComponents;
Split(item.SrcPath, "/", pathComponents);
for (const auto& comp : pathComponents) {
if (comp == "..") {
explain = TStringBuilder() << "source_path '" << srcPath << "' is invalid: path traversal is not allowed";
return false;
}
}
if (item.SrcPath.StartsWith("/")) {
explain = TStringBuilder() << "source_path '" << srcPath << "' is invalid: absolute paths are not allowed";
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +52
}

static TFsSettings FromBackupTask(const NKikimrSchemeOp::TBackupTask& task) {
return TFsSettings(task.GetFSSettings(), task.GetShardNum());
}

TString GetFullPath() const {
return TFsPath(BasePath) / RelativePath;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The export destination path (RelativePath from settings.GetPath()) is concatenated with BasePath using TFsPath(BasePath) / RelativePath without validation. If RelativePath contains .. components, it could allow writing files outside the intended BasePath directory, potentially overwriting system files or accessing restricted areas.

Similar to the import vulnerability, consider adding validation to:

  1. Reject paths containing .. components
  2. Ensure the final path doesn't escape the base_path
  3. Validate the resolved path is within the allowed directory before any write operations
Suggested change
}
static TFsSettings FromBackupTask(const NKikimrSchemeOp::TBackupTask& task) {
return TFsSettings(task.GetFSSettings(), task.GetShardNum());
}
TString GetFullPath() const {
return TFsPath(BasePath) / RelativePath;
ValidateRelativePath(RelativePath);
}
static void ValidateRelativePath(const TString& relPath) {
TFsPath path(relPath);
if (path.IsAbsolute()) {
ythrow yexception() << "RelativePath must not be absolute: " << relPath;
}
for (const auto& part : path) {
if (part == "..") {
ythrow yexception() << "RelativePath must not contain '..' components: " << relPath;
}
}
}
static TFsSettings FromBackupTask(const NKikimrSchemeOp::TBackupTask& task) {
return TFsSettings(task.GetFSSettings(), task.GetShardNum());
}
TString GetFullPath() const {
// Join and normalize the path, then check containment
TFsPath base(BasePath);
TFsPath full = base / RelativePath;
full.Fix();
if (!full.IsSubpathOf(base)) {
ythrow yexception() << "Export path escapes base directory: " << full.GetPath() << " (base: " << base.GetPath() << ")";
}
return full.GetPath();

Copilot uses AI. Check for mistakes.
execution_unit_ctors.h
execution_unit_kind.h
export_common.cpp
export_fs_uploader.cpp
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding export_fs_uploader.cpp to the DataShard build enables the FS export implementation, which currently constructs file paths from user-controllable ExportToFsSettings.base_path and items[].destination_path and then writes files directly to those paths. Because the uploader joins BasePath and RelativePath using TFsPath(BasePath) / RelativePath without rejecting .. segments or otherwise enforcing that the final path stays under a configured export directory, an attacker who can start an FS export can choose values like base_path: "/mnt/exports" and destination_path: "../..//etc/ydb.conf" to cause the DataShard process to create or overwrite arbitrary files on the host filesystem. To fix this, harden the FS exporter so that after joining paths it verifies TFsPath(fullPath).IsSubpathOf(TFsPath(basePath)) (or equivalent) and rejects any RelativePath containing .. or being absolute, preventing exports from escaping a safe root directory.

Suggested change
export_fs_uploader.cpp
# export_fs_uploader.cpp

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement NFS Export in DataShard

2 participants