Skip to content

Persistance#12

Open
SuperALKALINEdroiD wants to merge 15 commits intodevelopfrom
persistance
Open

Persistance#12
SuperALKALINEdroiD wants to merge 15 commits intodevelopfrom
persistance

Conversation

@SuperALKALINEdroiD
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @SuperALKALINEdroiD, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the database's foundational structure by introducing robust persistence capabilities. It establishes a new manifest system for managing SSTables, refactors configuration loading with dynamic generation and validation, and lays the groundwork for flushing in-memory data to disk. Furthermore, it adds a GET endpoint for data retrieval and updates internal node logic to intelligently handle in-memory storage thresholds and data searching. These changes collectively move the database closer to a fully persistent and reliable state.

Highlights

  • Enhanced Configuration Management: The database now dynamically generates a default configuration if one isn't found, and includes robust validation for all settings. The Write-Ahead Log (WAL) path is now configurable and relative to the application's runtime files, improving deployment flexibility.
  • Introduction of Manifest System: A new manifest package has been introduced to track SSTable metadata, including file paths and associated Bloom filter files. This lays the essential groundwork for implementing efficient disk-based storage and retrieval mechanisms.
  • In-Memory Data Management and Flushing: Internal node logic has been updated to monitor the size of in-memory data. When a configurable threshold is reached, the system will automatically trigger a flush of this data to persistent storage, significantly improving data durability and recovery capabilities.
  • New Data Retrieval Endpoint: A new GET / endpoint has been implemented, allowing clients to query data by key. This endpoint leverages the internal node's in-memory store for quick lookups, marking a step towards full CRUD functionality.
  • Persistence Utilities (Bloom Filter): A BloomFilter implementation has been added to the utils/persistance package. This probabilistic data structure will be crucial for efficient key lookups in disk-based SSTables, helping to reduce unnecessary disk I/O operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces persistence-related features, including configuration validation, manifest file handling, and the groundwork for flushing in-memory data to disk. While these are great additions, there are several critical issues that need to be addressed. These include incorrect algorithm usage (sort.Search), potential panics in HTTP handlers and gRPC methods from unhandled errors and nil dereferences, and flawed logic for memtable flushing which could lead to data loss. I've left detailed comments on these points with suggestions for fixes.

Comment on lines +31 to +39
value := r.URL.Query().Get("value")

grpcNode, hashError := appConfig.NodeHashInfo.GetNode(key)

if hashError != nil {
panic("Unable to get a node to store data")
}

logs.AddWalEntry(appConfig.WAL, key, value, grpcNode)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This GetValue handler has side effects, which is a violation of REST principles for GET requests. It takes a value from the query parameters and writes a WAL entry. GET requests should be idempotent and should not modify server state. This logic seems to be incorrectly copied from an insert/update handler and should be removed.

Comment on lines 50 to 59
searhResult, _ := server.memTable.Get(request.Key)

return &NodeResponse{
Status: Status_OK,
Timestamp: time.Now().String(),
Node: request.Node,
Key: request.Key,
Value: "DummyValue", // Dummy response
ErrorMessage: "",
Value: searhResult.(string),
ErrorMessage: "Not Found",
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This method has a critical bug. server.memTable.Get returns two values: the result and a boolean indicating if the key was found. The boolean is ignored here (using _). If the key is not found, searhResult will be nil. The subsequent type assertion searhResult.(string) will cause a panic.

You should check the boolean and handle the 'not found' case gracefully, for example by returning an error or a specific response. Additionally, the ErrorMessage is hardcoded to "Not Found" even when a value is successfully retrieved.

searhResult, found := server.memTable.Get(request.Key)
	if !found {
		return &NodeResponse{
			Status:       Status_ERROR,
			Timestamp:    time.Now().String(),
			Node:         request.Node,
			Key:          request.Key,
			ErrorMessage: "Not Found",
		}, nil
	}

	return &NodeResponse{
		Status:       Status_OK,
		Timestamp:    time.Now().String(),
		Node:         request.Node,
		Key:          request.Key,
		Value:        searhResult.(string),
		ErrorMessage: "",
	}, nil


logs.AddWalEntry(appConfig.WAL, key, value, grpcNode)

destinationNodeIndex := sort.Search(len(appConfig.Nodes), func(i int) bool { return appConfig.Nodes[i].ID == grpcNode }) % len(appConfig.Nodes)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The use of sort.Search here is incorrect. sort.Search requires a sorted slice and a predicate that returns false for the first part of the slice and true for the rest. The provided predicate func(i int) bool { return appConfig.Nodes[i].ID == grpcNode } does not meet this requirement and will not work correctly for finding the node index.

Additionally, comparing node IDs as strings can lead to incorrect sorting order (e.g., "10" < "2").

A simple linear scan would be more appropriate and correct here.

Suggested change
destinationNodeIndex := sort.Search(len(appConfig.Nodes), func(i int) bool { return appConfig.Nodes[i].ID == grpcNode }) % len(appConfig.Nodes)
destinationNodeIndex := -1
for i, node := range appConfig.Nodes {
if node.ID == grpcNode {
destinationNodeIndex = i
break
}
}
if destinationNodeIndex == -1 {
http.Error(w, "Could not find destination node", http.StatusInternalServerError)
return
}

Comment on lines +141 to +153
func (server *internalNode) flushMemTableToMemory() {
log.Println("Starting memory flush to persistant storage")
defer log.Println("Completed Memory write, Continuing normal operations")

keysInTable := server.memTable.Keys()

for index, value := range keysInTable {
fmt.Println(index, value.(string))
}

defer server.memTable.Clear()

}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function has several issues that could lead to data loss:

  1. It iterates over keysInTable but doesn't retrieve the corresponding values from the memtable. The values are needed to write the data to persistent storage.
  2. defer server.memTable.Clear() is risky. If writing to storage fails partway through, the defer will still execute, clearing the memtable and causing data loss. The memtable should only be cleared after all its contents have been successfully written to disk.
  3. The actual logic to persist data is missing, it only prints to the console.

Comment on lines +132 to +139
func (server *internalNode) shouldFlushToMemory() bool {
nodeCount := server.memTable.Size()
var sampleNode redblacktree.Node
nodeSize := int64(unsafe.Sizeof(sampleNode))
totalSize := int64(nodeCount) * nodeSize

return totalSize > server.dbConfig.InMemoryStorageThreshold
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculation of the memtable's memory size is incorrect. unsafe.Sizeof(sampleNode) only returns the size of the redblacktree.Node struct itself (which contains pointers and other small fields), not the size of the actual key and value data that the pointers point to. This will drastically underestimate the memory usage, and the flush to disk will likely not be triggered when intended.

A more accurate approach would be to track the total size of keys and values as they are added to and removed from the memtable.

return nil, fmt.Errorf("failed to save config file, error on encoder: %v", err)
manifest, err := manifest.GetManifest()
if err != nil {
return nil, fmt.Errorf("failed to parse config file: %v", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message here is misleading. The error is returned from manifest.GetManifest(), not from parsing a config file. A more accurate error message would be "failed to get manifest".

Suggested change
return nil, fmt.Errorf("failed to parse config file: %v", err)
return nil, fmt.Errorf("failed to get manifest: %v", err)

Comment on lines +18 to +20
func findDestinationNode() {

}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function findDestinationNode is empty and unused. It should be removed to avoid dead code in the repository.

Comment on lines +29 to +33
func GetAppPath() string {
path := common.GetAppPath()

return path
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function GetAppPath is a wrapper around common.GetAppPath but it's not used anywhere in the cmd package. The main function calls common.GetAppPath directly. This function appears to be dead code and should be removed to avoid confusion.

data, err := json.Marshal(configData)
if err != nil {
return nil, fmt.Errorf("failed to create config file: %v", err)
fmt.Println("Error marshalling example config:", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The codebase predominantly uses the log package for logging. Using fmt.Println here is inconsistent. It's better to use log.Printf or log.Println for consistency and to ensure logs are routed to the standard logger output (usually stderr).

Suggested change
fmt.Println("Error marshalling example config:", err)
log.Println("Error marshalling example config:", err)

if config.validateConfig() {
manifest, err := manifest.GetManifest()
if err != nil {
return nil, fmt.Errorf("failed to parse config file: %v", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message "failed to parse config file" is misleading here. The error comes from manifest.GetManifest(), which can fail for reasons other than parsing, such as file I/O errors when creating the manifest. The error message should more accurately reflect the source of the error, for example: "failed to get manifest".

Suggested change
return nil, fmt.Errorf("failed to parse config file: %v", err)
return nil, fmt.Errorf("failed to get manifest: %v", err)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant