Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @SuperALKALINEdroiD, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the database's foundational structure by introducing robust persistence capabilities. It establishes a new manifest system for managing SSTables, refactors configuration loading with dynamic generation and validation, and lays the groundwork for flushing in-memory data to disk. Furthermore, it adds a GET endpoint for data retrieval and updates internal node logic to intelligently handle in-memory storage thresholds and data searching. These changes collectively move the database closer to a fully persistent and reliable state.
Highlights
- Enhanced Configuration Management: The database now dynamically generates a default configuration if one isn't found, and includes robust validation for all settings. The Write-Ahead Log (WAL) path is now configurable and relative to the application's runtime files, improving deployment flexibility.
- Introduction of Manifest System: A new
manifestpackage has been introduced to track SSTable metadata, including file paths and associated Bloom filter files. This lays the essential groundwork for implementing efficient disk-based storage and retrieval mechanisms. - In-Memory Data Management and Flushing: Internal node logic has been updated to monitor the size of in-memory data. When a configurable threshold is reached, the system will automatically trigger a flush of this data to persistent storage, significantly improving data durability and recovery capabilities.
- New Data Retrieval Endpoint: A new
GET /endpoint has been implemented, allowing clients to query data by key. This endpoint leverages the internal node's in-memory store for quick lookups, marking a step towards full CRUD functionality. - Persistence Utilities (Bloom Filter): A
BloomFilterimplementation has been added to theutils/persistancepackage. This probabilistic data structure will be crucial for efficient key lookups in disk-based SSTables, helping to reduce unnecessary disk I/O operations.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces persistence-related features, including configuration validation, manifest file handling, and the groundwork for flushing in-memory data to disk. While these are great additions, there are several critical issues that need to be addressed. These include incorrect algorithm usage (sort.Search), potential panics in HTTP handlers and gRPC methods from unhandled errors and nil dereferences, and flawed logic for memtable flushing which could lead to data loss. I've left detailed comments on these points with suggestions for fixes.
| value := r.URL.Query().Get("value") | ||
|
|
||
| grpcNode, hashError := appConfig.NodeHashInfo.GetNode(key) | ||
|
|
||
| if hashError != nil { | ||
| panic("Unable to get a node to store data") | ||
| } | ||
|
|
||
| logs.AddWalEntry(appConfig.WAL, key, value, grpcNode) |
There was a problem hiding this comment.
This GetValue handler has side effects, which is a violation of REST principles for GET requests. It takes a value from the query parameters and writes a WAL entry. GET requests should be idempotent and should not modify server state. This logic seems to be incorrectly copied from an insert/update handler and should be removed.
utils/nodes/server.go
Outdated
| searhResult, _ := server.memTable.Get(request.Key) | ||
|
|
||
| return &NodeResponse{ | ||
| Status: Status_OK, | ||
| Timestamp: time.Now().String(), | ||
| Node: request.Node, | ||
| Key: request.Key, | ||
| Value: "DummyValue", // Dummy response | ||
| ErrorMessage: "", | ||
| Value: searhResult.(string), | ||
| ErrorMessage: "Not Found", | ||
| }, nil |
There was a problem hiding this comment.
This method has a critical bug. server.memTable.Get returns two values: the result and a boolean indicating if the key was found. The boolean is ignored here (using _). If the key is not found, searhResult will be nil. The subsequent type assertion searhResult.(string) will cause a panic.
You should check the boolean and handle the 'not found' case gracefully, for example by returning an error or a specific response. Additionally, the ErrorMessage is hardcoded to "Not Found" even when a value is successfully retrieved.
searhResult, found := server.memTable.Get(request.Key)
if !found {
return &NodeResponse{
Status: Status_ERROR,
Timestamp: time.Now().String(),
Node: request.Node,
Key: request.Key,
ErrorMessage: "Not Found",
}, nil
}
return &NodeResponse{
Status: Status_OK,
Timestamp: time.Now().String(),
Node: request.Node,
Key: request.Key,
Value: searhResult.(string),
ErrorMessage: "",
}, nil|
|
||
| logs.AddWalEntry(appConfig.WAL, key, value, grpcNode) | ||
|
|
||
| destinationNodeIndex := sort.Search(len(appConfig.Nodes), func(i int) bool { return appConfig.Nodes[i].ID == grpcNode }) % len(appConfig.Nodes) |
There was a problem hiding this comment.
The use of sort.Search here is incorrect. sort.Search requires a sorted slice and a predicate that returns false for the first part of the slice and true for the rest. The provided predicate func(i int) bool { return appConfig.Nodes[i].ID == grpcNode } does not meet this requirement and will not work correctly for finding the node index.
Additionally, comparing node IDs as strings can lead to incorrect sorting order (e.g., "10" < "2").
A simple linear scan would be more appropriate and correct here.
| destinationNodeIndex := sort.Search(len(appConfig.Nodes), func(i int) bool { return appConfig.Nodes[i].ID == grpcNode }) % len(appConfig.Nodes) | |
| destinationNodeIndex := -1 | |
| for i, node := range appConfig.Nodes { | |
| if node.ID == grpcNode { | |
| destinationNodeIndex = i | |
| break | |
| } | |
| } | |
| if destinationNodeIndex == -1 { | |
| http.Error(w, "Could not find destination node", http.StatusInternalServerError) | |
| return | |
| } |
utils/nodes/server.go
Outdated
| func (server *internalNode) flushMemTableToMemory() { | ||
| log.Println("Starting memory flush to persistant storage") | ||
| defer log.Println("Completed Memory write, Continuing normal operations") | ||
|
|
||
| keysInTable := server.memTable.Keys() | ||
|
|
||
| for index, value := range keysInTable { | ||
| fmt.Println(index, value.(string)) | ||
| } | ||
|
|
||
| defer server.memTable.Clear() | ||
|
|
||
| } |
There was a problem hiding this comment.
This function has several issues that could lead to data loss:
- It iterates over
keysInTablebut doesn't retrieve the corresponding values from the memtable. The values are needed to write the data to persistent storage. defer server.memTable.Clear()is risky. If writing to storage fails partway through, thedeferwill still execute, clearing the memtable and causing data loss. The memtable should only be cleared after all its contents have been successfully written to disk.- The actual logic to persist data is missing, it only prints to the console.
utils/nodes/server.go
Outdated
| func (server *internalNode) shouldFlushToMemory() bool { | ||
| nodeCount := server.memTable.Size() | ||
| var sampleNode redblacktree.Node | ||
| nodeSize := int64(unsafe.Sizeof(sampleNode)) | ||
| totalSize := int64(nodeCount) * nodeSize | ||
|
|
||
| return totalSize > server.dbConfig.InMemoryStorageThreshold | ||
| } |
There was a problem hiding this comment.
The calculation of the memtable's memory size is incorrect. unsafe.Sizeof(sampleNode) only returns the size of the redblacktree.Node struct itself (which contains pointers and other small fields), not the size of the actual key and value data that the pointers point to. This will drastically underestimate the memory usage, and the flush to disk will likely not be triggered when intended.
A more accurate approach would be to track the total size of keys and values as they are added to and removed from the memtable.
| return nil, fmt.Errorf("failed to save config file, error on encoder: %v", err) | ||
| manifest, err := manifest.GetManifest() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to parse config file: %v", err) |
There was a problem hiding this comment.
The error message here is misleading. The error is returned from manifest.GetManifest(), not from parsing a config file. A more accurate error message would be "failed to get manifest".
| return nil, fmt.Errorf("failed to parse config file: %v", err) | |
| return nil, fmt.Errorf("failed to get manifest: %v", err) |
utils/common/main.go
Outdated
| func findDestinationNode() { | ||
|
|
||
| } |
| func GetAppPath() string { | ||
| path := common.GetAppPath() | ||
|
|
||
| return path | ||
| } |
| data, err := json.Marshal(configData) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create config file: %v", err) | ||
| fmt.Println("Error marshalling example config:", err) |
There was a problem hiding this comment.
The codebase predominantly uses the log package for logging. Using fmt.Println here is inconsistent. It's better to use log.Printf or log.Println for consistency and to ensure logs are routed to the standard logger output (usually stderr).
| fmt.Println("Error marshalling example config:", err) | |
| log.Println("Error marshalling example config:", err) |
| if config.validateConfig() { | ||
| manifest, err := manifest.GetManifest() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to parse config file: %v", err) |
There was a problem hiding this comment.
The error message "failed to parse config file" is misleading here. The error comes from manifest.GetManifest(), which can fail for reasons other than parsing, such as file I/O errors when creating the manifest. The error message should more accurately reflect the source of the error, for example: "failed to get manifest".
| return nil, fmt.Errorf("failed to parse config file: %v", err) | |
| return nil, fmt.Errorf("failed to get manifest: %v", err) |
No description provided.