-
Notifications
You must be signed in to change notification settings - Fork 375
feat(io): Make Storage a trait
#1755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
crates/iceberg/src/io/storage_gcs.rs
Outdated
| } | ||
|
|
||
| #[async_trait] | ||
| impl Storage for OpenDALGcsStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be pretty annoying to implement nearly the same thing for all storage services, can we avoid that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this thread is relavant: https://docs.google.com/document/d/1-CEvRvb52vPTDLnzwJRBx5KLpej7oSlTu_rg0qKEGZ8/edit?disco=AAABrRO9Prk
The benefit of doing this is that users are allowed to only implement Storage for certain schemes. The annoying part of having duplicate code for multiple schemes will mostly apply to a versatile storage implementation like OpenDAL, which already has a convenient operator layer. For custom storage, I don't expect them to implement all schemes anyway (I may be wrong on this assumption)
For code duplication, I consider OpenDAL Storage to be the "managed" default storage that lives in this repo and we will have more control over the implementation. Once we have a new crate for each storage implementation(iceberg-storage-opendal), we can add some helpers to reduce the code duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm keeping the current implementation as unified storage (one OpenDalStorage for all backends). We can split it up later if needed
|
Hi, @liurenjie1024 and @CTTY, I have thinked about this again. I think we can split the concept around As discussed in #1797, sometimes people don't care about FileIO (the thing in java) at all. They will initiate, build and manage their own IO abstraction and only want to be used as So, I think we should make |
I'm a little confused about your point, would you mind to write a more detailed proposal? |
Will do |
569736e to
a9d8fdf
Compare
77f8b1d to
e881f83
Compare
322ae2a to
459dc73
Compare
|
I have concerns about this pr, where it leaves whole crate in a state inconsisteny with our design. I think to following our design, you don't need to change anything existing. We could name our new trait to be |
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?
Added some tests for the new storage builder and registry, but mostly relying on the existing tests