Spool Processing #212

d-chambers · 2023-07-31T16:12:18Z

d-chambers
Jul 31, 2023
Maintainer

This discussion relates to how to bring spool processing into dascore. #200 is a pre-req for providing a flexible to map processes over spools while decoupling concurrency and IO concerns.

So, thinking about this, I propose the following:

We create a new module in DASCore called processors. This may be a little confusing since we have a module called proc for patch processors, but since these are simply used as patch attributes (e.g., patch.pass_filter rather than dascore.proc.filter.pass_filter) it probably isn't a big deal, especially if we explain this in the __init__.py files.
In dascore.processors we create a base class called SpoolProcessor from which specific spool processors will inherit.
We create submodules for specific spool processors, (e.g, low_frequency or maybe just resampling) which will define subclasses of these processors for applying some process over an entire spool. It might look like this:

(contents of resample)

import dascore as dc
from dascore.processors import SpoolProcessor

# these would subclass pydantic models so no __init__ is needed
class SpoolResampler(SpoolProcessor):
    """
    A nice docstring explaining what the processor does and its input parameters.
    """
    frequencies: tuple[float, float]  
    dim: str
    tolerance: float
    # ... 

    def apply(in_spool, out_spool):
        """ defines logic of processor """
        # eg, determine chunk size, rechunk in_spool, apply_to out_spool


# Then it could be used like this:

# define spools (input/output sources)
in_spool = dc.spool("path_to_my_files")
out_spool = dc.spool("new_path")

# init the spool processor with required input params
# would also support a client input defined in `SpoolProcessor`
spool_resampler = SpoolResampler(frequencies=(None, 10), dim="time ", ...)

# apply the processor. May take several hours or longer depending the input spool and processor.
spool_resampler.apply(in_spool, out_spool)

Of course, there are more details to flesh out, specifically what methods we need in SpoolProcessor and a bit of the edges of the interface but I think this is a solid start.

I prefer this to Spool methods (e.g., a method Spool.Resample) because:

Given spool processors may be quite complicated, it is better, from an ergonomics and aesthetics perspective, to have a class with many input parameters than a method.
SpoolProcessors will, in many cases, take a long time to run. From a user's perspective, I would like most methods of Patch and Spool to be fast, maybe taking a few seconds at most. This is consistent with the method-chaining paradigm we encourage in the documentation. Importing a class from a module then feeding it spools feels "heavier" which is more commensurate with the task at hand.
SpoolProcessors could get quite complicated, and perhaps there is more to do than just apply the processing to spools. For example, SpoolProcessor subclasses may implement other methods besides just apply that do some kind of plotting, diagnosis, or just provide more insight into the processor. Bolting such functionality onto a method quickly becomes untenable. As a side note, this class-based method for handing complexity around a process is, in my opinion, what made the scikit-learn API successful.

rigotibi · 2024-09-26T17:13:53Z

rigotibi
Sep 26, 2024

Hi Kit,
Is the patch resampling tool now available? If yes, how to use it?
Thanks,

1 reply

d-chambers Sep 27, 2024
Maintainer Author

Hey @rigotibi,

We don't yet have a way to massive down-sampling in DASCore for a spool but you can checkout this repo: https://github.com/DASDAE/DASLowFreqProcessing

If all you need is to resample a single patch, or don't mind combining them later you can just use patch.resample.

rigotibi · 2024-09-27T00:53:01Z

rigotibi
Sep 27, 2024

Great! Thanks

…

--------- Rigo From: Derrick Chambers ***@***.***> Sent: Thursday, September 26, 2024 6:36 PM To: DASDAE/dascore ***@***.***> Cc: Tibi, Rigo ***@***.***>; Mention ***@***.***> Subject: [EXTERNAL] Re: [DASDAE/dascore] Spool Processing (Discussion #212) You don't often get email from ***@***.******@***.***>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hey @rigotibi<https://github.com/rigotibi>, We don't yet have a way to massive down-sampling in DASCore for a spool but you can checkout this repo: https://github.com/DASDAE/DASLowFreqProcessing If all you need is to resample a single patch, or don't mind combining them later you can just use patch.resample<https://dascore.org/api/dascore/proc/resample.html>. - Reply to this email directly, view it on GitHub<#212 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3OE7RPKXBL66A53UHA7OL3ZYSR6FAVCNFSM6AAAAABO5PF7TCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANZWHEZDQNQ>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spool Processing #212

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Spool Processing #212

Uh oh!

d-chambers Jul 31, 2023 Maintainer

Replies: 2 comments · 1 reply

Uh oh!

rigotibi Sep 26, 2024

Uh oh!

d-chambers Sep 27, 2024 Maintainer Author

Uh oh!

rigotibi Sep 27, 2024

d-chambers
Jul 31, 2023
Maintainer

Replies: 2 comments 1 reply

rigotibi
Sep 26, 2024

d-chambers Sep 27, 2024
Maintainer Author

rigotibi
Sep 27, 2024