You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases you have multiple id columns which define a time series, i.e. factory and machine type (Factory A & Machine A; Factory B & Machine A; ...). You could now create a new single id column beforehand; this would be ideally just a tuple of the combinations. Unfortunately the roll_time_series() function can't handle them right now (at least it didn't work for me).
Therefore I adapted the code a little bit in this function to handle multiple id columns.
Moreover if the column id name is different than "id", i.e. "machine_name", this column is also present in the rolled dataframe after calling roll_time_series().
from tsfresh.utilities.dataframe_functions import roll_time_series
df_rolled = roll_time_series(df, column_id="machine_name", column_sort="time")
from tsfresh import extract_features
df_features = extract_features(df_rolled, column_id="id", column_sort="time")
A naive user (like me) would just pass the rolled dataframe directly into the extract_features function. With the above workflow the extract_features() function would
fail if the column "machine_name" contains strings
would run perfectly and calculate features for "machine_name" if the column contains numerical values -> this doesn't make sense for me as we group indirectly by this column anyway and calculating features for id's...well..
Therefore I dropped the initial id colum(s) in this PR after rolling. The id's are included in the new created "id" and will be easy accessible again after extracting the features from the rolled dataframe.
I am sure there is a lot of room for improvements in my code changes, but for my use cases it works and I would be happy to get your feedback and thoughts about this.
Great idea @mmcux! I like both your ideas (allow for multiple columns in the id as well as remove the created id, as it can be easily recreated).
However, I would propose not to allow for multiple id columns as parameter in the roll_time_series function. The reason is as following: all our "main" functions in tsfresh allow for the same input parameters (column_id, column_sort, ...) and it would be a bit inconsistent if one of them allows for a list and the others don't. We could of course now change all functions to allow for lists, but that would be a lot of work. Therefore I would propose to go the other way you suggested: make sure the roll_time_series function can actually work with a id column, which consists of tuples. There is not much to change in this case, basically one line. What do you think? If you like it, feel free to just cherry-pick the commit this link points to (I have also adjusted the tests) in your PR. We can think of having a dedicated example in the documentation for this use-case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In some cases you have multiple id columns which define a time series, i.e. factory and machine type (Factory A & Machine A; Factory B & Machine A; ...). You could now create a new single id column beforehand; this would be ideally just a tuple of the combinations. Unfortunately the
roll_time_series()function can't handle them right now (at least it didn't work for me).Therefore I adapted the code a little bit in this function to handle multiple id columns.
Moreover if the column id name is different than "id", i.e. "machine_name", this column is also present in the rolled dataframe after calling
roll_time_series().A naive user (like me) would just pass the rolled dataframe directly into the extract_features function. With the above workflow the
extract_features()function wouldTherefore I dropped the initial id colum(s) in this PR after rolling. The id's are included in the new created "id" and will be easy accessible again after extracting the features from the rolled dataframe.
I am sure there is a lot of room for improvements in my code changes, but for my use cases it works and I would be happy to get your feedback and thoughts about this.