Describe the feature you'd like to see
I would love to see an extension to the core conditioning pipelines that allows users to supply structured camera trajectories (e.g., pan left/right, zoom in/out, orbital tracking) during inference. This could be implemented via extra conditioning inputs or an integrated spatial-temporal attention mechanism (similar to MotionCtrl or CameraCtrl implementations in other DiT architectures).
Is your feature request related to a problem?
Currently, controlling exact camera motion in text-to-video or image-to-video generation relies heavily on text prompting, which frequently results in unpredictable camera behaviors, unwanted "Ken Burns" drifting effects, or static frames.
Describe alternatives you've considered
Using external IP-Adapter or ControlNet hacks post-generation, which often breaks visual consistency or requires heavy reprocessing.
Additional context
Providing a clean python API wrapper within ltx-pipelines for coordinate-based or preset-based camera trajectories would significantly improve usability for professional cinematic workflows.
Describe the feature you'd like to see
I would love to see an extension to the core conditioning pipelines that allows users to supply structured camera trajectories (e.g., pan left/right, zoom in/out, orbital tracking) during inference. This could be implemented via extra conditioning inputs or an integrated spatial-temporal attention mechanism (similar to MotionCtrl or CameraCtrl implementations in other DiT architectures).
Is your feature request related to a problem?
Currently, controlling exact camera motion in text-to-video or image-to-video generation relies heavily on text prompting, which frequently results in unpredictable camera behaviors, unwanted "Ken Burns" drifting effects, or static frames.
Describe alternatives you've considered
Using external IP-Adapter or ControlNet hacks post-generation, which often breaks visual consistency or requires heavy reprocessing.
Additional context
Providing a clean python API wrapper within
ltx-pipelinesfor coordinate-based or preset-based camera trajectories would significantly improve usability for professional cinematic workflows.