I've briefly investigated supporting MLFlow as a self hosted alternative to wandb (which requires a license for commercial usage). The current approach couples “logging” to the wandb API and to wandb-specific assumptions, which makes it hard to add other loggers cleanly.
- Non-scalar metric logging is implemented via
wandb.plot e.g., confusion matrices. Other loggers (MLflow, TensorBoard, CSV) don’t necessarily have the same plotting API, so you need per-backend implementations or a generic artifact representation.
- The “is wandb initialized?” checks encode a wandb-specific lifecycle. Other backends have different notions of an active run, run IDs, and resume semantics.
- Project management/resume currently stores a
wandb_id and tries to resume by setting logger.init_args.id. That only makes sense for WandbLogger-like backends; MLflow uses run_id, others have no equivalent.
- Automatically injecting/configuring a specific logger inside the CLI (rather than letting Lightning instantiate the logger from config) creates a second “source of truth” for logging behavior, which becomes a combinatorial mess once you support multiple backends.
If the stance is to only support wandb in rslearn, I appreciate how that keeps things simple but it would be good to clarify this.
I've briefly investigated supporting MLFlow as a self hosted alternative to wandb (which requires a license for commercial usage). The current approach couples “logging” to the wandb API and to wandb-specific assumptions, which makes it hard to add other loggers cleanly.
wandb.plote.g., confusion matrices. Other loggers (MLflow, TensorBoard, CSV) don’t necessarily have the same plotting API, so you need per-backend implementations or a generic artifact representation.wandb_idand tries to resume by settinglogger.init_args.id. That only makes sense for WandbLogger-like backends; MLflow usesrun_id, others have no equivalent.If the stance is to only support wandb in rslearn, I appreciate how that keeps things simple but it would be good to clarify this.