Non-stratified splitting and overwriting of loss function in classification tasks

Hello,

I'd like to report two issues regarding classification tasks in modnet:

First, the loss function passed to `ModnetModel().fit()` is overwritten with `"categorical_crossentropy"` if `val_data` is not None and `self.multi_label=False`:
https://github.com/ppdebreuck/modnet/blob/e14188d3b8a036bba0a1d9c0a3f538dc58c3cd29/modnet/models/vanilla.py#L400-L411
As the `loss=None` case is already handled before in L352-L360 in the preprocessing of the training data, maybe this could be removed here when preprocessing the validation data?

Second, if `nested=False`, both `FitGenetic` and `ModnetModel.fit_preset()` perform a train test split that is not stratified:
https://github.com/ppdebreuck/modnet/blob/e14188d3b8a036bba0a1d9c0a3f538dc58c3cd29/modnet/models/vanilla.py#L580
https://github.com/ppdebreuck/modnet/blob/e14188d3b8a036bba0a1d9c0a3f538dc58c3cd29/modnet/hyper_opt/fit_genetic.py#L458-L462
This is an issue in the case of imbalanced datasets and it would be helpful if the splitting was stratified for classification tasks.

If you are interested, I'm happy to raise a PR with fixes.

	if self.num_classes[prop[0]] >= 2: # Classification
	targ = prop[0]
	if self.multi_label:
	y_inner = np.stack(val_data.df_targets[targ].values)
	if loss is None:
	loss = "binary_crossentropy"
	else:
	y_inner = tf.keras.utils.to_categorical(
	val_data.df_targets[targ].values,
	num_classes=self.num_classes[targ],
	)
	loss = "categorical_crossentropy"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-stratified splitting and overwriting of loss function in classification tasks #228

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	splits = [
	train_test_split(
	range(len(self.train_data.df_featurized)), test_size=val_fraction
	)
	]

Non-stratified splitting and overwriting of loss function in classification tasks #228

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions