Skip to content

Commit 00e3ab4

Browse files
AndreFCruzandre.cruzfdz-sergio-jesusipiresmacAlbertoEAF
authored
FairGBM openml API implementation (#116)
* resolving clashes with LightGBM version 3.2.1.99 * Implemented the FairGBM parameter descriptor * all tests now pass * passing necessary group information to LightGBM c++ metadata class * checking whether sensitive group column is in categorical format * added tests for FairGBM openml interface * fixing bug on replace of ImmutableMap * tests now pass * remove debug messages * tidying code according to PR feedback * constraint_group data is now held in int instead of float for compatibility with cpp code * applying PR feedback * updated lightgbm pom to point to latest python-api branch * removed deprecated code * improving memory management of SWIG data * running intellij code cleanup * Revert H2OApp changes. * Revert H2OApp changes on all files. * Small fixes to typing, and javadocs. * udpated make-lightgbm submodule * update lightgbm version * asserting fairnessConstrained=True before setting group data * moved all FairGBM-specific input handling to a separate class * fairgbm input processing * ensure that loaded file is properly closed * Update openml-lightgbm/lightgbm-provider/src/main/java/com/feedzai/openml/provider/lightgbm/FairGBMParamParserUtil.java Co-authored-by: ipiresmac <93192669+ipiresmac@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: ipiresmac <93192669+ipiresmac@users.noreply.github.com> * applied PR feedback * added links to appropriate GH issues * speeding up system tests by running only 2 boosting iterations * Tests fix. * Restore comparison to fairnessConstrained SWIG objects. * Change to comparison when set has size of only two elements. * applying PR feedback * disallowing usage of RF with FairGBM * DescriptorUtilTest * allocating ML models with try/with * testing FairGBMDescriptorUtil * reducing code complexity * Docstring update from Iva Co-authored-by: ipiresmac <93192669+ipiresmac@users.noreply.github.com> * Tests for GBM Providers classes. * Apply suggestions from documentation review Co-authored-by: ipiresmac <93192669+ipiresmac@users.noreply.github.com> * Apply suggestions from documentation review Co-authored-by: ipiresmac <93192669+ipiresmac@users.noreply.github.com> * making global FPR/FNR constraint documentation clearer * asserting sensitive attribute is only loaded for constrained optimization settings * Change methods names in providers tests, move property to inside test that uses it. * updating UI docstring for the global target FPR/FNR * Set better line spacing. Co-authored-by: andre.cruz <andre.cruz@feedzai.com> Co-authored-by: sergio.jesus <sergio.jesus@feedzai.com> Co-authored-by: ipiresmac <93192669+ipiresmac@users.noreply.github.com> Co-authored-by: Alberto Ferreira <AlbertoEAF@users.noreply.github.com>
1 parent de77c62 commit 00e3ab4

31 files changed

Lines changed: 101741 additions & 50142 deletions

openml-lightgbm/lightgbm-builder/pom.xml

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,21 +15,26 @@
1515

1616
<groupId>com.feedzai.openml.lightgbm</groupId>
1717
<artifactId>lightgbm-lib</artifactId>
18-
<version>3.0.1-with_model_locale_fix_for_java_and_streaming</version>
18+
<version>v3.2.1-fairgbm-alpha</version>
1919

2020
<packaging>jar</packaging>
2121
<name>Openml LightGBM lib</name>
2222
<description>
2323
LightGBM build for Java generated with make-lightgbm.
2424
</description>
2525
<url>https://github.com/feedzai/make-lightgbm</url>
26-
2726
<properties>
28-
<!-- Microsoft hasn't merged our model-locale-fix patch yet. -->
29-
<!--<lightgbm.repo.url>https://github.com/microsoft/LightGBM</lightgbm.repo.url>-->
30-
<lightgbm.repo.url>https://github.com/feedzai/LightGBM.git</lightgbm.repo.url>
31-
<lightgbmlib.version>3.0.1-with_model_locale_fix_for_java_and_streaming</lightgbmlib.version>
32-
<lightgbm.version>v3.0.1-with_model_locale_fix_for_java_and_streaming</lightgbm.version>
27+
<!-- Microsoft LightGBM -->
28+
<!-- <lightgbm.repo.url>https://github.com/microsoft/LightGBM</lightgbm.repo.url> -->
29+
30+
<!-- Feedzai's custom LightGBM -->
31+
<!-- <lightgbm.repo.url>https://github.com/feedzai/LightGBM.git</lightgbm.repo.url> -->
32+
33+
<!-- Feedzai's FairGBM! -->
34+
<lightgbm.repo.url>https://github.com/feedzai/fairgbm.git</lightgbm.repo.url>
35+
36+
<lightgbm.version>main-fairgbm</lightgbm.version>
37+
<lightgbmlib.version>v3.2.1-fairgbm-alpha</lightgbmlib.version>
3338
</properties>
3439

3540
<build>

openml-lightgbm/lightgbm-provider/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@
2222
<groupId>com.feedzai</groupId>
2323
<artifactId>openml-lightgbm</artifactId>
2424

25-
<description>OpenML Microsoft LightGBM Machine Learning Model and Classifier provider</description>
25+
<description>OpenML LightGBM Machine Learning Model and Classifier provider</description>
2626

2727
<properties>
28-
<lightgbmlib.version>3.0.1-with_model_locale_fix_for_java_and_streaming</lightgbmlib.version>
28+
<lightgbmlib.version>v3.2.1-fairgbm-alpha</lightgbmlib.version>
2929
</properties>
3030

3131
<dependencies>
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
package com.feedzai.openml.provider.lightgbm;
2+
3+
import com.feedzai.openml.provider.descriptor.fieldtype.NumericFieldType;
4+
5+
public abstract class AlgoDescriptorUtil {
6+
7+
/**
8+
* An alias to ease the readability of parameters' configuration that are not mandatory.
9+
*/
10+
protected static final boolean NOT_MANDATORY = false;
11+
12+
/**
13+
* An alias to ease the readability of parameters' configuration that are not mandatory.
14+
*/
15+
protected static final boolean MANDATORY = true;
16+
17+
/**
18+
* Helper method to return a range of type DOUBLE.
19+
*
20+
* @param minValue Minimum allowed value.
21+
* @param maxValue Maximum allowed value.
22+
* @param defaultValue Default value.
23+
* @return Double range with the specs above.
24+
*/
25+
protected static NumericFieldType doubleRange(final double minValue,
26+
final double maxValue,
27+
final double defaultValue) {
28+
return NumericFieldType.range(minValue, maxValue, NumericFieldType.ParameterConfigType.DOUBLE, defaultValue);
29+
}
30+
31+
/**
32+
* Helper method to return a range of type INT.
33+
*
34+
* @param minValue Minimum allowed value.
35+
* @param maxValue Maximum allowed value.
36+
* @param defaultValue Default value.
37+
* @return Integer range with the specs above.
38+
*/
39+
protected static NumericFieldType intRange(final int minValue,
40+
final int maxValue,
41+
final int defaultValue) {
42+
return NumericFieldType.range(minValue, maxValue, NumericFieldType.ParameterConfigType.INT, defaultValue);
43+
}
44+
45+
}
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
/*
2+
* Copyright 2022 Feedzai
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*
16+
*/
17+
18+
package com.feedzai.openml.provider.lightgbm;
19+
20+
import com.feedzai.openml.provider.descriptor.ModelParameter;
21+
import com.feedzai.openml.provider.descriptor.fieldtype.ChoiceFieldType;
22+
import com.feedzai.openml.provider.descriptor.fieldtype.FreeTextFieldType;
23+
import com.feedzai.openml.provider.descriptor.fieldtype.NumericFieldType;
24+
import com.google.common.collect.ImmutableSet;
25+
26+
import com.google.common.collect.Sets;
27+
import java.util.Set;
28+
import java.util.stream.Collectors;
29+
30+
/**
31+
* Utility to organize all the necessary Machine Learning Hyper-Parameters for configuring the training of LightGBM.
32+
*
33+
* @author Andre Cruz (andre.cruz@feedzai.com)
34+
* @since 1.3.6
35+
*/
36+
public class FairGBMDescriptorUtil extends LightGBMDescriptorUtil {
37+
38+
public static final String CONSTRAINT_GROUP_COLUMN_PARAMETER_NAME = "constraint_group_column";
39+
40+
/**
41+
* Defines the set of model parameters supported by the FairGBM algorithm.
42+
*/
43+
static final Set<ModelParameter> PARAMS = Sets.union(ImmutableSet.of(
44+
// The single parameter that will change for every different dataset
45+
new ModelParameter(
46+
CONSTRAINT_GROUP_COLUMN_PARAMETER_NAME,
47+
"(Fairness) Sensitive group column",
48+
"Fairness constraints are enforced over this column.\n"
49+
+ "This column must be in categorical format.\n"
50+
+ "Start this string with `name:` to use the name of a column, \n"
51+
+ "e.g., `name:age_group` for a column named `age_group`.",
52+
MANDATORY,
53+
new FreeTextFieldType("")
54+
// new FreeTextFieldType("", ".+") # TODO: https://github.com/feedzai/feedzai-openml/issues/68
55+
),
56+
57+
new ModelParameter(
58+
"constraint_type",
59+
"(Fairness) Constraint type",
60+
"Enforces group-wise parity on the given target metric for the selected group column. "
61+
+ "In general, FPR can be used for most detection settings "
62+
+ "to equalize the negative outcomes on legitimate individuals "
63+
+ "(false positives).",
64+
NOT_MANDATORY,
65+
new ChoiceFieldType(ImmutableSet.of("FPR", "FNR", "FPR,FNR"), "FPR")
66+
),
67+
68+
// Parameters related to global constraints
69+
new ModelParameter(
70+
"global_constraint_type",
71+
"(Fairness) Global constraint type",
72+
"FairGBM modifies the output scores to meet your target FPR and/or FNR as well as "
73+
+ "fairness at a decision threshold of approximately 0.5 (or 500 in Pulse). Set parameters "
74+
+ "(Fairness) Global target FPR/FNR accordingly. Using decision thresholds far from 0.5 "
75+
+ "will not ensure fairness.",
76+
NOT_MANDATORY,
77+
new ChoiceFieldType(ImmutableSet.of("FPR", "FNR", "FPR,FNR"), "FPR,FNR")
78+
),
79+
new ModelParameter(
80+
"global_target_fpr",
81+
"(Fairness) Global target FPR",
82+
"This parameter is only active when '(Fairness) Global constraint type' includes "
83+
+ "'FPR'. This is an inequality constraint: inactive when FPR is lower than the target. "
84+
+ "Oftentimes, some tension is required between global FPR and FNR constraints in order to "
85+
+ "achieve the target values (in these cases pick 'FPR,FNR' for the '(Fairness) Global "
86+
+ "constraint type' parameter).",
87+
NOT_MANDATORY,
88+
doubleRange(0.0, 1.0, 0.05)
89+
),
90+
new ModelParameter(
91+
"global_target_fnr",
92+
"(Fairness) Global target FNR",
93+
"This parameter is only active when '(Fairness) Global constraint type' includes "
94+
+ "'FNR'. This is an inequality constraint: inactive when FNR is lower than the target. "
95+
+ "Oftentimes, some tension is required between global FPR and FNR constraints in order to "
96+
+ "achieve the target values (in these cases pick 'FPR,FNR' for the '(Fairness) Global "
97+
+ "constraint type' parameter).",
98+
NOT_MANDATORY,
99+
doubleRange(0.0, 1.0, 0.5)
100+
),
101+
102+
new ModelParameter(
103+
"objective",
104+
"(Fairness) Objective function",
105+
"For FairGBM you must use a constrained optimization function. "
106+
+ "`constrained_cross_entropy` is recommended for most cases.",
107+
NOT_MANDATORY,
108+
new ChoiceFieldType(
109+
ImmutableSet.of("constrained_cross_entropy", "constrained_recall_objective"),
110+
"constrained_cross_entropy")
111+
),
112+
113+
// Tolerance on the fairness constraints
114+
new ModelParameter(
115+
"constraint_fpr_threshold",
116+
"(Fairness) FPR tolerance for fairness",
117+
"The tolerance when fulfilling fairness FPR constraints. "
118+
+ "The allowed difference between group-wise FPR. "
119+
+ "The value 0.0 enforces group-wise FPR to be *exactly* equal. "
120+
+ "Higher values lead to a less strict fairness enforcement.",
121+
NOT_MANDATORY,
122+
doubleRange(0.0, 1.0, 0.0)
123+
),
124+
new ModelParameter(
125+
"constraint_fnr_threshold",
126+
"(Fairness) FNR tolerance for fairness",
127+
"The tolerance when fulfilling fairness FNR constraints. "
128+
+ "The allowed difference between group-wise FNR. "
129+
+ "The value 0.0 enforces group-wise FNR to be *exactly* equal. "
130+
+ "Higher values lead to a less strict fairness enforcement.",
131+
NOT_MANDATORY,
132+
doubleRange(0.0, 1.0, 0.0)
133+
),
134+
135+
// Eventually we want this parameter to not depend as much on the size of the dataset
136+
// But currently this needs to be changed for each dataset considering its size (larger for larger datasets)
137+
// See: https://github.com/feedzai/fairgbm/issues/7
138+
new ModelParameter(
139+
"multiplier_learning_rate",
140+
"(Fairness) Multipliers' learning rate",
141+
"The Lagrangian multipliers control how strict the constraint enforcement is.",
142+
NOT_MANDATORY,
143+
NumericFieldType.min(Float.MIN_VALUE, NumericFieldType.ParameterConfigType.DOUBLE, 1e3)
144+
), // NOTE: I'm using Float.MIN_VALUE here because the minimum value of a double in C++ depends on the architecture it's ran on, using float here is more conservative
145+
new ModelParameter(
146+
"init_multipliers",
147+
"(Fairness) Initial multipliers",
148+
"The Lagrangian multipliers control how strict the constraint enforcement is. "
149+
+ "The default value is starting with zero `0` for each constraint.",
150+
NOT_MANDATORY,
151+
new FreeTextFieldType("")
152+
// new FreeTextFieldType("", "^((\\d+(\\.\\d*)?,)*(\\d+(\\.\\d*)?))?$") # TODO: https://github.com/feedzai/feedzai-openml/issues/68
153+
),
154+
155+
// These parameters probably shouldn't be changed in 90% of cases
156+
new ModelParameter(
157+
"constraint_stepwise_proxy",
158+
"(Fairness) Stepwise proxy for fairness constraints",
159+
"The type of proxy function to use for the fairness constraint. "
160+
+ "We need to use a differentiable proxy function, as FPR and FNR have discontinuous gradients.",
161+
NOT_MANDATORY,
162+
new ChoiceFieldType(ImmutableSet.of("cross_entropy", "quadratic", "hinge"), "cross_entropy")
163+
),
164+
new ModelParameter(
165+
"objective_stepwise_proxy",
166+
"(Fairness) Stepwise proxy for global constraints",
167+
"The proxy function to use for the objective function. "
168+
+ "Only used when explicitly optimizing for Recall (or any other metric of the "
169+
+ "confusion matrix). Leave blank when using standard objectives, such as cross-entropy.",
170+
NOT_MANDATORY,
171+
new ChoiceFieldType(ImmutableSet.of("cross_entropy", "quadratic", "hinge", ""), "")
172+
),
173+
174+
// Override this parameter from LightGBM so we can disallow using RF
175+
new ModelParameter(
176+
BOOSTING_TYPE_PARAMETER_NAME,
177+
"Boosting type",
178+
"Type of boosting model:\n"
179+
+ "'gbdt' is a good starting point,\n"
180+
+ "'goss' is faster but slightly less accurate,\n"
181+
+ "'dart' is much slower but might improve performance,\n"
182+
+ "'rf' is the random forest mode.",
183+
MANDATORY,
184+
new ChoiceFieldType(
185+
ImmutableSet.of("gbdt", "dart", "goss"),
186+
"gbdt"
187+
)
188+
)
189+
190+
// TODO: assess whether these parameters would ever be useful
191+
// // These parameters probably shouldn't be changed in 99% of cases
192+
// new ModelParameter(
193+
// "stepwise_proxy_margin",
194+
// "",
195+
// "",
196+
// NOT_MANDATORY,
197+
// new FreeTextFieldType("")
198+
// ),
199+
// new ModelParameter(
200+
// "score_threshold",
201+
// "",
202+
// "",
203+
// NOT_MANDATORY,
204+
// new FreeTextFieldType("")
205+
// ),
206+
// new ModelParameter(
207+
// "global_score_threshold",
208+
// "",
209+
// "",
210+
// NOT_MANDATORY,
211+
// new FreeTextFieldType("")
212+
// )
213+
214+
), LightGBMDescriptorUtil.PARAMS.stream()
215+
.filter(el -> !el.getName().equals(BOOSTING_TYPE_PARAMETER_NAME))
216+
.collect(Collectors.toSet()));
217+
218+
}
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
/*
2+
* Copyright 2020 Feedzai
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*
16+
*/
17+
18+
package com.feedzai.openml.provider.lightgbm;
19+
20+
import com.google.auto.service.AutoService;
21+
import java.util.Optional;
22+
import java.util.Set;
23+
24+
import com.feedzai.openml.provider.MachineLearningProvider;
25+
import com.feedzai.openml.provider.TrainingMachineLearningProvider;
26+
import com.feedzai.openml.provider.descriptor.MLAlgorithmDescriptor;
27+
import com.feedzai.openml.util.algorithm.MLAlgorithmEnum;
28+
29+
/**
30+
* This class implements Feedzai's OpenML MachineLearningProvider interface for FairGBM (constrained LightGBM).
31+
*
32+
* @author Andre Cruz (andre.cruz@feedzai.com)
33+
* @since 1.3.6
34+
*/
35+
@AutoService(MachineLearningProvider.class)
36+
public class FairGBMMLProvider implements TrainingMachineLearningProvider<LightGBMModelCreator> {
37+
38+
/**
39+
* The reported name of this provider.
40+
*/
41+
private static final String PROVIDER_NAME = "Feedzai GBM";
42+
43+
@Override
44+
public String getName() {
45+
return PROVIDER_NAME;
46+
}
47+
48+
@Override
49+
public Set<MLAlgorithmDescriptor> getAlgorithms() {
50+
return MLAlgorithmEnum.getDescriptors(new MLAlgorithmEnum[]{LightGBMAlgorithms.FAIRGBM_BINARY_CLASSIFIER});
51+
}
52+
53+
@Override
54+
public Optional<LightGBMModelCreator> getModelCreator(final String algorithmName) {
55+
return MLAlgorithmEnum.getByName(new MLAlgorithmEnum[]{LightGBMAlgorithms.FAIRGBM_BINARY_CLASSIFIER}, algorithmName)
56+
.map(algorithm -> new LightGBMModelCreator());
57+
}
58+
}

0 commit comments

Comments
 (0)