google-research
Fair matroid submodular maximization
This folder contains the codes for paper titled "Fairness in Streaming Submodular Maximization over a Matroid Constraint".
Each experiment requires a submodular function, a matroid, both accessed through oracles, and a fairness constraint.
Oracles are implemented for several submodular functions and matroids; to implement a new submodular
function inherit from SubmodularFunction
class and to implement a new matroid inherit from Matroid
class.
Compiling the code
One can compile all the files with c++17 or later, beside the ones used for preparing the inputs, described below.
Downloading and preparing datasets
-
To run the maximum coverage experiment, download the files
soc-pokec-relationships.txt
andsoc-pokec-profiles.txt
from https://snap.stanford.edu/data/soc-Pokec.html and save them in the same directory containing the preprocessing scripts (extract-attributes.cpp
,color-vertices.cpp
,statistics_bmi.cpp
,clean_graph_for_bmi.cpp
). Runextract-attributes.cpp
which should generatefiltered-attributes.txt
file. Next, runcolor-vertices.cpp
andstatistics_BMI.cpp
which should generatecolor_age_1.txt
andcolor-BMI.txt
files, respectively. Finally, runclean_graph_for_BMI.cpp
which should generateBMI-soc-pokec-relationships.txt
file. Update the path ingraph.cc
(inname_to_filename
map) to point toBMI-soc-pokec-relationships.txt
,color_age_1.txt
andcolor-BMI.txt
. -
To run the exemplar-based clustering experiment, download the data from https://archive.ics.uci.edu/ml/datasets/Bank+Marketing. Run
bank_input_converter_main.cc
on the downloaded data (bank
) which converts the input to the format that the oracle function expects. Update theinput_path
inClusteringExperiment
function in the main file to point to the converted features. -
To run the movie recommendation experiment, use the
prepare_movies.py
script from https://github.com/dj3500/movielens-matrix-completion and follow the instructions there. Place the filesmovies.dat
(from the MovieLens-1M archive),U.txt
andVT.txt
(generated by the script) in some directory, and update the path to that directory inmovies_data.cc
.
Running experiments
Refer to the main.cc
file for examples of how the experiments can be run. Please keep in mind to update the paths for reading the input (as explained above) and writing the results (in exp_base_path
in main.cc
). The "f" value in the result files correspond to the submodular objective value, "rank" to the rank $k$ of the matroid, and "error" to the violation of the fairness constraint $\mathrm{err}(S)$.
Example Makefile
One can create a file Makefile
with the following content.
SRC_FILES := $(wildcard *.cc)
H_FILES := $(wildcard *.h)
CXXFLAGS := -O3 -W -Wall -Wshadow -Wno-unused-parameter -Wno-sign-compare -std=c++17
BIN := fair-submodular.exe
$(BIN): $(SRC_FILES) $(H_FILES)
g++ $(CXXFLAGS) -o $@ $(SRC_FILES)
all: $(BIN)
Then running make
will build the code.