DreamML
Описание
Языки
- Python90,1%
- Jupyter Notebook9,3%
- TeX0,6%
DreamML - Self Machine Learning ❤️
The next stage of evalution DS-Template

About the DreamML
DreamML is a machine learning framework aimed at the industrial process. The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.
*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.
DreamML Concepts
-
Flexibility. DreamML can be used to automate the construction of solutions for various problem, data types (text, tables), and models.
-
Tuningability. Various hyper-parameters tuning methods are supported including models custom evaluation metrics and search spaces.
-
Validability. DreamML provides the ability to validate models, ensuring they meet necessary quality standards and are ready for use in real-world conditions.
-
Integrability. DreamML supports widely used ML libraries (Scikit-learn, CatBoost, XGBoost, Optuna, etc.).
-
Reproducibility. The generated pipelines and model artifacts are automatically saved in the experiment folder for reproducibility. Additionally, there is an option to resume training from checkpoints.
-
Customizability. DreamML allows managing models complexity and thereby achieving desired quality.
-
Production-orientability. The saved model artifacts and code can be easily wrapped into the necessary artifacts for deployment in production.
Installation
Get started
To develop a model, you can use the notebooks located in the
and select the one you need depending on the type of your task.
To validate models, you can use the notebooks located in the
To calibration models, you can use the notebooks located in the
How to Use
Information on notebooks for development notebooks/1. Model Development
-
First, you need to determine the pipeline configuration
- For
,regression,binary,multiclasstasks you can refer to this document 1_Model_Development_doc.mdmultilabel - For
task you can refer to this document 1_Topic_Modeling_doc.mdtopic_modeling - For
with (boosting) task you can refer to this document 1_TimeSeries_doc.mdtimeseries - For
with (Prophet) task you can refer to this document 1_AltModeTimeSeries_forecast.mdamts - If your dataset contains text features you should refer to this document 1_NLP_text_classification_doc.md
- If you would like to learn more about quality metrics and loss functions, we recommend that you refer to the document Binary_Classification_Metrics_doc.md
- For
-
You should start building the configuration and preparing the data for modeling
config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
- Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
- For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
- You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
models=models,
other_models=pipeline.other_model_dict,
encoder=transformer.cat_transformer,
ipynb_name=nb_name,
feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
- At the end, we can generate a development report. By default, it will be saved to the
folder.dreamml/results
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)
Authors
| Author | |
|---|---|
| Nikita Buts | nikitabuts2000@gmail.com |
| Alexander Izyurov | halfbrick845@gmail.com |
| Ivan Plotnikov | com.gateway.api@gmail.com |
| Maidari Tsydenov | maidaritsydenov@gmail.com |
| Evgeny Tkachenko | e_t@inbox.ru |
| Ilya Ivanov | morwes4@gmail.com |
| Nikita Varganov | - |
LICENSE
This project is licensed under the Apache License, Version 2.0. See LICENSE for details.