DreamML

0

Описание

Языки

  • Python90,1%
  • Jupyter Notebook9,3%
  • TeX0,6%
10 месяцев назад
10 месяцев назад
месяц назад
10 месяцев назад
месяц назад
месяц назад
месяц назад
месяц назад
месяц назад
месяц назад
README.md

DreamML - Self Machine Learning ❤️

The next stage of evalution DS-Template

DreamML_promo

About the DreamML


DreamML is a machine learning framework aimed at the industrial process. The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.

*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.


DreamML Concepts

  • Flexibility. DreamML can be used to automate the construction of solutions for various problem, data types (text, tables), and models.

  • Tuningability. Various hyper-parameters tuning methods are supported including models custom evaluation metrics and search spaces.

  • Validability. DreamML provides the ability to validate models, ensuring they meet necessary quality standards and are ready for use in real-world conditions.

  • Integrability. DreamML supports widely used ML libraries (Scikit-learn, CatBoost, XGBoost, Optuna, etc.).

  • Reproducibility. The generated pipelines and model artifacts are automatically saved in the experiment folder for reproducibility. Additionally, there is an option to resume training from checkpoints.

  • Customizability. DreamML allows managing models complexity and thereby achieving desired quality.

  • Production-orientability. The saved model artifacts and code can be easily wrapped into the necessary artifacts for deployment in production.


Installation

Get started


To develop a model, you can use the notebooks located in the

notebooks/1. Model Development
and select the one you need depending on the type of your task.

To validate models, you can use the notebooks located in the

notebooks/2. Validate Model

To calibration models, you can use the notebooks located in the

notebooks/3. Calibration

How to Use


Information on notebooks for development
notebooks/1. Model Development

  1. First, you need to determine the pipeline configuration

  2. You should start building the configuration and preparing the data for modeling

config_storage = ConfigStorage(config=config) transformer = DataTransformer(config_storage) data_storage = transformer.transform()
  1. Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage) pipeline.transform()
  1. For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage) oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
  1. You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver models = pipeline.prepared_model_dict pipeline.oot_potential = oot_potential models.update(lama) nb_name = saver.get_notebook_path_and_save() saver.save_artifacts( models=models, other_models=pipeline.other_model_dict, encoder=transformer.cat_transformer, ipynb_name=nb_name, feature_threshold=config_storage.feature_threshold, ) saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
  1. At the end, we can generate a development report. By default, it will be saved to the
    dreamml/results
    folder.
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)

Authors


AuthorEmail
Nikita Butsnikitabuts2000@gmail.com
Alexander Izyurovhalfbrick845@gmail.com
Ivan Plotnikovcom.gateway.api@gmail.com
Maidari Tsydenovmaidaritsydenov@gmail.com
Evgeny Tkachenkoe_t@inbox.ru
Ilya Ivanovmorwes4@gmail.com
Nikita Varganov-

LICENSE


This project is licensed under the Apache License, Version 2.0. See LICENSE for details.

PyPI Version Documentation Status