yargy
README.md
Yargy uses rules and dictionaries to extract structured information from Russian texts. Yargy is similar to Tomita parser.
Install
Yargy supports Python 3.7+, PyPy 3, depends only on Pymorphy2.
$ pip install yargy
Usage
from yargy import Parser, rule, and_, not_from yargy.interpretation import factfrom yargy.predicates import gramfrom yargy.relations import gnc_relationfrom yargy.pipelines import morph_pipeline
Name = fact( 'Name', ['first', 'last'],)Person = fact( 'Person', ['position', 'name'])
LAST = and_( gram('Surn'), not_(gram('Abbr')),)FIRST = and_( gram('Name'), not_(gram('Abbr')),)
POSITION = morph_pipeline([ 'управляющий директор', 'вице-мэр'])
gnc = gnc_relation()NAME = rule( FIRST.interpretation( Name.first ).match(gnc), LAST.interpretation( Name.last ).match(gnc)).interpretation( Name)
PERSON = rule( POSITION.interpretation( Person.position ).match(gnc), NAME.interpretation( Person.name )).interpretation( Person)
parser = Parser(PERSON)
match = parser.match('управляющий директор Иван Ульянов')print(match)
Person( position='управляющий директор', name=Name( first='Иван', last='Ульянов' ))
Documentation
All materials are in Russian:
Support
- Chat — https://t.me/natural_language_processing
- Issues — https://github.com/natasha/yargy/issues
- Commercial support — https://lab.alexkuk.ru
Development
Dev env
brew install graphviz
python -m venv ~/.venvs/natasha-yargysource ~/.venvs/natasha-yargy/bin/activate
pip install -r requirements/dev.txtpip install -e .
python -m ipykernel install --user --name natasha-yargy
Test + lint
make test
Update docs
make exec-docs
# Manually check git diff docs/, commit
Release
# Update setup.py version
git commit -am 'Up version'git tag v0.16.0
git pushgit push --tags
# Github Action builds dist and publishes to PyPi
Описание
Rule-based facts extraction for Russian language
Языки
Python
- Makefile