rulex

Форк

Описание

Russian pronunciation dictionary. Forked from github.com/poretsky/rulex

linux

27.12 MiB

Языки

C62%
Roff27,4%
M43,6%
Makefile2,6%
Emacs Lisp2,6%
Shell1,8%

Igor B. Poretsky

Package version update

26 июн 2026, 14:42

26 июн 2026, 14:4296285d6

.github

Github action fix

месяц назад

data

Minor dictionary update

месяц назад

Transition to GNU Autoconf

2 года назад

manpages

Synchronized manpage

3 месяца назад

src

Library API version increment

3 месяца назад

workshop

Minor development workflow enhancement

2 года назад

LICENSE.GPL-2

Add all necessary license texts

2 года назад

LICENSE.LGPL-2

Add all necessary license texts

2 года назад

Makefile.am

Transition to GNU Autoconf

2 года назад

README

Transition to GNU Autoconf

2 года назад

README.ru

Transition to GNU Autoconf

2 года назад

configure.ac

Package version update

месяц назад

rulex.el

Special Emacs mode to construct lexicon entries. Suggested by Dmitry Paduchikh.

23 года назад

README

RUSSIAN PRONUNCIATION DICTIONARY

If you prefer reading this info in Russian, you will find it in the

separate file README.ru.

Russian pronunciation dictionary RuLex is aimed primarily for use

together with the Russian TTS engine Ru_tts. When it is installed you

can instruct Ru_tts to use the dictionary by the command line switch

"-s /usr/local/share/freespeech/rulex.db".

The lexical database consists of two dictionaries and four sets of

rules. Subdirectory data contains source files of all these datasets

along with the test dictionary in source format. Dictionary source

format is as follows:

The dictionary source must be represented by a text file in koi8-r.

Each entry occupies exactly one line consisting of two fields

separated by space. The first field contains word itself and is used

as a key. The second field tells how this word should be

uttered. Stresses are denoted by the "+" sign following the stressed

letter. In such a manner you can use the "=" sign as well to denote

so-called weak stresses where it is appropriate, for instance, in some

complex words. In the key field only lowercase Russian letters are

allowed. In the utterance field additional characters "+", "=" and "-"

may be used as well. The "-" sign can serve as a separator in some

complex words.

For Ru_tts synthesizer could use the lexical database it must be

compiled. Since this project uses GNU Autoconf tools, it mustbe

configured at first by commands:

autoreconf -ifs

./configure

Then the dictionary creation is as simple as

make db

The file lexicon containing the lexical database will be created in

the data subdirectory as a result of these manipulations.

To install lexical database issue the command:

make install-db

Of course, this command should be performed with the superuser

privileges.

To recompile the lexholder utility issue the command:

make lexholder

To compile and install all stuff issue command:

make install

Of course, this command requires superuser privileges.

Along with the lexical database itself the package includes some

supporting tools, such as utility lexholder-ru, which allows updating

and extending the lexical database as well as extracting the database

content in textual form. This utility can be used either from command

line or from a shell-script. For information about it's using try

lexholder-ru -h

Or consult README.ru for the detailed description in Russian. Anyway, I

think, it would unlikely be useful for those who do not speak Russian.

Note 1. If you wish Ru_tts to make use of the lexical database created

by the lexholder-ru utility, they both must use the same database

support library. The version also makes sense. Usually it is expected

that the new version library routines can read the files created by

the ones of older versions, but it is not necessarily true. Detailed

information about used libraries for dynamically linked binaries can

be obtained by the "ldd" command.

Note 2. The lexical data storage format was changed in RuLex version

2.0. The new format can not be used by Ru_tts older than 2.0.

Note 3. The lexholder-ru utility does not free disk space when

database records are modified or deleted. Moreover, the database

size can grow dramatically during multiple updates. More compact

representation in this case can be achieved by the following steps:

1) Clean the database.

2) Extract all the datasets.

3) Recreate the database.

A little history

This dictionary was initially created as an attempt to provide some

stressing capability to the Ru_tts speech engine. My search through

the internet for some Russian stress dictionary had no practical

result. But then I've remembered that one of freely distributed

versions of Mueller's English-Russian dictionary actually contains the

desired information. I've extracted this stuff and got stress

dictionary about 60000 entries. It was not so much, but, anyway, I

already could see how the idea works. It became clear soon that there

are many special cases: some words should not be stressed at all and

the dictionary should contain not only stressing information. So, now

it is positioned as a pronunciation dictionary (not only stressing)

and it had grown dramatically since that time.

At some point I decided to extend the plain dictionary by some rules

allowing to guess pronunciation for some implicit word forms as well

as for the words where it can be done by formal criterions. As a

result, overall database size was greatly reduced, but the database

itself became more informative. And the old pronunciation dictionary

now serves for the database testing.

Now I'd like to thank all who helped me in this work:

The authors and publishers of the electronic version of Mueller's

dictionary with stress information, from which I obtained the initial

portion of stressed words;

Ivanov Guennady and my sister Tatyana, who had accomplished a lot of

hard work fixing and extending the dictionary at the beginning time;

Serge Fleytin, who provided me a bunch of new entries;

Dmitry Paduchih, who not only provided me new entries for the

dictionary, but had also developed the Emacs extension helping to

generate ones. This extension is included into the package and

Dmitry's comments about it are at the end of this file (in the Russian

section).

Igor B. Poretsky <poretsky@mlbox.ru>