rulex

rulexdb_open.3
126 строк · 5.0 Кб
Перенос по словам
1
.\"                                      Hey, EMACS: -*- nroff -*-
2
.TH RULEXDB_OPEN 3 "February 19, 2012"
3
.SH NAME
4
rulexdb_open \- open or create a rulex database
5
.SH SYNOPSIS
6
.nf
7
.B #include <rulexdb.h>
8
.sp
9
.BI "RULEXDB *rulexdb_open(const char *" path ", int " mode );
10
.fi
11
.SH DESCRIPTION
12
The
13
.BR rulexdb_open ()
14
function opens the rulex database in the file whose name is the string
15
pointed to by
16
.I path
17
and allocates and initializes all necessary internal data structures
18
associated with it.
19
.PP
20
The argument
21
.I mode
22
specifies a database access mode. It may accept one of the following
23
values:
24
.TP
25
.B RULEXDB_SEARCH
26
Open the database only for searching (read only mode).
27
.TP
28
.B RULEXDB_UPDATE
29
Open existing database for searching and updating (read and write
30
mode).
31
.TP
32
.B RULEXDB_CREATE
33
Create new database and open it for updating and searching.
34
.SH "DATABASE STRUCTURE"
35
The rulex database consists of two dictionaries and four sets
36
of rules. The \fBExplicit\fP dictionary contains the words that
37
are described individually and do not imply any information for
38
other forms. This dictionary is looked up first if the search
39
includes this stage. The \fBImplicit\fP dictionary contains
40
words in some basic form. This dictionary is used to construct
41
pronunciation string for various forms of these words. The basic
42
form of a word is guessed according to the rules from the
43
\fBClassifiers\fP and \fBPrefix detectors\fP rulesets. This is the
44
second stage of search process. If these stages do not bring a result
45
or are not performed the rules from the \fBGeneral\fP ruleset are used
46
to guess stressing word. If no one of these rules can be applied than
47
no guessing is made and search process fails.
48
.PP
49
Externally all the data are represented textually. For the Russian
50
letters the \fBkoi8\-r\fP character set is used and only lower case
51
is allowed.
52
.PP
53
Each dictionary record consists of two fields. The first field
54
contains Russian word that serves as a key when searching. Only
55
lowercase Russian letters are allowed here. The second field provides
56
pronunciation string for this word. The pronunciation string
57
is the word itself, but written in such a manner as it should
58
be pronounced. There are three additional symbols allowed
59
in the pronunciation string along with the lowercase
60
Russian letters. The "+" sign can be used to point the stressed
61
letter. It should be placed just after that letter. The "=" sign
62
is used in some cases just in the same manner to point so-called
63
weak stress. The "-" sign can serve as a separator in some complex
64
words. All other symbols are treated as illegal.
65
.PP
66
There are four rulesets in the database: \fBGeneral\fP rules,
67
\fBClassifiers\fP, \fBPrefix detectors\fP and
68
\fBCorrectors\fP. Externally all these rules are represented by
69
records consisting of one or two fields. The first field always
70
contains a regular expression which is matched against the word to
71
make a decision whether this rule can be applied.
72
.PP
73
The only task of \fBGeneral\fP rules is to guess stress
74
in the words when dictionary lookup fails. The rules are tried
75
sequentially until match or the list exhaustion. If match succeeds
76
then the "+" sign is inserted into the word right after the first
77
subexpression match to point stressing position.
78
 These rules do not contain a second field.
79
.PP
80
For the \fBClassifiers\fP ruleset each rule is checked one by one
81
until match occurs. Then the part from the beginning of the word
82
through to the end of the first subexpression match is extracted
83
and if a second field is present it is appended to the extracted
84
part as a suffix. The resulting string is treated as a basic form
85
of the word, so it is looked up in the \fBImplicit\fP dictionary.
86
If nothing is found the process continues
87
until the ruleset will be exceeded.
88
.PP
89
When nothing is found in the database for a word in its original form,
90
\fBPrefix detection\fP rules are applied to it sequentially until
91
match occurs. The matched prefix is stripped and replaced by the
92
replacement string if any. Then the result word is searched in the
93
\fBImplicit\fP dictionary. In the case of success the original prefix
94
is restored in the pronunciation string.
95
.PP
96
The rules from \fBCorrectors\fP ruleset are applied
97
to the pronunciation strings instead of the original words.
98
The second field in these rules specifies a regular replacement
99
string where digits serve as subexpression numbers.
100
.SH "RETURN VALUE"
101
Upon successful completion
102
.BR rulexdb_open ()
103
return a
104
.I RULEXDB
105
pointer that should be used in other database access functions for
106
referencing the database.
107
Otherwise, NULL is returned.
108
.SH SEE ALSO
109
.BR rulexdb_classify (3),
110
.BR rulexdb_close (3),
111
.BR rulexdb_dataset_name (3),
112
.BR rulexdb_discard_dictionary (3),
113
.BR rulexdb_discard_ruleset (3),
114
.BR rulexdb_fetch_rule (3),
115
.BR rulexdb_lexbase (3),
116
.BR rulexdb_load_ruleset (3),
117
.BR rulexdb_remove_item (3),
118
.BR rulexdb_remove_rule (3),
119
.BR rulexdb_remove_this_item (3),
120
.BR rulexdb_retrieve_item (3),
121
.BR rulexdb_search (3),
122
.BR rulexdb_seq (3),
123
.BR rulexdb_subscribe_item (3),
124
.BR rulexdb_subscribe_rule (3)
125
.SH AUTHOR
126
Igor B. Poretsky <poretsky@mlbox.ru>.
127
rulex

Использование cookies