ksgi
/
tutorial4.xml
403 строки · 13.2 Кб
1<article data-sblg-article="1" data-sblg-tags="tutorial" itemscope="itemscope" itemtype="http://schema.org/BlogPosting">2<header>3<h2 itemprop="name">4Using Pages
5</h2>6<address itemprop="author">Ross Richardson</address>7<time itemprop="datePublished" datetime="2017-09-20">20 September, 2017</time>8</header>9<p>10<strong>Thanks to Ross Richardson's fine work in contributing this tutorial!</strong>11</p>12<p>13In order to facilitate convenient handling of common cases, <span class="nm">kcgi</span> provides functionality for dealing with14the <abbr>CGI</abbr> meta variable <code>PATH_INFO</code>).15For example, if <span class="file">/cgi-bin/foo</span> is the CGI script, invoking <span16class="file">/cgi-bin/foo/bar/baz</span> will pass <span class="file">/bar/baz</span> as additional information.17Many CGI scripts use this functionality as <q>URL normalisation</q>, or pushing query-string variables into the path.18</p>19<p>20This tutorial describes an example CGI which implements a news site devoted to some particular topic.
21The default document shows an index page, and there are sections for particular relevant areas.
22In each of these, the trailing slash may be included or omitted.
23I assume that your script is available at <span class="file">/cgi-bin/news</span>.24</p>25<dl>26<dt><span class="file">/cgi-bin/news</span>, <span class="file">/cgi-bin/news/index</span></dt>27<dd>main index</dd>28<dt><span class="file">/cgi-bin/news/about/</span></dt>29<dd>about the site</dd>30<dt><span class="file">/cgi-bin/news/archive/</span></dt>31<dd>archive of old articles</dd>32<dt><span class="file">/cgi-bin/news/archive/<var>yyyy</var></span></dt>33<dd>archive/index of articles for year <var>yyyy</var></dd>34<dt><span class="file">/cgi-bin/news/archive/<var>yyyy</var>/<var>mm</var></span></dt>35<dd>archive/index of articles for month <var>mm</var> of year <var>yyyy</var></dd>36<dt><span class="file">/cgi-bin/news/archive/<var>yyyy</var>/<var>mm</var>/<var>dd</var></span></dt>37<dd>archive/index of articles for date <var>yyyy</var>-<var>mm</var>-<var>dd</var></dd>38<dt><span class="file">/cgi-bin/news/random</span></dt>39<dd>a random article</dd>40<dt><span class="file">/cgi-bin/news/tag/<var>subj</var></span></dt>41<dd>articles tagged with "<var>subj</var>"</dd>42</dl>43<p>44<aside itemprop="about">45The tutorial gives an overview of the basic path handling provided by <span class="nm">kcgi</span>, and then shows and discusses46relevant code snippets.
47</aside>48</p>49<h3>50Basic Handling
51</h3>52<p>53Assuming a call to <a href="khttp_parse.3.html">khttp_parse(3)</a> returns <code>KCGI_OK</code>, the relevant fields of the54<code>struct kreq</code> are:55</p>56<dl>57<dt><code>fullpath</code></dt>58<dd>the value of <abbr>CGI</abbr> meta variable <code>PATH_INFO</code> (which may be the empty string)</dd>59<dt><code>pagename</code></dt>60<dd>the substring of <code>PATH_INFO</code> from after the initial '/' to (but excluding) the next '/', or to the end-of-string61(or the empty string if no such substring exists)</dd>62<dt><code>page</code></dt>63<dd>64<ul>65<li>if <code>pagename</code> is the empty string, the <code>defpage</code> parameter passed to66<a href="khttp_parse.3.html">khttp_parse(3)</a> (that is, the index corrsponding to the default page)</li>67<li>if <code>pagename</code> matches one of the strings in the <code>pages</code> parameter passed to68<a href="khttp_parse.3.html">khttp_parse(3)</a>, the index of that string</li>69<li>if <code>pagename</code> does not match any of the strings in <code>pages</code>, the <code>pagesz</code>70parameter passed to <a href="khttp_parse.3.html">khttp_parse(3)</a></li>71</ul>72</dd>73<dt><code>path</code></dt>74<dd>the middle part of <code>PATH_INFO</code> after stripping <code>pagename/</code> at the beginning and <code>.suffix</code>75at the end.</dd>76</dl>77<p>78In addition, the field <code>pname</code> contains the value of the <abbr>CGI</abbr> meta variable <code>SCRIPT_NAME</code>.79</p>80<h3>81Source Code
82</h3>83<p>84Here we look only at the code snippets not covered by the earlier tutorials.
85Firstly, we define some values corresponding with the subsections of the site.
86</p>87<figure class="sample">88<pre class="prettyprint linenums">enum pg {89PG_INDEX,
90PG_ABOUT,
91PG_ARCHIVE,
92PG_RANDOM,
93PG_TAG,
94PG__MAX
95};</pre>96</figure>97<p>98Next, we define the path strings corresponding with the enumeration values
99</p>100<figure class="sample">101<pre class="prettyprint linenums">static const char *pages[PG__MAX] = {102"index",103"about",104"archive",105"random",106"tag"107};</pre>108</figure>109<p>110We then define a constant bitmap corresponding with those <code>enum pg</code> values for which no extra path information should111be present in the <abbr>HTTP</abbr> request.112This will be used for sanity-checking the request.
113</p>114<figure class="sample">115<pre class="prettyprint linenums">const size_t pg_no_extra_permitted =116((1 << PG_INDEX) |117(1 << PG_ABOUT) |118(1 << PG_RANDOM));</pre>119</figure>120<p>121Next, we define a type for dates, a constant for the earliest valid year, functions for parsing a string specifying a date.
122We use year zero to indicate an invalid specification, and month/day zero to indicate that a month/day value was not specified.)
123</p>124<p>125<strong>Editor's note</strong>: remember that <a href="https://man.openbsd.org/strptime">strptime(3)</a> and friends may not be126available within a file-system sandbox due to time-zone access, so we need to find another way.
127</p>128<figure class="sample">129<pre class="prettyprint linenums">struct adate {130unsigned int year; /* 0 if invalid */
131unsigned int month; /* 0 if not specified */
132unsigned int day; /* 0 if not specified */
133};
134
135const unsigned int archive_first_yr = 1995;
136
137static unsigned int
138current_year(void)
139{
140struct tm *t;
141time_t now;
142
143if ((now = time(NULL)) == (time_t)-1 ||
144(t = gmtime(&now)) == NULL)145exit(EXIT_FAILURE);
146
147return t->tm_year + 1900;148} /* current_year */
149
150static unsigned int
151month_length(unsigned int y, unsigned int m)
152{
153unsigned int len;
154
155switch (m) {
156case 2:
157if (y % 4 == 0 && (y % 100 != 0 || y % 400 == 0))158len = 29;
159else
160len = 28;
161break;
162case 1:
163case 3:
164case 5:
165case 7:
166case 8:
167case 10:
168case 12:
169len = 31;
170break;
171case 4:
172case 6:
173case 9:
174case 11:
175len = 30;
176break;
177default:
178exit(EXIT_FAILURE);
179}
180return len;
181} /* month_length */
182
183static void
184str_to_adate(const char* s, char sep, struct adate *d)
185{
186long long val;
187char *t, *a, *b;
188size_t i;
189
190/* Set error/default state until proven otherwise. */
191d->year = 0;192d->month = 0;193d->day = 0;194
195i = 0;
196while (isdigit((unsigned char)s[i]) || s[i] == sep)
197i++;
198
199if (i > 0 && s[i] == '\0') {200/* s consists of digits and sep characters only. */
201/* Make a copy with which is is safe to tamper. */
202t = kstrdup(s);
203a = t;
204if ((b = strchr(a, sep)) != NULL)
205*b = '\0';
206val = strtonum(a, archive_first_yr, current_year(), NULL);
207if (val != 0) {
208/* Year is OK. */
209d->year = val;210if (b != NULL && b[1] != '\0') {211/* Move on to month. */
212a = &b[1];213if ((b = strchr(a, sep)) != NULL)
214*b = '\0';
215val = strtonum(a, 1, 12, NULL);
216if (val == 0) {
217d->year = 0;218} else {
219d->month = val;220if (b != NULL && b[1] != '\0') {221/* Move on to day. */
222a = &b[1];223if ((b = strchr(a, sep)) != NULL)
224*b = '\0';
225if ((b != NULL && b[1] != '\0') ||226(val = strtonum(a, 1, month_length
227(d->year, d->month), NULL)) == 0) {228d->year = 0;229d->month = 0;230} else {
231d->day = val;232}
233}
234}
235}
236}
237free(t);
238}
239} /* str_to_adate */</pre>240</figure>241<p>242Now, we consider the basic handling of the request.
243</p>244<figure class="sample">245<pre class="prettyprint linenums">int246main(void) {
247struct kreq r;
248struct adate ad;
249struct kpair *p;
250
251if (khttp_parse(&r, NULL, 0,252pages, PG__MAX, PG_INDEX) != KCGI_OK)
253return 0 /* abort */;
254
255if (r.mime != KMIME_TEXT_HTML) {
256handle_err(&r, KHTTP_404);257} else if (r.method != KMETHOD_GET &&258r.method != KMETHOD_HEAD) {
259handle_err(&r, KHTTP_405);260} else if (r.page == PG__MAX ||
261(r.path[0] != '\0' &&262((1 << r.page) & pg_no_extra_permitted))) {263handle_err(&r, KHTTP_404);264} else {
265switch (r.page) {
266case PG_INDEX :
267handle_index(&r);268break;
269case PG_ABOUT :
270handle_about(&r);271break;
272case PG_ARCHIVE :
273if (r.path != NULL && r.path[0] != '\0') {274str_to_adate(r.path, '/', &ad);275if (ad.year != 0) {
276handle_archive(&r, &ad);277} else {
278handle_err(&r, KHTTP_404);279}
280} else {
281/* Not specified at all. */
282handle_archive(&r, NULL);283}
284break;
285case PG_RANDOM :
286handle_random(&r);287break;
288case PG_TAG :
289handle_tag(&r, r.path);290break;
291default :
292/* shouldn't happen */
293handle_err(&r, KHTTP_500);294break;
295}
296}
297khttp_free(&r);298return EXIT_SUCCESS;
299}</pre>300</figure>301<p>302Suppose we now decide that we wish to fall back to looking for a date specification (with '-' separators rather than '/') in the
303query string if none is specified in the path.
304This is as simple as adding the required definition…305</p>306<figure class="sample">307<pre class="prettyprint linenums">enum key {308KEY_ADATE,
309KEY__MAX
310};</pre>311</figure>312<p>313…and adding a validator function…314</p>315<figure class="sample">316<pre class="prettyprint linenums">static int317valid_adate(struct kpair* kp)
318{
319struct adate ad;
320int ok;
321
322/* Invalid until proven otherwise. */
323ok = 0;
324
325if (kvalid_stringne(kp)) {
326str_to_adate(kp->val, '-', &ad);327if (ad.year != 0) {
328/* We have a valid specification. */
329kp->type = KPAIR__MAX /* Not a simple type. */;330kp->valsz = sizeof(ad);331kp->val = kmalloc(kp->valsz);332((struct adate*)kp->val)->year = ad.year;333((struct adate*)kp->val)->month = ad.month;334((struct adate*)kp->val)->day = ad.day;335ok = 1;
336}
337}
338return ok;
339} /* valid_adate */
340
341static const struct kvalid keys[KEY__MAX] = {
342{ valid_adate, "adate" } /* KEY_ADATE */343};</pre>344</figure>345<p>346(Note that the same date parsing function, <kbd>str_to_adate()</kbd>, is used but in this case it is wrapped in a validator347function and thus executes in the sandboxed environment.)
348</p>349<p>350…and, in <kbd>main()</kbd>, modifying the call to <a href="khttp_parse.3.html">khttp_parse(3)</a>…351</p>352<figure class="sample">353<pre class="prettyprint linenums">if (khttp_parse(&r, keys, KEY__MAX,354pages, PG__MAX, PG_INDEX) != KCGI_OK) {
355khttp_free(&r);356return EXIT_FAILURE /* abort */;
357}</pre>358</figure>359<p>360…and handling of the <kbd>PG_ARCHIVE</kbd> case…361</p>362<figure class="sample">363<pre class="prettyprint linenums">case PG_ARCHIVE :364if (r.path != NULL && r.path[0] != '\0') {365str_to_adate(r.path, '/', &ad);366if (ad.year != 0)
367handle_archive(&r, &ad);368else
369handle_err(&r, KHTTP_404);370} else if (r.fieldmap[KEY_ADATE] != NULL) {
371/* Fallback to field. */
372handle_archive(&r, (struct adate*)r.fieldmap[KEY_ADATE]->val);373} else if (r.fieldnmap[KEY_ADATE] != NULL) {
374/* Field is invalid. */
375handle_err(&r, KHTTP_404);376} else {
377/* Not specified at all. */
378handle_archive(&r, NULL);379}
380break;</pre>381</figure>382<p>383Whilst some specifications are naturally suited to the use of path information (for example, dates, file system hierarchies, and
384timezones), others are are a less natural fit.
385Suppose, in our example, that we want to be able to specify a date and a tag <em>at the same time</em>. This could be achieved386by extending the behaviour of the <kbd>archive</kbd> or <kbd>tag</kbd> "page", but does not fit comfortably with387either.
388In general, use of query string <kbd>keys</kbd> is preferred over <kbd>pages</kbd> because the former:389</p>390<ul>391<li><strong>involve parsing/validation in a sandboxed environment</strong></li>392<li>allows for greater flexibility</li>393</ul>394<p>395<strong>Editor's note</strong>: Ross makes a good case396for putting some sort of handling facility for URLs into
397the protected child process.
398For example, we could pass a string into <a href="khttp_parse.3.html">khttp_parsex(3)</a> that would define a template for399splitting the path into arguments.
400For example, <q>/@@0@@/@@1@@/@@2@@</q> might consider a pathname matching <q>/foo/bar/baz</q> with components being validated as401query arguments.
402</p>403</article>404