unstructured

Форк
0
1
[
2
  {
3
    "element_id": "259298f61f4d6456eb95121be01c03dc",
4
    "metadata": {
5
      "data_source": {
6
        "permissions_data": [
7
          {
8
            "mode": 33188
9
          }
10
        ],
11
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
12
      },
13
      "filetype": "application/pdf",
14
      "languages": [
15
        "eng"
16
      ],
17
      "links": [
18
        {
19
          "start_index": 15,
20
          "text": "Theseareconcatenatedandonceagainprojected , resultinginthefinalvalues , depictedinFigure2",
21
          "url": "figure.2"
22
        }
23
      ],
24
      "page_number": 1
25
    },
26
    "text": "output values. These are concatenated and once again projected, resulting in the final values, as depicted in Figure 2.",
27
    "type": "NarrativeText"
28
  },
29
  {
30
    "element_id": "be50bef9d05264591898ec2a7afeabdf",
31
    "metadata": {
32
      "data_source": {
33
        "permissions_data": [
34
          {
35
            "mode": 33188
36
          }
37
        ],
38
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
39
      },
40
      "filetype": "application/pdf",
41
      "languages": [
42
        "eng"
43
      ],
44
      "page_number": 1
45
    },
46
    "text": "Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions. With a single attention head, averaging inhibits this.",
47
    "type": "NarrativeText"
48
  },
49
  {
50
    "element_id": "cffeb4133cb6ed7b9998087ff6eeedb5",
51
    "metadata": {
52
      "data_source": {
53
        "permissions_data": [
54
          {
55
            "mode": 33188
56
          }
57
        ],
58
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
59
      },
60
      "filetype": "application/pdf",
61
      "languages": [
62
        "eng"
63
      ],
64
      "page_number": 1
65
    },
66
    "text": "MultiHead(Q, K, V ) = Concat(head1, ..., headh)W O",
67
    "type": "Title"
68
  },
69
  {
70
    "element_id": "63f06d447c3b500d3724a17730b6a340",
71
    "metadata": {
72
      "data_source": {
73
        "permissions_data": [
74
          {
75
            "mode": 33188
76
          }
77
        ],
78
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
79
      },
80
      "filetype": "application/pdf",
81
      "languages": [
82
        "eng"
83
      ],
84
      "page_number": 1
85
    },
86
    "text": "where headi = Attention(QW Q",
87
    "type": "Title"
88
  },
89
  {
90
    "element_id": "9fab14a755f04854dc4b4329737c315a",
91
    "metadata": {
92
      "data_source": {
93
        "permissions_data": [
94
          {
95
            "mode": 33188
96
          }
97
        ],
98
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
99
      },
100
      "filetype": "application/pdf",
101
      "languages": [
102
        "eng"
103
      ],
104
      "page_number": 1
105
    },
106
    "text": "i , KW K i",
107
    "type": "Title"
108
  },
109
  {
110
    "element_id": "5da738390f100e4d2eefc5e286f2cc96",
111
    "metadata": {
112
      "data_source": {
113
        "permissions_data": [
114
          {
115
            "mode": 33188
116
          }
117
        ],
118
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
119
      },
120
      "filetype": "application/pdf",
121
      "languages": [
122
        "eng"
123
      ],
124
      "page_number": 1
125
    },
126
    "text": ", V W V",
127
    "type": "Title"
128
  },
129
  {
130
    "element_id": "254bce889c65e7d964e2106a5b2640f9",
131
    "metadata": {
132
      "data_source": {
133
        "permissions_data": [
134
          {
135
            "mode": 33188
136
          }
137
        ],
138
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
139
      },
140
      "filetype": "application/pdf",
141
      "languages": [
142
        "eng"
143
      ],
144
      "page_number": 1
145
    },
146
    "text": "i )",
147
    "type": "Title"
148
  },
149
  {
150
    "element_id": "a11edfa4b89f9c2d63ef8cac98240e45",
151
    "metadata": {
152
      "data_source": {
153
        "permissions_data": [
154
          {
155
            "mode": 33188
156
          }
157
        ],
158
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
159
      },
160
      "filetype": "application/pdf",
161
      "languages": [
162
        "eng"
163
      ],
164
      "page_number": 1
165
    },
166
    "text": "Where the projections are parameter matrices W Q and W O ∈ Rhdv×dmodel.",
167
    "type": "NarrativeText"
168
  },
169
  {
170
    "element_id": "c3a800cafce18223e5e90c2ec57bb27f",
171
    "metadata": {
172
      "data_source": {
173
        "permissions_data": [
174
          {
175
            "mode": 33188
176
          }
177
        ],
178
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
179
      },
180
      "filetype": "application/pdf",
181
      "languages": [
182
        "eng"
183
      ],
184
      "page_number": 1
185
    },
186
    "text": "i ∈ Rdmodel×dk , W K",
187
    "type": "Title"
188
  },
189
  {
190
    "element_id": "4d14397c6db433cb643ecf17bba23bf2",
191
    "metadata": {
192
      "data_source": {
193
        "permissions_data": [
194
          {
195
            "mode": 33188
196
          }
197
        ],
198
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
199
      },
200
      "filetype": "application/pdf",
201
      "languages": [
202
        "eng"
203
      ],
204
      "page_number": 1
205
    },
206
    "text": "i ∈ Rdmodel×dk , W V",
207
    "type": "Title"
208
  },
209
  {
210
    "element_id": "accfd5998875a89c8cdf2bd9c74bff26",
211
    "metadata": {
212
      "data_source": {
213
        "permissions_data": [
214
          {
215
            "mode": 33188
216
          }
217
        ],
218
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
219
      },
220
      "filetype": "application/pdf",
221
      "languages": [
222
        "eng"
223
      ],
224
      "page_number": 1
225
    },
226
    "text": "i ∈ Rdmodel×dv",
227
    "type": "NarrativeText"
228
  },
229
  {
230
    "element_id": "10d4ac36f66e0318cd97f5921638dc37",
231
    "metadata": {
232
      "data_source": {
233
        "permissions_data": [
234
          {
235
            "mode": 33188
236
          }
237
        ],
238
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
239
      },
240
      "filetype": "application/pdf",
241
      "languages": [
242
        "eng"
243
      ],
244
      "page_number": 1
245
    },
246
    "text": "In this work we employ h = 8 parallel attention layers, or heads. For each of these we use dk = dv = dmodel/h = 64. Due to the reduced dimension of each head, the total computational cost is similar to that of single-head attention with full dimensionality.",
247
    "type": "NarrativeText"
248
  },
249
  {
250
    "element_id": "d31b9469d49a1d806388e85c09c96669",
251
    "metadata": {
252
      "data_source": {
253
        "permissions_data": [
254
          {
255
            "mode": 33188
256
          }
257
        ],
258
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
259
      },
260
      "filetype": "application/pdf",
261
      "languages": [
262
        "eng"
263
      ],
264
      "page_number": 1
265
    },
266
    "text": "3.2.3 Applications of Attention in our Model",
267
    "type": "Title"
268
  },
269
  {
270
    "element_id": "3a881c92f819fd398474f23105de039f",
271
    "metadata": {
272
      "data_source": {
273
        "permissions_data": [
274
          {
275
            "mode": 33188
276
          }
277
        ],
278
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
279
      },
280
      "filetype": "application/pdf",
281
      "languages": [
282
        "eng"
283
      ],
284
      "page_number": 1
285
    },
286
    "text": "The Transformer uses multi-head attention in three different ways:",
287
    "type": "NarrativeText"
288
  },
289
  {
290
    "element_id": "831aa8a0f86419eb62b4b4d5afb18f23",
291
    "metadata": {
292
      "data_source": {
293
        "permissions_data": [
294
          {
295
            "mode": 33188
296
          }
297
        ],
298
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
299
      },
300
      "filetype": "application/pdf",
301
      "languages": [
302
        "eng"
303
      ],
304
      "links": [
305
        {
306
          "start_index": 354,
307
          "text": "38",
308
          "url": "cite.wu2016google"
309
        },
310
        {
311
          "start_index": 358,
312
          "text": "2",
313
          "url": "cite.bahdanau2014neural"
314
        },
315
        {
316
          "start_index": 361,
317
          "text": "9",
318
          "url": "cite.JonasFaceNet2017"
319
        }
320
      ],
321
      "page_number": 1
322
    },
323
    "text": "In \"encoder-decoder attention\" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence. This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as [38, 2, 9].",
324
    "type": "ListItem"
325
  },
326
  {
327
    "element_id": "62fb78b6c60c3144d075a419a5460bf2",
328
    "metadata": {
329
      "data_source": {
330
        "permissions_data": [
331
          {
332
            "mode": 33188
333
          }
334
        ],
335
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
336
      },
337
      "filetype": "application/pdf",
338
      "languages": [
339
        "eng"
340
      ],
341
      "page_number": 1
342
    },
343
    "text": "The encoder contains self-attention layers. In a self-attention layer all of the keys, values and queries come from the same place, in this case, the output of the previous layer in the encoder. Each position in the encoder can attend to all positions in the previous layer of the encoder.",
344
    "type": "ListItem"
345
  },
346
  {
347
    "element_id": "6abb9e18ac585710c60c06901be658dd",
348
    "metadata": {
349
      "data_source": {
350
        "permissions_data": [
351
          {
352
            "mode": 33188
353
          }
354
        ],
355
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
356
      },
357
      "filetype": "application/pdf",
358
      "languages": [
359
        "eng"
360
      ],
361
      "links": [
362
        {
363
          "start_index": 347,
364
          "text": "−∞) ofthesoftmaxwhichcorrespondtoillegalconnections . SeeFigure2",
365
          "url": "figure.2"
366
        }
367
      ],
368
      "page_number": 1
369
    },
370
    "text": "Similarly, self-attention layers in the decoder allow each position in the decoder to attend to all positions in the decoder up to and including that position. We need to prevent leftward information flow in the decoder to preserve the auto-regressive property. We implement this inside of scaled dot-product attention by masking out (setting to −∞) all values in the input of the softmax which correspond to illegal connections. See Figure 2.",
371
    "type": "ListItem"
372
  },
373
  {
374
    "element_id": "5abe45658c9277b6f0ffa3080a007d73",
375
    "metadata": {
376
      "data_source": {
377
        "permissions_data": [
378
          {
379
            "mode": 33188
380
          }
381
        ],
382
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
383
      },
384
      "filetype": "application/pdf",
385
      "languages": [
386
        "eng"
387
      ],
388
      "page_number": 1
389
    },
390
    "text": "3.3 Position-wise Feed-Forward Networks",
391
    "type": "Title"
392
  },
393
  {
394
    "element_id": "d25fe7951542a7487ea748861d01d8e8",
395
    "metadata": {
396
      "data_source": {
397
        "permissions_data": [
398
          {
399
            "mode": 33188
400
          }
401
        ],
402
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
403
      },
404
      "filetype": "application/pdf",
405
      "languages": [
406
        "eng"
407
      ],
408
      "page_number": 1
409
    },
410
    "text": "In addition to attention sub-layers, each of the layers in our encoder and decoder contains a fully connected feed-forward network, which is applied to each position separately and identically. This consists of two linear transformations with a ReLU activation in between.",
411
    "type": "NarrativeText"
412
  },
413
  {
414
    "element_id": "158730a18e567cd5d48b02e42a288838",
415
    "metadata": {
416
      "data_source": {
417
        "permissions_data": [
418
          {
419
            "mode": 33188
420
          }
421
        ],
422
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
423
      },
424
      "filetype": "application/pdf",
425
      "languages": [
426
        "eng"
427
      ],
428
      "page_number": 1
429
    },
430
    "text": "FFN(x) = max(0, xW1 + b1)W2 + b2",
431
    "type": "UncategorizedText"
432
  },
433
  {
434
    "element_id": "0e77e68ba5473d98840c3212f4a8cb80",
435
    "metadata": {
436
      "data_source": {
437
        "permissions_data": [
438
          {
439
            "mode": 33188
440
          }
441
        ],
442
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
443
      },
444
      "filetype": "application/pdf",
445
      "languages": [
446
        "eng"
447
      ],
448
      "page_number": 1
449
    },
450
    "text": "(2)",
451
    "type": "UncategorizedText"
452
  },
453
  {
454
    "element_id": "f0894df186d469df50a36f0d7022b0d7",
455
    "metadata": {
456
      "data_source": {
457
        "permissions_data": [
458
          {
459
            "mode": 33188
460
          }
461
        ],
462
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
463
      },
464
      "filetype": "application/pdf",
465
      "languages": [
466
        "eng"
467
      ],
468
      "page_number": 1
469
    },
470
    "text": "While the linear transformations are the same across different positions, they use different parameters from layer to layer. Another way of describing this is as two convolutions with kernel size 1. The dimensionality of input and output is dmodel = 512, and the inner-layer has dimensionality df f = 2048.",
471
    "type": "NarrativeText"
472
  },
473
  {
474
    "element_id": "ee3bb63cce2f927a9195c1cf7ac06417",
475
    "metadata": {
476
      "data_source": {
477
        "permissions_data": [
478
          {
479
            "mode": 33188
480
          }
481
        ],
482
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
483
      },
484
      "filetype": "application/pdf",
485
      "languages": [
486
        "eng"
487
      ],
488
      "page_number": 1
489
    },
490
    "text": "3.4 Embeddings and Softmax",
491
    "type": "Title"
492
  },
493
  {
494
    "element_id": "e0fea088e5ff762de65f5bed79ec58a3",
495
    "metadata": {
496
      "data_source": {
497
        "permissions_data": [
498
          {
499
            "mode": 33188
500
          }
501
        ],
502
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
503
      },
504
      "filetype": "application/pdf",
505
      "languages": [
506
        "eng"
507
      ],
508
      "links": [
509
        {
510
          "start_index": 439,
511
          "text": "30",
512
          "url": "cite.press2016using"
513
        }
514
      ],
515
      "page_number": 1
516
    },
517
    "text": "Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to vectors of dimension dmodel. We also use the usual learned linear transfor- mation and softmax function to convert the decoder output to predicted next-token probabilities. In our model, we share the same weight matrix between the two embedding layers and the pre-softmax dmodel. linear transformation, similar to [30]. In the embedding layers, we multiply those weights by",
518
    "type": "NarrativeText"
519
  },
520
  {
521
    "element_id": "4ba002bdd74ed597330cab2461ee5a85",
522
    "metadata": {
523
      "data_source": {
524
        "permissions_data": [
525
          {
526
            "mode": 33188
527
          }
528
        ],
529
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
530
      },
531
      "filetype": "application/pdf",
532
      "languages": [
533
        "eng"
534
      ],
535
      "page_number": 1
536
    },
537
    "text": "√",
538
    "type": "UncategorizedText"
539
  },
540
  {
541
    "element_id": "ef2d127de37b942baad06145e54b0c61",
542
    "metadata": {
543
      "data_source": {
544
        "permissions_data": [
545
          {
546
            "mode": 33188
547
          }
548
        ],
549
        "url": "/home/runner/work/unstructured/unstructured/test_unstructured_ingest/download/s3/small-pdf-set/page-with-formula.pdf"
550
      },
551
      "filetype": "application/pdf",
552
      "languages": [
553
        "eng"
554
      ],
555
      "page_number": 1
556
    },
557
    "text": "5",
558
    "type": "Footer"
559
  }
560
]

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.