Skip to content

Latest commit

 

History

History

docs

<!DOCTYPE html>
<html xmlns="https://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <meta name="author" content="Michael Cysouw" />
  <title>Using pandoc-ling</title>
  <style>
    html {
      line-height: 1.5;
      font-family: Georgia, serif;
      font-size: 20px;
      color: #1a1a1a;
      background-color: #fdfdfd;
    }
    body {
      margin: 0 auto;
      max-width: 36em;
      padding-left: 50px;
      padding-right: 50px;
      padding-top: 50px;
      padding-bottom: 50px;
      hyphens: auto;
      overflow-wrap: break-word;
      text-rendering: optimizeLegibility;
      font-kerning: normal;
    }
    @media (max-width: 600px) {
      body {
        font-size: 0.9em;
        padding: 1em;
      }
      h1 {
        font-size: 1.8em;
      }
    }
    @media print {
      body {
        background-color: transparent;
        color: black;
        font-size: 12pt;
      }
      p, h2, h3 {
        orphans: 3;
        widows: 3;
      }
      h2, h3, h4 {
        page-break-after: avoid;
      }
    }
    p {
      margin: 1em 0;
    }
    a {
      color: #1a1a1a;
    }
    a:visited {
      color: #1a1a1a;
    }
    img {
      max-width: 100%;
    }
    h1, h2, h3, h4, h5, h6 {
      margin-top: 1.4em;
    }
    h5, h6 {
      font-size: 1em;
      font-style: italic;
    }
    h6 {
      font-weight: normal;
    }
    ol, ul {
      padding-left: 1.7em;
      margin-top: 1em;
    }
    li > ol, li > ul {
      margin-top: 0;
    }
    blockquote {
      margin: 1em 0 1em 1.7em;
      padding-left: 1em;
      border-left: 2px solid #e6e6e6;
      color: #606060;
    }
    code {
      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
      font-size: 85%;
      margin: 0;
    }
    pre {
      margin: 1em 0;
      overflow: auto;
    }
    pre code {
      padding: 0;
      overflow: visible;
      overflow-wrap: normal;
    }
    .sourceCode {
     background-color: transparent;
     overflow: visible;
    }
    hr {
      background-color: #1a1a1a;
      border: none;
      height: 1px;
      margin: 1em 0;
    }
    table {
      margin: 1em 0;
      border-collapse: collapse;
      width: 100%;
      overflow-x: auto;
      display: block;
      font-variant-numeric: lining-nums tabular-nums;
    }
    table caption {
      margin-bottom: 0.75em;
    }
    tbody {
      margin-top: 0.5em;
      border-top: 1px solid #1a1a1a;
      border-bottom: 1px solid #1a1a1a;
    }
    th {
      border-top: 1px solid #1a1a1a;
      padding: 0.25em 0.5em 0.25em 0.5em;
    }
    td {
      padding: 0.125em 0.5em 0.25em 0.5em;
    }
    header {
      margin-bottom: 4em;
      text-align: center;
    }
    #TOC li {
      list-style: none;
    }
    #TOC ul {
      padding-left: 1.3em;
    }
    #TOC > ul {
      padding-left: 0;
    }
    #TOC a:not(:hover) {
      text-decoration: none;
    }
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    span.underline{text-decoration: underline;}
    div.column{display: inline-block; vertical-align: top; width: 50%;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
  </style>
  <!-- CSS added by lua-filter 'pandoc-ling' -->
  <style>
  .linguistic-example { 
    margin: 0; 
  }
  .linguistic-example caption { 
    margin-bottom: 0; 
  }
  .linguistic-example tbody { 
    border-top: none; 
    border-bottom: none;
  }
  .linguistic-example-preamble {
    height: 1em;
    vertical-align: top; 
  }
  .linguistic-example td {
    padding-left: 0;
  }
  .linguistic-example-content { 
    vertical-align: top;  
  }
  .linguistic-example-label {
    vertical-align: top;
  }
  .linguistic-example-judgement { 
    vertical-align: top; 
    padding-right: 2px;
  }
  </style>
        
  <!--[if lt IE 9]>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<header id="title-block-header">
<h1 class="title">Using pandoc-ling</h1>
<p class="author">Michael Cysouw</p>
</header>
<nav id="TOC" role="doc-toc">
<ul>
<li><a href="#pandoc-ling"><span class="toc-section-number">1</span> pandoc-ling</a></li>
<li><a href="#rationale"><span class="toc-section-number">2</span> Rationale</a></li>
<li><a href="#the-basic-structure-of-a-linguistic-example"><span class="toc-section-number">3</span> The basic structure of a linguistic example</a></li>
<li><a href="#introducing-pandoc-ling"><span class="toc-section-number">4</span> Introducing <code>pandoc-ling</code></a>
<ul>
<li><a href="#editing-linguistic-examples"><span class="toc-section-number">4.1</span> Editing linguistic examples</a></li>
<li><a href="#interlinear-examples"><span class="toc-section-number">4.2</span> Interlinear examples</a></li>
<li><a href="#cross-referencing-examples"><span class="toc-section-number">4.3</span> Cross-referencing examples</a></li>
<li><a href="#options-of-pandoc-ling"><span class="toc-section-number">4.4</span> Options of <code>pandoc-ling</code></a>
<ul>
<li><a href="#global-options"><span class="toc-section-number">4.4.1</span> Global options</a></li>
<li><a href="#local-options"><span class="toc-section-number">4.4.2</span> Local options</a></li>
</ul></li>
<li><a href="#issues-with-pandoc-ling"><span class="toc-section-number">4.5</span> Issues with <code>pandoc-ling</code></a></li>
<li><a href="#a-note-on-latex-conversion"><span class="toc-section-number">4.6</span> A note on Latex conversion</a></li>
<li><a href="#a-note-on-implementation"><span class="toc-section-number">4.7</span> A note on implementation</a></li>
</ul></li>
</ul>
</nav>
<h1 data-number="1" id="pandoc-ling"><span class="header-section-number">1</span> pandoc-ling</h1>
<p><em>Michael Cysouw</em> &lt;<a href="mailto:[email protected]" class="email">[email protected]</a>&gt;</p>
<p>A Pandoc filter for linguistic examples</p>
<p>tl;dr</p>
<ul>
<li>Easily write linguistic examples including basic interlinear glossing.</li>
<li>Let numbering and cross-referencing be done for you.</li>
<li>Export to (almost) any format of your wishes for final polishing.</li>
<li>As an example, check out this readme in <a href="https://cysouw.github.io/pandoc-ling/readme.html">HTML</a> or <a href="https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf">Latex</a>.</li>
</ul>
<h1 data-number="2" id="rationale"><span class="header-section-number">2</span> Rationale</h1>
<p>In the field of linguistics there is an outspoken tradition to format example sentences in research papers in a very specific way. In the field, it is a perennial problem to get such example sentences to look just right. Within Latex, there are numerous packages to deal with this problem (e.g. covington, linguex, gb4e, expex, etc.). Depending on your needs, there is some Latex solution for almost everyone. However, these solutions in Latex are often cumbersome to type, and they are not portable to other formats. Specifically, transfer between latex, html, docx, odt or epub would actually be highly desirable. Such transfer is the hallmark of <a href="https://pandoc.org">Pandoc</a>, a tool by John MacFarlane that provides conversion between these (and many more) formats.</p>
<p>Any such conversion between text-formats naturally never works perfectly: every text-format has specific features that are not transferable to other formats. A central goal of Pandoc (at least in my interpretation) is to define a set of shared concepts for text-structure (a ‘common denominator’ if you will, but surely not ‘least’!) that can then be mapped to other formats. In many ways, Pandoc tries (again) to define a set of logical concepts for text structure (‘semantic markup’), which can then be formatted by your favourite typesetter. As long as you stay inside the realm of this ‘common denominator’ (in practice that means Pandoc’s extended version of Markdown/CommonMark), conversion works reasonably well (think 90%-plus).</p>
<p>Building on John Gruber’s <a href="https://daringfireball.net/projects/markdown/syntax">Markdown philosophy</a>, there is a strong urge here to learn to restrain oneself while writing, and try to restrict the number of layout-possibilities to a minimum. In this sense, with <code>pandoc-ling</code> I propose a Markdown-structure for linguistic examples that is simple, easy to type, easy to read, and portable through the Pandoc universe by way of an extension mechanism of Pandoc, called a ‘Pandoc Lua Filter’. This extension will not magically allow you to write every linguistic example thinkable, but my guess is that in practice the present proposal covers the majority of situations in linguistic publications (think 90%-plus). As an example (and test case) I have included automatic conversions into various formats in this repository (chech them out in the directory <code>tests</code> to get an idea of the strengths and weaknesses of the current implementation).</p>
<h1 data-number="3" id="the-basic-structure-of-a-linguistic-example"><span class="header-section-number">3</span> The basic structure of a linguistic example</h1>
<p>Basically, a linguistic example consists of 6 possible building blocks, of which only the number and at least one example line are necessary. The space between the building blocks is kept as minimal as possible without becoming cramped. When (optional) building blocks are not included, then the other blocks shift left and up (only exception: a preamble without labels is not shifted left completely, but left-aligned with the example, not with the judgement).</p>
<ul>
<li><strong>Number</strong>: Running tally of all examples in the work, possibly restarting at chapters or other major headings. Typically between round brackets, possibly with a chapter number added before in long works, e.g. example (7.26). Aligned top-left, typically left-aligned to main text margin.</li>
<li><strong>Preamble</strong>: Optional information about the content/kind of example. Aligned top-left: to the top with the number, to the left with the (optional) label. When there is no label, then preamble is aligned with the example, not with the judgment.</li>
<li><strong>Label</strong>: Indices for sub-examples. Only present when there are more than one example grouped together inside one numbered entity. Typically these sub-example labels use latin letters followed by a full stop. They are left-aligned with the preamble, and each label is top-aligned with the top-line of the corresponding example (important for longer line-wrapped examples).</li>
<li><strong>Judgment</strong>: Examples can optionally have grammaticality judgments, typically symbols like **?!* sometimes in superscript relative to the corresponding example. judgements are right-aligned to each other, typically with only minimal space to the left-aligned examples.</li>
<li><strong>Line example</strong>: A minimal linguistic example has at least one line example, i.e. an utterance of interest. Building blocks in general shift left and up when other (optional) building blocks are not present. Minimally, this results in a number with one line example.</li>
<li><strong>Interlinear example</strong>: A complex structure typically used for examples from languages unknown to most readers. Consist of three or four lines that are left-aligned:
<ul>
<li><strong>Header</strong>: An optional header is typically used to display information about the language of the example, including literature references. When not present, then all other lines from the interlinear example shift upwards.</li>
<li><strong>Source</strong>: The actual language utterance, often typeset in italics. This line is internally separated at spaces, and each sub-block is left-aligned with the corresponding sub-blocks of the gloss.</li>
<li><strong>Gloss</strong>: Explanation of the meaning of the source, often using abbreviations in small caps. This line is internally separated at spaces, and each block is left-aligned with the block from source.</li>
<li><strong>Translation</strong>: Free translation of the source, typically quoted. Not separated in blocks, but freely extending to the right. Left-aligned with the other lines from the interlinear example.</li>
</ul></li>
</ul>
<figure>
<img src="figure/ExampleStructure.png" alt="The structure of a linguistic example." />
<figcaption aria-hidden="true">The structure of a linguistic example.</figcaption>
</figure>
<p>There are of course much more possibilities to extend the structure of a linguistic examples, like third or fourth subdivisions of labels (often using small roman numerals as a third level) or multiple glossing lines in the interlinear example. Also, the content of the header is sometimes found right-aligned to the right of the interlinear example (language into to the top, reference to the bottom). All such options are currently not supported by <code>pandoc-ling</code>.</p>
<p>Under the hood, this structure is prepared by <code>pandoc-ling</code> as a table. Tables are reasonably well transcoded to different document formats. Specific layout considerations mostly have to be set manually. Alignment of the text should work in most exports. Some <code>CSS</code> styling is proposed by <code>pandoc-ling</code>, but can of course be overruled. For latex (and beamer) special output is prepared using various available latex packages (see options, below).</p>
<h1 data-number="4" id="introducing-pandoc-ling"><span class="header-section-number">4</span> Introducing <code>pandoc-ling</code></h1>
<h2 data-number="4.1" id="editing-linguistic-examples"><span class="header-section-number">4.1</span> Editing linguistic examples</h2>
<p>To include a linguistic example in Markdown <code>pandoc-ling</code> uses the <code>div</code> structure, which is indicated in Pandoc-Markdown by typing three colons at the start and three colons at the end. To indicate the <code>class</code> of this <code>div</code> the letters ‘ex’ (for ‘example’) should be added after the top colons (with or without space in between). This ‘ex’-class is the signal for <code>pandoc-ling</code> to start processing such a <code>div</code>. The numbering of these examples will be inserted by <code>pandoc-ling</code>.</p>
<p>Empty lines can be added inside the <code>div</code> for visual pleasure, as they mostly do not have an influence on the output. Exception: do <em>not</em> use empty lines between unlabelled line examples. Multiple lines of text can be used (without empty lines in between), but they will simply be interpreted as one sequential paragraph.</p>
<pre><code>::: ex
This is the most basic structure of a linguistic example. 
:::</code></pre>

<div id="ex4.1">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.1)</td>
<td class="linguistic-example-content" style="text-align: left;">This is the most basic structure of a linguistic example.</td>
</tr>
</tbody>
</table>
</div>
<p>Alternatively, the <code>class</code> can be put in curled brackets (and then a leading full stop is necessary before <code>ex</code>). Inside these brackets more attributes can be added (separated by space), for example an id, using a hash, or any attribute=value pairs that should apply to this example. Currently there is only one real attribute implemented (<code>formatGloss</code>), but in principle it is possible to add more attributes that can be used to fine-tune the typesetting of the example (see below for a description of such <code>local options</code>).</p>
<pre><code>::: {#id .ex formatGloss=false}

This is a multi-line example.
But that does not mean anything for the result
All these lines are simply treated as one paragraph.
They will become one example with one number.

:::</code></pre>

<div id="ex4.2">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.2)</td>
<td class="linguistic-example-content" style="text-align: left;">This is a multi-line example.
But that does not mean anything for the result
All these lines are simply treated as one paragraph.
They will become one example with one number.</td>
</tr>
</tbody>
</table>
</div>
<p>A preamble can be added by inserting an empty line between preamble and example. The same considerations about multiple text-lines apply.</p>
<pre><code>:::ex
Preamble

This is an example with a preamble.
:::</code></pre>

<div id="ex4.3">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.3)</td>
<td class="linguistic-example-preamble" style="text-align: left;">Preamble</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-content" style="text-align: left;">This is an example with a preamble.</td>
</tr>
</tbody>
</table>
</div>
<p>Sub-examples with labels are entered by starting each sub-example with a small latin letter and a full stop. Empty lines between labels are allowed. Subsequent lines without labels are treated as one paragraph. Empty lines <em>not</em> followed by a label with a full stop will result in errors.</p>
<pre><code>:::ex
a. This is the first example.
b. This is the second.
a. The actual letters are not important, `pandoc-ling` will put them in order.

e. Empty lines are allowed between labelled lines
Subsequent lines are again treated as one sequential paragraph.
:::</code></pre>

<div id="ex4.4">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.4)</td>
<td class="linguistic-example-label" style="text-align: left;">a.</td>
<td class="linguistic-example-content" style="text-align: left;">This is the first example.</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">b.</td>
<td class="linguistic-example-content" style="text-align: left;">This is the second.</td>
</tr>
<tr class="odd">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">c.</td>
<td class="linguistic-example-content" style="text-align: left;">The actual letters are not important, <code>pandoc-ling</code> will put them in order.</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">d.</td>
<td class="linguistic-example-content" style="text-align: left;">Empty lines are allowed between labelled lines
Subsequent lines are again treated as one sequential paragraph.</td>
</tr>
</tbody>
</table>
</div>
<p>A labelled list can be combined with a preamble.</p>
<pre><code>:::ex
Any nice description here

a. one example sentence.
b. two
c. three
:::</code></pre>

<div id="ex4.5">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.5)</td>
<td colspan="2" class="linguistic-example-preamble" style="text-align: left;">Any nice description here</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">a.</td>
<td class="linguistic-example-content" style="text-align: left;">one example sentence.</td>
</tr>
<tr class="odd">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">b.</td>
<td class="linguistic-example-content" style="text-align: left;">two</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">c.</td>
<td class="linguistic-example-content" style="text-align: left;">three</td>
</tr>
</tbody>
</table>
</div>
<p>Grammaticality judgements should be added before an example, and after an optional label, separated from both by spaces (though four spaces in a row should be avoided, that could lead to layout errors). To indicate that any sequence of symbols is a judgements, prepend the judgement with a caret <code>^</code>. Alignment will be figured out by <code>pandoc-ling</code>.</p>
<pre><code>:::ex
Throwing in a preamble for good measure

a. ^* This traditionally signals ungrammaticality.
b. ^? Question-marks indicate questionable grammaticality.
c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
d. However, such long sequences sometimes lead to undesirable effects in the layout.
:::</code></pre>

<div id="ex4.6">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.6)</td>
<td colspan="3" class="linguistic-example-preamble" style="text-align: left;">Throwing in a preamble for good measure</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">a.</td>
<td class="linguistic-example-judgement" style="text-align: right;">*</td>
<td class="linguistic-example-content" style="text-align: left;">This traditionally signals ungrammaticality.</td>
</tr>
<tr class="odd">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">b.</td>
<td class="linguistic-example-judgement" style="text-align: right;">?</td>
<td class="linguistic-example-content" style="text-align: left;">Question-marks indicate questionable grammaticality.</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">c.</td>
<td class="linguistic-example-judgement" style="text-align: right;"><sup>whynot?</sup></td>
<td class="linguistic-example-content" style="text-align: left;">But in principle any sequence can be used (here even in superscript).</td>
</tr>
<tr class="odd">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">d.</td>
<td class="linguistic-example-judgement" style="text-align: right;"></td>
<td class="linguistic-example-content" style="text-align: left;">However, such long sequences sometimes lead to undesirable effects in the layout.</td>
</tr>
</tbody>
</table>
</div>
<p>A minor detail is the alignment of a single example with a preamble and grammaticality judgements. In this case it looks better for the preamble to be left aligned with the example and not with the judgement.</p>
<pre><code>:::ex
Here is a special case with a preamble

^^???^ With a singly questionably example.
Note the alignment! Especially with this very long example
that should go over various lines in the output.
:::</code></pre>

<div id="ex4.7">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.7)</td>
<td style="text-align: right;"></td>
<td class="linguistic-example-preamble" style="text-align: left;">Here is a special case with a preamble</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-judgement" style="text-align: right;"><sup>???</sup></td>
<td class="linguistic-example-content" style="text-align: left;">With a singly questionably example.
Note the alignment! Especially with this very long example
that should go over various lines in the output.</td>
</tr>
</tbody>
</table>
</div>
<p>For the lazy writers among us, it is also possible to use a simple bullet list instead of a labelled list. Note that the listed elements will still be formatted as a labelled list.</p>
<pre><code>:::ex
- This is a lazy example.
- ^# It should return letters at the start just as before.
- ^% Also testing some unusual judgements.
:::</code></pre>

<div id="ex4.8">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.8)</td>
<td class="linguistic-example-label" style="text-align: left;">a.</td>
<td class="linguistic-example-judgement" style="text-align: right;"></td>
<td class="linguistic-example-content" style="text-align: left;">This is a lazy example.</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">b.</td>
<td class="linguistic-example-judgement" style="text-align: right;">#</td>
<td class="linguistic-example-content" style="text-align: left;">It should return letters at the start just as before.</td>
</tr>
<tr class="odd">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">c.</td>
<td class="linguistic-example-judgement" style="text-align: right;">%</td>
<td class="linguistic-example-content" style="text-align: left;">Also testing some unusual judgements.</td>
</tr>
</tbody>
</table>
</div>
<p>Just for testing: a single example with a judgement (which resulted in an error in earlier versions).</p>
<pre><code>::: ex
^* This traditionally signals ungrammaticality.
:::</code></pre>

<div id="ex4.9">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.9)</td>
<td class="linguistic-example-judgement" style="text-align: right;">*</td>
<td class="linguistic-example-content" style="text-align: left;">This traditionally signals ungrammaticality.</td>
</tr>
</tbody>
</table>
</div>
<h2 data-number="4.2" id="interlinear-examples"><span class="header-section-number">4.2</span> Interlinear examples</h2>
<p>For interlinear examples with aligned source and gloss, the structure of a <code>lineblock</code> is used, starting the lines with a vertical line <code>|</code>. There should always be four vertical lines (for header, source, gloss and translation, respectively), although the content after the first vertical line can be empty. The source and gloss lines are separated at spaces, and all parts are right-aligned. If you want to have a space that is not separated, you will have to ‘protect’ the space, either by putting a backslash before the space, or by inserting a non-breaking space instead of a normal space (either type <code>&amp;nbsp;</code> or insert an actual non-breaking space, i.e. unicode character <code>U+00A0</code>).</p>
<pre><code>:::ex
| Dutch (Germanic)
| Deze zin is in het nederlands.
| DEM sentence AUX in DET dutch.
| This sentence is dutch.
:::</code></pre>

<div id="ex4.10">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.10)</td>
<td colspan="6" class="linguistic-example-header linguistic-example-content" style="text-align: left;">Dutch (Germanic)</td>
</tr>
<tr class="even">
<td></td>
<td style="text-align: left;">Deze</td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;">zin</td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;">is</td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;">in</td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;">het</td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;">nederlands.</td>
</tr>
<tr class="odd">
<td></td>
<td style="text-align: left;">DEM</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">sentence</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">AUX</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">in</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">DET</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">dutch.</td>
</tr>
<tr class="even">
<td></td>
<td colspan="6" class="linguistic-example-translation linguistic-example-content" style="text-align: left;">This sentence is dutch.</td>
</tr>
</tbody>
</table>
</div>
<p>An attempt is made to format interlinear examples when the option <code>formatGloss=true</code> is added. This will:</p>
<ul>
<li>remove formatting from the source and set everything in italics,</li>
<li>remove formatting from the gloss and set sequences (&gt;1) of capitals and numbers into small caps (note that the positioning of small caps on web pages is <a href="https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align">highly complex</a>),</li>
<li>a tilde <code>~</code> between spaces in the gloss is treated as a shortcut for an empty gloss (internally, the sequence <code>space-tilde-space</code> is replaced by <code>space-space-nonBreakingSpace-space-space</code>),</li>
<li>consistently put translations in single quotes, possibly removing other quotes.</li>
</ul>
<!-- -->
<pre><code>::: {.ex formatGloss=true}
| Dutch (Germanic)
| Is deze zin in het nederlands ?
| AUX DEM sentence in DET dutch Q
| Is this sentence dutch?
:::</code></pre>

<div id="ex4.11">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.11)</td>
<td colspan="7" class="linguistic-example-header linguistic-example-content" style="text-align: left;">Dutch (Germanic)</td>
</tr>
<tr class="even">
<td></td>
<td style="text-align: left;"><em>Is</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>deze</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>zin</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>in</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>het</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>nederlands</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>?</em></td>
</tr>
<tr class="odd">
<td></td>
<td style="text-align: left;"><span class="smallcaps">aux</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;"><span class="smallcaps">dem</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">sentence</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">in</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;"><span class="smallcaps">det</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">dutch</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;"><span class="smallcaps">q</span></td>
</tr>
<tr class="even">
<td></td>
<td colspan="7" class="linguistic-example-translation linguistic-example-content" style="text-align: left;">‘Is this sentence dutch?’</td>
</tr>
</tbody>
</table>
</div>
<p>The results of such formatting will not always work, but it seems to be quite robust in my testing. The next example brings everything together:</p>
<ul>
<li>a preamble,</li>
<li>labels, both for single lines and for interlinear examples,</li>
<li>interlinear examples start on a new line immediately after the letter-label,</li>
<li>grammaticality judgements with proper alignment,</li>
<li>when the header of an interlinear example is left out, everything is shifted up,</li>
<li>The formatting of the interlinear is harmonised.</li>
</ul>
<!-- -->
<pre><code>::: {.ex formatGloss=true samePage=false}
Completely superfluous preamble, but it works ...

a. Mixing single line examples with interlinear examples.
a. This is of course highly unusal.
Just for this example, let&#39;s add some extra material in this example.

a.
| Dutch (Germanic) Note the grammaticality judgement!
| ^^:–)^ Deze zin is (dit\ is&amp;nbsp;test) nederlands.
| DEM sentence AUX ~ dutch.
| This sentence is dutch.

b.
|
| Deze tweede zin heeft geen header.
| DEM second sentence have.3SG.PRES no header.
| This second sentence does not have a header.
:::</code></pre>

<div id="ex4.12">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.12)</td>
<td colspan="3" class="linguistic-example-preamble" style="text-align: left;">Completely superfluous preamble, but it works …</td>
</tr>
<tr class="even">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">a.</td>
<td style="text-align: right;"></td>
<td class="linguistic-example-content" style="text-align: left;">Mixing single line examples with interlinear examples.</td>
</tr>
<tr class="odd">
<td></td>
<td class="linguistic-example-label" style="text-align: left;">b.</td>
<td class="linguistic-example-judgement" style="text-align: right;"></td>
<td class="linguistic-example-content" style="text-align: left;">This is of course highly unusal.
Just for this example, let’s add some extra material in this example.</td>
</tr>
</tbody>
</table>
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td>          </td>
<td class="linguistic-example-label" style="text-align: left;">c.</td>
<td style="text-align: right;"></td>
<td colspan="5" class="linguistic-example-header linguistic-example-content" style="text-align: left;">Dutch (Germanic) Note the grammaticality judgement!</td>
</tr>
<tr class="even">
<td></td>
<td style="text-align: left;"></td>
<td style="text-align: right;"><sup>:–)</sup></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>Deze</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>zin</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>is</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>(dit is test)</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>nederlands.</em></td>
</tr>
<tr class="odd">
<td></td>
<td style="text-align: left;"></td>
<td style="text-align: right;"></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;"><span class="smallcaps">dem</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">sentence</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;"><span class="smallcaps">aux</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">   </td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">dutch.</td>
</tr>
<tr class="even">
<td></td>
<td style="text-align: left;"></td>
<td style="text-align: right;"></td>
<td colspan="5" class="linguistic-example-translation linguistic-example-content" style="text-align: left;">‘This sentence is dutch.’</td>
</tr>
</tbody>
</table>
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td>          </td>
<td class="linguistic-example-label" style="text-align: left;">d.</td>
<td class="linguistic-example-judgement" style="text-align: right;"></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>Deze</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>tweede</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>zin</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>heeft</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>geen</em></td>
<td class="linguistic-example-source linguistic-example-content" style="text-align: left;"><em>header.</em></td>
</tr>
<tr class="even">
<td></td>
<td style="text-align: left;"></td>
<td style="text-align: right;"></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;"><span class="smallcaps">dem</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">second</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">sentence</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">have.<span class="smallcaps">3sg</span>.<span class="smallcaps">pres</span></td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">no</td>
<td class="linguistic-example-gloss linguistic-example-content" style="text-align: left;">header.</td>
</tr>
<tr class="odd">
<td></td>
<td style="text-align: left;"></td>
<td style="text-align: right;"></td>
<td colspan="6" class="linguistic-example-translation linguistic-example-content" style="text-align: left;">‘This second sentence does not have a header.’</td>
</tr>
</tbody>
</table>
</div>
<h2 data-number="4.3" id="cross-referencing-examples"><span class="header-section-number">4.3</span> Cross-referencing examples</h2>
<p>The examples are automatically numbered by <code>pandoc-ling</code>. Cross-references to examples inside a document can be made by using the <code>[@ID]</code> format (used by Pandoc for citations). When an example has an explicit identifier (like <code>#test</code> in the next example), then a reference can be made to this example with <code>[@test]</code>, leading to <a href="#test">(4.13)</a> when formatted (note that the formatting does not work on the github website. Please check the ‘docs’ subdirectory).</p>
<pre><code>::: {#test .ex}
This is a test
:::</code></pre>

<div id="ex4.13">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: top;">(4.13)</td>
<td class="linguistic-example-content" style="text-align: left;">This is a test</td>
</tr>
</tbody>
</table>
</div>
<p>Inspired by the <code>linguex</code>-approach, you can also use the keywords <code>next</code> or <code>last</code> to refer to the next or the last example, e.g. <code>[@last]</code> will be formatted as <a href="#test">(4.13)</a>. By doubling the first letters to <code>nnext</code> or <code>llast</code> reference to the next/last-but-one can be made. Actually, the number of starting letters can be repeated at will in <code>pandoc-ling</code>, so something like <code>[@llllllllast]</code> will also work. It will be formatted as <a href="#ex4.6">(4.6)</a> after the processing of <code>pandoc-ling</code>. Needless to say that in such a situation an explicit identifier would be a better choice.</p>
<p>Referring to sub-examples can be done by manually adding a suffix into the cross reference, simply separated from the identifier by a space. For example, <code>[@lllast c]</code> will refer to the third sub-example of the last-but-two example. Formatted this will look like this: <a href="#ex4.12">(4.12 c)</a>, smile! However, note that the “c” has to be manually determined. It is simply a literal suffix that will be copied into the cross-reference. Something like <code>[@last hA1l0]</code> will work also, leading to <a href="#test">(4.13 hA1l0)</a> when formatted (which is of course nonsensical).</p>
<p>For exports that include attributes (like html), the examples have an explicit id of the form <code>exNUMBER</code> in which <code>NUMBER</code> is the actual number as given in the formatted output. This means that it is possible to refer to an example on any web-page by using the hash-mechanism to refer to a part of the web-page. For example <code>#ex4.7</code> at can be used to refer to the seventh example in the html-output of this readme (try <a href="https://cysouw.github.io/pandoc-ling/readme.html#ex4.7">this link</a>). The id in this example has a chapter number ‘4’ because in the html conversion I have set the option <code>addChapterNumber</code> to <code>true</code>. (Note: when numbers restart the count in each chapter with the option <code>restartAtChapter</code>, then the id is of the form <code>exCHAPTER.NUMBER</code>. This is necessary to resolve clashing ids, as the same number might then be used in different chapters.)</p>
<p>I propose to use these ids also to refer to examples in citations when writing scholarly papers, e.g. (Cysouw 2021: #ex7), independent of whether the links actually resolve. In principle, such citations could easily be resolved when online publications are properly prepared. The same proposal could also work for other parts of research papers, for example using tags like <code>#sec, #fig, #tab, #eq</code> (see the Pandoc filter <a href="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/cysouw/crossref-adapt"><code>crossref-adapt</code></a>). To refer to paragraphs (which should replace page numbers in a future of adaptive design), I propose to use no tag, but directly add the number to the hash (see the Pandoc filter <a href="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/cysouw/count-para"><code>count-para</code></a> for a practical mechanism to add such numbering).</p>
<h2 data-number="4.4" id="options-of-pandoc-ling"><span class="header-section-number">4.4</span> Options of <code>pandoc-ling</code></h2>
<h3 data-number="4.4.1" id="global-options"><span class="header-section-number">4.4.1</span> Global options</h3>
<p>The following global options are available with <code>pandoc-ling</code>. These can be added to the <a href="https://pandoc.org/MANUAL.html#metadata-blocks">Pandoc metadata</a>. An example of such metadata can be found at the bottom of this <code>readme</code> in the form of a YAML-block. Pandoc allows for various methods to provide metadata (see the link above).</p>
<ul>
<li><strong><code>formatGloss</code></strong> (boolean, default <code>false</code>): should all interlinear examples be consistently formatted? If you use this option, you can simply use capital letters for abbreviations in the gloss, and they will be changed to small caps. The source line is set to italics, and the translations is put into single quotes.</li>
<li><strong><code>samePage</code></strong> (boolean, default <code>true</code>, only for Latex): should examples be kept together on the same page? Can also be overriden for individual examples by adding <code>{.ex samePage=false}</code> at the start of an example (cf. below on <code>local options</code>).</li>
<li><strong><code>xrefSuffixSep</code></strong> (string, defaults to no-break-space): When cross references have a suffix, how should the separator be formatted? The defaults ‘no-break-space’ is a safe options. I personally like a ‘narrow no-break space’ better (Unicode <code>U+202F</code>), but this symbol does not work with all fonts, and might thus lead to errors. For Latex typesetting, all space-like symbols are converted to a Latex thin space <code>\,</code>.</li>
<li><strong><code>restartAtChapter</code></strong> (boolean, default <code>false</code>): should the counting restart for each chapter?
<ul>
<li>Actually, when <code>true</code> this setting will restart the counting at the highest heading level, which for various output formats can be set by the Pandoc option <code>top-level-division</code>.</li>
<li>The id of each example will now be of the form <code>exCHAPTER.NUMBER</code> to resolve any clashes when the same number appears in different chapter.</li>
<li>Depending on your Latex setup, an explicit entry <code>top-level-division: chapter</code> might be necessary in your metadata.</li>
</ul></li>
<li><strong><code>addChapterNumber</code></strong> (boolean, default <code>false</code>): should the chapter (= highest heading level) number be added to the number of the example? When setting this to <code>true</code> any setting of <code>restartAtChapter</code> will be ignored. In most Latex situations this only works in combination with a <code>documentclass: book</code>.</li>
<li><strong><code>latexPackage</code></strong> (one of: <code>linguex</code>, <code>gb4e</code>, <code>langsci-gb4e</code>, <code>expex</code>, default <code>linguex</code>): Various options for converting examples to Latex packages that typeset linguistic examples. None of the conversions works perfectly, though in should work in most normal situations (think 90%-plus). It might be necessary to first convert to <code>Latex</code>, correct the output, and then typeset separately with a latex compiler like <code>xelatex</code>. Using the direct option insider Pandoc might also work in many situations. Export to <strong><code>beamer</code></strong> seems to work reasonably well with the <code>gb4e</code> package. All others have artefacts or errors.</li>
</ul>
<h3 data-number="4.4.2" id="local-options"><span class="header-section-number">4.4.2</span> Local options</h3>
<p>Local options are options that can be set for each individual example. The <code>formatGloss</code> option can be used to have an individual example be formatted differently from the global setting. For example, when the global setting is <code>formatGloss: true</code> in the metadata, then adding <code>formatGloss=false</code> in the curly brackets of a specific example will block the formatting. This is especially useful when the automatic formatting does not give the desired result.</p>
<p>If you want to add something else (not a linguistic example) in a numbered example, then there is the local option <code>noFormat=true</code>. An attempt will be made to try and do a reasonable layout. Multiple paragraphs will simply we taken as is, and the number will be put in front. In HTML the number will be centred. It is usable for an incidental mathematical formula.</p>
<pre><code>::: {.ex noFormat=true}
$$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$
:::</code></pre>

<div id="ex4.14">
<table class="linguistic-example">
<tbody>
<tr class="odd">
<td class="linguistic-example-number" style="vertical-align: middle;">(4.14)</td>
<td class="linguistic-example-content" style="text-align: left;"><div class="ex" data-noFormat="true">
<p><math display="block" xmlns="https://www.w3.org/1998/Math/MathML"><semantics><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mi>i</mi><mo>=</mo><mfrac><mrow><msup><mi>n</mi><mn>2</mn></msup><mo>−</mo><mi>n</mi></mrow><mn>2</mn></mfrac></mrow><annotation encoding="application/x-tex">\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}</annotation></semantics></math></p>
</div></td>
</tr>
</tbody>
</table>
</div>
<h2 data-number="4.5" id="issues-with-pandoc-ling"><span class="header-section-number">4.5</span> Issues with <code>pandoc-ling</code></h2>
<ul>
<li>Manually provided identifiers for examples should not be purely numerical (so do not use e.g. <code>#5789</code>). In some situation this interferes with the setting of the cross-references.</li>
<li>Because the cross-references use the same structure as citations in Pandoc, the processing of citations (by <code>citeproc</code>) should be performed <strong>after</strong> the processing by <code>pandoc-ling</code>. Another Pandoc filter, <a href="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/lierdakil/pandoc-crossref"><code>pandoc-crossref</code></a>, for numbering figures and other captions, also uses the same system. There seems to be no conflict between <code>pandoc-ling</code> and <code>pandoc-crossref</code>.</li>
<li>Interlinear examples will will not wrap at the end of the page. There is no solution yet for longer examples that are longer than the size of the page.</li>
<li>It is not (yet) possible to have more than one glossing line.</li>
<li>When exporting to <code>docx</code> there is a problem because there are paragraphs inserted after tables, which adds space in lists with multiple interlinear examples (except when they have exactly the same number of columns). This is <a href="https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262">by design</a>. The official solution is to set font-size to 1 for this paragraph inside MS Word.</li>
<li>Multi-column cells are crucial for <code>pandoc-ling</code> to work properly. These are only introduced in new table format with Pandoc 2.10 (so older Pandoc version are not supported). Also note that these structures are not yet exported to all formats, e.g. it will not be displayed correctly in <code>docx</code>. However, this is currently an area of active development</li>
<li><code>langsci-gb4e</code> is only available as part of the <a href="https://ctan.org/pkg/langsci?lang=en"><code>langsci</code> package</a>. You have to make it available to Pandoc, e.g. by adding it into the same directory as the pandoc-ling.lua filter. I have added a recent version of <code>langsci-gb4e</code> here for convenience, but this one might be outdated at some time in the future.</li>
<li><code>beamer</code> output seems to work best with <code>latexPackage: gb4e</code>.</li>
</ul>
<h2 data-number="4.6" id="a-note-on-latex-conversion"><span class="header-section-number">4.6</span> A note on Latex conversion</h2>
<p>Originally, I decided to write this filter as a two-pronged conversion, making a markdown version myself, but using a mapping to one of the many latex libraries for linguistics examples as a quick fix. I assumed that such a mapping would be the easy part. However, it turned out that the mapping to latex was much more difficult that I anticipated. Basically, it turned out that the ‘common denominator’ that I was aiming for was not necessarily the ‘common denominator’ provided by the latex packages. I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and expex) with growing dismay. This approach resulted in a first version. However, after this version was (more or less) finished, I realised that it would be better to first define the ‘common denominator’ more clearly (as done here), and then implement this purely in Pandoc. From that basis I have then made attempts to map them to the various latex packages.</p>
<h2 data-number="4.7" id="a-note-on-implementation"><span class="header-section-number">4.7</span> A note on implementation</h2>
<p>The basic structure of the examples are transformed into Pandoc tables. Tables are reasonably safe for converting in other formats. Care has been taken to add <code>classes</code> to all elements of the tables (e.g. the preamble has the class <code>linguistic-example-preamble</code>). When exported formats are aware of these classes, they can be used to fine-tune the formatting. I have used a few such fine-tunings into the html output of this filter by adding a few CSS-style statements. The naming of the classes is quite transparent, using the form <code>linguistic-example-STRUCTURE</code>.</p>
<p>The whole table is encapsulated in a <code>div</code> with class <code>ex</code> and an id of the form <code>exNUMBER</code>. This means that an example can be directly referred to in web-links by using the hash-mechanism. For example, adding <code>#ex3</code> to the end of a link will immediately jump to this example in a browser.</p>
<p>The current implementation is completely independent from the <a href="https://pandoc.org/MANUAL.html#numbered-example-lists">Pandoc numbered examples implementation</a> and both can work side by side, like (2):</p>
<ol type="1">
<li><p>These are native Pandoc numbered examples</p></li>
<li><p>They are independent of <code>pandoc-ling</code> but use the same output formatting in many default exports, like latex.</p></li>
</ol>
<p>However, in practice various output-formats of Pandoc (e.g. latex) also use numbers in round brackets for these, so in practice it might be confusing to combine both.</p>
</body>
</html>