Fahhem's Blog - Wikipediahttps://blog.fahhem.com/2013-03-01T00:00:00-08:00An intermittent post of thoughtsAn intermittent post of thoughtsWikipedia "templates" Part 22013-03-01T00:00:00-08:002013-03-01T00:00:00-08:00fahhemtag:blog.fahhem.com,2013-03-01:/2013/03/wikipedia-templates-part-2/<p>I'll start this adventure off with the basics, what one would expect in
a template language for an online encyclopedia. We'll be skipping all
the text-formatting and layout markup since they're well-documented and
very straight forward. You can find it on the <a href="http://en.wikipedia.org/wiki/Help:Wiki_markup" title="Wiki markup">Help:Wiki
markup</a> page,
I'll be skipping the …<a class="read-more" href="/2013/03/wikipedia-templates-part-2/"><span>continue</span></a></p><p>I'll start this adventure off with the basics, what one would expect in
a template language for an online encyclopedia. We'll be skipping all
the text-formatting and layout markup since they're well-documented and
very straight forward. You can find it on the <a href="http://en.wikipedia.org/wiki/Help:Wiki_markup" title="Wiki markup">Help:Wiki
markup</a> page,
I'll be skipping the first two sections Layout and Format. Note that
those two sections are half that entire page, despite the page having 11
other sections to cover.<!--more--></p>
<p>Firstly, most non-text-formatting/layout syntax is essentially a series
of tokens surrounded symmetrically by some number of {, [, |, + and -
with | or : separating internal tokens when appropriate. Text is
anything not between these characters.</p>
<h2>Links</h2>
<p>The first syntax will be that of links; it's the only one that uses
square brackets, [, and it essentially has two forms, single and double
bracketed, that are combined with parameters and namespaces to handle
dozens of uses. The single bracketed link is simply for external links,
and can have a name set or not. The below wikicode renders as
<a href="http://example.com/">Example</a>:</p>
<div class="highlight"><pre><span></span><code><span class="err">[http://example.com/ Example]</span>
</code></pre></div>
<p>If there was no second word, it would have rendered as
<a href="http://example.com/">http://example.com/</a>.</p>
<h3>Internal links</h3>
<p>These are delimited by double square brackets and use the | for setting
the name; the target can be to a whole slew of things, depending on the
namespace used, which precedes a : in the target.</p>
<p>For the most part, these links are to other wikipedia articles, so if
they're of the following formats that's what they are:</p>
<div class="highlight"><pre><span></span><code><span class="k">[[target]]</span>
<span class="k">[[target#section]]</span>
<span class="k">[[target|name]]</span>
<span class="k">[[target#section|name]]</span>
</code></pre></div>
<p>However, the following formats are for links with namespaces, which can
be to various other types:</p>
<div class="highlight"><pre><span></span><code><span class="k">[[namespace:target]]</span>
<span class="k">[[namespace:target|name]]</span>
</code></pre></div>
<p>If the namespace is one of the current 26 <a href="http://en.wikipedia.org/wiki/Help:Namespace">Wikipedia
namespaces</a>, then these are
equivalent to the normal article links, except the article titles
include :. This is because, in Wikipedia's database, those articles with
a namespace are stored next to articles with their namespace intact. It
could also be an 'Interwiki' link which could be one of the <a href="http://en.wikipedia.org/wiki/Wikipedia:Wikimedia_sister_projects#Linking_between_projects">Wikimedia
projects</a>
or one of the many (and constantly growing) <a href="http://meta.wikimedia.org/wiki/Interwiki_map">other
wikis</a> known to Wikipedia.</p>
<p>Category namespaces are a little different; if it's like a namespaced
link with the namespace 'Category', then it means the current article is
in that category. However, if it looks like the next version, then it's
actually a link to that category's page:</p>
<div class="highlight"><pre><span></span><code><span class="err">[[:Category:category_name]]</span>
</code></pre></div>
<h2>Images</h2>
<p>These syntactically look just like links, but have a File: namespace.
There are many options allowed through the | character, as if they're
templates, but images still use the [[]] syntax. All the options are
laid out in the <a href="http://en.wikipedia.org/wiki/Wikipedia:Picture_tutorial">picture
tutorial</a> on
Wikipedia, so I'm not going to go over them here, just that this is the
general syntax for an image:</p>
<div class="highlight"><pre><span></span><code><span class="err">[[File:image_filename.type|parameter1|parameter2|caption]]</span>
</code></pre></div>
<p>However, some templates exist for adding layers around images, but
they're normally a ton of boilerplate around an [[File:...]] link.</p>
<p>That's all for this round, I'll be explaining templates/transclusion in
the next part. And after that, explaining how it all comes together to
give anybody trying to parse it all, a headache.</p>Wikipedia "templates" Part 12013-02-25T00:00:00-08:002013-02-25T00:00:00-08:00fahhemtag:blog.fahhem.com,2013-02-25:/2013/02/wikipedia-templates-part-1/<p>Wikipedia templates seem to have grown organically, and that's the
nicest way to describe it. They look simple at first glance, just a
bunch of squiggly brackets to denote everything, how bad can it be?</p>
<p>In this multi-part series, I'll be writing about the complicated bits of
the wikipedia template …<a class="read-more" href="/2013/02/wikipedia-templates-part-1/"><span>continue</span></a></p><p>Wikipedia templates seem to have grown organically, and that's the
nicest way to describe it. They look simple at first glance, just a
bunch of squiggly brackets to denote everything, how bad can it be?</p>
<p>In this multi-part series, I'll be writing about the complicated bits of
the wikipedia template syntax.<!--more--> I'll start off explaining how
they look like, with all their variances included, combine them to see
how day-to-day articles are really torturous bundles of thousands of
squiggly and square brackets, lumped with pipes, colons, and pounds to
make parsing it all feel like a drive down a pothole-laden street with
bicyclists and pedestrians making sudden appearances and hour-long
street lights followed by miles of empty road with a stop sign every
hundred feet for good measure.</p>
<p>At the end, I'll try to provide an AST for parsing Wikipedia articles
into a sane structure to render from. Looking at many of the Wikipedia
parsing engines, I've noticed they're mainly built to parse/understand
only some of the syntax, and that most are rather haphazardly built to
accept the ever-evolving Wikipedia template syntax. To fight this
code-organization problem, one practice is to parse the source syntax
into an AST (abstract syntax tree) as an intermediary, in-memory format,
and then to convert that AST into the format you expect, whether it's
HTML for a website, a PDF for a printable version, or LaTeX because
you're a grad student with an insane professor. There is a
performance-loss due to the multiple passes required, but you'll see
that's not really a concern with Wikipedia template parsing after this
series; and the programmer performance gain is rather large, you can
sustainably add support for new syntax to the parser and renderers
without needing to do them in tandem, so you can have multiple
programmers do this if you wish, nor adding O(N^2^) complexity for each
new syntax.</p>Wikipedia "templates"2013-02-23T00:00:00-08:002013-02-23T00:00:00-08:00fahhemtag:blog.fahhem.com,2013-02-23:/2013/02/wikipedia-templates/<p>I've been dealing with Wikipedia 'templates' recently, so I'll be
exploring their syntax, parsing, and rendering in the next few posts.
Here's a taste:</p>
<p>Wikipedia templates seem to have grown organically, and that's the
nicest way to describe it. They look simple at first glance, just a
bunch of squiggly …<a class="read-more" href="/2013/02/wikipedia-templates/"><span>continue</span></a></p><p>I've been dealing with Wikipedia 'templates' recently, so I'll be
exploring their syntax, parsing, and rendering in the next few posts.
Here's a taste:</p>
<p>Wikipedia templates seem to have grown organically, and that's the
nicest way to describe it. They look simple at first glance, just a
bunch of squiggly brackets to denote everything, how bad can it be?</p>
<p>Let me lead you down the rabbit hole...</p>