Swift MarkdownKit is a framework for parsing text in Markdown
format. The supported syntax is based on the CommonMark Markdown specification.
Swift MarkdownKit also provides an extended version of the parser that is able to handle Markdown tables.
Swift MarkdownKit defines an abstract syntax for Markdown, it provides a parser for parsing strings into
abstract syntax trees, and comes with generators for creating HTML and
attributed strings.
Using the framework
Parsing Markdown
Class MarkdownParser
provides a simple API for parsing Markdown in a string. The parser returns an abstract syntax tree
representing the Markdown structure in the string:
let markdown = MarkdownParser.standard.parse("""
# Header
## Sub-header
And this is a **paragraph**.
""")
print(markdown)
Executing this code will result in the follwing data structure of type Block getting printed:
document(heading(1, text("Header")),
heading(2, text("Sub-header")),
paragraph(text("And this is a "),
strong(text("paragraph")),
text("."))))
Block
is a recursively defined enumeration of cases with associated values (also called an algebraic datatype).
Case document refers to the root of a document. It contains a sequence of blocks. In the example above, two
different types of blocks appear within the document: heading and paragraph. A heading case consists
of a heading level (as its first argument) and heading text (as the second argument). A paragraph case simply
consists of text.
Text is represented using the struct
Text
which is effectively a sequence of TextFragment values.
TextFragment
is yet another recursively defined enumeration with associated values. The example above shows two different
TextFragment cases in action: text and strong. Case text represents plain strings. Case strong
contains a Text object, i.e. it encapsulates a sequence of TextFragment values which are
“marked up strongly”.
Parsing “extended” Markdown
Class ExtendedMarkdownParser has the same interface like MarkdownParser but supports tables and
definition lists in addition to the block types defined by the CommonMark specification.
Tables are based on the
GitHub Flavored Markdown specification with one extension: within a table
block, it is possible to escape newline characters to enable cell text to be written on multiple lines. Here is an example:
| Column 1 | Column 2 |
| ------------ | -------------- |
| This text \
is very long | More cell text |
| Last line | Last cell |
Definition lists are implemented in an
ad hoc fashion. A definition consists of terms and their corresponding definitions. Here is an example of two
definitions:
Apple
: Pomaceous fruit of plants of the genus Malus in the family Rosaceae.
Orange
: The fruit of an evergreen tree of the genus Citrus.
: A large round juicy citrus fruit with a tough bright reddish-yellow rind.
Configuring the Markdown parser
The Markdown dialect supported by MarkdownParser is defined by two parameters: a sequence of
block parsers (each represented as a subclass of
BlockParser),
and a sequence of inline transformers (each represented as a subclass of
InlineTransformer).
The initializer of class MarkdownParser accepts both components optionally. The default configuration
(neither block parsers nor inline transformers are provided for the initializer) is able to handle Markdown based on the
CommonMark specification.
Since MarkdownParser objects are stateless (beyond the configuration of block parsers and inline
transformers), there is a predefined default MarkdownParser object accessible via the static property
MarkdownParser.standard. This default parsing object is used in the example above.
New markdown parsers with different configurations can also be created by subclassing
MarkdownParser
and by overriding the class properties defaultBlockParsers and defaultInlineTransformers. Here is
an example of how class
ExtendedMarkdownParser
is derived from MarkdownParser simply by overriding
defaultBlockParsers and by specializing standard in a covariant fashion.
open class ExtendedMarkdownParser: MarkdownParser {
override open class var defaultBlockParsers: [BlockParser.Type] {
return self.blockParsers
}
private static let blockParsers: [BlockParser.Type] =
MarkdownParser.defaultBlockParsers + [TableParser.self]
override open class var standard: ExtendedMarkdownParser {
return self.singleton
}
private static let singleton: ExtendedMarkdownParser = ExtendedMarkdownParser()
}
Extending the Markdown parser
With version 1.1 of the MarkdownKit framework, it is now also possible to extend the abstract
syntax supported by MarkdownKit. Both Block and TextFragment enumerations now include
a custom case which refers to objects representing the extended syntax. These objects have to
implement protocol CustomBlock for blocks and CustomTextFragment for text fragments.
Here is a simple example how one can add support for “underline” (e.g. this is ~underlined~ text)
and “strikethrough” (e.g. this is using ~~strike-through~~) by subclassing existing inline transformers.
First, a new custom text fragment type has to be implemented for representing underlined and
strike-through text. This is done with an enumeration which implements the CustomTextFragment protocol:
enum LineEmphasis: CustomTextFragment {
case underline(Text)
case strikethrough(Text)
func equals(to other: CustomTextFragment) -> Bool {
guard let that = other as? LineEmphasis else {
return false
}
switch (self, that) {
case (.underline(let lhs), .underline(let rhs)):
return lhs == rhs
case (.strikethrough(let lhs), .strikethrough(let rhs)):
return lhs == rhs
default:
return false
}
}
func transform(via transformer: InlineTransformer) -> TextFragment {
switch self {
case .underline(let text):
return .custom(LineEmphasis.underline(transformer.transform(text)))
case .strikethrough(let text):
return .custom(LineEmphasis.strikethrough(transformer.transform(text)))
}
}
func generateHtml(via htmlGen: HtmlGenerator) -> String {
switch self {
case .underline(let text):
return "<u>" + htmlGen.generate(text: text) + "</u>"
case .strikethrough(let text):
return "<s>" + htmlGen.generate(text: text) + "</s>"
}
}
func generateHtml(via htmlGen: HtmlGenerator,
and attrGen: AttributedStringGenerator?) -> String {
return self.generateHtml(via: htmlGen)
}
var rawDescription: String {
switch self {
case .underline(let text):
return text.rawDescription
case .strikethrough(let text):
return text.rawDescription
}
}
var description: String {
switch self {
case .underline(let text):
return "~\(text.description)~"
case .strikethrough(let text):
return "~~\(text.description)~~"
}
}
var debugDescription: String {
switch self {
case .underline(let text):
return "underline(\(text.debugDescription))"
case .strikethrough(let text):
return "strikethrough(\(text.debugDescription))"
}
}
}
Next, two inline transformers need to be extended to recognize the new emphasis delimiter ~:
final class EmphasisTestTransformer: EmphasisTransformer {
override public class var supportedEmphasis: [Emphasis] {
return super.supportedEmphasis + [
Emphasis(ch: "~", special: false, factory: { double, text in
return .custom(double ? LineEmphasis.strikethrough(text)
: LineEmphasis.underline(text))
})]
}
}
final class DelimiterTestTransformer: DelimiterTransformer {
override public class var emphasisChars: [Character] {
return super.emphasisChars + ["~"]
}
}
Finally, a new extended markdown parser can be created:
final class EmphasisTestMarkdownParser: MarkdownParser {
override public class var defaultInlineTransformers: [InlineTransformer.Type] {
return [DelimiterTestTransformer.self,
CodeLinkHtmlTransformer.self,
LinkTransformer.self,
EmphasisTestTransformer.self,
EscapeTransformer.self]
}
override public class var standard: EmphasisTestMarkdownParser {
return self.singleton
}
private static let singleton: EmphasisTestMarkdownParser = EmphasisTestMarkdownParser()
}
Processing Markdown
The usage of abstract syntax trees for representing Markdown text has the advantage that it is very easy to
process such data, in particular, to transform it and to extract information. Below is a short Swift snippet
which illustrates how to process an abstract syntax tree for the purpose of extracting all top-level headers
(i.e. this code prints the top-level outline of a text in Markdown format).
let markdown = MarkdownParser.standard.parse("""
# First *Header*
## Sub-header
And this is a **paragraph**.
# Second **Header**
And this is another paragraph.
""")
func topLevelHeaders(doc: Block) -> [String] {
guard case .document(let topLevelBlocks) = doc else {
preconditionFailure("markdown block does not represent a document")
}
var outline: [String] = []
for block in topLevelBlocks {
if case .heading(1, let text) = block {
outline.append(text.rawDescription)
}
}
return outline
}
let headers = topLevelHeaders(doc: markdown)
print(headers)
This will print an array with the following two entries:
["First Header", "Second Header"]
Converting Markdown into other formats
Swift MarkdownKit currently provides two different generators, i.e. Markdown processors which
output, for a given Markdown document, a corresponding representation in a different format.
HtmlGenerator
defines a simple mapping from Markdown into HTML. Here is an example for the usage of the generator:
let html = HtmlGenerator.standard.generate(doc: markdown)
There are currently no means to customize HtmlGenerator beyond subclassing. Here is an example that
defines a customized HTML generator which formats blockquote Markdown blocks using HTML tables:
Swift MarkdownKit also comes with a generator for attributed strings.
AttributedStringGenerator
uses a customized HTML generator internally to define the translation from Markdown into
NSAttributedString. The initializer of AttributedStringGenerator provides a number of
parameters for customizing the style of the generated attributed string.
let generator = AttributedStringGenerator(fontSize: 12,
fontFamily: "Helvetica, sans-serif",
fontColor: "#33C",
h1Color: "#000")
let attributedStr = generator.generate(doc: markdown)
Using the command-line tool
The Swift MarkdownKit Xcode project also implements a
very simple command-line tool
for either translating a single Markdown text file into HTML or for translating all Markdown files within a given
directory into HTML.
The tool is provided to serve as a basis for customization to specific use cases. The simplest way to build the
binary is to use the Swift Package Manager (SPM):
> git clone https://github.com/objecthub/swift-markdownkit.git
Cloning into 'swift-markdownkit'...
remote: Enumerating objects: 70, done.
remote: Counting objects: 100% (70/70), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 70 (delta 13), reused 65 (delta 11), pack-reused 0
Unpacking objects: 100% (70/70), done.
> cd swift-markdownkit
> swift build -c release
[1/3] Compiling Swift Module 'MarkdownKit' (25 sources)
[2/3] Compiling Swift Module 'MarkdownKitProcess' (1 sources)
[3/3] Linking ./.build/x86_64-apple-macosx/release/MarkdownKitProcess
> ./.build/x86_64-apple-macosx/release/MarkdownKitProcess
usage: mdkitprocess <source> [<target>]
where: <source> is either a Markdown file or a directory containing Markdown files
<target> is either an HTML file or a directory in which HTML files are written
Known issues
There are a number of limitations and known issues:
The Markdown parser currently does not fully support link reference definitions in a CommonMark-compliant
fashion. It is possible to define link reference definitions and use them, but for some corner cases, the current
implementation behaves differently from the spec.
Requirements
The following technologies are needed to build the components of the Swift MarkdownKit framework.
The command-line tool can be compiled with the Swift Package Manager, so Xcode is not strictly needed
for that. Similarly, just for compiling the framework and trying the command-line tool in Xcode, the
Swift Package Manager is not needed.
Swift MarkdownKit
Overview
Swift MarkdownKit is a framework for parsing text in Markdown format. The supported syntax is based on the CommonMark Markdown specification. Swift MarkdownKit also provides an extended version of the parser that is able to handle Markdown tables.
Swift MarkdownKit defines an abstract syntax for Markdown, it provides a parser for parsing strings into abstract syntax trees, and comes with generators for creating HTML and attributed strings.
Using the framework
Parsing Markdown
Class
MarkdownParser
provides a simple API for parsing Markdown in a string. The parser returns an abstract syntax tree representing the Markdown structure in the string:Executing this code will result in the follwing data structure of type
Block
getting printed:Block
is a recursively defined enumeration of cases with associated values (also called an algebraic datatype). Casedocument
refers to the root of a document. It contains a sequence of blocks. In the example above, two different types of blocks appear within the document:heading
andparagraph
. Aheading
case consists of a heading level (as its first argument) and heading text (as the second argument). Aparagraph
case simply consists of text.Text is represented using the struct
Text
which is effectively a sequence ofTextFragment
values.TextFragment
is yet another recursively defined enumeration with associated values. The example above shows two differentTextFragment
cases in action:text
andstrong
. Casetext
represents plain strings. Casestrong
contains aText
object, i.e. it encapsulates a sequence ofTextFragment
values which are “marked up strongly”.Parsing “extended” Markdown
Class
ExtendedMarkdownParser
has the same interface likeMarkdownParser
but supports tables and definition lists in addition to the block types defined by the CommonMark specification. Tables are based on the GitHub Flavored Markdown specification with one extension: within a table block, it is possible to escape newline characters to enable cell text to be written on multiple lines. Here is an example:Definition lists are implemented in an ad hoc fashion. A definition consists of terms and their corresponding definitions. Here is an example of two definitions:
Configuring the Markdown parser
The Markdown dialect supported by
MarkdownParser
is defined by two parameters: a sequence of block parsers (each represented as a subclass ofBlockParser
), and a sequence of inline transformers (each represented as a subclass ofInlineTransformer
). The initializer of classMarkdownParser
accepts both components optionally. The default configuration (neither block parsers nor inline transformers are provided for the initializer) is able to handle Markdown based on the CommonMark specification.Since
MarkdownParser
objects are stateless (beyond the configuration of block parsers and inline transformers), there is a predefined defaultMarkdownParser
object accessible via the static propertyMarkdownParser.standard
. This default parsing object is used in the example above.New markdown parsers with different configurations can also be created by subclassing
MarkdownParser
and by overriding the class propertiesdefaultBlockParsers
anddefaultInlineTransformers
. Here is an example of how classExtendedMarkdownParser
is derived fromMarkdownParser
simply by overridingdefaultBlockParsers
and by specializingstandard
in a covariant fashion.Extending the Markdown parser
With version 1.1 of the MarkdownKit framework, it is now also possible to extend the abstract syntax supported by MarkdownKit. Both
Block
andTextFragment
enumerations now include acustom
case which refers to objects representing the extended syntax. These objects have to implement protocolCustomBlock
for blocks andCustomTextFragment
for text fragments.Here is a simple example how one can add support for “underline” (e.g.
this is ~underlined~ text
) and “strikethrough” (e.g.this is using ~~strike-through~~
) by subclassing existing inline transformers.First, a new custom text fragment type has to be implemented for representing underlined and strike-through text. This is done with an enumeration which implements the
CustomTextFragment
protocol:Next, two inline transformers need to be extended to recognize the new emphasis delimiter
~
:Finally, a new extended markdown parser can be created:
Processing Markdown
The usage of abstract syntax trees for representing Markdown text has the advantage that it is very easy to process such data, in particular, to transform it and to extract information. Below is a short Swift snippet which illustrates how to process an abstract syntax tree for the purpose of extracting all top-level headers (i.e. this code prints the top-level outline of a text in Markdown format).
This will print an array with the following two entries:
Converting Markdown into other formats
Swift MarkdownKit currently provides two different generators, i.e. Markdown processors which output, for a given Markdown document, a corresponding representation in a different format.
HtmlGenerator
defines a simple mapping from Markdown into HTML. Here is an example for the usage of the generator:There are currently no means to customize
HtmlGenerator
beyond subclassing. Here is an example that defines a customized HTML generator which formatsblockquote
Markdown blocks using HTML tables:Swift MarkdownKit also comes with a generator for attributed strings.
AttributedStringGenerator
uses a customized HTML generator internally to define the translation from Markdown intoNSAttributedString
. The initializer ofAttributedStringGenerator
provides a number of parameters for customizing the style of the generated attributed string.Using the command-line tool
The Swift MarkdownKit Xcode project also implements a very simple command-line tool for either translating a single Markdown text file into HTML or for translating all Markdown files within a given directory into HTML.
The tool is provided to serve as a basis for customization to specific use cases. The simplest way to build the binary is to use the Swift Package Manager (SPM):
Known issues
There are a number of limitations and known issues:
Requirements
The following technologies are needed to build the components of the Swift MarkdownKit framework. The command-line tool can be compiled with the Swift Package Manager, so Xcode is not strictly needed for that. Similarly, just for compiling the framework and trying the command-line tool in Xcode, the Swift Package Manager is not needed.
Copyright
Author: Matthias Zenger (matthias@objecthub.net)
Copyright © 2019-2023 Google LLC.
Please note: This is not an official Google product.