Patterns is a Swift library for Parser Expression Grammars (PEGs). It can be used to create expressions similar to regular expressions (like regex’es) and grammars (for parsers).
let text = "This is a point: (43,7), so is (0, 5). But my final point is (3,-1)."
let number = ("+" / "-" / "") • digit+
let point = "(" • Capture(name: "x", number)
• "," • " "¿ • Capture(name: "y", number) • ")"
struct Point: Codable {
let x, y: Int
}
let points = try Parser(search: point).decode([Point].self, from: text)
// points == [Point(x: 43, y: 7), Point(x: 0, y: 5), Point(x: 3, y: -1)]
Patterns are defined directly in code, instead of in a text string.
Note: Long patterns can give the Swift type checker a lot to think about, especially long series of a / b / c / etc.... To improve build times, try to split a long pattern into multiple shorter ones.
Standard PEG
"text"
Text within double quotes matches that exact text, no need to escape special letters with \. If you want to turn a string variable s into a pattern, use Literal(s).
OneOf(...)
This is like character classes ([...]) from regular expressions, and matches 1 character. OneOf("aeiouAEIOU") matches any single character in that string, and OneOf("a"..."e") matches any of “abcde”. They can also be combined, like OneOf("aeiou", punctuation, "x"..."z"). To match any character except …, use OneOf(not: ...).
You can also implement one yourself:
OneOf(description: "ten") { character in
character.wholeNumberValue == 10
}
It takes a closure @escaping (Character) -> Bool and matches any character for which the closure returns true. The description parameter is only used when creating a textual representation of the pattern.
a • b • c
The • operator (Option-8 on U.S. keyboards, Option-Q on Norwegian ones) first matches a, then b and then c. It is used to create a pattern from a sequence of other patterns.
a*
matches 0 or more, as many as it can (it is greedy, like the regex a*?). So a pattern like a* • a will never match anything because the a* pattern will always match all it can, leaving nothing left for the last a.
a+
matches 1 or more, also as many as it can (like the regex a+?).
a¿
makes a optional, but it always matches if it can (the ¿ character is Option-Shift-TheKeyWith?OnIt on most keyboards).
a / b
This first tries the pattern on the left. If that fails it tries the pattern on the right. This is ordered choice, once a has matched it will never go back and try b if a later part of the expression fails. This is the main difference between PEGs and most other grammars and regex’es.
&&a • b
The “and predicate” first verifies that a matches, then moves the position in the input back to where a began and continues with b. In other words it verifies that both a and b match from the same position. So to match one ASCII letter you can use &&ascii • letter.
!a • b
The “not predicate” verifies that a does not match, then just like above it moves the position in the input back to where a began and continues with b. You can read it like “b and not a”.
Grammars
The main advantage of PEGs over regular expressions is that they support recursive expressions. These expressions can contain themselves, or other expressions that in turn contain them. Here is how you can parse simple arithmetic expressions:
The top expression is called first. • !any means it must match the entire string, because only at the end of the string is there no characters. If you want to match multiple arithmetic expressions in a string, comment out the first expression. Grammars use dynamic properties so there is no auto-completion for the expression names.
Additions
There are predefined OneOf patterns for all the boolean is... properties of Swift’s Character: letter, lowercase, uppercase, punctuation, whitespace, newline, hexDigit, digit, ascii, symbol, mathSymbol, currencySymbol.
They all have the same name as the last part of the property, except for wholeNumber, which is renamed to digit because wholeNumber sounds more like an entire number than a single digit.
There is also alphanumeric, which is a letter or a digit.
any
Matches any character. !any matches only the end of the text.
Line()
Matches a single line, not including the newline characters. So Line() • Line() will never match anything, but Line() • "\n" • Line() matches 2 lines.
Line.Start() matches at the beginning of the text, and after any newline characters. Line.End() matches at the end of the text, and right before any newline characters. They both have a length of 0, which means the next pattern will start at the same position in the text.
Word.Boundary()
Matches the position right before or right after a word. Like Line.Start() and Line.End() it also has a length of 0.
a.repeat(...)
a.repeat(2) matches 2 of that pattern in a row. a.repeat(...2) matches 0, 1 or 2, a.repeat(2...) matches 2 or more and a.repeat(3...6) between 3 and 6.
Skip() • a • b
Finds the first match of a • b from the current position.
Parsing
To actually use a pattern, pass it to a Parser:
let parser = try Parser(search: a)
for match in parser.matches(in: text) {
// ...
}
Parser(search: a) searches for the first match for a. It is the same as Parser(Skip() • a).
The .matches(in: String) method returns a lazy sequence of Match instances.
Often we are only interested in parts of a pattern. You can use the Capture pattern to assign a name to those parts:
let text = "This is a point: (43,7), so is (0, 5). But my final point is (3,-1)."
let number = ("+" / "-" / "") • digit+
let point = "(" • Capture(name: "x", number)
• "," • " "¿ • Capture(name: "y", number) • ")"
struct Point: Codable {
let x, y: Int
}
let parser = try Parser(search: point)
let points = try parser.decode([Point].self, from: text)
Or you can use subscripting:
let pointsAsSubstrings = parser.matches(in: text).map { match in
(text[match[one: "x"]!], text[match[one: "y"]!])
}
You can also use match[multiple: name] to get an array if captures with that name may be matched multiple times. match[one: name] only returns the first capture of that name.
Inputs
By default, patterns have String as their input type. But you can use any BidirectionalCollection with Hashable elements for input. Just explicitly specify the input type of the first pattern, and the rest should get it automatically:
let text = "This is a point: (43,7), so is (0, 5). But my final point is (3,-1).".utf8
let digit = OneOf<String.UTF8View>(UInt8(ascii: "0")...UInt8(ascii: "9"))
let number = ("+" / "-" / "") • digit+
let point = "(" • Capture(name: "x", number)
• "," • " "¿ • Capture(name: "y", number) • ")"
struct Point: Codable {
let x, y: Int
}
let parser = try Parser(search: point)
let pointsAsSubstrings = parser.matches(in: text).map { match in
(text[match[one: "x"]!], text[match[one: "y"]!])
}
Parser.decode can (currently) only take String as input, but .matches handles all types.
Patterns
Patterns is a Swift library for Parser Expression Grammars (PEGs). It can be used to create expressions similar to regular expressions (like regex’es) and grammars (for parsers).
For general information about PEGs, see the original paper or Wikipedia.
Example
See also:
Usage
Patterns are defined directly in code, instead of in a text string.
Note: Long patterns can give the Swift type checker a lot to think about, especially long series of
a / b / c / etc...
. To improve build times, try to split a long pattern into multiple shorter ones.Standard PEG
"text"
Text within double quotes matches that exact text, no need to escape special letters with
\
. If you want to turn a string variables
into a pattern, useLiteral(s)
.OneOf(...)
This is like character classes (
[...]
) from regular expressions, and matches 1 character.OneOf("aeiouAEIOU")
matches any single character in that string, andOneOf("a"..."e")
matches any of “abcde”. They can also be combined, likeOneOf("aeiou", punctuation, "x"..."z")
. To match any character except …, useOneOf(not: ...)
.You can also implement one yourself:
It takes a closure
@escaping (Character) -> Bool
and matches any character for which the closure returnstrue
. The description parameter is only used when creating a textual representation of the pattern.a • b • c
The • operator (Option-8 on U.S. keyboards, Option-Q on Norwegian ones) first matches
a
, thenb
and thenc
. It is used to create a pattern from a sequence of other patterns.a*
matches 0 or more, as many as it can (it is greedy, like the regex
a*?
). So a pattern likea* • a
will never match anything because thea*
pattern will always match all it can, leaving nothing left for the lasta
.a+
matches 1 or more, also as many as it can (like the regex
a+?
).a¿
makes
a
optional, but it always matches if it can (the¿
character is Option-Shift-TheKeyWith?OnIt on most keyboards).a / b
This first tries the pattern on the left. If that fails it tries the pattern on the right. This is ordered choice, once
a
has matched it will never go back and tryb
if a later part of the expression fails. This is the main difference between PEGs and most other grammars and regex’es.&&a • b
The “and predicate” first verifies that
a
matches, then moves the position in the input back to wherea
began and continues withb
. In other words it verifies that botha
andb
match from the same position. So to match one ASCII letter you can use&&ascii • letter
.!a • b
The “not predicate” verifies that
a
does not match, then just like above it moves the position in the input back to wherea
began and continues withb
. You can read it like “b and not a”.Grammars
The main advantage of PEGs over regular expressions is that they support recursive expressions. These expressions can contain themselves, or other expressions that in turn contain them. Here is how you can parse simple arithmetic expressions:
This will parse expressions like “1+2-3^(4*3)/2”.
The top expression is called first.
• !any
means it must match the entire string, because only at the end of the string is there no characters. If you want to match multiple arithmetic expressions in a string, comment out the first expression. Grammars use dynamic properties so there is no auto-completion for the expression names.Additions
There are predefined OneOf patterns for all the boolean
is...
properties of Swift’sCharacter
:letter
,lowercase
,uppercase
,punctuation
,whitespace
,newline
,hexDigit
,digit
,ascii
,symbol
,mathSymbol
,currencySymbol
.They all have the same name as the last part of the property, except for
wholeNumber
, which is renamed todigit
becausewholeNumber
sounds more like an entire number than a single digit.There is also
alphanumeric
, which is aletter
or adigit
.any
Matches any character.
!any
matches only the end of the text.Line()
Matches a single line, not including the newline characters. So
Line() • Line()
will never match anything, butLine() • "\n" • Line()
matches 2 lines.Line.Start()
matches at the beginning of the text, and after any newline characters.Line.End()
matches at the end of the text, and right before any newline characters. They both have a length of 0, which means the next pattern will start at the same position in the text.Word.Boundary()
Matches the position right before or right after a word. Like
Line.Start()
andLine.End()
it also has a length of 0.a.repeat(...)
a.repeat(2)
matches 2 of that pattern in a row.a.repeat(...2)
matches 0, 1 or 2,a.repeat(2...)
matches 2 or more anda.repeat(3...6)
between 3 and 6.Skip() • a • b
Finds the first match of
a • b
from the current position.Parsing
To actually use a pattern, pass it to a Parser:
Parser(search: a)
searches for the first match fora
. It is the same asParser(Skip() • a)
.The
.matches(in: String)
method returns a lazy sequence ofMatch
instances.Often we are only interested in parts of a pattern. You can use the
Capture
pattern to assign a name to those parts:Or you can use subscripting:
You can also use
match[multiple: name]
to get an array if captures with that name may be matched multiple times.match[one: name]
only returns the first capture of that name.Inputs
By default, patterns have
String
as their input type. But you can use anyBidirectionalCollection
withHashable
elements for input. Just explicitly specify the input type of the first pattern, and the rest should get it automatically:Parser.decode
can (currently) only take String as input, but.matches
handles all types.Setup
Swift Package Manager
Add this to your
Package.swift
file:or choose “Add Package Dependency” from within Xcode.
Implementation
Patterns is implemented using a virtual parsing machine, similar to how LPEG is implemented, and the
backtrackingvm
function described here.Contributing
Contributions are most welcome 🙌.
License
MIT