(lispkit markdown)

Library (lispkit markdown) provides an API for programmatically constructing Markdown documents, for parsing strings in Markdown format, as well as for mapping Markdown documents into corresponding HTML. The Markdown syntax supported by this library is based on the CommonMark Markdown specification.

Data Model

Markdown documents are represented using an abstract syntax that is implemented by three algebraic datatypes block, list-item, and inline, via define-datatype of library (lispkit datatype).

Blocks

At the top-level, a Markdown document consist of a list of blocks. The following recursively defined datatype shows all the supported block types as variants of type block.

(define-datatype block markdown-block?
  (document blocks)
      where (markdown-blocks? blocks)
  (blockquote blocks)
      where (markdown-blocks? blocks)
  (list-items start tight items)
      where (and (opt fixnum? start) (markdown-list? items))
  (paragraph text)
      where (markdown-text? text)
  (heading level text)
      where (and (fixnum? level) (markdown-text? text))
  (indented-code lines)
      where (every? string? lines)
  (fenced-code lang lines)
      where (and (opt string? lang) (every? string? lines))
  (html-block lines)
      where (every? string? lines)
  (reference-def label dest title)
      where (and (string? label) (string? dest) (every? string? title))
  (table header alignments rows)
      where (and (every? markdown-text? header)
                 (every? symbol? alignments)
                 (every? (lambda (x) (every? markdown-text? x)) rows))
  (definition-list defs)
      where (every? (lambda (x)
                      (and (markdown-text? (car x))
                           (markdown-list? (cdr x)))) defs)
  (thematic-break))

(document blocks) represents a full Markdown document consisting of a list of blocks. (blockquote blocks) represents a blockquote block which itself has a list of sub-blocks. (list-items start tight items) defines either a bullet list or an ordered list. start is #f for bullet lists and defines the first item number for ordered lists. tight is a boolean which is #f if this is a loose list (with vertical spacing between the list items). items is a list of list items of type list-item as defined as follows:

(define-datatype list-item markdown-list-item?
  (bullet ch tight? blocks)
      where (and (char? ch) (markdown-blocks? blocks))
  (ordered num ch tight? blocks)
      where (and (fixnum? num) (char? ch) (markdown-blocks? blocks)))

The most frequent Markdown block type is a paragraph. (paragraph text) represents a single paragraph of text where text refers to a list of inline text fragments of type inline (see below). (heading level text) defines a heading block for a heading of a given level, where level is a number starting with 1 (up to 6). (indented-code lines) represents a code block consisting of a list of text lines each represented by a string. (fenced-code lang lines) is similar: it defines a code block with code expressed in the given language lang. (html lines) defines a HTML block consisting of the given lines of text. (reference-def label dest title) introduces a reference definition consisting of a given label, a destination URI dest, as well as a title string. (table header alignments rows) defines a table consisting of headers, a list of markdown text describing the header of each column, alignments, a list of symbols l (= left), c (= center), and r (= right), and rows, a list of lists of markdown text. (definition-list defs) represents a definition list where defs refers to a list of definitions. A definition has the form (name def ...) where name is markdown text defining a name, and def is a bullet item using : as bullet character. Finally, (thematic-break) introduces a thematic break block separating the previous and following blocks visually, often via a line.

Inline Text

Markdown text is represented as lists of inline text segments, each represented as an object of type inline. inline is defined as follows:

(define-datatype inline markdown-inline?
  (text str)
      where (string? str)
  (code str)
      where (string? str)
  (emph text)
      where (markdown-text? text)
  (strong text)
      where (markdown-text? text)
  (link text uri title)
      where (and (markdown-text? text) (string? uri) (string? title))
  (auto-link uri)
      where (string? uri)
  (email-auto-link email)
      where (string? uri)
  (image text uri title)
      where (and (markdown-text? text) (string? uri) (string? title))
  (html tag)
      where (string? tag)
  (line-break hard?))

(text str) refers to a text segment consisting of string str. (code str) refers to a code string str (often displayed as verbatim text). (emph text) represents emphasized text (often displayed as italics). (strong text) represents text in boldface. (link text uri title) represents a hyperlink with text linking to uri and title representing a title for the link. (auto-link uri) is a link where uri is both the text and the destination URI. (email-auto-link email) is a "mailto:" link to the given email address email. (image text uri title) inserts an image at uri with image description text and image link title title. (html tag) represents a single HTML tag of the form <tag>. Finally, (line-break #f) introduces a "soft line break", whereas (line-break #t) inserts a "hard line break".

Creating Markdown documents

Markdown documents can either be constructed programmatically via the datatypes introduced above, or a string representing a Markdown documents gets parsed into the internal abstract syntax representation via function markdown.

For instance, (markdown "# My title\n\nThis is a paragraph.") returns a markdown document consisting of two blocks: a header block for header "My title" and a paragraph block for the text "This is a paragraph":

(markdown "# My title\n\nThis is a paragraph.")
⟹ #block:(document (#block:(heading 1 (#inline:(text "My title"))) #block:(paragraph (#inline:(text "This is a paragraph.")))))

The same document can be created programmatically in the following way:

(document
  (list
    (heading 1 (list (text "My title")))
    (paragraph (list (text "This is a paragraph.")))))
⟹ #block:(document (#block:(heading 1 (#inline:(text "My title"))) #block:(paragraph (#inline:(text "This is a paragraph.")))))

Processing Markdown documents

Since the abstract syntax of Markdown documents is represented via algebraic datatypes, pattern matching can be used to deconstruct the data. For instance, the following function returns all the top-level headers of a given Markdown document:

(import (lispkit datatype))  ; this is needed to import `match`

(define (top-headings doc)
  (match doc
    ((document blocks)
      (filter-map (lambda (block)
                    (match block
                      ((heading 1 text) (text->raw-string text))
                      (else #f)))
                  blocks))))

An example for how top-headings can be applied to this Markdown document:

# *header* 1
Paragraph.
# __header__ 2
## header 3
The end.

is shown here:

(top-headings (markdown "# *header* 1\nParagraph.\n# __header__ 2\n## header 3\nThe end."))
⟹  ("header 1" "header 2")

API

Symbol representing the markdown block type. The type-for procedure of library (lispkit type) returns this symbol for all block objects.

Symbol representing the markdown list-item type. The type-for procedure of library (lispkit type) returns this symbol for all list item objects.

Symbol representing the markdown inline type. The type-for procedure of library (lispkit type) returns this symbol for all inline objects.

Returns #t if obj is a proper list of objects o for which (markdown-block? o) returns #t; otherwise it returns #f.

Returns #t if obj is a variant of algebraic datatype block.

Returns #t if markdown blocks lhs and rhs are equals; otherwise it returns #f.

Returns #t if obj is a proper list of list items i for which (markdown-list-item? i) returns #t; otherwise it returns #f.

Returns #t if obj is a variant of algebraic datatype list-item.

Returns #t if markdown list items lhs and rhs are equals; otherwise it returns #f.

Returns #t if obj is a proper list of objects o for which (markdown-inline? o) returns #t; otherwise it returns #f.

Returns #t if obj is a variant of algebraic datatype inline.

Returns #t if markdown inline text lhs and rhs are equals; otherwise it returns #f.

Returns #t if obj is a valid markdown document, i.e. an instance of the document variant of datatype block; returns #f otherwise.

Returns #t if markdown documents lhs and rhs are equals; otherwise it returns #f.

Parses the text in Markdown format in str and returns a representation of the abstract syntax using the algebraic datatypes block, list-item, and inline.

Converts a Markdown document md into HTML, represented in form of a string. md needs to satisfy the markdown? predicate.

Converts a Markdown block or list of blocks bs into HTML, represented in form of a string. tight? is a boolean and should be set to true if the conversion should consider tight typesetting (see CommonMark specification for details).

Converts Markdown inline text or list of inline texts txt into HTML, represented in form of a string.

Converts a Markdown document md into a styled HTML document, represented in form of a string. md needs to satisfy the markdown? predicate. style is a list with up to three elements: (size font color). It specifies the default text style of the document. size is the point size of the font, font is a font name, and color is a HTML color specification (e.g. "#FF6789"). codestyle specifies the style of inline code in the same format. colors is a list of HTML color specifications for the following document elements in this order: the border color of code blocks, the color of blockquote "bars", the color of H1, H2, H3 and H4 headers.

Converts given inline text text into a string representation which encodes markup in text using Markdown syntax. text needs to satisfy the markdown-text? predicate.

Converts given inline text text into a string representation ignoring markup in text. text needs to satisfy the markdown-text? predicate.

Last updated