Markdown To Html Python



I'm building my first website using flask and HTML. Some of my data that I want to migrate to this website resides in Markdown format. I am trying to convert Markdown into HTML using this however, I. Markdown is a great way of adding rich content to your command line applications. Here’s an example of use. Markdown README.

In the previous article we looked at what static sites are, and how they work.

Now we will look at how to convert a single markdown file into an HTML file.

Convert Word To Markdown

The conversion process

Markdown

This diagram from the previous article shows the basic process for converting a set of markdown files into the required HTML files for a complete website:

This time we will look in more detail at what is involved in converting a single page of markdown into the corresponding HTML file:

Here is an example markdown file, test.md:

This actually isn't a pure markdown file. The top part of the file is meta-data for the page, in a format called yaml. Many static site generators use a similar system. The yaml is contained between the two '---' markers. The rest of the file (after the second '---') is the markdown content of the file. But for brevity we will call the entire file a markdown file.

Markdown To Html Using Python

Converting this page to HTML actually involves 4 separate tasks:

  • Split the file into yaml and markdown parts
  • Extract the meta-data from the YAML.
  • Convert the markdown to an HTML fragment (the page content).
  • Combine the meta-data and page content with the HTML template to create a complete HTML file.

Fortunately, if we use the right Python libraries, each of these steps is very easy.

Markdown To Html Python Editor

Splitting the file

This part is fairly standard Python. We read the markdown file in, line by line, and create two strings, ym that contains the yaml text, and md that contains the markdown text.

Python allows us to treat a text file as a sequence of lines of text, that we can loop through using a for loop.

The first loop discards strings until we find the first '---'. The second loop reads all the strings until the next '---'. Those are the yaml_lines. Finally, all the remaining lines after the second '---' are the markdown data.

We join all the yaml_lines to form a string ym. We join all the lines of markdown data to form the string md.

Parsing the yaml data

We will use the Python yaml library to parse the yaml data, like this:

This parses a block of yaml text and creates a dictionary with the result. Here is what it prints:

This is the same data as we had on the test.md file, but now in the form of a Python dictionary.

Notice that the tags element has a list of values. That is because the yaml header uses a syntax for tags that allows for multiple values.

Converting the markdown data

Here we convert the second part of the file, the markdown data, into an html fragment, like this:

We are using the markdown library to do the conversion. This takes a markdown format string and returns an html string. Based on the markdown code above, the html content string will be:

As you can see it correctly marked up the bold and italic text, hyperlink, and image. The markdown method has several extensions that can be added, for example to provide syntax highlighting, but we aren't using those here.

The output is an html fragment. It places each paragraph inside its own paragraph tags, but it doesn't provide higher level tags such as a body tag. It is assumed that the html fragment will be place within a full html document (which we will do next).

Creating the full html

We create our final html using a template like this:

This template is just a basic html page. For a real website, you would probably want to use something more sophisticated, maybe a responsive template and some CSS styling.

But the basic method is the same. You use a full html page template, but with placeholders for variable content such as the title of the page, the author's name, and the main content itself.

The placeholders are enclosed in double curly brackets, for example {{title}}. We use the pystache module to substitute real values for the placeholders to create the final html. Here is the code:

The render function accepts the html template, plus a dictionary that maps the template names on to their values.

Notice that the info dictionary we are using comes straight from the yaml parser. It already contains entries for the title, author and date. The trick here is to make sure that each tag in the html template exactly matches the equivalent field in the yaml header. That way, pystache will be looking for the same tags that the yaml parser stored.

Well that isn't quite true. The info dictionary doesn't yet have an entry for content, because the content comes from the markdown. So we add and extra element to the dictionary, called 'content', containing the processed markdown content.

The other thing to notice is that we use triple brackets for content - {{{content}}}. The reason for this is that the content is raw html data:

  • For {{value}}, pystache renders the value assuming it is text that you want to display. If it contains html characters such as < it will use escape characters so the the symbol is displayed as a < in the browser. That is what you would want in the page title, for instance.
  • For {{{value}}}, pystache renders the text unaltered, so it the text contains <p>, it will cause a paragraph break. This is what you want for the page content, which does include paragraph breaks.

Putting it all together

This has taken a bit of explaining, but if you actually look at the code to convert the yaml plus markdown into a final html page, it is remarkably simple:

In the next article we will look at how to build a complete site.

Introduction

Python-Markdown is a packagethat converts content in Markdown format to HTML. In this example, we will look at how to convert Markdown to HTML and automatically generate a table-of-contents.We will also look at using the command-line tool to convert content.We will also cover how to use fenced code blocks and

Setup

Install the markdown library with pip. I am using Python 3.8 in this example.

Convert Markdown to HTML in Python

The easiest way to convert is just use a string for input and a string for output.

To use files for input and output instead:

Convert Markdown to HTML with command-line tool

The Python-Markdown CLI tool is convenient whenyou just want to convert a document without embedding the code in a larger application.

The easiest way to invoke it by running is a module with python -m. For example:

Generate a table of contents (TOC)

To generate a TOC, we need to using the toc extension. There are a number of other extensions availablewith the package that you can check out at https://python-markdown.github.io/extensions/.

You convert the same way before, except this time you pass in an extra parameter to include the extension

To customize options, you need to include the markdown.extensions.toc.TocExtension classand pass an instance of that object to the extensions parameter. See the following example.Read more at https://python-markdown.github.io/extensions/toc/#usage

In your Markdown, add [TOC] to the Markdown where the TOC should go.

Fenced code blocks

To make a code block you can indent all lines by 4 spaces by default.Personally, I prefer using the three backticks (```) to enclose code without indenting.It also gives a place to define which language is being used.

To use the triple backticks you need to enable the fenced_code extension.This extensions already comes with Python-Markdown.This will wrap the code block with a <pre> and <code> tag.

TIP: If you need to write a triple backtick code block within your Markdown code, you can wrap the outermostcodeblock with additional backticks. For example, use a set of 4 or 5 instead of 3 like this:

Markdown To Html Python Code

Source code syntax highlighting

To build on the previous section using fenced_code, you can add syntax highlightingwith the codehilite extension. This extensions already comes with Python-Markdown, butit depends on another Python library named Pygments.

Install pygments with pip:

Here is an example of generating HTML with both fenced_code and codehilite extensions together.

When you add the codehilite extension,the code block is wrapped with the class .codehilite and many other styles will be applied.You could write your own styles, but Pygments comes with several style sets you can use.You can generate the different styles using a command-line tool called pygmentize.Use this tool to list available color themes and to generate the styles.Save the CSS output to a .css file and link it in your HTML like normal.

To apply the proper styles, you must generate the CSS and apply it.

In the HTML:

Conclusion

After reading this, you should understand how to convert Markdown contentto HTML and how to automatically generate a table-of-contents.You should be able to use strings or files for conversion.You should also understand how to use the CLI tool to convert content.You should also know how to include extensions and apply fenced code blocks and source code syntax highlithing with Pygments.

Python Markdown To Html With Css

References