20 minute read

From now on I will be able to write posts in Org Mode for this blog, besides the existing Markdown method. Here’s how I configure it. This article is written in an Org file and also serves as a demonstration of this new feature.

First, you can do everything in the original Markdown way, which is kept intact. Now you can write everything in Org Mode with the extra benefit of using front matter, Liquid syntax, figures, galleries, etc. exactly the same as in Markdown.

Here are the basic elements demonstrated:

Org Elements Examples

Headings

Custom ID

In Markdown you can write the attribute syntax {: id="your-custom-id"} to assign a custom ID to a heading. In Org you can achieve the same by using the conventional CUSTOM_ID in a property drawer.

Topdown IDs and ID Deduplication

By default, heading id-s are generated automatically by transforming the heading to a hyphenated string. By adding heading_ids: topdown in the front matter you can enable topdown IDs, which are generated by concatenating for each heading all the id-s of its parent headings, if any. For example,

* Heading 1
# id: heading-1
** Foo
# id: heading-1-foo
*** Bar
# id: heading-1-foo-bar
* Heading 2
# id: heading-2
** Foo
# id: heading-2-foo
*** Bar
# id: heading-2-foo-bar

At any rate, there is a duplication check during the generation. If a duplicate ID is found, a number will be appended incrementally.

However, CUSTOM_ID is never affected.

Heading Numbering

Headings can be numbered automatically. You can give numbers to all headings of a post or only the headings of a subtree:

  • For all headings: Add ordered: true to the front matter of the post.
  • For all headings under a subtree: Add the attribute {: .ordered} right under a heading. All of its sub-headings will be numbered. For example:

    *** Headings A
    {: .ordered}
    **** Heading 1
    ***** Heading 1.1
    ****** Heading 1.1.1
    ***** Heading 1.2
    **** Heading 2
    ***** Heading 2.1
    
    *** Headings B
    **** Heading B1
    {: .ordered}
    ***** Heading B1.1
    ****** Heading B1.1.1
    ***** Heading B1.2
    **** Heading B2
    ***** Heading B2.1

    Output:

    * Headings
    ** Headings A
    1. Heading 1
    1.1. Heading 1.1
    1.1.1. Heading 1.1.1
    1.2. Heading 1.2
    2. Heading 2
    2.1. Heading 2.1
    ** Headings B
    *** Heading 1
    1. Heading 1.1
    1.1. Heading 1.1.1
    2. Heading 1.2
    *** Heading 2
    **** Heading 2.1

    If a heading is marked with the {: .ordered} attribute the numbering for its sub-headings should always start from 1 regardless of its level in the document or its position relative its siblings.

If both ordered: true and {: .ordered} are present for a heading, the rules of the latter should override the former.

Heading A

1. Heading 1
1.1. Heading 1.1
Heading 1.1.1
1.2. Heading 1.2
2. Heading 2
2.1. Heading 2.1

Heading B

Heading B1
1. Heading B1.1
Heading B1.1.1
2. Heading B1.2
Heading B2
Heading B2.1

Tables

Here is the morphological breakdown:

Component Origin Meaning
Erythro- Greek (erythros) Red
-xyl- Greek (xylon) Wood
-aceae Latin (Suffix) Belonging to the family of

Lists

An ordered list with multiple levels:

  1. First item

    Hello world!
  2. Second item

    1. Sub-item A (Indented 3 spaces)

      Some text

      1. Sub-item A (Indented 3 spaces)
      2. Sub-item A (Indented 3 spaces)

        1. Sub-item A (Indented 3 spaces)
        2. Sub-item A (Indented 3 spaces)
      3. Sub-item A (Indented 3 spaces)
    2. Sub-item B

      • Deeply nested bullet (Indented another 3 spaces)
      • Deeply nested bullet (Indented another 3 spaces)
  3. Third item

Images

A simple image:

https://i.postimg.cc/Vv8jFw8D/unsplash-gallery-image-1.jpg

A figure:

This is a figure alt text
This is a figure caption

A gallery (with customization):

This is a gallery caption

Code highlighting

Inline codes: print, hello

Using : to start a line of code:

descriptor + subject + taxonomic rank

Using structure template (code block):

def print_hi(name)
  puts "Hi, #{name}"
end
print_hi('Tom')

#=> prints 'Hi, Tom' to STDOUT.

Text block:

Hello
{% include gallery columns=6 caption="My custom gallery" %}
{% include gallery columns=6 caption="My custom gallery" %}

Notices

{: .notice--warning}
*Watch out!* This is a warning notice inside an Org file.

Watch out! This is a warning notice inside an Org file.

Watch out! This is a warning notice inside an Org file.

Watch out! This is a warning notice inside an Org file.

Watch out! This is a warning notice inside an Org file.

Watch out! This is a warning notice inside an Org file.

Watch out! This is a warning notice inside an Org file.

Furigana

You can write furigana for Japanese Kanji words like the following:

本|日(ほん|じつ)はお時|間(じ|かん)をいただき、ありがとうございます。私はマカ
オにあるマカオ理|工|大|学(り|こう|だい|がく)を4年|制(ねん|せい)の学|士(が
く|し)課|程(か|てい)で卒業(そつ|ぎょう)しました。

本|日(ほん|じつ)はお時|間(じ|かん)をいただき、ありがとうございます。私はマカオにあるマカオ理|工|大|学(り|こう|だい|がく)を4年|制(ねん|せい)の学|士(がく|し)課|程(か|てい)で卒業(そつ|ぎょう)しました。

Configurations

Basic Converter

Here’s how I did the configurations.

  1. Add org-ruby to your Gemfile:

    group :jekyll_plugins do
      ...
      gem "org-ruby"
    end
  2. Write a new plugin: _plugins/org_converter.rb

       require 'nokogiri'
       require 'rouge'
    
       Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc|
         # Enable Liquid syntax
         if doc.extname.downcase == '.org'
           # Enable {% raw %}...{% endraw %}
    @@LIT_2@@
           # Enable figures, galleries, and links to posts (skip inside code or raw blocks)
           doc.content = doc.content.gsub(/#{block_regex}|#{raw_regex}|#{include_regex}/) do |match|
             $1 ? "\n#+BEGIN_HTML\n#{$1}\n#+END_HTML\n" : match
           end
    
           inline_code = /[=~][^=~\n]+[=~]/
           markdown_link = /(?<!\!)\[([^\]]+)\]\(([^)]+)\)/
    
           doc.content = doc.content.gsub(/#{block_regex}|^[ \t]*:[^\n]*$|#{inline_code}|#{markdown_link}/) do |match|
             match.start_with?('[') && !match.start_with?('[[') ? "[[#{$2}][#{$1}]]" : match
           end
         end
       end
    
       module Jekyll
         class OrgConverter < Converter
           safe true
           priority :low
    
           def matches(ext)
             ext =~ /^\.org$/i
           end
    
           def output_ext(ext)
             ".html"
           end
    
           def convert(content)
             require 'org-ruby'
             html = Orgmode::Parser.new(content).to_html
             doc = Nokogiri::HTML.fragment(html)
    
             doc.css('h1, h2, h3, h4, h5, h6').each do |node|
               node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
             end
    
             # Add <thead> to tables
             doc.css('table').each do |table|
               trs = table.xpath('./tr')
               next if trs.empty?
    
               if trs.first.at_xpath('./th')
                 thead = Nokogiri::XML::Node.new('thead', doc)
                 thead.add_child(trs.first)
                 tbody = Nokogiri::XML::Node.new('tbody', doc)
                 trs[1..-1].each { |tr| tbody.add_child(tr) }
                 table.add_child(thead)
                 table.add_child(tbody)
               else
                 tbody = Nokogiri::XML::Node.new('tbody', doc)
                 trs.each { |tr| tbody.add_child(tr) }
                 table.add_child(tbody)
               end
             end
    
             # Process code blocks to enable code highlighting
             doc.css('pre.src[lang]').each do |pre|
               lang = pre['lang']
               lexer = Rouge::Lexer.find_fancy(lang) || Rouge::Lexers::PlainText
               formatter = Rouge::Formatters::HTML.new
    
               # .sub(/\A[\r\n]+/, '') targets the absolute beginning of the string
               # (\A) and removes any leading line breaks or carriage returns.
               # .sub(/\s+\z/, '') targets the absolute end of the string (\z) and
               # removes any trailing whitespace, including empty lines and spaces.
               code_text = pre.text.sub(/\A[\r\n]+/, '').sub(/\s+\z/, '')
    
               highlighted = formatter.format(lexer.lex(code_text))
               new_node = Nokogiri::HTML.fragment(%Q{
                   <div class="language-#{lang} highlighter-rouge">
                     <div class="highlight"><pre class="highlight"><code>#{highlighted}</code></pre></div>
                   </div>
                 })
               pre.replace(new_node)
             end
    
             doc.to_html
           end
         end
       end
  3. Install the gems with bundler and run:

    bundle install
    bundle exec jekyll serve

Headings

Custom ID

In markdown you can write {: id="20260116203517"} to assign a custom ID to a heading. In Org there is the CUSTOM_ID property in a drawer. By default, org-ruby completely drops :PROPERTIES: drawers when converting to HTML, which is why your CUSTOM_ID values are disappearing before Nokogiri even has a chance to see them. Additionally, your current org_converter.rb script forcefully overwrites every heading’s ID with a hyphenated slug.

The fastest, most efficient way to make this work is to pre-process the Org document line-by-line right before passing it to org-ruby. We can detect the :CUSTOM_ID: inside the drawer, temporarily “smuggle” it directly into the heading’s text as a special marker (%%CUSTOM_ID:...%%), and then have your Nokogiri loop extract it and safely assign it as the real HTML ID.

Here are the lines to change in _plugins/org_converter.rb:

modified   _plugins/org_converter.rb
@@ -36,12 +36,42 @@ module Jekyll
     end

     def convert(content)
+      # Parse CUSTOM_ID properties and inject a temporary marker
+      lines = content.lines
+      lines.each_with_index do |line, index|
+        if line =~ /^\*+[ \t]+/
+          j = index + 1
+          if j < lines.length && lines[j] =~ /^[ \t]*:PROPERTIES:[ \t]*$/i
+            k = j + 1
+            custom_id = nil
+            while k < lines.length && lines[k] !~ /^[ \t]*:END:[ \t]*$/i && lines[k] !~ /^\*+[ \t]+/
+              if lines[k] =~ /^[ \t]*:CUSTOM_ID:[ \t]+(\S+)/i
+                custom_id = $1
+              end
+              k += 1
+            end
+            if custom_id && lines[k] =~ /^[ \t]*:END:[ \t]*$/i
+              lines[index] = lines[index].chomp + " %%CUSTOM_ID:#{custom_id}%%\n"
+            end
+          end
+        end
+      end
+      content = lines.join
+
       require 'org-ruby'
       html = Orgmode::Parser.new(content).to_html
       doc = Nokogiri::HTML.fragment(html)

       doc.css('h1, h2, h3, h4, h5, h6').each do |node|
-        node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
+        if node.content =~ /%%CUSTOM_ID:(\S+)%%/
+          node['id'] = $1
+          # Clean the marker out of all text nodes inside this heading
+          node.xpath('.//text()').each do |t|
+            t.content = t.content.gsub(/\s*%%CUSTOM_ID:\S+%%/, '')
+          end
+        else
+          node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
+        end
       end

       # Add <thead> to tables

Topdown IDs and ID Deduplication

To achieve this, we need to extract the heading_ids: topdown preference from the front matter and pass it into the org_converter.rb pipeline. Because the convert(content) method strictly receives the raw string and does not have native access to the front matter variables, the most efficient approach is to inject a temporary marker (%%HEADING_IDS:topdown%%) into the document during the :pre_render hook, and then seamlessly extract it inside the converter before passing the content to org-ruby.

We can simultaneously implement a stack-based hierarchy tracking array and an ID tracking hash to guarantee uniqueness and concatenation.

Here are the exact diffs to update the _plugins/org_converter.rb file.

  1. Inject the configuration marker

    Add the injection logic to your :pre_render hook:

    --- _plugins/org_converter.rb
    +++ _plugins/org_converter.rb
    @@ -4,6 +4,10 @@
     Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc|
       # Enable Liquid syntax
       if doc.extname.downcase == '.org'
    +    if doc.data['heading_ids'] == 'topdown'
    +      doc.content = "%%HEADING_IDS:topdown%%\n" + doc.content
    +    end
    +
         # Enable {{ "{%" }} raw %}...{{ "{%" }} endraw %}
         doc.content = doc.content.gsub(/^[ \t]*(\{%[ \t]*raw[ \t]*%\})[ \t]*\n?/, '')
         doc.content = doc.content.gsub(/^[ \t]*(\{%[ \t]*endraw[ \t]*%\})[ \t]*\n?/, '')
  2. Implement Topdown and Duplicate Logic

    Update the convert function to intercept the marker and apply the new logic dynamically:

    modified   _plugins/org_converter.rb
    @@ -40,6 +44,11 @@ module Jekyll
         end
    
         def convert(content)
    +      topdown = false
    +      if content.sub!(/\A%%HEADING_IDS:topdown%%\r?\n/, '')
    +        topdown = true
    +      end
    +
           # Parse CUSTOM_ID properties and inject a temporary marker
           lines = content.lines
           lines.each_with_index do |line, index|
    @@ -66,16 +75,44 @@ module Jekyll
           html = Orgmode::Parser.new(content).to_html
           doc = Nokogiri::HTML.fragment(html)
    
    +      id_stack = Array.new(6)
    +      seen_ids = {}
    +
           doc.css('h1, h2, h3, h4, h5, h6').each do |node|
    +        level = node.name[1].to_i
    +        custom_id = nil
    +
             if node.content =~ /%%CUSTOM_ID:(\S+)%%/
    -          node['id'] = $1
    +          custom_id = $1
               # Clean the marker out of all text nodes inside this heading
               node.xpath('.//text()').each do |t|
                 t.content = t.content.gsub(/\s*%%CUSTOM_ID:\S+%%/, '')
               end
    +        end
    +
    +        slug = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
    +        slug = 'section' if slug.empty?
    +
    +        if custom_id
    +          base_id = custom_id
    +        elsif topdown
    +          parent_id = id_stack[0...level-1].compact.last
    +          base_id = parent_id ? "#{parent_id}-#{slug}" : slug
             else
    -          node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
    +          base_id = slug
             end
    +
    +        final_id = base_id
    +        count = 1
    +        while seen_ids.key?(final_id)
    +          final_id = "#{base_id}-#{count}"
    +          count += 1
    +        end
    +        seen_ids[final_id] = true
    +
    +        node['id'] = final_id
    +        id_stack[level - 1] = final_id
    +        (level..5).each { |i| id_stack[i] = nil }
           end
    
           # Add <thead> to tables

Heading Numbering

To make this work seamlessly, we must append a :post_render hook to the bottom of org_converter.rb. This is structurally necessary for two reasons:

  1. Front Matter Access: Jekyll strips front matter (ordered: true) before passing the text to the convert function, so org_converter cannot natively see it.
  2. TOC Availability: Minimal Mistakes injects the Table of Contents via Liquid templates during the layout phase. If we try to inject numbers inside the convert step, the TOC doesn’t exist in the HTML yet, meaning it would remain unnumbered.

By running this logic in :post_render, we have full access to both the front matter and the fully generated layout (including the TOC).

Append these lines to the very end of the _plugins/org_converter.rb file:

modified   _plugins/org_converter.rb
@@ -151,3 +151,53 @@ module Jekyll
     end
   end
 end
+
+Jekyll::Hooks.register [:pages, :documents], :post_render do |doc|
+  # Only process Org Mode files
+  next unless doc.extname.downcase == '.org'
+
+  is_ordered_post = doc.data['ordered'] == true || doc.data['ordered'] == 'true'
+  fragment = Nokogiri::HTML(doc.output)
+  modified = false
+
+  counters = Array.new(7, 0)
+  anchor_stack = is_ordered_post ? [0] : []
+
+  main_content = fragment.at_css('section.page__content') || fragment.at_css('#main') || fragment
+
+  main_content.css('h1, h2, h3, h4, h5, h6').each do |heading|
+    # Skip structural layout headings
+    next if heading.ancestors('.toc, .page__comments, .page__related, .sidebar').any?
+
+    level = heading.name[1].to_i
+    # Exit any ordered scopes that are at the same or higher level than the current heading
+    anchor_stack.reject! { |a| a >= level }
+
+    counters[level] += 1
+    ((level + 1)..6).each { |i| counters[i] = 0 }
+
+    if anchor_stack.any?
+      active_anchor = anchor_stack.last
+      visible_counters = counters[(active_anchor + 1)..level]
+
+      if visible_counters.any?
+        number_prefix = visible_counters.join('.') + '. '
+        heading.inner_html = "<span class=\"heading-number\">#{number_prefix}</span>" + heading.inner_html
+
+        if heading['id']
+          toc_link = fragment.at_css(".toc__menu a[href='##{heading['id']}']")
+          if toc_link
+            toc_link.inner_html = "<span class=\"toc-number\">#{number_prefix}</span>" + toc_link.inner_html
+          end
+        end
+        modified = true
+      end
+    end
+
+    if heading['class'] && heading['class'].split.include?('ordered')
+      anchor_stack << level
+    end
+  end
+
+  doc.output = fragment.to_html if modified
+end

Lists

Normalizing Loose Lists

The outputs of the same list structure produced from Org and Markdown are slightly different as indicated in the below examples by the “Diff” comments. It seems that in Markdown when an item has sub-contents like a literal block, a sub-list, or a paragraph, its first content would be enclosed in <p>, while in Org this doesn’t happen. How to make Org list also follow this pattern in the output?

  • Org list:

    1. First item
    
       #+begin_src text
       Hello world!
       #+end_src
    
    2. Second item
       1. Sub-item A (Indented 3 spaces)
    
          Some text
    
          1. Sub-item A (Indented 3 spaces)
          2. Sub-item A (Indented 3 spaces)
             1. Sub-item A (Indented 3 spaces)
             2. Sub-item A (Indented 3 spaces)
          3. Sub-item A (Indented 3 spaces)
       2. Sub-item B
          * Deeply nested bullet (Indented another 3 spaces)
    3. Third item
  • Org list output:

    <ol>
      <li>First item <!-- Diff 1: No <p> around the content -->
        <div class="language-text highlighter-rouge">
          <div class="highlight">
            <pre class="highlight">
              <code>Hello world!</code>
            </pre>
          </div>
        </div>
      </li>
      <li>Second item
        <ol>
          <li>Sub-item A (Indented 3 spaces) <!-- Diff 2: No <p> around the content -->
            <p>Some text</p>
            <ol>
              <li>Sub-item A (Indented 3 spaces)</li>
              <li>Sub-item A (Indented 3 spaces)
                <ol>
                  <li>Sub-item A (Indented 3 spaces)</li>
                  <li>Sub-item A (Indented 3 spaces)</li>
                </ol>
              </li>
              <li>Sub-item A (Indented 3 spaces)</li>
            </ol>
          </li>
          <li>Sub-item B <!-- Diff 3: No <p> around the content -->
            <ul>
              <li>Deeply nested bullet (Indented another 3 spaces)</li>
            </ul>
          </li>
        </ol>
      </li>
      <li>Third item</li>
    </ol>
  • Markdown list:

    1. First item
    
       ``` text
       Hello world!
       ```
    
    2. Second item
       1. Sub-item A (Indented 3 spaces)
    
          Some text
    
          1. Sub-item A (Indented 3 spaces)
          1. Sub-item A (Indented 3 spaces)
             1. Sub-item A (Indented 3 spaces)
             1. Sub-item A (Indented 3 spaces)
          1. Sub-item A (Indented 3 spaces)
       2. Sub-item B
          * Deeply nested bullet (Indented another 3 spaces)
    3. Third item
  • Markdown list output:

    <ol>
      <li>
        <p>First item</p>
        <div class="language-text highlighter-rouge">
          <div class="highlight">
            <pre class="highlight">
              <code>Hello world!</code>
            </pre>
          </div>
        </div>
      </li>
      <li>Second item
        <ol>
          <li>
            <p>Sub-item A (Indented 3 spaces)</p>
            <p>Some text</p>
            <ol>
              <li>Sub-item A (Indented 3 spaces)</li>
              <li>Sub-item A (Indented 3 spaces)
                <ol>
                  <li>Sub-item A (Indented 3 spaces)</li>
                  <li>Sub-item A (Indented 3 spaces)</li>
                </ol>
              </li>
              <li>Sub-item A (Indented 3 spaces)</li>
            </ol>
          </li>
          <li>
            <p>Sub-item B</p>
            <ul>
              <li>Deeply nested bullet (Indented another 3 spaces)</li>
            </ul>
          </li>
        </ol>
      </li>
      <li>Third item</li>
    </ol>

The differences originate from how Kramdown (Markdown) and org-ruby process “loose” lists.

In Markdown, when a list item contains block-level elements (like a nested list, a code block, or a paragraph) or is separated by blank lines, Kramdown considers it a “loose” list item and actively wraps its leading, raw text into a <p> tag. org-ruby, however, natively dumps the raw text directly into the <li> element as an inline node before appending the block-level children.

To force your Org files to mimic Kramdown’s exact wrapping behavior, we can add a logic step right at the end of your HTML processing pipeline. Nokogiri will scan every <li>, check if it contains any block-level child elements, and if it does, it will gather all the leading text up to that block and neatly wrap it in a <p> tag.

--- _plugins/org_converter.rb
+++ _plugins/org_converter.rb
@@ -134,6 +134,28 @@
           target['class'] = (existing_classes + class_names).uniq.join(' ')
         end
       end

+      # Process complex list items to wrap leading text in <p>
+      block_tags = %w[p div ul ol blockquote pre table dl figure h1 h2 h3 h4 h5 h6 hr]
+      doc.css('li').each do |li|
+        first_block = li.children.find { |c| c.element? && block_tags.include?(c.name.downcase) }
+
+        if first_block
+          leading_nodes = []
+          li.children.each do |child|
+            break if child == first_block
+            leading_nodes << child
+          end
+
+          # Check if there is actual inline content/text to wrap
+          has_content = leading_nodes.any? do |n|
+            (n.text? && !n.text.strip.empty?) || (n.element? && n.name.downcase != 'br')
+          end
+
+          if has_content
+            p_node = Nokogiri::XML::Node.new('p', doc)
+            first_block.add_previous_sibling(p_node)
+            leading_nodes.each { |n| p_node.add_child(n) }
+          end
+        end
+      end
+
       doc.to_html
     end
   end
  1. Resolve the (potential) conflict between link_abbrs and #+LINK?

    There will be absolutely no conflict. Because your custom _plugins/link_abbr.rb uses a :pre_render hook, it modifies the raw text document before org-ruby even touches the file. By the time org-ruby wakes up to compile the HTML, your plugin has already replaced [[foo:image.jpg]] with [[/assets/.../image.jpg]]. org-ruby will simply see a standard file link and successfully render it, completely ignoring the now-unused #+LINK: foo directive at the top of the file.

  2. Add Org Syntax Support to Your Plugin

    Adding support for [[foo:filename]] is highly achievable, but introduces a major edge-case in the Front Matter: Unquoted brackets break YAML. If you type - [[foo:image.jpg]] in YAML without quotes, the YAML parser natively interprets [ as an array constructor. It will process your string into a nested Ruby array [["foo:image.jpg"]], which causes standard string comparisons to crash.

    The code intercepts Org-style [[...]] links in the body content, title fields, and implements a safety net to deeply extract the string from the front matter even if the YAML parser incorrectly converted the unquoted brackets into an array.

Related articles:

Notices

In Markdown files you can write notices such as:

**Watch out!** This paragraph of text has been [emphasized](#) with the `{: .notice--warning}` class.
{: .notice--warning}

Because Jekyll processes Org files into HTML using org-ruby before Kramdown can parse them, Kramdown’s block attribute syntax like {: .notice--warning} will not work.

To achieve the exact same result in org-ruby while still being able to use standard Org formatting (like bold, lists, and links) inside the notice, you must inject the raw HTML wrapper directly using the #+html: directive.

#+html: <div class="notice--warning">
*Watch out!*
This is a warning notice inside an Org file.
#+html: </div>

org-ruby passes the lines starting with #+html: straight to the output without modification. Because you leave empty lines between the HTML tags and your text, org-ruby will still evaluate the text in the middle as standard Org syntax, converting Watch out! into bold tags and wrapping the lines in paragraph tags.

However, this is tedious to write compared with the Markdown syntax. It would be much better if we can mimic Kramdown’s attribute syntax in Org files.

Because you are already utilizing Nokogiri in your Jekyll pipeline, we can create a lightweight parser that finds the exact {: .classname } text, strips it out, and natively injects the CSS classes into the correct HTML tags.

modified   _plugins/org_converter.rb
@@ -84,6 +84,39 @@ module Jekyll
         pre.replace(new_node)
       end

+      # Process Kramdown-style attribute lists {: .class1 .class2 }
+      doc.xpath('.//text()[contains(., "{:")]').each do |node|
+        # Skip if the syntax is inside a literal code block
+        next if node.ancestors('pre, code').any?
+
+        if node.content =~ /\{:\s*((?:\.[a-zA-Z0-9_\-–—]+\s*)+)\}/
+          raw_classes = $1
+          normalized_classes = raw_classes.gsub(/[–—]/, '--')
+          class_names = normalized_classes.scan(/\.([a-zA-Z0-9_-]+)/).flatten
+
+          # Strip the syntax from the text node
+          node.content = node.content.sub(/\{:\s*(?:\.[a-zA-Z0-9_\-–—]+\s*)+\}/, '')
+
+          parent = node.parent
+          target = parent
+
+          # Clean up trailing <br> if the syntax was on a new line
+          if node.content.strip.empty? && node.previous_sibling && node.previous_sibling.name == 'br'
+            node.previous_sibling.remove
+          end
+
+          # If removing the syntax leaves the block entirely empty, it targets the previous element
+          if parent.name == 'p' && parent.text.strip.empty? && parent.children.all? { |c| c.name == 'text' || c.name == 'br' }
+            target = parent.previous_element || parent
+            parent.remove if target != parent
+          end
+
+          # Apply the classes safely without overwriting existing ones
+          existing_classes = target['class'] ? target['class'].split(' ') : []
+          target['class'] = (existing_classes + class_names).uniq.join(' ')
+        end
+      end
+
       doc.to_html
     end
   end

Newlines in Body Text

org-ruby and standard Markdown engines preserve newlines during HTML generation for a specific reason: to maintain the readability of the generated HTML source code. By HTML design, browsers interpret any newline in the source code as a single space. While this works perfectly for languages like English that use spaces to separate words, it creates unwanted, unnatural gaps in CJK (Chinese, Japanese, Korean) texts where words are not separated by spaces.

Here are some examples:

English:

Watch out! This is a warning notice inside an Org file. Watch out! This is a warning notice inside an Org file. Watch out! This is a warning notice inside an Org file.

Source:

Watch out! This is a warning notice inside an Org
file. Watch out! This is a warning notice inside
an Org file. Watch out! This is a warning notice
inside an Org file.

Output:

<p>Watch out! This is a warning notice inside an Org
  file. Watch out! This is a warning notice inside
  an Org file. Watch out! This is a warning notice
  inside an Org file.</p>

Japanese:

本日はお時間をいただき、ありがとうございます。私はマカオにあるマカオ理工大学を4年制の学士課程で卒業しました。

Source:

本日はお時間をいただき、ありがとうございます。私は
マカオにあるマカオ理工大学を4年制の学士課程で卒業
しました。

Output:

<p>本日はお時間をいただき、ありがとうございます。私は
  マカオにあるマカオ理工大学を4年制の学士課程で卒業
  しました。</p>

The most efficient and safe approach to fix this is to clean up the text using Nokogiri right before outputting the final HTML. We can implement logic to completely remove newlines (and any surrounding whitespace) only when they are sandwiched between CJK characters, while safely converting all other newlines into a single space so English words don’t get squashed together.

Add the following code to _plugins/org_converter.rb right before the doc.to_html call:

modified   _plugins/org_converter.rb
@@ -242,6 +242,21 @@ module Jekyll
         end
       end

+      # Newline cleanup (Remove extra spaces between CJK characters)
+      cjk = "\p{Han}\p{Hiragana}\p{Katakana}ー、。!?「」『』()【】,.:;"
+      doc.xpath('.//text()[not(ancestor::pre or ancestor::code)]').each do |node|
+        content = node.content
+        next unless content.match?(/[\r\n]/)
+
+        # 1. Completely remove newlines and spaces between CJK characters (uses
+        # lookahead to safely handle overlapping lines)
+        content = content.gsub(/([#{cjk}])\s*[\r\n]+\s*(?=[#{cjk}])/o, '')
+        # 2. Safely convert remaining newlines to a single space
+        content = content.gsub(/\s*[\r\n]+\s*/, ' ')
+
+        node.content = content if content != node.content
+      end
+
       doc.to_html
     end
   end

Why this implementation excels:

  • Bypassing XPath Blindspots: By filtering [not(ancestor::pre or ancestor::code)] via XPath and moving the newline check next unless content.match?(/[\r\n]/) into Ruby, we bypass libxml2’s strict string literal parsing. Ruby will now correctly evaluate and catch every single node that contains a newline.
  • Positive Lookahead (?=...): In a sentence spanning 3 lines, the lookahead (?=[#{cjk}]) asserts that a CJK character exists after the newline without “consuming” it, allowing the engine to successfully match and delete consecutive newlines.
  • Cross-Platform Newlines: Using [\r\n]+ ensures it works flawlessly regardless of whether your files were saved with Windows (\r\n) or Unix (\n) line endings.

Furigana

Here’s an edge case for the furigana handling code. The 3rd and 5th “理工大学” aren’t parsed or matched correctly. One thing in common is that they are all broken by a newline with the | character leading the next line. This may cause some confusion between the Org table syntax and the furigana syntax.

本日(ほんじつ)
本|日(ほん|じつ)
理工大学(りこうだいがく)
理|工|
大|学(り|こう|だい|がく)
理|工|大
|学(り|こう|だい|がく)
理|工|大|学(り|
こう|だい|がく)
理|工|大|学(り
|こう|だい|がく)

本日ほんじつ本|日(ほん|じつ)理工大学りこうだいがく理|工| 大|学(り|こう|だい|がく)理|工|大

学(り こう だい がく)

理|工|大|学(り| こう|だい|がく)理|工|大|学(り

こう だい がく)

Changes:

git diff 1cb2b19..204e959 -- _plugins/org_converter.rb
modified   _plugins/org_converter.rb
@@ -1,6 +1,10 @@
 require 'nokogiri'
 require 'rouge'
 
+module Jekyll
+  FURIGANA_REGEX = /((?:\p{Han}|々|\|)(?:(?:\p{Han}|々|\||\s)*(?:\p{Han}|々|\|))?)\s*[[[ぁ-んァ-ヶー|\s]+][((]][))]/
+end
+
 Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc|
   # Enable Liquid syntax
   if doc.extname.downcase == '.org'
@@ -21,6 +25,15 @@ Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc|
       $1 ? "\n#+BEGIN_HTML\n#{$1}\n#+END_HTML\n" : match
     end
 
+    # Pre-process furigana to strip newlines and prevent org-ruby table corruption
+    doc.content = doc.content.gsub(/#{block_regex}|#{raw_regex}|#{Jekyll::FURIGANA_REGEX}/) do |match|
+      if match.match?(/\A[ \t]*(?:#|\{%)/)
+        match
+      else
+        match.gsub(/[\r\n]+/, '')
+      end
+    end
+
     inline_code = /[=~][^=~\n]+[=~]/
     markdown_link = /(?<!\!)\[([^\]]+)\]\(([^)]+)\)/
 
@@ -213,6 +226,49 @@ module Jekyll
         end
       end
 
+      # Furigana handling
+      # 1. Use XPath to filter out pre/code ancestors at the C-level (libxml2) for maximum speed.
+      # 2. Fast pre-filter using 'contains' so Ruby only processes nodes that actually have parentheses.
+      target_nodes = doc.xpath('.//text()[not(ancestor::pre or ancestor::code) and (contains(., "(") or contains(., "("))]')
+      target_nodes.each do |node|
+        content = node.content
+        if content.match?(Jekyll::FURIGANA_REGEX)
+          new_html = content.gsub(Jekyll::FURIGANA_REGEX) do |match|
+            raw_base = $1
+            raw_ruby = $2
+            clean_base = raw_base.gsub(/\s+/, '')
+            clean_ruby = raw_ruby.gsub(/\s+/, '')
+            bases = clean_base.split('|')
+            rubies = clean_ruby.split('|')
+
+            if bases.length == rubies.length
+              ruby_content = bases.zip(rubies).map { |b, r| "#{b}<rt>#{r}</rt>" }.join('')
+              "<ruby>#{ruby_content}</ruby>"
+            else
+              base = clean_base.delete('|')
+              rb = clean_ruby.delete('|')
+              "<ruby>#{base}<rt>#{rb}</rt></ruby>"
+            end
+          end
+          node.replace(Nokogiri::HTML::DocumentFragment.parse(new_html)) if new_html != content
+        end
+      end

Comments