Enabling Drafting in Org Mode in Minimal Mistakes
From now on I will be able to write posts in Org Mode for this blog, besides the existing Markdown method. Here’s how I configure it. This article is written in an Org file and also serves as a demonstration of this new feature.
First, you can do everything in the original Markdown way, which is kept intact. Now you can write everything in Org Mode with the extra benefit of using front matter, Liquid syntax, figures, galleries, etc. exactly the same as in Markdown.
Here are the basic elements demonstrated:
Org Elements Examples
Headings
Custom ID
In Markdown you can write the attribute syntax {: id="your-custom-id"} to assign a custom ID to a heading. In Org you can achieve the same by using the conventional CUSTOM_ID in a property drawer.
Topdown IDs and ID Deduplication
By default, heading id-s are generated automatically by transforming the heading to a hyphenated string. By adding heading_ids: topdown in the front matter you can enable topdown IDs, which are generated by concatenating for each heading all the id-s of its parent headings, if any. For example,
* Heading 1
# id: heading-1
** Foo
# id: heading-1-foo
*** Bar
# id: heading-1-foo-bar
* Heading 2
# id: heading-2
** Foo
# id: heading-2-foo
*** Bar
# id: heading-2-foo-barAt any rate, there is a duplication check during the generation. If a duplicate ID is found, a number will be appended incrementally.
However, CUSTOM_ID is never affected.
Heading Numbering
Headings can be numbered automatically. You can give numbers to all headings of a post or only the headings of a subtree:
- For all headings: Add
ordered: trueto the front matter of the post. -
For all headings under a subtree: Add the attribute
{: .ordered}right under a heading. All of its sub-headings will be numbered. For example:*** Headings A {: .ordered} **** Heading 1 ***** Heading 1.1 ****** Heading 1.1.1 ***** Heading 1.2 **** Heading 2 ***** Heading 2.1 *** Headings B **** Heading B1 {: .ordered} ***** Heading B1.1 ****** Heading B1.1.1 ***** Heading B1.2 **** Heading B2 ***** Heading B2.1Output:
* Headings ** Headings A 1. Heading 1 1.1. Heading 1.1 1.1.1. Heading 1.1.1 1.2. Heading 1.2 2. Heading 2 2.1. Heading 2.1 ** Headings B *** Heading 1 1. Heading 1.1 1.1. Heading 1.1.1 2. Heading 1.2 *** Heading 2 **** Heading 2.1If a heading is marked with the
{: .ordered}attribute the numbering for its sub-headings should always start from1regardless of its level in the document or its position relative its siblings.
If both ordered: true and {: .ordered} are present for a heading, the rules of the latter should override the former.
Heading A
1. Heading 1
1.1. Heading 1.1
Heading 1.1.11.2. Heading 1.2
2. Heading 2
2.1. Heading 2.1
Heading B
Heading B1
1. Heading B1.1
Heading B1.1.12. Heading B1.2
Heading B2
Heading B2.1
Tables
Here is the morphological breakdown:
| Component | Origin | Meaning |
|---|---|---|
| Erythro- | Greek (erythros) | Red |
| -xyl- | Greek (xylon) | Wood |
| -aceae | Latin (Suffix) | Belonging to the family of |
Lists
An ordered list with multiple levels:
-
First item
Hello world! -
Second item
-
Sub-item A (Indented 3 spaces)
Some text
- Sub-item A (Indented 3 spaces)
-
Sub-item A (Indented 3 spaces)
- Sub-item A (Indented 3 spaces)
- Sub-item A (Indented 3 spaces)
- Sub-item A (Indented 3 spaces)
-
Sub-item B
- Deeply nested bullet (Indented another 3 spaces)
- Deeply nested bullet (Indented another 3 spaces)
-
- Third item
Images
A simple image:

A figure:

A gallery (with customization):
Code highlighting
Inline codes: print, hello
Using : to start a line of code:
descriptor + subject + taxonomic rank
Using structure template (code block):
def print_hi(name)
puts "Hi, #{name}"
end
print_hi('Tom')
#=> prints 'Hi, Tom' to STDOUT.Text block:
Hello
{% include gallery columns=6 caption="My custom gallery" %}
{% include gallery columns=6 caption="My custom gallery" %}Notices
{: .notice--warning}
*Watch out!* This is a warning notice inside an Org file.Watch out! This is a warning notice inside an Org file.
Watch out! This is a warning notice inside an Org file.
Watch out! This is a warning notice inside an Org file.
Watch out! This is a warning notice inside an Org file.
Watch out! This is a warning notice inside an Org file.
Watch out! This is a warning notice inside an Org file.
Furigana
You can write furigana for Japanese Kanji words like the following:
本|日(ほん|じつ)はお時|間(じ|かん)をいただき、ありがとうございます。私はマカ
オにあるマカオ理|工|大|学(り|こう|だい|がく)を4年|制(ねん|せい)の学|士(が
く|し)課|程(か|てい)で卒業(そつ|ぎょう)しました。本|日(ほん|じつ)はお時|間(じ|かん)をいただき、ありがとうございます。私はマカオにあるマカオ理|工|大|学(り|こう|だい|がく)を4年|制(ねん|せい)の学|士(がく|し)課|程(か|てい)で卒業(そつ|ぎょう)しました。
Configurations
Basic Converter
Here’s how I did the configurations.
-
Add
org-rubyto yourGemfile:group :jekyll_plugins do ... gem "org-ruby" end -
Write a new plugin:
_plugins/org_converter.rbrequire 'nokogiri' require 'rouge' Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc| # Enable Liquid syntax if doc.extname.downcase == '.org' # Enable {% raw %}...{% endraw %} @@LIT_2@@ # Enable figures, galleries, and links to posts (skip inside code or raw blocks) doc.content = doc.content.gsub(/#{block_regex}|#{raw_regex}|#{include_regex}/) do |match| $1 ? "\n#+BEGIN_HTML\n#{$1}\n#+END_HTML\n" : match end inline_code = /[=~][^=~\n]+[=~]/ markdown_link = /(?<!\!)\[([^\]]+)\]\(([^)]+)\)/ doc.content = doc.content.gsub(/#{block_regex}|^[ \t]*:[^\n]*$|#{inline_code}|#{markdown_link}/) do |match| match.start_with?('[') && !match.start_with?('[[') ? "[[#{$2}][#{$1}]]" : match end end end module Jekyll class OrgConverter < Converter safe true priority :low def matches(ext) ext =~ /^\.org$/i end def output_ext(ext) ".html" end def convert(content) require 'org-ruby' html = Orgmode::Parser.new(content).to_html doc = Nokogiri::HTML.fragment(html) doc.css('h1, h2, h3, h4, h5, h6').each do |node| node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-') end # Add <thead> to tables doc.css('table').each do |table| trs = table.xpath('./tr') next if trs.empty? if trs.first.at_xpath('./th') thead = Nokogiri::XML::Node.new('thead', doc) thead.add_child(trs.first) tbody = Nokogiri::XML::Node.new('tbody', doc) trs[1..-1].each { |tr| tbody.add_child(tr) } table.add_child(thead) table.add_child(tbody) else tbody = Nokogiri::XML::Node.new('tbody', doc) trs.each { |tr| tbody.add_child(tr) } table.add_child(tbody) end end # Process code blocks to enable code highlighting doc.css('pre.src[lang]').each do |pre| lang = pre['lang'] lexer = Rouge::Lexer.find_fancy(lang) || Rouge::Lexers::PlainText formatter = Rouge::Formatters::HTML.new # .sub(/\A[\r\n]+/, '') targets the absolute beginning of the string # (\A) and removes any leading line breaks or carriage returns. # .sub(/\s+\z/, '') targets the absolute end of the string (\z) and # removes any trailing whitespace, including empty lines and spaces. code_text = pre.text.sub(/\A[\r\n]+/, '').sub(/\s+\z/, '') highlighted = formatter.format(lexer.lex(code_text)) new_node = Nokogiri::HTML.fragment(%Q{ <div class="language-#{lang} highlighter-rouge"> <div class="highlight"><pre class="highlight"><code>#{highlighted}</code></pre></div> </div> }) pre.replace(new_node) end doc.to_html end end end -
Install the gems with bundler and run:
bundle install bundle exec jekyll serve
Headings
Custom ID
In markdown you can write {: id="20260116203517"} to assign a custom ID to a heading. In Org there is the CUSTOM_ID property in a drawer. By default, org-ruby completely drops :PROPERTIES: drawers when converting to HTML, which is why your CUSTOM_ID values are disappearing before Nokogiri even has a chance to see them. Additionally, your current org_converter.rb script forcefully overwrites every heading’s ID with a hyphenated slug.
The fastest, most efficient way to make this work is to pre-process the Org document line-by-line right before passing it to org-ruby. We can detect the :CUSTOM_ID: inside the drawer, temporarily “smuggle” it directly into the heading’s text as a special marker (%%CUSTOM_ID:...%%), and then have your Nokogiri loop extract it and safely assign it as the real HTML ID.
Here are the lines to change in _plugins/org_converter.rb:
modified _plugins/org_converter.rb
@@ -36,12 +36,42 @@ module Jekyll
end
def convert(content)
+ # Parse CUSTOM_ID properties and inject a temporary marker
+ lines = content.lines
+ lines.each_with_index do |line, index|
+ if line =~ /^\*+[ \t]+/
+ j = index + 1
+ if j < lines.length && lines[j] =~ /^[ \t]*:PROPERTIES:[ \t]*$/i
+ k = j + 1
+ custom_id = nil
+ while k < lines.length && lines[k] !~ /^[ \t]*:END:[ \t]*$/i && lines[k] !~ /^\*+[ \t]+/
+ if lines[k] =~ /^[ \t]*:CUSTOM_ID:[ \t]+(\S+)/i
+ custom_id = $1
+ end
+ k += 1
+ end
+ if custom_id && lines[k] =~ /^[ \t]*:END:[ \t]*$/i
+ lines[index] = lines[index].chomp + " %%CUSTOM_ID:#{custom_id}%%\n"
+ end
+ end
+ end
+ end
+ content = lines.join
+
require 'org-ruby'
html = Orgmode::Parser.new(content).to_html
doc = Nokogiri::HTML.fragment(html)
doc.css('h1, h2, h3, h4, h5, h6').each do |node|
- node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
+ if node.content =~ /%%CUSTOM_ID:(\S+)%%/
+ node['id'] = $1
+ # Clean the marker out of all text nodes inside this heading
+ node.xpath('.//text()').each do |t|
+ t.content = t.content.gsub(/\s*%%CUSTOM_ID:\S+%%/, '')
+ end
+ else
+ node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-')
+ end
end
# Add <thead> to tablesTopdown IDs and ID Deduplication
To achieve this, we need to extract the heading_ids: topdown preference from the front matter and pass it into the org_converter.rb pipeline. Because the convert(content) method strictly receives the raw string and does not have native access to the front matter variables, the most efficient approach is to inject a temporary marker (%%HEADING_IDS:topdown%%) into the document during the :pre_render hook, and then seamlessly extract it inside the converter before passing the content to org-ruby.
We can simultaneously implement a stack-based hierarchy tracking array and an ID tracking hash to guarantee uniqueness and concatenation.
Here are the exact diffs to update the _plugins/org_converter.rb file.
-
Inject the configuration marker
Add the injection logic to your
:pre_renderhook:--- _plugins/org_converter.rb +++ _plugins/org_converter.rb @@ -4,6 +4,10 @@ Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc| # Enable Liquid syntax if doc.extname.downcase == '.org' + if doc.data['heading_ids'] == 'topdown' + doc.content = "%%HEADING_IDS:topdown%%\n" + doc.content + end + # Enable {{ "{%" }} raw %}...{{ "{%" }} endraw %} doc.content = doc.content.gsub(/^[ \t]*(\{%[ \t]*raw[ \t]*%\})[ \t]*\n?/, '') doc.content = doc.content.gsub(/^[ \t]*(\{%[ \t]*endraw[ \t]*%\})[ \t]*\n?/, '') -
Implement Topdown and Duplicate Logic
Update the convert function to intercept the marker and apply the new logic dynamically:
modified _plugins/org_converter.rb @@ -40,6 +44,11 @@ module Jekyll end def convert(content) + topdown = false + if content.sub!(/\A%%HEADING_IDS:topdown%%\r?\n/, '') + topdown = true + end + # Parse CUSTOM_ID properties and inject a temporary marker lines = content.lines lines.each_with_index do |line, index| @@ -66,16 +75,44 @@ module Jekyll html = Orgmode::Parser.new(content).to_html doc = Nokogiri::HTML.fragment(html) + id_stack = Array.new(6) + seen_ids = {} + doc.css('h1, h2, h3, h4, h5, h6').each do |node| + level = node.name[1].to_i + custom_id = nil + if node.content =~ /%%CUSTOM_ID:(\S+)%%/ - node['id'] = $1 + custom_id = $1 # Clean the marker out of all text nodes inside this heading node.xpath('.//text()').each do |t| t.content = t.content.gsub(/\s*%%CUSTOM_ID:\S+%%/, '') end + end + + slug = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-') + slug = 'section' if slug.empty? + + if custom_id + base_id = custom_id + elsif topdown + parent_id = id_stack[0...level-1].compact.last + base_id = parent_id ? "#{parent_id}-#{slug}" : slug else - node['id'] = node.text.downcase.strip.gsub(/[^a-z0-9\s-]/, '').gsub(/\s+/, '-') + base_id = slug end + + final_id = base_id + count = 1 + while seen_ids.key?(final_id) + final_id = "#{base_id}-#{count}" + count += 1 + end + seen_ids[final_id] = true + + node['id'] = final_id + id_stack[level - 1] = final_id + (level..5).each { |i| id_stack[i] = nil } end # Add <thead> to tables
Heading Numbering
To make this work seamlessly, we must append a :post_render hook to the bottom of org_converter.rb. This is structurally necessary for two reasons:
- Front Matter Access: Jekyll strips front matter (
ordered: true) before passing the text to the convert function, soorg_convertercannot natively see it. - TOC Availability: Minimal Mistakes injects the Table of Contents via Liquid templates during the layout phase. If we try to inject numbers inside the convert step, the TOC doesn’t exist in the HTML yet, meaning it would remain unnumbered.
By running this logic in :post_render, we have full access to both the front matter and the fully generated layout (including the TOC).
Append these lines to the very end of the _plugins/org_converter.rb file:
modified _plugins/org_converter.rb
@@ -151,3 +151,53 @@ module Jekyll
end
end
end
+
+Jekyll::Hooks.register [:pages, :documents], :post_render do |doc|
+ # Only process Org Mode files
+ next unless doc.extname.downcase == '.org'
+
+ is_ordered_post = doc.data['ordered'] == true || doc.data['ordered'] == 'true'
+ fragment = Nokogiri::HTML(doc.output)
+ modified = false
+
+ counters = Array.new(7, 0)
+ anchor_stack = is_ordered_post ? [0] : []
+
+ main_content = fragment.at_css('section.page__content') || fragment.at_css('#main') || fragment
+
+ main_content.css('h1, h2, h3, h4, h5, h6').each do |heading|
+ # Skip structural layout headings
+ next if heading.ancestors('.toc, .page__comments, .page__related, .sidebar').any?
+
+ level = heading.name[1].to_i
+ # Exit any ordered scopes that are at the same or higher level than the current heading
+ anchor_stack.reject! { |a| a >= level }
+
+ counters[level] += 1
+ ((level + 1)..6).each { |i| counters[i] = 0 }
+
+ if anchor_stack.any?
+ active_anchor = anchor_stack.last
+ visible_counters = counters[(active_anchor + 1)..level]
+
+ if visible_counters.any?
+ number_prefix = visible_counters.join('.') + '. '
+ heading.inner_html = "<span class=\"heading-number\">#{number_prefix}</span>" + heading.inner_html
+
+ if heading['id']
+ toc_link = fragment.at_css(".toc__menu a[href='##{heading['id']}']")
+ if toc_link
+ toc_link.inner_html = "<span class=\"toc-number\">#{number_prefix}</span>" + toc_link.inner_html
+ end
+ end
+ modified = true
+ end
+ end
+
+ if heading['class'] && heading['class'].split.include?('ordered')
+ anchor_stack << level
+ end
+ end
+
+ doc.output = fragment.to_html if modified
+endLists
Normalizing Loose Lists
The outputs of the same list structure produced from Org and Markdown are slightly different as indicated in the below examples by the “Diff” comments. It seems that in Markdown when an item has sub-contents like a literal block, a sub-list, or a paragraph, its first content would be enclosed in <p>, while in Org this doesn’t happen. How to make Org list also follow this pattern in the output?
-
Org list:
1. First item #+begin_src text Hello world! #+end_src 2. Second item 1. Sub-item A (Indented 3 spaces) Some text 1. Sub-item A (Indented 3 spaces) 2. Sub-item A (Indented 3 spaces) 1. Sub-item A (Indented 3 spaces) 2. Sub-item A (Indented 3 spaces) 3. Sub-item A (Indented 3 spaces) 2. Sub-item B * Deeply nested bullet (Indented another 3 spaces) 3. Third item -
Org list output:
<ol> <li>First item <!-- Diff 1: No <p> around the content --> <div class="language-text highlighter-rouge"> <div class="highlight"> <pre class="highlight"> <code>Hello world!</code> </pre> </div> </div> </li> <li>Second item <ol> <li>Sub-item A (Indented 3 spaces) <!-- Diff 2: No <p> around the content --> <p>Some text</p> <ol> <li>Sub-item A (Indented 3 spaces)</li> <li>Sub-item A (Indented 3 spaces) <ol> <li>Sub-item A (Indented 3 spaces)</li> <li>Sub-item A (Indented 3 spaces)</li> </ol> </li> <li>Sub-item A (Indented 3 spaces)</li> </ol> </li> <li>Sub-item B <!-- Diff 3: No <p> around the content --> <ul> <li>Deeply nested bullet (Indented another 3 spaces)</li> </ul> </li> </ol> </li> <li>Third item</li> </ol> -
Markdown list:
1. First item ``` text Hello world! ``` 2. Second item 1. Sub-item A (Indented 3 spaces) Some text 1. Sub-item A (Indented 3 spaces) 1. Sub-item A (Indented 3 spaces) 1. Sub-item A (Indented 3 spaces) 1. Sub-item A (Indented 3 spaces) 1. Sub-item A (Indented 3 spaces) 2. Sub-item B * Deeply nested bullet (Indented another 3 spaces) 3. Third item -
Markdown list output:
<ol> <li> <p>First item</p> <div class="language-text highlighter-rouge"> <div class="highlight"> <pre class="highlight"> <code>Hello world!</code> </pre> </div> </div> </li> <li>Second item <ol> <li> <p>Sub-item A (Indented 3 spaces)</p> <p>Some text</p> <ol> <li>Sub-item A (Indented 3 spaces)</li> <li>Sub-item A (Indented 3 spaces) <ol> <li>Sub-item A (Indented 3 spaces)</li> <li>Sub-item A (Indented 3 spaces)</li> </ol> </li> <li>Sub-item A (Indented 3 spaces)</li> </ol> </li> <li> <p>Sub-item B</p> <ul> <li>Deeply nested bullet (Indented another 3 spaces)</li> </ul> </li> </ol> </li> <li>Third item</li> </ol>
The differences originate from how Kramdown (Markdown) and org-ruby process “loose” lists.
In Markdown, when a list item contains block-level elements (like a nested list, a code block, or a paragraph) or is separated by blank lines, Kramdown considers it a “loose” list item and actively wraps its leading, raw text into a <p> tag. org-ruby, however, natively dumps the raw text directly into the <li> element as an inline node before appending the block-level children.
To force your Org files to mimic Kramdown’s exact wrapping behavior, we can add a logic step right at the end of your HTML processing pipeline. Nokogiri will scan every <li>, check if it contains any block-level child elements, and if it does, it will gather all the leading text up to that block and neatly wrap it in a <p> tag.
--- _plugins/org_converter.rb
+++ _plugins/org_converter.rb
@@ -134,6 +134,28 @@
target['class'] = (existing_classes + class_names).uniq.join(' ')
end
end
+ # Process complex list items to wrap leading text in <p>
+ block_tags = %w[p div ul ol blockquote pre table dl figure h1 h2 h3 h4 h5 h6 hr]
+ doc.css('li').each do |li|
+ first_block = li.children.find { |c| c.element? && block_tags.include?(c.name.downcase) }
+
+ if first_block
+ leading_nodes = []
+ li.children.each do |child|
+ break if child == first_block
+ leading_nodes << child
+ end
+
+ # Check if there is actual inline content/text to wrap
+ has_content = leading_nodes.any? do |n|
+ (n.text? && !n.text.strip.empty?) || (n.element? && n.name.downcase != 'br')
+ end
+
+ if has_content
+ p_node = Nokogiri::XML::Node.new('p', doc)
+ first_block.add_previous_sibling(p_node)
+ leading_nodes.each { |n| p_node.add_child(n) }
+ end
+ end
+ end
+
doc.to_html
end
endLinks
Link Abbreviations
-
Resolve the (potential) conflict between
link_abbrsand#+LINK?There will be absolutely no conflict. Because your custom
_plugins/link_abbr.rbuses a:pre_renderhook, it modifies the raw text document beforeorg-rubyeven touches the file. By the timeorg-rubywakes up to compile the HTML, your plugin has already replaced[[foo:image.jpg]]with[[/assets/.../image.jpg]].org-rubywill simply see a standard file link and successfully render it, completely ignoring the now-unused#+LINK: foodirective at the top of the file. -
Add Org Syntax Support to Your Plugin
Adding support for
[[foo:filename]]is highly achievable, but introduces a major edge-case in the Front Matter: Unquoted brackets break YAML. If you type- [[foo:image.jpg]]in YAML without quotes, the YAML parser natively interprets[as an array constructor. It will process your string into a nested Ruby array[["foo:image.jpg"]], which causes standard string comparisons to crash.The code intercepts Org-style
[[...]]links in the body content, title fields, and implements a safety net to deeply extract the string from the front matter even if the YAML parser incorrectly converted the unquoted brackets into an array.
Related articles:
- Enabling Link Abbreviations in Minimal Mistakes
- Migrating Link Abbreviations to Minimal Mistakes Plus
- User Guide: Link Abbreviations
Notices
In Markdown files you can write notices such as:
**Watch out!** This paragraph of text has been [emphasized](#) with the `{: .notice--warning}` class.
{: .notice--warning}Because Jekyll processes Org files into HTML using org-ruby before Kramdown can parse them, Kramdown’s block attribute syntax like {: .notice--warning} will not work.
To achieve the exact same result in org-ruby while still being able to use standard Org formatting (like bold, lists, and links) inside the notice, you must inject the raw HTML wrapper directly using the #+html: directive.
#+html: <div class="notice--warning">
*Watch out!*
This is a warning notice inside an Org file.
#+html: </div>org-ruby passes the lines starting with #+html: straight to the output without modification. Because you leave empty lines between the HTML tags and your text, org-ruby will still evaluate the text in the middle as standard Org syntax, converting Watch out! into bold tags and wrapping the lines in paragraph tags.
However, this is tedious to write compared with the Markdown syntax. It would be much better if we can mimic Kramdown’s attribute syntax in Org files.
Because you are already utilizing Nokogiri in your Jekyll pipeline, we can create a lightweight parser that finds the exact {: .classname } text, strips it out, and natively injects the CSS classes into the correct HTML tags.
modified _plugins/org_converter.rb
@@ -84,6 +84,39 @@ module Jekyll
pre.replace(new_node)
end
+ # Process Kramdown-style attribute lists {: .class1 .class2 }
+ doc.xpath('.//text()[contains(., "{:")]').each do |node|
+ # Skip if the syntax is inside a literal code block
+ next if node.ancestors('pre, code').any?
+
+ if node.content =~ /\{:\s*((?:\.[a-zA-Z0-9_\-–—]+\s*)+)\}/
+ raw_classes = $1
+ normalized_classes = raw_classes.gsub(/[–—]/, '--')
+ class_names = normalized_classes.scan(/\.([a-zA-Z0-9_-]+)/).flatten
+
+ # Strip the syntax from the text node
+ node.content = node.content.sub(/\{:\s*(?:\.[a-zA-Z0-9_\-–—]+\s*)+\}/, '')
+
+ parent = node.parent
+ target = parent
+
+ # Clean up trailing <br> if the syntax was on a new line
+ if node.content.strip.empty? && node.previous_sibling && node.previous_sibling.name == 'br'
+ node.previous_sibling.remove
+ end
+
+ # If removing the syntax leaves the block entirely empty, it targets the previous element
+ if parent.name == 'p' && parent.text.strip.empty? && parent.children.all? { |c| c.name == 'text' || c.name == 'br' }
+ target = parent.previous_element || parent
+ parent.remove if target != parent
+ end
+
+ # Apply the classes safely without overwriting existing ones
+ existing_classes = target['class'] ? target['class'].split(' ') : []
+ target['class'] = (existing_classes + class_names).uniq.join(' ')
+ end
+ end
+
doc.to_html
end
endNewlines in Body Text
org-ruby and standard Markdown engines preserve newlines during HTML generation for a specific reason: to maintain the readability of the generated HTML source code. By HTML design, browsers interpret any newline in the source code as a single space. While this works perfectly for languages like English that use spaces to separate words, it creates unwanted, unnatural gaps in CJK (Chinese, Japanese, Korean) texts where words are not separated by spaces.
Here are some examples:
English:
Watch out! This is a warning notice inside an Org file. Watch out! This is a warning notice inside an Org file. Watch out! This is a warning notice inside an Org file.
Source:
Watch out! This is a warning notice inside an Org
file. Watch out! This is a warning notice inside
an Org file. Watch out! This is a warning notice
inside an Org file.
Output:
<p>Watch out! This is a warning notice inside an Org
file. Watch out! This is a warning notice inside
an Org file. Watch out! This is a warning notice
inside an Org file.</p>Japanese:
本日はお時間をいただき、ありがとうございます。私はマカオにあるマカオ理工大学を4年制の学士課程で卒業しました。
Source:
本日はお時間をいただき、ありがとうございます。私は
マカオにあるマカオ理工大学を4年制の学士課程で卒業
しました。
Output:
<p>本日はお時間をいただき、ありがとうございます。私は
マカオにあるマカオ理工大学を4年制の学士課程で卒業
しました。</p>The most efficient and safe approach to fix this is to clean up the text using Nokogiri right before outputting the final HTML. We can implement logic to completely remove newlines (and any surrounding whitespace) only when they are sandwiched between CJK characters, while safely converting all other newlines into a single space so English words don’t get squashed together.
Add the following code to _plugins/org_converter.rb right before the doc.to_html call:
modified _plugins/org_converter.rb
@@ -242,6 +242,21 @@ module Jekyll
end
end
+ # Newline cleanup (Remove extra spaces between CJK characters)
+ cjk = "\p{Han}\p{Hiragana}\p{Katakana}ー、。!?「」『』()【】,.:;"
+ doc.xpath('.//text()[not(ancestor::pre or ancestor::code)]').each do |node|
+ content = node.content
+ next unless content.match?(/[\r\n]/)
+
+ # 1. Completely remove newlines and spaces between CJK characters (uses
+ # lookahead to safely handle overlapping lines)
+ content = content.gsub(/([#{cjk}])\s*[\r\n]+\s*(?=[#{cjk}])/o, '')
+ # 2. Safely convert remaining newlines to a single space
+ content = content.gsub(/\s*[\r\n]+\s*/, ' ')
+
+ node.content = content if content != node.content
+ end
+
doc.to_html
end
endWhy this implementation excels:
- Bypassing XPath Blindspots: By filtering
[not(ancestor::pre or ancestor::code)]via XPath and moving the newline checknext unless content.match?(/[\r\n]/)into Ruby, we bypass libxml2’s strict string literal parsing. Ruby will now correctly evaluate and catch every single node that contains a newline. - Positive Lookahead
(?=...): In a sentence spanning 3 lines, the lookahead(?=[#{cjk}])asserts that a CJK character exists after the newline without “consuming” it, allowing the engine to successfully match and delete consecutive newlines. - Cross-Platform Newlines: Using
[\r\n]+ensures it works flawlessly regardless of whether your files were saved with Windows (\r\n) or Unix (\n) line endings.
Furigana
Here’s an edge case for the furigana handling code. The 3rd and 5th “理工大学” aren’t parsed or matched correctly. One thing in common is that they are all broken by a newline with the | character leading the next line. This may cause some confusion between the Org table syntax and the furigana syntax.
本日(ほんじつ)
本|日(ほん|じつ)
理工大学(りこうだいがく)
理|工|
大|学(り|こう|だい|がく)
理|工|大
|学(り|こう|だい|がく)
理|工|大|学(り|
こう|だい|がく)
理|工|大|学(り
|こう|だい|がく)本日本|日(ほん|じつ)理工大学理|工| 大|学(り|こう|だい|がく)理|工|大
| 学(り | こう | だい | がく) |
理|工|大|学(り| こう|だい|がく)理|工|大|学(り
| こう | だい | がく) |
Changes:
git diff 1cb2b19..204e959 -- _plugins/org_converter.rbmodified _plugins/org_converter.rb
@@ -1,6 +1,10 @@
require 'nokogiri'
require 'rouge'
+module Jekyll
+ FURIGANA_REGEX = /((?:\p{Han}|々|\|)(?:(?:\p{Han}|々|\||\s)*(?:\p{Han}|々|\|))?)\s*[[[ぁ-んァ-ヶー|\s]+][((]][))]/
+end
+
Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc|
# Enable Liquid syntax
if doc.extname.downcase == '.org'
@@ -21,6 +25,15 @@ Jekyll::Hooks.register [:pages, :documents], :pre_render do |doc|
$1 ? "\n#+BEGIN_HTML\n#{$1}\n#+END_HTML\n" : match
end
+ # Pre-process furigana to strip newlines and prevent org-ruby table corruption
+ doc.content = doc.content.gsub(/#{block_regex}|#{raw_regex}|#{Jekyll::FURIGANA_REGEX}/) do |match|
+ if match.match?(/\A[ \t]*(?:#|\{%)/)
+ match
+ else
+ match.gsub(/[\r\n]+/, '')
+ end
+ end
+
inline_code = /[=~][^=~\n]+[=~]/
markdown_link = /(?<!\!)\[([^\]]+)\]\(([^)]+)\)/
@@ -213,6 +226,49 @@ module Jekyll
end
end
+ # Furigana handling
+ # 1. Use XPath to filter out pre/code ancestors at the C-level (libxml2) for maximum speed.
+ # 2. Fast pre-filter using 'contains' so Ruby only processes nodes that actually have parentheses.
+ target_nodes = doc.xpath('.//text()[not(ancestor::pre or ancestor::code) and (contains(., "(") or contains(., "("))]')
+ target_nodes.each do |node|
+ content = node.content
+ if content.match?(Jekyll::FURIGANA_REGEX)
+ new_html = content.gsub(Jekyll::FURIGANA_REGEX) do |match|
+ raw_base = $1
+ raw_ruby = $2
+ clean_base = raw_base.gsub(/\s+/, '')
+ clean_ruby = raw_ruby.gsub(/\s+/, '')
+ bases = clean_base.split('|')
+ rubies = clean_ruby.split('|')
+
+ if bases.length == rubies.length
+ ruby_content = bases.zip(rubies).map { |b, r| "#{b}<rt>#{r}</rt>" }.join('')
+ "<ruby>#{ruby_content}</ruby>"
+ else
+ base = clean_base.delete('|')
+ rb = clean_ruby.delete('|')
+ "<ruby>#{base}<rt>#{rb}</rt></ruby>"
+ end
+ end
+ node.replace(Nokogiri::HTML::DocumentFragment.parse(new_html)) if new_html != content
+ end
+ end
Comments