Automatic Parsing

Requirements

I found an awesome browser extension called Furiganator which can insert Furigana (Ruby text) in the webpage with the following features:

  • Very fast and reliable.
  • Furigana in Hiragana, Katakana or Romaji.
  • Option to show Furigana only on mouse over.
  • Option to make the furigana text unselectable.
  • Option to use color-tags to easily identify the kanji level or grade.
  • Multiple color themes.

In its Licenses, Certificates and Acknowledgements it describes the technologies used:

Now I want to implement the same features in our minimal-mistakes-plus plugin with the same technologies. Specifically,

  • It should be very fast and reliable.
  • Use the the option furigana: true/false in the front matter to turn the whole furigana parsing process on (true) or off (false or unset).
  • It can add Furigana in Hiragana, Katakana or Romaji automatically to normal texts without the furigana pattern using the same technologies as Furiganator depending on the option furigana_mode, which

    • applies to the whole document when specified in the front matter:

      furigana_mode: hiragana/katakana/romaji
    • applies to only the paragraphs marked by the attribute syntax:

      {: furigana_mode="hiragana/katakana/romaji"}

    The attribute syntax takes precedence if both applies to a paragraph.

    If the option is unset, default to hiragana.

  • Use the option furigana_selectable: true/false in the front matter to make the furigana text selectable (true) or unselectable (false or unset).
  • Use the option furigana_mouse_over: true/false to only show Furigana on mouse over (true) or to show it all the time (false or unset).
  • Option to use color-tags to easily identify the kanji level or grade (Do not implement).
  • Multiple color themes (Do not implement).

Kramdown attribute syntax

These options can be applied per paragraph with the Kramdown attribute syntax:

{: furigana_mode="hiragana/katakana/romaji" furigana_selectable="true/false" furigana_mouse_over="true/fasle"}

The attribute syntax overrides the same option in the front matter.

For the options furigana_selectable and furigana_mouse_over:

  • If specified in the front matter add a toolbar just above the table of contents (~class=”toc”~) and inside the sidebar `class=”sidebar__right sticky”`. Then add a small toggle icon (different looks for on and off) for each option inside the toolbar so that they can be toggled on or off for the whole page.
  • If specified per paragraph in Kramdown attribute syntax, add the same toggle icon for each option just above the marked paragraph so that they can be toggled on or off only for that paragraph.

The HTML of the sidebar:

<aside class="sidebar__right sticky">
  <nav class="toc" aria-label="Table of contents">
    <header><h4 class="nav__title">
        <i class="fas fa-file-alt"></i> On this page</h4>
    </header>
  </nav>
</aside>

Use different icons for furigana_selectable and furigana_mouse_over, and each icon should still have different looks when on and off. Make the frgntr-global-toolbar prettier and more nicely integrated with the theme.

Use the following layout for the toolbar above the TOC:

Tools        <-- Title of the entire toolbar section, background color same as the "nav__title" of the TOC
Furigana     <-- Title of the Furigana section, background color same as current
icon1 icon2  <-- The icons, smaller, more compact, no background color

# Light dim border between the sections.

Integration with Existing Furigana Parsing

For the existing furigana parsing process dealing with the pattern:

(X|Y|Z:X1|Y1|Z1)

its processing logic should be kept intact, with these additional changes:

  • Now it belongs to the whole furigana parsing process which can be turned on or off using the option furigana: true/false as mentioned above.
  • If it’s turned off, convert all furigana patterns to normal texts without the pattern in the output, that is, (X|Y|Z:X1|Y1|Z1) becomes XYZ.
  • The pattern takes precedence over the automatic parsing, that is, a pattern should not be sent for automatic processing.

Example Outputs

Here is an example of how the Furiganator extension converts a piece of text to HTML when “Make furigana unselectable” and “Only show furigana on mouse hover” are turned on:

本日はお時間をいただき、ありがとうございます。私はマカオにあるマカオ理工大学を4年制の学士課程で卒業しました。3ヶ月後、鶴ヶ崎さんと会う。

HTML Output:

<ruby furiganated="true">本日<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">ほんじつ</rt>
  <rp>)</rp>
</ruby>はお<ruby furiganated="true">時間<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">じかん</rt>
  <rp>)</rp>
</ruby>をいただき、ありがとうございます。<ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">わたし</rt>
  <rp>)</rp>
</ruby>はマカオにあるマカオ<ruby furiganated="true">理工<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">りこう</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">大学<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">だいがく</rt>
  <rp>)</rp>
</ruby>を4 <ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">とし</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">せい</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">学士<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">がくし</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">課程<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">かてい</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">卒業<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">そつぎょう</rt>
  <rp>)</rp>
</ruby>しました。3<ruby furiganated="true">ヶ月<rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">かげつ</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable "></rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">つる</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable ">さき</rt>
  <rp>)</rp>
</ruby>さんと<ruby furiganated="true"><rp>(</rp>
  <rt class="frgntr_hidden frgntr_unselectable "></rt>
  <rp>)</rp>
</ruby>う。

When “Make furigana unselectable” and “Only show furigana on mouse hover” are both turned off:

<ruby furiganated="true">本日<rp>(</rp>
  <rt class="">ほんじつ</rt>
  <rp>)</rp>
</ruby>はお<ruby furiganated="true">時間<rp>(</rp>
  <rt class="">じかん</rt>
  <rp>)</rp>
</ruby>をいただき、ありがとうございます。<ruby furiganated="true"><rp>(</rp>
  <rt class="">わたし</rt>
  <rp>)</rp>
</ruby>はマカオにあるマカオ<ruby furiganated="true">理工<rp>(</rp>
  <rt class="">りこう</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">大学<rp>(</rp>
  <rt class="">だいがく</rt>
  <rp>)</rp>
</ruby>を4 <ruby furiganated="true"><rp>(</rp>
  <rt class="">とし</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="">せい</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">学士<rp>(</rp>
  <rt class="">がくし</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">課程<rp>(</rp>
  <rt class="">かてい</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true">卒業<rp>(</rp>
  <rt class="">そつぎょう</rt>
  <rp>)</rp>
</ruby>しました。3<ruby furiganated="true">ヶ月<rp>(</rp>
  <rt class="">かげつ</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class=""></rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="">つる</rt>
  <rp>)</rp>
</ruby><ruby furiganated="true"><rp>(</rp>
  <rt class="">さき</rt>
  <rp>)</rp>
</ruby>さんと<ruby furiganated="true"><rp>(</rp>
  <rt class=""></rt>
  <rp>)</rp>
</ruby>う。

Implementation

To achieve a seamless, lightning-fast, and precise automatic Furigana injection using the exact same technologies as Furiganator, we need to architect a Ruby-to-Node.js bridge.

Spawning a new Node.js process for every single paragraph or file would be extremely slow because Kuromoji’s dictionaries take hundreds of milliseconds to load. Instead, the solution uses a persistent Node.js worker that boots up once per Jekyll build, loads the dictionary, and sits idly waiting for arrays of text jobs over stdin.

Here is the complete implementation with the required comments and explanations.

1. Install Node.js Dependencies

Navigate to your plugin’s directory and initialize the Node dependencies. Kuroshiro and its analyzer are strictly required for the Node worker.

cd lib/minimal_mistakes_plus
npm init -y
npm install kuroshiro kuroshiro-analyzer-kuromoji

2. The Node.js Worker (kuroshiro_worker.js)

Create a new file lib/minimal_mistakes_plus/kuroshiro_worker.js. This is the high-performance daemon that parses Japanese morphology via Kuromoji and converts it to HTML Ruby text.

3. The Updated Ruby File (furigana.rb)

Overwrite your existing lib/minimal_mistakes_plus/furigana.rb.

This code acts as the director. It handles global/local settings, strips out manual syntax when toggled off, acts as an isolation barrier for <code>=/=<pre> tags so Kuroshiro doesn’t parse code blocks, and securely streams strings back and forth with the Node worker.

4. GitHub Actions

You should commit the following files:

kuroshiro_worker.js
package.json
package-lock.json

You must not commit the node_modules folder. It is already included in your .gitignore file, which is the correct standard practice.

However, because you are not committing node_modules, there is a hidden trap for GitHub Actions: when your workflow pulls your plugin as a Ruby Gem to build the site, the Node.js dependencies will be missing, causing the Jekyll build to crash when it tries to spawn the worker.

To solve this flawlessly without bloating your Git repository or requiring users to modify their GitHub Actions YAML files, you can add a tiny “auto-install” hook to your Ruby script. GitHub Actions (ubuntu-latest) has Node.js pre-installed, so your plugin can simply instruct it to download the dictionaries automatically on the very first run.

--- lib/minimal_mistakes_plus/furigana.rb
+++ lib/minimal_mistakes_plus/furigana.rb
@@ -20,6 +20,12 @@
           @disabled = true
           return
         end
+
+        # Auto-install Node dependencies if missing (e.g., in GitHub Actions or fresh gem install)
+        unless Dir.exist?(File.join(File.dirname(__FILE__), 'node_modules'))
+          Jekyll.logger.info "Furigana:", "Installing Node.js dependencies. This may take a moment..."
+          system("npm install", chdir: File.dirname(__FILE__))
+        end
 
         @stdin, @stdout, @stderr, @wait_thr = Open3.popen3("node #{script_path}")

Bug Fixes

Characters not found in the dictionary

Some characters are parsed incorrectly, such as 箇 and 个:

<p>Some characters are parsed incorrectly, such as <ruby furiganated="true"><rp>(</rp><rt class="frgntr_unselectable"></rt><rp>)</rp></ruby> or <ruby furiganated="true"><rp>(</rp><rt class="frgntr_unselectable"></rt><rp>)</rp></ruby>:</p>

whose Furigana is added as itself. Is this caused by the kuroshiro backend or our own codes?

This happens because of the underlying Kuromoji dictionary used by the Kuroshiro backend.

When Kuromoji encounters a single kanji character or variant that it doesn’t have a specific contextual reading for (or treats as an unknown word), it defaults to returning the surface character itself as the reading. Kuroshiro then sees a kanji character, sees a reading attached to it, and faithfully wraps it in a <ruby> tag, resulting in <ruby>箇...<rt>箇</rt>...</ruby>.

We can easily fix this by filtering the HTML output. We’ll add a cleanup step to strip the <ruby> tags whenever the base text perfectly matches the reading text. We need to do this on both the Ruby backend (for the static build) and the JavaScript frontend (for the live converter).

Furigana Converter

Now I want to be able to add an input box in a post, with which you can paste Japanese text into it and the page can then instantly display Furigana for the text just below the box. The parsing process should utilize the same mechanism we’ve implemented in kuroshiro_worker.js and furigana.rb.

For example,

# The input box, whose width is as wide as the post content, height is 3 lines
# high by default and expands automatically as the content grows, no scroll bar.
{% include furigana-converter placeholder="Enter you contents..." %}

The text with furigana goes here...

Because Jekyll is a static site generator, furigana.rb and kuroshiro_worker.js only run on your computer during the build phase. Once the site is published, it becomes purely static HTML and cannot send text back to the Ruby script.

To achieve instant, live parsing inside the browser, we need to load the exact same Kuroshiro and Kuromoji engines directly into the client via a CDN. The logic will be identical to your backend worker, and it will even detect and apply your global visibility/selectability toggles.

Create a new file at _includes/furigana-converter.html with the following contents (see the source file in the repository).

Then add to your post:

{% include furigana-converter.html placeholder="Enter your contents..." %}

I see the message “Downloading dictionary (first time only, ~18MB)…” when I first click the input box every time after I refresh the page. Is it actually downloading the libraries every time you refresh the page? Can we cache the libraries so you don’t have to download it every time after a normal refresh, but only when you force a refresh?

Actually, your browser is already caching those files automatically.

The CDN we are using (cdn.jsdelivr.net) sends aggressive caching headers. On your first visit, the browser downloads the 18MB dictionary. On subsequent normal refreshes, the browser fetches those files directly from its local disk cache, which requires zero internet bandwidth.

The delay you are experiencing is the initialization phase. Every time you refresh the page, the browser’s JavaScript memory is wiped clean. When you click the input box, the Kuromoji engine has to read the cached files, decompress them, and rebuild its complex text-parsing data structures (a double-array trie) in your computer’s RAM. This computation takes a second or two, even on fast devices.

Because it’s impossible to persist instantiated JavaScript memory states across page reloads without deeply rewriting the library to support IndexedDB serialization, this brief loading phase is unavoidable.

To prevent this confusion, we should update the wording in the status message to reflect what is actually happening:

--- lib/minimal_mistakes_plus/furigana-converter.rb
+++ lib/minimal_mistakes_plus/furigana-converter.rb
@@ -66,7 +66,7 @@
             kuroshiroInitializing = true;
             if (statusEl) {
                 statusEl.style.display = 'block';
-                statusEl.innerText = "Downloading dictionary (first time only, ~18MB)...";
+                statusEl.innerText = "Loading dictionary engine...";
             }
 
             try {

Bug fixes

Kuromoji Resources Not Found

On the GitHub Actions webpage when clicking on the Furigana Converter input box, the whole page freezes. Errors reporting kuromoji resources 404 (Not Found) can be seen in the console, as shown in the screenshot. It works fine when running locally. I think it’s because the CDN links are wrong, such as https://josephtesfaye.com/cdn.jsdelivr.net/npm/kuromoji@0.1.2/dict/base.dat.gz. How does this happen? Fix it.

Gemini (Pro):

This is a deep, notorious quirk caused by a combination of Kuromoji’s internal code and the browser’s URL resolution rules. Here is exactly what is happening:

  1. The path.join Bug: Inside the Kuromoji library, the loader combines your dictPath with the filename using a browser polyfill of Node’s path.join(). This polyfill strictly normalizes paths and strips out double slashes. It takes “https://cdn.jsdelivr.net/npm/kuromoji@0.1.2/dict/” and turns it into “https:/cdn.jsdelivr.net/npm/kuromoji@0.1.2/dict/base.dat.gz” (with only one slash after https:).
  2. Why it worked locally (HTTP): Your local server runs on http://. When the browser’s XMLHttpRequest tries to fetch https:/cdn..., it sees that the scheme (https) doesn’t match your local server (http). Because it’s a cross-scheme request, the browser safely auto-corrects the missing slash and fetches the dictionary from the external CDN.
  3. Why it failed in Actions (HTTPS): GitHub Pages runs on https://. When the browser sees the request to https:/cdn..., the schemes match perfectly. According to standard web specification, a matching scheme with a single slash is treated as a relative absolute-path on the current domain. Therefore, the browser forcefully appends it to your site, resulting in the 404 error at https://josephtesfaye.com/cdn.jsdelivr.net/…!

Updated: