[EN] Automatic Meaningful Custom IDs for Org Headings
Spoiler alert, I will just modify a bit of code that already exists, go directly to the bottom if you want the solution, or read the whole post if you are interested in how I got there.
Update 2021-11-22
I’ve put the code presented here as a complete package. You can find it in this repository or in its GitHub mirror (be aware the latter may not be as up-to-date as the former is. Installation instructions are in the README.
The issue
About two to three years ago, as I was working on a project that was meant to be published on the internet, I looked for a solution to get fixed anchor links to my various headings when I performed HTML exports. As some of you may know, by default when an Org file is exported to an HTML file, a random ID will be generated for each header, and this ID will be used as their anchor. Here’s a quick example of a simple org file:
#+title: Sample org file
* First heading
Reference to a subheading
* Second heading
Some stuff written here
** First subheading
Some stuff
** Second subheading
Some other stuff
And this is the result once exported to HTML (with a lot of noise
removed from <head>
):
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Sample org file</title>
<meta name="generator" content="Org mode" />
<meta name="author" content="Lucien Cartier-Tilet" />
</head>
<body>
<div id="content">
<h1 class="title">Sample org file</h1>
<div id="outline-container-orgd8e6238" class="outline-2">
<h2 id="orgd8e6238"><span class="section-number-2">1</span> First heading</h2>
<div class="outline-text-2" id="text-1">
<p>
Reference to a subheading
</p>
</div>
</div>
<div id="outline-container-org621c39a" class="outline-2">
<h2 id="org621c39a"><span class="section-number-2">2</span> Second heading</h2>
<div class="outline-text-2" id="text-2">
<p>
Some stuff written here
</p>
</div>
<div id="outline-container-orgae45d6b" class="outline-3">
<h3 id="orgae45d6b"><span class="section-number-3">2.1</span> First subheading</h3>
<div class="outline-text-3" id="text-2-1">
<p>
Some stuff
</p>
</div>
</div>
<div id="outline-container-org9301aa9" class="outline-3">
<h3 id="org9301aa9"><span class="section-number-3">2.2</span> Second subheading</h3>
<div class="outline-text-3" id="text-2-2">
<p>
Some other stuff
</p>
</div>
</div>
</div>
</div>
</body>
</html>
As you can see, all the anchors are in the format of org[a-f0-9]{7}
.
First, this is not really meaningful if you want to read the anchor
and guess where it will lead you. But secondly, these anchors will
change each time you export your Org file to HTML. If I want to share
a URL to my website and to a specific heading, … well I can’t, it will
change the next time I update the document. And I don’t want to have
to set a CUSTOM_ID
property for each one of my headings manually. So,
what to do?
A first solution
A first solution I found came from this blog post, where Lee Hinman
described the very same issue they had and wrote some Elisp code to
remedy that (it’s a great read, go take a look). And it worked, and
for some time I used their code in my Emacs configuration file in
order to generate unique custom IDs for my Org headers. Basically what
the code does is it detects if auto-id:t
is set in an #+OPTIONS
header. If it is, then it will iterate over all the Org headers, and
for each one of them it will insert a CUSTOM_ID
, which is made from a
UUID generated by Emacs. And tadah! we get for each header a
h-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
custom
ID that won’t change next time we export our Org file to HTML when we
save our file, and only for headings which don’t already have a
CUSTOM_ID
property. Wohoo!
Except…
These headers are not meaningful
OK, alright, that’s still a huge step forward, we don’t have to type
any CUSTOM_ID
property manually any more, it’s done automatically for
us. But, when I send someone a link like
https://langue.phundrak.com/eittland#h-76fc0b91-e41c-42ad-8652-bba029632333
,
the first reaction to this URL is often something along the lines of
“What the fuck?”. And they’re right, this URL is unreadable when it
comes to the anchor. How am I supposed to guess it links to the
description of the vowels of the Eittlandic language? (That’s a
constructed language I’m working on, you won’t find anything about it
outside my website. Also, this link is dead now, it got simplified
thanks to Vuepres.)
So, I went back to my configuration file for Emacs, and through some
trial and error, I finally found a way to get a consistent custom ID
which is readable and automatically set. With the current state of my
code, what you get is the complete path of the Org heading, all spaces
replaced by underscores and headings separated by dashes, with a final
unique identifier taken from an Emacs-generated UUID. Now, the same
link as above will look like
https://langue.phundrak.com/eittland#Aperçu_structurel-Inventaire_phonétique_et_orthographe-Voyelles_pures-84f05c2c
.
It won’t be more readable to you if you don’t speak French, but you
can guess it is way better than what we had before. I even added a
safety net by replacing all forward slashes with dashes. The last ID
is here to ensure the path will be unique in case we’d have two
identical paths in the org file for one reason or another.
The modifications I made to the first function eos/org-id-new
are
minimal, where I just split the UUID and get its first part. This is
basically a way to simplify it.
(defun eos/org-id-new (&optional prefix)
"Create a new globally unique ID.
An ID consists of two parts separated by a colon:
- a prefix
- a unique part that will be created according to
`org-id-method'.
PREFIX can specify the prefix, the default is given by the
variable `org-id-prefix'. However, if PREFIX is the symbol
`none', don't use any prefix even if `org-id-prefix' specifies
one.
So a typical ID could look like \"Org-4nd91V40HI\"."
(let* ((prefix (if (eq prefix 'none)
""
(concat (or prefix org-id-prefix)
"-"))) unique)
(if (equal prefix "-")
(setq prefix ""))
(cond
((memq org-id-method
'(uuidgen uuid))
(setq unique (org-trim (shell-command-to-string org-id-uuid-program)))
(unless (org-uuidgen-p unique)
(setq unique (org-id-uuid))))
((eq org-id-method 'org)
(let* ((etime (org-reverse-string (org-id-time-to-b36)))
(postfix (if org-id-include-domain
(progn
(require 'message)
(concat "@"
(message-make-fqdn))))))
(setq unique (concat etime postfix))))
(t (error "Invalid `org-id-method'")))
(concat prefix (car (split-string unique "-")))))
Next, we have here the actual generation of the custom ID. As you can
see, the let
has been replaced by a let*
which allowed me to create
the ID with the variables orgpath
and heading
. The former concatenates
the path to the heading joined by dashes, and heading
concatenates
orgpath
to the name of the current heading joined by a dash if orgpath
is not empty. It will then create a slug out of the result, deleting
some elements such as forward slashes or tildes, and all whitespace is
replaced by underscores. It then passes heading
as an argument to the
function described above to which the unique ID will be concatenated.
(defun eos/org-custom-id-get (&optional pom create prefix)
"Get the CUSTOM_ID property of the entry at point-or-marker POM.
If POM is nil, refer to the entry at point. If the entry does not
have an CUSTOM_ID, the function returns nil. However, when CREATE
is non nil, create a CUSTOM_ID if none is present already. PREFIX
will be passed through to `eos/org-id-new'. In any case, the
CUSTOM_ID of the entry is returned."
(interactive)
(org-with-point-at pom
(let* ((orgpath (mapconcat #'identity (org-get-outline-path) "-"))
(heading (replace-regexp-in-string
"/\\|~\\|\\[\\|\\]" ""
(replace-regexp-in-string
"[[:space:]]+" "_" (if (string= orgpath "")
(org-get-heading t t t t)
(concat orgpath "-" (org-get-heading t t t t))))))
(id (org-entry-get nil "CUSTOM_ID")))
(cond
((and id
(stringp id)
(string-match "\\S-" id)) id)
(create (setq id (eos/org-id-new (concat prefix heading)))
(org-entry-put pom "CUSTOM_ID" id)
(org-id-add-location id
(buffer-file-name (buffer-base-buffer)))
id)))))
The rest of the code is unchanged, here it is anyway:
(defun eos/org-add-ids-to-headlines-in-file ()
"Add CUSTOM_ID properties to all headlines in the current file
which do not already have one.
Only adds ids if the `auto-id' option is set to `t' in the file
somewhere. ie, #+OPTIONS: auto-id:t"
(interactive)
(save-excursion
(widen)
(goto-char (point-min))
(when (re-search-forward "^#\\+OPTIONS:.*auto-id:t"
(point-max)
t)
(org-map-entries (lambda ()
(eos/org-custom-id-get (point)
'create))))))
(add-hook 'org-mode-hook
(lambda ()
(add-hook 'before-save-hook
(lambda ()
(when (and (eq major-mode 'org-mode)
(eq buffer-read-only nil))
(eos/org-add-ids-to-headlines-in-file))))))
Note that you will need the package org-id
to make this code work. You
simply need to add the following code before the code I shared above:
(require 'org-id)
(setq org-id-link-to-org-use-id 'create-if-interactive-and-no-custom-id)
And that’s how my links are now way more readable and persistent! The only downside I found to this is when you move headings and their path is modified, or when you modify the heading itself, the custom ID is not automatically updated. I could fix that by regenerating the custom ID on each save, regardless of whether a custom ID already exists or not, but it’s at the risk an ID manually set will get overwritten.