kindle->org
Table of Contents
1. Overview
I've enjoyed using an e-reader for the past 10 years. One of my favorite features is being able to highlight and take notes on interesting passages. I was recently reading Elements of Clojure (highly recommend) and thought it would be great if I could combine these notes with an actual note taking system like org. Well, why not do that?
It turns out this isn't too hard! Kindles store all notes, highlights and bookmarks in a text file. Simply plug it in and open documents/My Clippings.txt
. You'll see something like the following:
elementsofclojure (Zachary Tellman)^M - Your Note on Location 386 | Added on Saturday, May 27, 2023 5:04:07 AM^M ^M Layers of indirection should provide an easy path for someone to learn the codebase. A layer should contain new information but not soo much its unfamiliar.^M ==========^M elementsofclojure (Zachary Tellman)^M - Your Highlight on Location 390-391 | Added on Saturday, May 27, 2023 5:04:38 AM^M ^M Idioms provide a mapping between code structure and intent. Consistently used, they allow readers to trust their own intuition.^M ==========^M
Pretty cool, huh? Let's import these into org so we can search and manipulate them with emacs. We'll store them in org by creating a new heading for each book and under each book we'll create sub-headings for each note and highlight. We can then store the metadata as properties under each heading.
Here's how importing data will work at a high level:
- Call the interactive
kindle-import-notes
function. - Specify where our kindle clippings file is located.
- Import the kindle data as org headings.
One early design decision is how do we want to sync this with existing org data? For example, if I import my kindle data to org and then make changes to the imported data what should happen the next time I import? Should my changes be overridden? Should I find diffs and accept the ones in org? Maybe accept the diffs from the kindle? Personally, I'd rather use a flush and fill strategy: delete everything in org and replace it with whats on my kindle every time I import.
2. Implementation
At a high level we are going to:
- Parse the clippings file into our own data structures.
- Use the org API to create headings for our notes.
We can start by choosing how we store the data. This is a fairly simple tasks so we can use a simple data structure: plists. For example:
Field | Example |
---|---|
title | elementsofclojure |
author | Zachary Tellman |
timestamp | May 27, 2023 5:04:38 AM |
location | 390-391 |
content | Idioms provide a … |
For reference, I'll refer to these as annotations.
Next, we need to parse the clipping file into annotations. To do this we can iterate over each entry in the clippings file. At each entry we can extract the relevant information to create an annotation. We can use regular expressions for extracting information from each entry.
(setq re-title "^\\(.*\\) (.*)") (setq re-author "^.* (\\(.*\\))^M") (setq re-type "^- Your \\([A-Z]+\\) ") (setq re-location "Location \\([0-9]+-?[0-9]?+\\)") (setq re-timestamp "Added on \\(.*\\)^M") (setq re-content "\\(.*\\)^M") (defun extract-regex (regex text) "Apply the given regex to the given text and return the first capture group match." (second (s-match regex text))) (defun peek-line (n) "Returns the nth line below the current point." (forward-line n) (let ((line (buffer-substring-no-properties (line-beginning-position) (line-end-position)))) (forward-line (- n)) line))
The ^M are Carriage Return characters used in dos formats. I've hardcoded them into my regexes but you could alternatively run dos2unix on the clippings file to remove them before parsing the file.
For reference, here is the same sample from the clippings file:
elementsofclojure (Zachary Tellman)^M - Your Note on Location 386 | Added on Saturday, May 27, 2023 5:04:07 AM^M ^M Layers of indirection should provide an easy path for someone to learn the codebase. A layer should contain new information but not soo much its unfamiliar.^M ==========^M elementsofclojure (Zachary Tellman)^M - Your Highlight on Location 390-391 | Added on Saturday, May 27, 2023 5:04:38 AM^M ^M Idioms provide a mapping between code structure and intent. Consistently used, they allow readers to trust their own intuition.^M ==========^M
You can see that each entry ends with an ==========
. Let's call this our separator. We can use the separator to detect where each entry is. We'll search for a separator. From there we can extract all the information into our plist by looking at the lines above the separator and accumulate it onto a stack. Then, we can search for the next entry until we can't find any more. Note that we need to reverse the results since we are pushing them onto a stack.
(setq kindle-separator "==========") (defun annotations-from-kindle-file (path) "Extracts kindle annotations from a clippings file and returns a list of all annotations as plists with the following properties: - title - author - type - location - timestamp - text" (let ((annotations (list))) (with-temp-buffer (insert-file-contents path) (while (search-forward kindle-separator nil t) (push (list :title (extract-regex re-title (peek-line -4)) :author (extract-regex re-author (peek-line -4)) :type (extract-regex re-type (peek-line -3)) :location (extract-regex re-location (peek-line -3)) :timestamp (extract-regex re-timestamp (peek-line -3)) :content (extract-regex re-content (peek-line -1))) annotations))) (reverse annotations)))
Now that we have our data we need to convert it to org headers. To recap, our strategy for formatting annotations is to:
- Clear all existing notes under a heading.
- Create a heading for each book.
- Create sub heading under each book for all annotations in that book. Annotations will have there text as the heading title. We can then store metadata as properties so we can query them later.
One potential issue is that annotations can be out of order. For example, maybe we are reading book A and take a note. Then, we highlight a passage from book B before returning to book A were we take more notes. We can solve this by sorting annotations by book. This way, we don't need additional logic for putting annotations in the right place. We can assume that when we see an annotation from a book we haven't seen before we can make a new heading for the book and insert the annotation below it.
(defun clear-subtree () "Clears the contents under the current heading." (save-excursion (org-mark-subtree) (forward-line) (delete-region (region-beginning) (region-end)) (deactivate-mark))) (defun heading-title () "The title of the heading at the current point." (nth 4 (org-heading-components))) (defun heading-level () "The tree level of the heading at the current point." (nth 0 (org-heading-components))) (defun next-heading () "Creates a heading after the heading after the current point." (org-mark-subtree) (goto-char (region-end)) (org-insert-heading "foo")) (defun new-book? (title) "Returns true if the parent heading of the heading at the current point is different than the given title." (not (string= title (first (reverse (org-get-outline-path)))))) (defun heading-for-book (title author) "Creates a heading for the given book." (org-promote) (org-edit-headline (concat (or title "NA") "[" author "]")) (next-heading) (org-demote)) (defun heading-for-annotation (title author type location timestamp text) "Creates a heading for the given annotation info." (org-edit-headline (or content "nil")) (org-entry-put (point) "title" title) (org-entry-put (point) "author" author) (org-entry-put (point) "timestamp" timestamp) (org-entry-put (point) "location" location) (org-entry-put (point) "type" type) (next-heading)) (defun annotations-to-org (annotations) (let ((annotations (sort annotations (lambda (a b) (string< (cadr a) (cadr b)))))) (clear-subtree) (end-of-line) (next-heading) (org-demote) (org-demote) (while annotations (cl-destructuring-bind (&key title author type location timestamp text) (pop annotations) (when (new-book? title) (heading-for-book title author)) (heading-for-annotation title author type location timestamp text))) (goto-char (region-beginning)) (kill-line)))
The code assumes that we begin each while loop on a new header that is two levels below the top level heading that the import is called from.
Finally, we can wrap everything into an interactive function. Just call the following function with the point on a org heading, find the clippings file and your notes will be imported!
(defun kindle-import-notes (path) (interactive "p") (annotations-to-org (annotations-from-kindle-file (ido-read-file-name "Clippings file: "))))