Using Calibre to catalogue a physical speculative fiction book collection

The problem

I have a large collection of paper books, which I am slowly entering into Calibre as empty book records, so that I can have a more easily accessible record of what I do and do not own.

I initially thought that I could make the data capture trivial by scanning the ISBN barcodes of the books with a phone app. Then I would enter a list of ISBNs into Calibre, push a button to fetch metadata for them automatically, and everything would Just Work, like magic.

I was wrong for various reasons:

  • A lot of my books predate the existence of ISBNs.

  • I want my virtual book records to match my physical books as closely as possible -- but an ISBN is not an unambiguous identifier of a particular edition of a book. Multiple editions with completely different covers can share the same ISBN.

  • Calibre wasn't really designed for this kind of pedantic cataloguing of physical objects, so it doesn't really care about these distinctions. It also helpfully smushes metadata records together if their ISBNs match, and there is no way to make it stop.

  • The default public metadata sources that Calibre uses don't even have metadata for the vast majority of old editions of SFF books.

The solution

A really good source of metadata for old SFF books is ISFDB, the Internet Speculative Fiction Database. And there is a plugin for Calibre which scrapes metadata from it! Unfortunately it hasn't been updated in several years, and ISFDB's HTML periodically changes. So I am maintaining a fork, which I tweak whenever I enter a new batch of books and discover that something has broken. I have also started a reimplementation, in which I hope to include all the little bits of data entry glue that I am about to describe.

Edit: I am now focusing on the reimplementation -- the fork is pretty much abandoned.

My current workflow

  1. I take a pile of books and look them up on ISFDB in my browser (Firefox). I usually process one author at a time, since it's the most efficient way to find multiple book titles at once. I search each title page for the specific edition which most closely matches the physical copy I have. This record is uniquely identified with an ISFDB ID which appears in the URL and on the publication page.

  2. At this point I would previously laboriously copy the ISFDB IDs from all the open ISFDB pages by hand into a text file, and then run a script to create entries in Calibre with these identifiers. The manual copying became very annoying very quickly, so I hacked together a Python script which automatically extracts these identifiers from the currently open tabs in a running Firefox session. Now I can pipe the output of this script to the script which creates records.

  3. At this point I have some empty records with only the ISFDB ID set (and also some custom columns which are not related to the metadata). Now to avoid the record-smushing issue I disable all metadata sources except ISFDB (important!) and fetch metadata for all the records. If an ISFDB ID is present, my fork of the plugin will ignore all other data (like author and title) and use only the ID in its search, so assuming that I have found the correct records in step one the download is guaranteed to fetch the correct data.

  4. Now I do some manual cleanup, like fetching or correcting cover images which were missing from ISFDB.

The Horrible Firefox Hack

Edit: The latest versions of Firefox store the session in an lz4-compressed json file rather than an uncompressed json file, which necessitates the update below (thanks, StackOverflow!). You will need to install the lz4 library.

Edit: Recent versions of the lz4 library require you to import lz4.block explicitly.

isfdb_ids_from_firefox.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/usr/bin/env python3

firefox_session_path="/home/confluence/.mozilla/firefox/mhxsxkg0.default/sessionstore-backups/recovery.jsonlz4"

import lz4.block
import json
import re

f = open(firefox_session_path, "rb")
magic = f.read(8)
session = json.loads(lz4.block.decompress(f.read()).decode("utf-8"))
f.close()

tabs = []

for w in session["windows"]:
    tabs.extend(w["tabs"])

urls = [t["entries"][-1]["url"] for t in tabs]

for u in urls:
    #print(u)
    m = re.search("www\.isfdb\.org/cgi-bin/pl\.cgi\?(\d+)", u)
    if m:
        print(m.group(1))

The Record Creation Script

calibre-add-from-isfdb.sh (marvel at my consistent naming conventions):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/bash

while read id
#for id in `cat $@`
do
  calibredb add -e -I isfdb:$id
  added_id=`calibredb search -l 1 identifiers:isfdb:$id`
  if [ -n "$added_id" ]
  then
    calibredb set_custom "shelf" $added_id "SFF"
    calibredb set_custom "read" $added_id "1"
  fi
#done
done < "${1:-/dev/stdin}"

In addition to creating a new record with the ISFDB ID filled into the identifier field with the appropriate prefix, I also set two custom columns to mark the book as read and file it in the correct category, which I have called a shelf. You can edit this to set whatever custom values you want, or remove it entirely.

Assuming that both scripts are executable and in your path, you can put them together like this:

isfdb_ids_from_firefox.py | calibre-add-from-isfdb.sh

Future work

In the reimplementation I hope to incorporate an entry field for ISFDB IDs and the magical Firefox session scraping directly into the plugin, so that the whole process is more streamlined, not operating system-dependent, and more usable by other people. In the meantime, if you use some flavour of Unix, you may be able to use my very messy current setup with minor modifications.

If you have comments, questions or rotten tomatoes, contact me on Twitter or file an issue against the reimplementation on GitHub (even though it currently doesn't exist).

Tweets from September 2017

The Traitor Baru Cormorant by Seth Dickinson: very good, and utterly depressing, secondary world fantasy.

Mon Sep 04 18:42:46 +0000 2017


That awkward moment when you realise that two completely unrelated bugs in your code were masking each other. :/

Wed Sep 06 16:50:20 +0000 2017


@astrolabe_cat What OS? GIMP (OSS Photoshop equivalent) and Inkscape (vector drawing program) have ports.… https://twitter.com/i/web/status/906953893296386049

Sun Sep 10 18:53:53 +0000 2017


@astrolabe_cat I don't want to be That Guy and say "install Linux", but... install Linux. ;) It gives access to a h… https://twitter.com/i/web/status/906954373972013059

Sun Sep 10 18:55:47 +0000 2017


@astrolabe_cat Tux Paint? http://www.tuxpaint.org/ I also remembered Krita https://krita.org/ which started as… https://twitter.com/i/web/status/906956978139922432

Sun Sep 10 19:06:08 +0000 2017


@astrolabe_cat You can do simple things in complicated programs with access to appropriate learning materials. Mayb… https://twitter.com/i/web/status/906958035133845504

Sun Sep 10 19:10:20 +0000 2017


@astrolabe_cat I understand desire not to scare off; as a counterpoint, an overly simplified program can be frustra… https://twitter.com/i/web/status/906958342081441798

Sun Sep 10 19:11:33 +0000 2017


@astrolabe_cat I learned Photoshop and QuarkXPress (layout program) as a kid from "For Dummies" books bought by mom… https://twitter.com/i/web/status/906958656998166529

Sun Sep 10 19:12:48 +0000 2017


@astrolabe_cat I'd maybe check out Krita, then; it was designed from the start to have a more intuitive UI. Friendl… https://twitter.com/i/web/status/906959021655064577

Sun Sep 10 19:14:15 +0000 2017


@astrolabe_cat Here are some video tuts for Krita: https://docs.krita.org/Video_Tutorials

Sun Sep 10 19:16:49 +0000 2017


Thread. https://twitter.com/dammitZA/status/906955208776749056

Sun Sep 10 19:26:17 +0000 2017


New Humble Bundle of books themed around finding a tech job: https://www.humblebundle.com/books/tech-job-for-dummies-books Includes HTML5, JavaScript and Python development.

Mon Sep 11 18:47:33 +0000 2017


I have finally found a desktop pager for Fluxbox which isn't a thousand years old and broken: LXPanel with all other widgets removed.

Mon Sep 11 23:35:32 +0000 2017


It has window icons! It's being actively developed! It even has GUI config. Can't go in the slit, but I can position it underneath.

Mon Sep 11 23:38:45 +0000 2017


Sadly this is made up, but I desperately wanted it to be true. https://twitter.com/klaasm67/status/908654201844183042

Fri Sep 15 22:55:12 +0000 2017


Peacock spiders! https://www.sciencealert.com/seven-new-species-of-adorable-little-peacock-spiders-make-their-debut (via @hodgestar)

Sun Sep 17 09:08:43 +0000 2017


We have finally extracted our @Oglaf shirts and books from the ZA postal system, and they are glorious.

Wed Sep 20 09:06:58 +0000 2017


This is how I do it in my head. It only looks silly in writing because each step is made explicit (and there's a cl… https://twitter.com/i/web/status/911153225130299392

Fri Sep 22 09:00:31 +0000 2017


@astrolabe_cat @British_Airways Inexplicable airport bureaucracy is the worst. :( I'm so sorry to hear that; I hope… https://twitter.com/i/web/status/912369072477736960

Mon Sep 25 17:31:52 +0000 2017


The Emperor's Knife by Mazarkis Williams: interesting ideas; pacing and characterisation issues. Ultimately didn't hold my interest.

Fri Sep 29 07:54:35 +0000 2017


Is anyone planning to explain why @FNBSA has discontinued selling prepaid elec. for @CityofCT, or is one perfunctory SMS all we're getting?

Fri Sep 29 08:25:18 +0000 2017


@mariskaza @FNBSA @CityofCT I can only assume that it's some kind of acrimonious dispute over the service. The lack… https://twitter.com/i/web/status/913682718759714816

Fri Sep 29 08:31:50 +0000 2017


I don't want to DM you, @CityofCT. Why can't you explain this situation, which affects multiple Capetonians, in public?

Fri Sep 29 13:07:18 +0000 2017


@CityofCT I understand that. I'm not expecting a 47-part Twitter epic; more like a press release on your site that you can point to.

Fri Sep 29 16:48:20 +0000 2017


Apparently the decision has now been reversed. Hooray? ¯_(ツ)_/¯ https://twitter.com/FNBSA/status/913809757747720195

Fri Sep 29 17:43:06 +0000 2017

Tweets from August 2017

"People no longer trust each other. Why? And how can we fix it? An interactive guide to the game theory of trust": http://ncase.me/trust/

Wed Aug 02 17:51:57 +0000 2017


Happy #internationalcatday! DGvAX4VXgAA-rkH.jpg

Tue Aug 08 20:35:06 +0000 2017


@pierre_nel No idea; would not eat.

Thu Aug 10 21:21:25 +0000 2017


@Fancierfancier I'm really looking forward to book 3!

Fri Aug 11 22:12:11 +0000 2017


World's! Most! Epic! Fishing! Simulator! https://www.youtube.com/watch?v=X0uNhsLmGWA

Sun Aug 20 10:23:00 +0000 2017


Today I discovered @Miamaska, a really good webcomic.

Thu Aug 24 23:13:16 +0000 2017

Tweets from July 2017

@Nantalith OMG, kittens! :D

Mon Jul 03 11:01:56 +0000 2017


@rbjacobs Still no followup from CC division re: my card pin issues.

Wed Jul 05 09:54:24 +0000 2017


@ParadoxMirror I don't know whether to laugh or cry. I can't believe this isn't satire.

Wed Jul 05 09:58:53 +0000 2017


@ParadoxMirror The article is pretty old; I wonder if the trauma has faded after a mourning period of five years. ;)

Wed Jul 05 10:03:23 +0000 2017


@Rbjacobs No feedback since original reply. Have not tried phoning. I need to speak to a technical person who can a… https://twitter.com/i/web/status/882618875216633857

Wed Jul 05 15:15:12 +0000 2017


@Rbjacobs This is a long and complex issue which I would like to continue to discuss in writing, not re-state to a… https://twitter.com/i/web/status/882620287023550464

Wed Jul 05 15:20:49 +0000 2017


@Rbjacobs I'd be happy to speak to someone on the phone or in person if it's someone who is informed about my speci… https://twitter.com/i/web/status/882620629073244161

Wed Jul 05 15:22:10 +0000 2017


I remember the fake books released online @ same time as real books. And the people who read them all the way throu… https://twitter.com/i/web/status/884558949500035073

Mon Jul 10 23:44:22 +0000 2017


@fijall @europython Red and green lettuce? Luxury!

Wed Jul 12 11:59:39 +0000 2017


https://www.youtube.com/watch?v=ye6GCY_vqYk

Sat Jul 15 22:19:50 +0000 2017


https://www.youtube.com/watch?v=XFYWazblaUA

Thu Jul 20 16:46:53 +0000 2017


This looks amazing. https://www.youtube.com/watch?v=6EZCBSsBxko

Fri Jul 21 08:34:54 +0000 2017


@ParadoxMirror Yesss, it's awesome. I can't wait! I've been super excited since the teaser.

Fri Jul 21 09:34:22 +0000 2017


@ParadoxMirror Maybe a little bit typecast as the straight man in weird cop buddy movies? He's pretty good at it, though.

Fri Jul 21 10:38:48 +0000 2017


SARS eFiling finally moving away from Flash https://mybroadband.co.za/news/internet/221408-sars-efiling-will-move-away-from-adobe-flash.html

Wed Jul 26 10:25:39 +0000 2017

Tweets from June 2017

Facsimile Dust Jackets LLC sells reproduction dust jackets for old books: https://www.dustjackets.com

Sat Jun 03 15:27:17 +0000 2017


@pierre_nel Epic. :D

Sat Jun 03 23:02:31 +0000 2017


@pierre_nel On the other hand, that's what you get for using production credentials in docs for new devs & assuming… https://twitter.com/i/web/status/871140598203502592

Sat Jun 03 23:04:38 +0000 2017


:D https://www.youtube.com/watch?v=dxWvtMOGAhw

Sat Jun 10 06:50:22 +0000 2017


Today I discovered Voice of Baceprot, an Indonesian schoolgirl thrash metal band: https://www.youtube.com/channel/UCu3Moj3Nl7RPrk3or5GDQEw/videos

Sat Jun 10 13:44:35 +0000 2017


Sigh. I also only buy trades for a variety of reasons. I can't believe we're still having this conversation in 2017. http://www.blastr.com/2017-6-13/black-panther-world-wakanda-cancelled-im-reason-superhero-comics-are-struggling

Sat Jun 17 14:02:19 +0000 2017


@fnb you have blocked my credit card pin AGAIN while I am overseas. How do I proceed? About to board flight.

Mon Jun 19 13:20:21 +0000 2017


@rbjacobs @fnb you have blocked my credit card pin AGAIN while I am overseas. How do I proceed? About to board flight.

Mon Jun 19 13:23:56 +0000 2017


@rbjacobs I can see that you have changed the pin; I want assurance card won't be cancelled if I attempt to use it again.

Mon Jun 19 13:34:07 +0000 2017


@rbjacobs I went through the proper authorisation procedure before leaving. This is the second time FNB has done this to me.

Mon Jun 19 13:35:09 +0000 2017


@Rbjacobs already found new pin & used it. Issue is that my activity is triggering unexplained pin resets. Don't kn… https://twitter.com/i/web/status/876863184837234688

Mon Jun 19 18:04:09 +0000 2017


@Rbjacobs ... or if I risk complete cancellation. Last time I kept calling you, pin kept being reset, & nobody could explain why. 2/?

Mon Jun 19 18:06:01 +0000 2017


@Rbjacobs Last time problem persisted post my return to ZA until I got a new card. No explanation ever given. I nee… https://twitter.com/i/web/status/876864014273433600

Mon Jun 19 18:07:27 +0000 2017


AIMS Desktop, a distro customised for mathematics-focused higher ed., now based on Debian: https://jonathancarter.org/2017/06/18/aims-desktop-2017-1-is-available/ (via @highvoltage)

Mon Jun 19 22:12:06 +0000 2017


@Rbjacobs Explanation I was given makes no sense. Have followup questions; could you please escalate to someone w/… https://twitter.com/i/web/status/877503435796434944

Wed Jun 21 12:28:16 +0000 2017


Babel-17 by Samuel R. Delany: some parts have aged better than others; overall still a fun read full of interesting ideas about language.

Thu Jun 22 14:51:17 +0000 2017


@Rbjacobs Still no followup response since this tweet; still not sure whether it's safe for me to use my card.

Fri Jun 23 15:13:45 +0000 2017


Area X (Southern Reach trilogy) by Jeff Vandermeer: Lovecraftian bureaucracy. Not sure if ending is satisfying, but journey is interesting.

Tue Jun 27 06:10:42 +0000 2017


Constellation Games by Leonard Richardson: a traditional first contact story communicated mostly through alien video games. Recommended.

Tue Jun 27 06:22:16 +0000 2017


Raven Stratagem by Yoon Ha Lee: as good as the first book, so very good. Recommended. But start with the first book.

Tue Jun 27 06:23:04 +0000 2017


To everyone that I apparently persuaded to buy Ninefox Gambit: to avoid confusion, I suggest you actually start with http://clarkesworldmagazine.com/lee_10_12/

Tue Jun 27 13:27:21 +0000 2017


Interview with Yoon Ha Lee in Lightspeed Magazine: http://www.lightspeedmagazine.com/nonfiction/interview-yoon-ha-lee/

Wed Jun 28 07:47:29 +0000 2017


The Golem and the Djinni by Helene Wecker: pretty good historical fantasy, although a bit slow.

Wed Jun 28 16:32:58 +0000 2017


Trying out Mastodon. Come, we can have the whole place to ourselves for five seconds before the assholes arrive. ;) https://mastodon.xyz/@confluence

Wed Jun 28 20:47:37 +0000 2017


@HypnZA There are literally dozens of us!

Wed Jun 28 22:23:36 +0000 2017


It's possible for a publisher to insist that retailers sell their books without DRM. Tor has done it. Why can't O'Reilly?

Thu Jun 29 13:57:11 +0000 2017

Tweets from May 2017

https://www.youtube.com/watch?v=w_MSFkZHNi4

Thu May 04 14:31:47 +0000 2017


"The Amazon CloudFront distribution is configured to block access from your country." One way to make sure your local newspaper stays local.

Sun May 14 09:12:17 +0000 2017


Find international radio stations on a world map! http://radio.garden

Sun May 14 22:02:13 +0000 2017


Thread. :/ https://twitter.com/zeynep/status/866803106566295553

Tue May 23 12:26:43 +0000 2017


Ornithologist problems https://jads-abum-rat.tumblr.com/post/156735823429/i-cannot-believe-either-found-on-the-door-of

Thu May 25 13:58:43 +0000 2017


Interesting post by @MarissaLingen on volunteering, and when to stop http://www.marissalingen.com/blog/?p=1782

Mon May 29 10:22:45 +0000 2017


https://www.youtube.com/watch?v=zoiezEB9n2Q

Tue May 30 06:41:45 +0000 2017