Archiving Podcasts

2020-08-17

janprzy/podarchive on GitHub

I want to download and archive some podcasts I'm listening to. This isn't 100% reasonable, but audio files aren't that big, so it shouldn't be too much of an issue.
Unfortunately, none of the solutions I found seems to fulfill all of my requirements:

The next step is obvious: I have to make something myself.

Basic Workings

It isn't anything fancy or "smart", but the basic idea is this:

  1. Download the RSS feed
  2. Cycle through the items:
    1. Decide how the file would be named. Does it already exist?
      If not:
      1. Download the audio file and adjust the name
      2. Write an HTML file containing the show notes and embedding the audio. It can be played when viewing this file in a web browser.
  3. Finally, write a list of all episodes to index.html.

Some Details

File Naming

By default, the episode's title is used as the filename. However, this doesn't always allow for easy sorting. Therefore, the flags "-d | --date" and "-e | --episode-number" can be used to prepend the publishing date and/or episode number to the filename.

Configuration

I am not using configuration files, all preferences are set from the command line. My intention is to run it as a cronjob, the command doesn't need to be short. Config files would just add needless complexity.

Overview/Episode List

After downloading, an overview of all episodes is written to index.html. This is a simple list of all episodes with links to their individual files. It can be used to conveniently view the archive in a web browser.
This isn't an essential feature, but it was easy to implement and can sometimes be slightly nicer than the file browser.

For all details and usage instructions, please consult the README.

Name

Choosing a name did not involve much creativity. Something descriptive like "Podcast Archiver" seemed a little too clunky and boring, while another very obvious and unimaginative name, "podarchiver", was already taken by two GitHub-repositories. Eventually, I settled on a slight variation of the latter: "podarchive".

Language

I've wanted to try Perl for a while, and this project seemed like a good opportunity. So far, I haven't run into any language-specific hurdles, let's hope it stays that way. The script is not complex enough to require things like object orientation, it is mostly procedural.
Since this is my first Perl project, I probably violated every convention and adhered to many bad practices. You can complain on GitHub if you don't like something.

Examples

My NAS runs this every night at 2:30. The -q flag makes it only output error messages. Since different podcasts use different naming schemes, the filenames need to be adjusted in different ways.

Titles all neatly follow the same pattern, they do not need to be altered:

podarchive "https://atp.fm/rss" "/tank/Media/Podcast/Accidental Tech Podcast" -q

Titles do not contain a number, so I'm prepending a number and appending the publishing date:

podarchive "https://rss.simplecast.com/podcasts/2389/rss" "/tank/Media/Podcast/Do By Friday" -de --date-behind -q

These two podcasts are similar in that the titles do not follow a pattern. Most contain the episode number, but occasionally there are "special" or "bonus" episodes which don't adhere to the regular numbering scheme inserted in between. Simply prepending a sequential number would look ugly, since it would be out of sync with the numbers already included in some of the titles. Therefore, I’m using the publishing date instead:

podarchive "http://www.hellointernet.fm/podcast?format=rss" "/tank/Media/Podcast/Hello Internet" -d -q
podarchive "https://www.unmade.fm/episodes?format=rss" "/tank/Media/Podcast/Unmade Podcast" -d -q