Open-source News

How I use Groovy to analyze album art in my music directory

opensource.com - Sun, 08/28/2022 - 15:00
How I use Groovy to analyze album art in my music directory Chris Hermansen Sun, 08/28/2022 - 03:00 Register or Login to like Register or Login to like

In this series, I'm developing several scripts to help in cleaning up my music collection. In the last article, I used the framework I created for analyzing the directory and sub-directories of music files and carried out the analysis with the fine open source JAudiotagger library to analyze the tags of the music files in the music directory and subdirectories. In this article, I will do a simpler job:

  1. Use the framework we created in Part 1
  2. Make sure each album directory has a cover.jpg class
  3. Make a note of any other files in the album directory that aren't FLAC, MP3 or OGG.
Music and metadata

If you haven't read part 1 and part 2 of this series, do that now so you understand the intended structure of my music directory, the framework created in that article, and how to pick up FLAC, MP3, and OGG files.

One more thing. Most audio ripping applications and many downloads:

  • Don't come with a useful cover.jpg file
  • Even if they do come with a useful cover.jpg file, they don't link the media files to it
  • Carry in all sorts of other files of dubious utility (for example, playlist.m3u, which gets created by a tagging utility I've used in the past)

As I mentioned in my last article, the ultimate goal of this series is to create a few useful scripts to help identify missing or unusual tags and facilitate the creation of a work plan to fix tagging problems. This particular script looks for missing cover.jpg files and unwanted non-media files, and creates a CSV file that you can load into LibreOffice or OnlyOffice to look for problems. It won't look at the media files themselves, nor does it look for extraneous files left in the artist subdirectories (those are exercises left for the reader).

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles The framework and album files analysis

Start with the code. As before, I've incorporated comments in the script that reflect the (relatively abbreviated) "comment notes" that I typically leave for myself:

     1  // Define the music library directory
       
     2  def musicLibraryDirName = '/var/lib/mpd/music'
       
     3  // Print the CSV file header
       
     4  println "artist|album|cover|unwanted files"
       
     5  // Iterate over each directory in the music libary directory
     6  // These are assumed to be artist directories

     7  new File(musicLibraryDirName).eachDir { artistDir ->
       
     8      // Iterate over each directory in the artist directory
     9      // These are assumed to be album directories
       
    10      artistDir.eachDir { albumDir ->
       
    11          // Iterate over each file in the album directory
    12          // These are assumed to be content or related
    13          // (cover.jpg, PDFs with liner notes etc)
       
    14          // Initialize the counter for cover.jpg
    15          // and the list for unwanted file names
       
    16          def coverCounter = 0
    17          def unwantedFileNames = []
       
    18          albumDir.eachFile { contentFile ->
       
    19              // Analyze the file
       
    20              if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
    21                  // nothing to do here
    22              } else if (contentFile.name == 'cover.jpg') {
    23                  coverCounter++
    24              } else {
    25                  unwantedFileNames << contentFile.name
    26              }
       
    27          }
    28          println "${artistDir.name}|${albumDir.name}|$coverCounter|${unwantedFileNames.join(',')}"
    29      }
    30  }

Lines 1-2 define the name of the music file directory.

Line 3-4 print the CSV file header.

Lines 5-13 come from the framework created in Part 1 of this article and get down to the album sub-subdirectories.

Lines 14-17 set up the cover.jpg counter (should only ever be zero or one) and the empty list in which we will accumulate unwanted file names.

Lines 18-27 analyze any files found in the album directories:

Lines 20-21 uses the Groovy match operator ==~ and a "slashy" regular expression to check file name patterns. Nothing is done with these files (see Part 2 for that information).

Lines 22-23 count the instances of cover.jpg (it should only ever be zero or one).

Lines 24-26 record the names of any non-media, non-cover.jpg files to show potential cruft or who-knows-what in the album directories.

Line 28 prints out the artist name, album name, cover.jpg count and list of unwanted file names.

That’s it!

Running the code

Typically, I run this as follows:

$ groovy TagAnalyzer3.groovy > tagAnalysis3.csv

Then I load the resulting CSV into a spreadsheet. For example, with LibreOffice Calc , go to the Sheet menu and select Insert sheet from file. When prompted, set the delimiter character to |. In my case, the results look like this:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

I've sorted this in increasing order of the column "cover" to show album sub-subsubdirectories that don't have cover.jpg files. Note that some have cover.png instead. My experience with music players is that at least some don't play well with PNG format cover images.

Also, note that some of these have PDF liner notes, extra image files, M3U playlists, and so on. In my next article, I'll show you how to manage some of the cruft.

Here's how I use open source tools to analyze my music directory including album cover files.

Image by:

Opensource.com

Java Audio and music What to read next How I analyze my music directory with Groovy My favorite open source library for analyzing music files This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

GIMP 2.99.12 Released - "A Huge Milestone Towards GIMP 3.0"

Phoronix - Sun, 08/28/2022 - 02:16
GIMP 2.99.12 is out as a weekend surprise as the newest development release towards the GIMP 3.0 image manipulation program's release...

Debian Begins A General Resolution To Decide What To Do With Non-Free Firmware

Phoronix - Sun, 08/28/2022 - 00:24
Debian has begun a general resolution process to solicit a vote by its stakeholders what to do with non-free firmware...

AMD Zen 4 LbrExtV2 Feature Queued Ahead Of Linux 6.1

Phoronix - Sat, 08/27/2022 - 18:05
Earlier this month AMD posted Linux kernel patches preparing LbrExtV2 as updated Last Branch Record functionality being introduced with upcoming AMD Zen 4 processors. That LbrExtV2 support for the Linux kernel's "perf" subsystem has now been queued up in its respective branch ahead of the Linux 6.1 feature merge window beginning in early October...

Ubuntu 22.10 Optimizing OpenSSH Server Memory Use, Other RAM Optimizations Coming

Phoronix - Sat, 08/27/2022 - 17:52
As part of a broader effort to reduce system memory use on Ubuntu Linux particularly for server and container/cloud use-cases, Ubuntu 22.10's OpenSSH server has switched to using socket-based activation...

AMD Ryzen Threadripper 5965WX Cooling With The Dynatron A39 Heatsink

Phoronix - Sat, 08/27/2022 - 17:39
One of the questions that has come up following my AMD Ryzen Threadripper PRO 5965WX Linux testing has been how well air-cooling is working out for the 280 Watt workstation CPU. Water cooling is, of course, most ideal but there are air coolers that can work out sufficiently too. Here are some quick reference results...

GCC 13 Seeing Work On OpenMP 5.0 Reverse Offload Functionality

Phoronix - Sat, 08/27/2022 - 17:13
With the OpenMP 5.0 parallel programming specification there is the reverse offload capability for going from the offloaded device back to the system host. The GCC 13 open-source compiler is seeing work recently around supporting this functionality...

My favorite open source library for analyzing music files

opensource.com - Sat, 08/27/2022 - 15:00
My favorite open source library for analyzing music files Chris Hermansen Sat, 08/27/2022 - 03:00 1 reader likes this 1 reader likes this

In my previous article, I created a framework for analyzing the directories and subdirectories of music files, using the groovy.File class, which extends and streamlines java.File and simplifies its use. In this article, I use the open source JAudiotagger library to analyze the tags of the music files in the music directory and subdirectories. Be sure to read the first article in this series if you intend to follow along.

Install Java and Groovy

Groovy is based on Java, and requires a Java installation. Both a recent and decent version of Java and Groovy might be in your Linux distribution's repositories. Groovy can also be installed directly from the Apache Foundation website. A nice alternative for Linux users is SDKMan, which can be used to get multiple versions of Java, Groovy, and many other related tools. For this article, I use SDK's releases of:

  • Java: version 11.0.12-open of OpenJDK 11
  • Groovy: version 3.0.8
Back to the problem

In the 15 or so years that I've been carefully ripping my CD collection and increasingly buying digital downloads, I have found that ripping programs and digital music download vendors are all over the map when it comes to tagging music files. Sometimes, my files are missing tags that can be useful to music players, such as ALBUMSORT. Sometimes this means my files are full of tags I don't care about, such as MUSICBRAINZ_DISCID, that cause some music players to change the order of presentation in obscure ways, so that one album appears to be many, or sorts in a strange order.

Given that I have nearly 10,000 tracks in nearly 700 albums, it's quite nice when my music player manages to display my collection in a reasonably understandable order. Therefore, the ultimate goal of this series is to create a few useful scripts to help identify missing or unusual tags and facilitate the creation of a work plan to fix tagging problems. This particular script analyzes the tags of music files and creates a CSV file that I can load into LibreOffice or OnlyOffice to look for problems. It won't look at missing cover.jpg files nor show album sub-subdirectories that contain other files, because this isn't relevant at the music file level.

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles My Groovy framework plus JAudiotagger

Once again, start with the code. As before, I've incorporated comments in the script that reflect the (relatively abbreviated) "comment notes" that I typically leave for myself:

     1  @Grab('net.jthink:jaudiotagger:3.0.1')
     2  import org.jaudiotagger.audio.*
       
     3  def logger = java.util.logging.Logger.getLogger('org.jaudiotagger');
     4  logger.setLevel(java.util.logging.Level.OFF);
       
     5  // Define the music library directory
       
     6  def musicLibraryDirName = '/var/lib/mpd/music'
       
     7  // These are the music file tags we are happy to see
     8  // Some tags can occur more than once in a given file
       
     9  def wantedFieldIdSet = ['ALBUM', 'ALBUMARTIST',
    10      'ALBUMARTISTSORT', 'ARTIST', 'ARTISTSORT',
    11      'COMPOSER', 'COMPOSERSORT', 'COVERART', 'DATE',
    12      'GENRE', 'TITLE', 'TITLESORT', 'TRACKNUMBER',
    13      'TRACKTOTAL', 'VENDOR', 'YEAR'] as LinkedHashSet
       
    14  // Print the CSV file header
       
    15  print "artistDir|albumDir|contentFile"
    16  print "|${wantedFieldIdSet*.toLowerCase().join('|')}"
    17  println "|other tags"
       
    18  // Iterate over each directory in the music libary directory
    19  // These are assumed to be artist directories
       
    20  new File(musicLibraryDirName).eachDir { artistDir ->
       
    21      // Iterate over each directory in the artist directory
    22      // These are assumed to be album directories
       
    23      artistDir.eachDir { albumDir ->
       
    24          // Iterate over each file in the album directory
    25          // These are assumed to be content or related
    26          // (cover.jpg, PDFs with liner notes etc)
       
    27          albumDir.eachFile { contentFile ->
       
    28              // Initialize the counter map for tags we like
    29              // and the list for unwanted tags
       
    30              def fieldKeyCounters = wantedFieldIdSet.collectEntries { e ->
    31                  [(e): 0]
    32              }
    33              def unwantedFieldIds = []
       
    34              // Analyze the file and print the analysis
       
    35              if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
    36                  def af = AudioFileIO.read(contentFile)
    37                  af.tag.fields.each { tagField ->
    38                      if (tagField.id in wantedFieldIdSet)
    39                          fieldKeyCounters[tagField.id]++
    40                      else
    41                          unwantedFieldIds << tagField.id
    42                  }
    43                  print "${artistDir.name}|${albumDir.name}|${contentFile.name}"
    44                  wantedFieldIdSet.each { fieldId ->
    45                      print "|${fieldKeyCounters[fieldId]}"
    46                  }
    47                  println "|${unwantedFieldIds.join(',')}"
    48              }
       
    49          }
    50      }
    51  }

 

Line 1 is one of those awesomely lovely Groovy facilities that simplify life enormously. It turns out that the kind developer of JAudiotagger makes a compiled version available on the Maven central repository. In Java, this requires some XML ceremony and configuration. Using Groovy, I just use the @Grab annotation, and Groovy handles the rest behind the scenes.

Line 2 imports the relevant class files from the JAudiotagger library.

Lines 3-4 configure the JAudiotagger library to turn off logging. In my own experiments, the default level is quite verbose and the output of any script using JAudiotagger is filled with logging information. This works well because Groovy builds the script into a static main class. I'm sure I'm not the only one who has configured the logger in some instance method only to see the configuration garbage collected after the instance method returns.

Lines 5-6 are from the framework introduced in Part 1.

Lines 7-13 create a LinkedHashSet containing the list of tags that I hope will be in each file (or, at least, I'm OK with having in each file). I use a LinkedHashSet here so that the tags are ordered.

This is a good time to point out a discrepancy in the terminology I've been using up until now and the class definitions in the JAudiotagger library. What I have been calling "tags" are what JAudiotagger calls org.jaudiotagger.tag.TagField instances. These instances live within an instance of org.jaudiotagger.tag.Tag. So the "tag" from JAudiotagger's point of view is the collection of "tag fields". I'm going to follow their naming convention for the rest of this article.

This collection of strings reflects a bit of prior digging with metaflac. Finally, it's worth mentioning that JAudiotagger's org.jaudiotagger.tag.FieldKey uses "_" to separate words in the field keys, which seems incompatible with the strings returned by org.jaudiotagger.tag.Tag.getFields(), so I don’t use FieldKey.

Lines 14-17 print the CSV file header. Note the use of Groovy's *. spread operator to apply toLowerCase() to each (upper case) string element of wantedFieldIdSet.

Lines 18-27 are from the framework introduced in Part 1, descending into the sub-sub-directories where the music files are found.

Lines 28-32 initialize a map of counters for the desired fields. I use counters here because some tag fields can occur more than once in a given file. Note the use of wantedFieldIdSet.collectEntries to build a map using the set elements as keys (the key value e is in parentheses, as it must be evaluated). I explain this in more detail in this article about maps in Groovy.

Line 33 initializes a list for accumulating unwanted tag field IDs.

Lines 34-48 analyzes any FLAC, MP3 or OGG music files found:

  • Line 35 uses the Groovy match operator ==~ and a "slashy" regular expression to check file name patterns;
  • Line 36 reads the music file metadata using org.jaudiotagger.AudioFileIO.read() into the variable af
  • Line 37-48 loops over the tag fields found in the metadata:
    • Line 37 uses Groovy's each() method to iterate over the list of tag fields returned by af.tag.getFields(), which in Groovy can be abbreviated to af.tag.fields
    • Line 38-39 counts any occurrence of a wanted tag field ID
    • Line 40-41 appends an occurrence of an unwanted tag field ID to the unwanted list
    • Line 43-47 prints out the counts and unwanted fields (if any)

That's it!

Typically, I would run this as follows:

$ groovy TagAnalyzer2.groovy > tagAnalysis2.csv
$

And then I load the resulting CSV into a spreadsheet. For example, with LibreOffice Calc, I go to the Sheet menu and select Insert sheet from file. I set the delimiter character to |. In my case, the results look like this:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

I like to have the ALBUMARTIST defined as well as the ARTIST for some music players so that the files in an album are grouped together when artists on individual tracks vary. This happens in compilation albums, but also in some albums with guest artists where the ARTIST field might say for example "Tony Bennett and Snoop Dogg" (I made that up. I think.) Lines 22 and onward in the spreadsheet shown above don't specify the album artist, so I might want to fix that going forward.

Here is what the last column showing unwanted field ids looks like:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

Note that these tags may be of some interest and so the "wanted" list is modified to include them. I would set up some kind of script to delete field IDs BPM, ARTWORKGUID, CATALOGUENUMBER, ISRC and PUBLISHER.

Next steps

In the next article, I'll step back from tracks and check for cover.jpg and other non-music files lying around in artist subdirectories and album sub-subdirectories.

Here's how I use the JAudiotagger library with a Groovy script I created to analyze my music files.

Image by:

Opensource.com

Java Audio and music What to read next How I analyze my music directory with Groovy This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

KDE Makes It Easy To Now Remap Extra Mouse Buttons, Discover Keeps Getting Better

Phoronix - Sat, 08/27/2022 - 06:30
KDE developer Nate Graham is out early with his usual weekly development summary that highlights all of this prominent open-source desktop environment. Notable this week is KDE integrating support for re-binding extra mouse buttons as well as a lot of continued work on Discover...

Pages