Monitoring a File for Freshness

The Initial Challenge

A long time ago, I set up a home media system that includes a TV antenna and a couple of network-connected digital tuner boxes. I integrated this into my Plex and Emby setups so that within those systems I could access live programming and record it.

I started with just Plex and had a number of occurrences of issues with the EPG (Electronic Program Guide) that Plex provided breaking my Live TV & DVR configuration and invalidating all of my show recordings. The issue happened enough times that I decided to change to using an XML-based EPG that I would source, configured, and download myself. I found a source that I liked and downloaded the content. I re-built my DVR configuration and all was well.

The Secondary Challenge

A single download of an XML EPG is easy, but the data the file contains only goes out to a maximum of 14 days into the future. Every hour of every day, some of the data in that file becomes worthless (each time a show has finished airing its episode, that information is no longer useful but it remains in the file). The next hurdle is automating the download so that I can continually update the programming information available to Plex.

There are a few different solutions to how to automate downloading this data regularly, but a common way to do it is with a Docker container that’s built for this specific purpose. Once you download and deploy the container, you configure the parameters, and let it run. Voila! Your XML EPG file is now updated on a regular basis and you always have the most current programming information available for your DVR software.

Ongoing Challenge Number 1 (and Solution)

Once I automated the download, I thought I was all set. Until I realized I wasn’t… One day, I noticed that there was very little future schedule information in Plex and didn’t understand why. I kept tinkering with things but wasn’t able to get it updated to include more data. Then, one day, there was -no- programming schedule information in Plex at all.

What had happened was that my Docker container had shut itself down (apparently due to encountering errors because of something I had been doing in my home network causing communications issues for it) and stopped downloading updates. A simple restart fixed the issue for me at that point, but I realized I needed to be monitoring more of my home network and alerting based on things being offline. So, I installed Uptime Kuma and configured it to monitor my (other) Docker instances and send me alerts via Pushbullet if something went offline.

Ongoing Challenge Number 2

Knowing the operational state of the container was a huge step forward, but I learned soon after that it didn’t completely close the gap. The short story is that the EPG provider I have been getting my data from (Zap2It) was apparently acquired by another company and the entire web site and all of the API’s were abruptly shut down / changed with no warning to the public.

The longer story is that I saw the same thing happening as I did the first time – the programming guide information was not going as far out and there were many “holes” in the schedule with “unknown” programs in the listing. This issue was NOT quickly solved because the author of the container I was using had not made any changes to their scripts to use the new formats (and trying to restart the container resulted in it basically just shutting down due to errors again), so I had to go and find a totally different container author that HAD updated their scripts so that I could still use the same underlying data formats. About 90 days after switching to this new container, the problem occurred AGAIN because apparently more changes were made to the structure of the site and more updates to the scripts in the container were required.

Solution #2 – File Freshness Monitoring

If the container is running but not downloading new data, then the resultant XML file does not get rebuilt / replaced / updated on disk. And that means that the associated timestamp for the file does not change. Since my container scripts are configured to run every 12 hours, this should result in the file being updated twice per day (Plex can only be configured to ingest new guide data once per day) and I take that into consideration in my direct monitoring of the file.

What I ultimately did was write a script that does two checks on the final XML EPG file – it checks the age of the file and compares its content to two previous versions to ensure that data is being updated at each run. If the file is determined to be older than expected or does not appear to have updated data since the last two runs, then it sends me an alert to indicate that the file may be stale. The following sections break down the pieces of the script.

File Names and Variables

There are a total of four files that are used in the monitoring script. These are the name of the actual XML file that gets created after a successful download, two specific file names used to store two previous versions of the file, and a file used to store details about the differences between the current XML file and the previous versions. Here is what the beginning of the file looks like (replace the path information to suit your setup):

#!/bin/sh
filename="<path to>/xmltv.xml"
filename2="<path to>/xmltv.old"
filename3="<path to>/xmltv.old2"
filename4="<path to>/xmltv.new"

Get Current File’s Timestamp and Calculate Age

This gets the current file’s timestamp, gets the current date and time, calculates the difference between the two, and then converts it to “whole days”. Under normal circumstances, the calculated age should always be 0 meaning that the file is less than 24 hours old.

file_mtime=$(stat --format='%Y' "$filename")
current_time=$(date +%s)
diff_seconds=$((current_time - file_mtime))
days_old=$((diff_seconds / 86400))

Determine if the Current File is Different

I use the ‘comm’ command to compare two text files. The parameter ‘-3’ prevents showing lines that exist in both files leaving the output to be files that are only in one of the two files. The first comparison is between the current file and the previous version and its output is redirected to a new file (overwriting the version that was there previously). I do this same comparison again between the current version and the version from two runs prior and redirect the output to the same file just created, but appended to the end (>> versus >). This ensures that the new file contains unique entries from either of the last two runs. If it goes three runs with no updates, there will be NO differences found here and that new file will be empty.

I then use the ‘wc’ command to count the number of lines in that comparitor output file, and pass it through ‘awk’ to modify the output to remove the filename and leave just the numeric count.

comm -3 $filename $filename2 > $filename4
comm -3 $filename $filename3 >> $filename4
line_count=$(wc -l "$filename4" | awk '{ print $1 }')

Checking Results and Taking Action

Now that comparison data is determined, it can be checked and an alert can be sent if it’s determined that the file is not changing. The alert is done using PushBullet via ‘curl’. I have the extension in my web browser and the app on my phone so I will get the alert no matter what and can take action if I need to.

My decision points of when to send an alert are when the current file is at least one full day old OR if there are zero differences between the current file and the two previous versions. Replace the token key with your PushBullet token that you generated.

if [[ $days_old -gt 1 || $line_count -eq 0 ]]; then
curl --header 'Access-Token: <replace w/ your token>' --header 'Content-Type: application/json' --data-binary '{"body":"XMLTV Guide is Stale","title":"XMLTV","type":"note"}' --request POST https://api.pushbullet.com/v2/pushes
fi

Finishing Up

Once all of the evaluating, calculating, decision making, and alerting steps are done, it’s time to make some copies of the just-used files so they are available for the next round. Two quick copy commands set things up for the next run of the script.

cp $filename2 $filename3
cp $filename $filename2

Final Thoughts

This is a very simple script to check file freshness from a couple of different perspectives, but it is very effective. The XML file that gets generated with each run of the container’s script can vary by quite a bit. And when the current version is compared to even one prior version, there can be thousands of lines that are caught as unique to each of the two files and written to the ‘.new’ file. And since it will take until at least the second run AFTER downloads stop before this will catch a problem, both checks are likely to expose the issue on the same pass (and all subsequent passes).

When the downloading stops working, re-starting the Docker container and doing a fresh download from the hub site often results in a fix. This is because the container’s author has updated their scripts, and a fresh download of the container pulls all of the newest pieces into place to run correctly. As a result, the next update I make to the monitor script will include a way to do a restart of the container and pull of the newest data when I do so that becomes part of the automated routine.

What scripts have you written to monitor and manage things in your home labs or even work environments?

Leave a Reply

Your email address will not be published. Required fields are marked *