Posted 2022 August 17
AutoPkg download recipes commonly use one of four methods to determine the URL to supply to URLDownloader:
- Use a static link provided by the developer (which may then redirect),
- Obtain the link from some sort of feed (e.g., GitHubReleasesInfoProvider, SparkleUpdateInfoProvider),
- Use/write a custom processor specifically for that vendor or title (e.g., MozillaURLProvider, AppleSupportDownloadInfoProvider), or
- Scrape the URL from a web page using URLTextSearcher.
That list is arguably in priority order. Yet there are many, many recipes where URLTextSearcher is the chosen solution. Personally, I know the ins and outs of HTML much better than Python, so I became familiar with URLTextSearcher very quickly. What I love about that particular processor is that it leverages the power of regular expressions for searching and allows you to name the output variable for the string that you find with your regular expression.
I recently had a situation where one of my download recipes was not behaving as intended, likely due to a change in the headers on the vendor’s server. I was providing URLDownloader a static link, but since there was a redirect, I leveraged the
prefetch_filename argument to have it fetch the proper filename for the download. After the change on the vendor’s end, the
prefetch_filename option was fetching a lot of extraneous information, leaving me with a filename of about 100 characters in length. While I filed an AutoPkg issue for this (#818), I still needed to work around it for now. The desired filename was a subset of the filename fetched by the
prefetch_filename option, so my mind immediately went to regular expressions.
While there are other custom processors out there that can manipulate filenames (e.g., FindAndReplace), URLTextSearcher had the features I really wanted. The problem was that it only searches the contents of a file it downloads, not a filename. So rather than reinvent the wheel, I decided to build a custom processor based on URLTextSearcher. I took its code and eliminated all the download functionality. I then changed it so that the text to be searched would be whatever you assigned to the variable
text_in rather than a file. I uncreatively named this processor TextSearcher.
For the specific recipe I was dealing with,
%pathname% was a good choice for the value of
text_in, since it contained the filename I wanted to extract. Rather than construct a regular expression to extract the entire filename, however, I opted to grab the version number from the filename and assign that value to the variable
version, since renaming the installer
%NAME%-%version%.pkg was very similar to what the vendor was using and met my internal needs. (Lazily, the regular expression was also much easier to construct this way.) Problem solved!
I wasn’t going to write about this, thinking the case was too obscure or could be solved another way. But then, a possible use for this processor by someone else appeared in the MacAdmins Slack. In their case, Java was involved and the version info they wanted was reliably contained in a particular .jar filename within the app bundle. Hence this blog post.
If you have a use for TextSearcher, you can call it using the following shared processor syntax:
com.github.jazzace.processors/TextSearcher. You can see a recipe that uses it in my repo.
 AutoPkg’s default behaviour is to take the part of the URL after the final slash of the original request prior to any redirect. You can also give your download an arbitrary name by setting the processor’s
filename Input Variable (e.g., to
%NAME%.pkg). [Return to main text]
 Pro Tip: You can provide URLTextSearcher a
file:/// URL if you wish to search the contents of a file on the local system, even one you had downloaded earlier in the recipe chain. [Return to main text]