AutomationMethodsScreaming Frog

Automating Screaming Frog Using Batch Scripts

By February 4, 2019 No Comments

A simple batch script which creates a time and date stamped directory, runs screaming frog using a custom configuration, exports the desired reports  and then renames the files.

I currently use a very similar approach to run crawls in the morning and archive crawls of our sites; and am looking into working crawl data into our build and deployment processes.

For now this script only works on Windows computers/laptops, if there’s enough interest I’ll replicate for Mac’s when I get the time.

Preparing your environment

Before you begin with the script, you need to create the below directory structure


/crawls
/crawls/results


In the scraper directory we will be putting our batch file as well as any spider configuration files we want to use.

In the scraper results directory we will create our date and time stamped directory and export the files from our crawl.

 

A brief intro to Batch scripting

Below is a table explaining some of the basics of batch scripting you'll need to know to understand

 

Code Description Example 
SETSet is used to declare environmental variables, all variables declared using set will expire once you close the command or exit the batch fileSET url=https://www.example.com

//Creates a variable called URL and sets it to https://www.example.com

%variableName%Once created, you can use and display the value of a variable by enclosing it in %'s. In order to print out the content of a variable, you need to use the Echo commandECHO %url%

//https://www.example.com 

CD [dirname]

or

CD [value]

CD, or Change Directory, allows you to navigate into specific directories and drives.

 

Helpfully, when you do not specify a cd operator, it will just return the directory you're in.

CD results

//enter the results directory 

CD ..

//exit the current directory and go to the preceding directory 

CD /d p:\

//exit the current drive and enter the P drive. 

::

Or

REM

You add comments using either :: or the rem statement.

Each line of comment will end on a carriage return

REM get today's date

:: get today's date

%DATE%The date variable is a built in environment variable. In plain English, your system maintains certain variables which you can use without having to declare or create them.ECHO %date% //{today's date} 
%CD%Using the environement variable %cd% is a helpful shortcut to return the current filepath as a string. This is helpful when working with generated filepathsSET cwd=%cd%

//create a variable called cwd and add the filepath as a string 

REN

[oldfile/dirname] [newfile/dirname]
The ren function allows you to rename a file or foldeerREN oldfile.txt newfile.txt

//changes oldfile.txt to newfile.txt

MD "filename"

or

MKDIR

The MD command creates folders, allowing for both one and several folders to made in one instance

MD "new file"

//creates a new directory in the current working directory called new file. 

MD "new file" "other new file"

//creates two files in the current working directory, one called new file and the second other new file. 

 

What this script does in plain English

  • Lines 1 – 11 declare variables with filepaths and the target domain name.
  • Lines 12 – 17  create a string with today's date and time combined. So to standardise the use of periods, slashes and colons I've also replaced them all with slashes
  • Lines 18 – 22 create a new directory, using the above timestamp as a folder name
  • Lines 23 - 24 enters into the Screaming Frog directory and provides the instructions for what to crawl, what to export and the target export directory
  • Lines 25 - 28 navigate to the new directory and rename all of the files to indicate the domain being crawled.

The code 

Replace the enboldened text with the corresponding filepath/name based on your computer or preference.

Make sure not to include trailing slashes where indicated and to refrain from including spaces after the = operator.

All filepaths should be absolute, this script is only built to work with absolute URLs.

It's not necessary to copy the Start of Script and End of Script comments - they're just there for clarity


::**********Start of script**********

set results=Z:\My Documents\Scraping\OSEO\Results
:: %results%
:: do not add trailing slash

set sfclidir=C:\Program Files (x86)\Screaming Frog SEO Spider\
:: %sfclidir%
:: include trailing slash

set configFile=Z:\My Documents\Scraping\OSEO\Scrape Config.seospiderconfig
:: %configFile%
:: ought to have .seospiderconfig file extension

set domain=https://opensourceseo.org/
:: %domain%

::create date & time stamped directory
set dateString=%DATE:/=-%
set timeString=%TIME:~0,2%-%TIME:~3,2%-%TIME:~6,2%
set ToDaysDate=%dateString%%timeString: =-%
::%ToDaysDate%

set newFilePath="%ToDaysDate%"
chdir /d %results%
mkdir %newFilePath%
chdir /d %ToDaysDate%
set outputDir=%cd%
::%outputDir%

chdir /d "%sfclidir%"
ScreamingFrogSEOSpiderCli.exe --config "%configFile%" --crawl "%domain%"  --save-crawl --headless --output-folder "%outputDir%" --export-format "csv" --export-tabs "Internal:All,Response Codes:All"

chdir /d %outputDir%
REN *.csv *.
REN *. *-opensourceseo.
REN *. *.csv

::**********End of Script**********

 

Here's a screencast showing how to edit the above

 

 

Creating the .bat file and adding a licence.txt

Copy the above code into notepad and save as sitename-crawl.bat (obviously changing sitename to reflect the site).  You may need to select All files over .txt files when saving.

The Screaming Frog user guide recommends including a licence, EULA acceptance flag and a preference for how you want your database storage, more information can be found here https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#commandlineoptions

Task Scheduler setup

In order to schedule your crawls, you now need to setup Microsoft Tasks Scheduler.

Search on your computer for Task Scheduler and follow the following instructions:

  1. In the Action pane, click on "Create Basic Task"
  2. Give the task a name and description, then click next
  3. Indicate how often you would like it to run, then click next (depending on your choice there may be a requirement to pick a day to run the crawls)
  4. Select "Start a program" on actions
  5. Under the "Start a program" view, click browse and select your newly created .bat file, click next
  6. Confirm details and click finish.

Once again here's a quick screencast of how to set this part up

mm

About Tom Gregan

London based organic search and content marketing expert, helping brands with their inbound marketing efforts through data driven strategic consultancy - http://www.tom-gregan.co.uk/