Home > Screaming Frog > Screaming Frog Custom Extractions: A Guide to Extracting Crawl Data
Screaming Frog Custom Extractions: A Guide to Extracting Crawl Data
Screaming Frog is a powerful SEO tool that has many features for search engine optimization. One of the lesser-known features, Screaming Frog Custom Extractions, allows you to easily extract data from your crawls. This blog post will discuss how Screaming Frog Custom Extraction works and why it can help improve your SEO efforts!
Websites have a ton of helpful information—most times, it’s too laborious or complicated to visit every page on a website to copy product data, metadata, title tags, anchor text into a spreadsheet. Here is where Screaming Frog comes to the rescue with custom data extractions to automate the process. Custom extractions are a form of web scraping, web harvesting, or web data extraction used to scrape and extract data from websites, giving the ability to store it locally on your computer.
For beginners, some questions you might have:
What is the Screaming Frog SEO Spider?
The Screaming Frog SEO Spider software is a website crawler that improves onsite SEO by extracting and analyzing your website’s data using a graphical user interface (GUI).
What are custom extractions?
Custom extractions are a set of Screaming Frogs SEO spider functions to extract explicit information from web pages. These extractions help optimize your site for Technical SEO, including search results, gather essential data on your copy, and help locate and fix errors.
How is Data Extraction done?
The process of data extraction involves pulling the required data on your website using a Screaming Frog web spider. The information is saved within Screaming Frog’s memory, giving you the option to export your scanned results to Excel or Google Sheets for further review.
Why is Data Extraction critical?
Data extraction allows you to harvest large amounts of data quickly and efficiently. This automation gives you immediate results of web architecture. This process saves you time and resources while giving you the valuable data you’ll need to plan and strategize search engine optimization strategies.
Screaming Frog is the go-to Web Scraper Tool for SEOs. The options are endless; here are a ton of custom web-scraping syntaxes.
How to Extract Custom Data using Screaming Frog
1. In ScreamingFrog, go to Configuration > Custom > Extraction.
Screaming Frog Custom Extraction
2. Next, you will need to +Add and set up your extraction rules.
Select elements of internal HTML using the Custom Extraction tab
3. Add a Title, 4. Select if you need CSSPath, XPath, or Regex, 5. Add your search function.
If you aren’t sure which selector or function you need, look at the examples below or use the inspect element function in Google Chrome Dev Tools. You can open Dev Tools by using “right-click” in the Google Chrome browser.
Example:
Here is an example of how you would scrape for a Facebook Pixel ID
Facebook Pixel ID Extraction
In the Results, you can see, one of my pages is missing a Facebook Pixel:
Missing Facebook ID
Below are predefined custom extraction datasets to get you started.
Basic Syntax for using XPath Web Scraping
SYNTAX
FUNCTION
//
Search anywhere in the document
/
Search within the root
@
Select a specific attribute of an element
*
The wildcard is used to select any element.
[ ]
Find a specific element.
.
Specifies the current element
..
Specifies the parent element
XPath functions
XPATH
OUTPUT
//h1
Extract all H1 tags
//h3[1]
Extract the first H3 tag
//h3[2]
Extract the second H3 tag
//div/p
Extract text – any <p> contained within a <div>
//div[@class='author']
Extract any <div> with class “author”
//p[@class='bio']
Extract any <p> with class “bio”
//*[@class='bio']
Extract any element with class “bio”
//ul/li[last()]
Extract the last <li> in a <ul>
//ol[@class='cat']/li[1]
Extract the first <li> in a <ol> with class “cat”
count(//h2)
Count the number of H2’s (set extraction filter to “Function Value”)
//a[contains(.,'click here')]
Extract any link with anchor text containing “click here.”
//a[starts-with(@title,'Written by')]
Extract any link with a title starting with “Written by.”
How to Extract Common HTML Elements
XPATH
OUTPUT
//@href
Extract all links
//a[starts-with(@href,'mailto')]/@href
Extract link that starts with “mailto” (email address)
//img/@src
Extract all image source URLs
//img[contains(@class,'aligncenter')]/@src
Extract all image source URLs for images with the class name containing “aligncenter.”
//link[@rel='alternate']
Extract elements with the rel attribute set to “alternate.”
Isaac Adams-Hands is the SEO Director at SEO North, a company that provides Search Engine Optimization services. Isaac has considerable expertise in Search Engine Optimization, Server Administration, and Cyber Security, which gives him a leg up as a Google Algorithm Analyst and SEO Expert.