How to monitor an entire website for changes automatically and track all pages & URLs with crawlers - AlertBits

How-to Guides Automation 

How to monitor an entire website for changes automatically and track all pages & URLs with crawlers

AlertBits logo

AlertBits Team  •  15 May 2023

Work desk with laptop

Do you want to monitor all the pages of a website?

You could manually find and add individual URLs to a service like AlertBits to monitor them...

...but that'd be time-consuming, slow, and tedious.

Worry not, however, as it's possible to automate this process easily.

Whether you want to monitor webpages to detect changes or just want to keep an archive/snapshot of your site, you can use crawlers that will find and add URLs for you.

Monitoring individual pages for changes can already help you stay ahead of competitors, react faster to market changes, and gather important insight for better decision-making.

But by automating the process of finding and adding new URLs regularly, you can save even more time & costs, make your life & job easier, and further increase productivity, ROI & growth.

And here's how you can do it...

Monitor all pages & URLs with crawlers

In AlertBits, you can manually add URLs to monitor so you can receive alerts & notifications whenever the page changes.

But to automate this process, you'll need to create a Crawler.

Go to "Crawlers", and then "Add Crawler".

The idea is simple: you provide the URL of a website that the crawler will periodically visit.

In that URL, the crawler will collect all the links that are found on the page.

Then the crawler will visit those links to find more links.

And through this process, the crawler will find all the links and URLs that lead to different pages on the site.

Then, the crawler will automatically add those found URLs to your AlertBits account to monitor.

Crawlers can also search through XML sitemaps to gather URLs.

Here are the details of each field in the Add Crawler form:

Starting URL

This is the first URL the crawler will visit.

You'll preferably want to provide the URL to a page that contains a lot of links to the different pages on that website.

If the first visited page has a navigation bar or footer area that has a lot of links to other pages on that site, that will make things easier.

Do note that, if the starting URL has no other internal links (e.g. links to different pages on the same domain or origin) on the page, the crawler won't be able to continue.

You can also set the Starting URL to be a website's XML sitemap or sitemap index file, and the crawler should be able to find URLs from that as well.

Conditions (optional)

Sometimes, you don't want to monitor literally all the pages of a website.

Perhaps you only want to monitor all blog posts (e.g. all URLs that start with https://example.com/blog/), or not monitor any URLs that match a certain value.

For that, you can add Conditions to filter which URLs the crawler will find and add.

You can specify a "URL contains" condition and the crawler will only add URLs that contain your provided value.

Or, you can use a "URL doesn't contain" condition to filter out URLs that you don't care about monitoring.

Note: the crawler will only find and add URLs that are part of the same website (specifically the same HTTP origin). The crawler won't visit or collect links/URLs that point to a different website or domain.

Crawling frequency

In the "Crawl every" field, you can specify how frequently the crawler will automatically run to crawl the site again (hopefully to find new URLs).

After the initial crawl, subsequent crawls will look for new URLs and only add new URLs that the crawler didn't previously find.

Pro tip: after creating a crawler and running it for the first time, you can disable the crawler to stop it from automatically running again in the future.

Max. URL Visits

This field specifies the maximum number of pages the crawler will visit on a site to find links/URLs before stopping.

This is because some sites can be very big and contain thousands of pages; crawling through all of them can take prohibitively long.

So to test things out, you can start with a low number like 3, meaning the crawler will visit the first 3 pages it finds on the site, gather all the links/URLs found on them, and stop there.

If the crawler is not able to find some URLs, you'll likely need to increase this number so the crawler can look through more pages on the site.

While it highly depends on the structure of the site, generally the more pages the crawler visits the more links/URLs it'll be able to find.

The maximum number you can set this field to will depend on your subscription plan.

Cost

Every time the crawler runs to look for URLs, a certain amount will be deducted from your account balance (e.g. "Checks").

Similar to how a standard capture of a webpage usually costs 1.00 in balance, the standard crawler run costs 1.00.

So if your subscription plan gives you 1,000 checks per month, you can run a 1.00 cost crawler a thousand times.

The cost goes up based on "Max. URL visits", and how many checks you're given per month/year depends on your subscription plan.

Page monitoring settings

For every URL the crawler finds, it'll be added to your account to monitor based on the settings you specify.

So you'll get to choose settings such as checking frequency (e.g. how often that specific page/URL is checked for changes, not the crawling frequency) and your typical monitoring and alert configurations.

You can choose to monitor for Visual changes (either a specific area or the entire page) and modify the minimum change trigger.

Or, you can track the found URLs for Element changes (either a specific HTML element or the entire page) and configure conditions accordingly.

You can also modify advanced settings such as storage, device width, actions (like click, type, etc.), start & end times, location, cookies, and more.

Note: when fetching a page to preview, the page specified in the "Starting URL" field will be fetched and loaded. If you want, you can fetch a different page first to configure your monitoring settings and then change the "Starting URL" field later.

Managing crawlers

After you've created a crawler, you can enable/disable it and edit its settings as needed.

You can also see all the pages/URLs the crawler found and added to your account.

Crawler-added pages are just like manually-added pages, so you can enable/disable and edit them individually as well.

Conclusion

Crawlers can make large-scale website monitoring efforts really easy to set up and manage.

They can browse through webpages or sitemaps to find URLs that you care about so you can monitor information that's important to you.

And they do this all on autopilot!

That means you don't have to waste your precious time doing all of this and can instead spend more time on more valuable tasks.

So if you're interested in trying out AlertBits, sign up for an account here (it's completely free, after all): Get Started Now.

And if you want to test out our crawlers, check out the different plans available here: Pricing & ROI.

If you have any questions or feedback, definitely send them to us at hi@alertbits.com.

AlertBits logo

AlertBits Team

AlertBits content and editorial team.

Still have a question?

No worries, just send us an email at hi@alertbits.com