You can make modifications to the URLs, as they are being crawled, using the ‘Remove URL Parameters’ and ‘URL Rewriting’ features in Advanced Settings, in step 4 of the crawl setup.
These features are useful to undertake tasks such as removing URL components that are complicating analysis of your website or to rewrite URLs to an external website, such as lookup service e.g. retrieving information from an API for a set of your page URLs.
Stripping URL Parameters
If you simply want to strip out parameters, you can list them on separate lines in the Remove Parameters option in Advanced Settings. e.g. Add ‘param1’ to strip all parameters such as param1=1, param1=2, etc.
The URL rewriting function allows you to use regular expressions to modify your page URLs in more complex ways.
The URL is matched by the regular expression in the ‘Match From’ column, and replaced with what you set in the ‘Match To’ column.
If you use parentheses in the Match From setting, you in conjunction with variables $1, $2, etc. Lumar will insert whatever text matches the corresponding parenthetical group.
Append URL (must be in this order):
Match From1: (.+?.+)
Match To1: $1&url=someurl.com
Replace a domain:
Change the name of a parameter:
Change the case of a parameter:
Change HTTPS to HTTP:
Force Trailing Slash:
Match From: ([^/])$
Match To: $1/
Writing regular expressions can get tricky, so contact us at email@example.com if you need any help with these features.