Sitemap Parser v0.7

Sitemap Parser

Sitemap Parser?

Sitemap Parser is a simple command line tool which gets a sitemap URI and returns a list of all URLs in that URI, it will iterate over all links in the sitemap recursively, so if the sitemap is an index it will iterate over all sitemaps in the index, the same goes for a compressed (zip) sitemaps.

How is the Parsing Done?

The inner “Engine” which the sitemap parser uses is the useful crawler-commons library.

I am lucky to be part of that project too.

What's new in the v0.7 Release?

Upgraded Platform & Dependencies

  • Java LTS upgrade — Updated to the latest Java LTS version (#10)
  • Dependencies updated — All libraries bumped to their latest versions (#11)

Improved Developer Experience

  • README refresh — Revamped the README for better readability (#12)
  • Batch file improvement — The batch/run script now automatically picks up the latest SitemapParser version (#13)
  • Folder hierarchy reorg — Reorganized the project’s folder structure for clarity (#15)

Code Quality

  • Logging overhaul — Cleaned up and standardized the logging setup (#14)
  • Code cleanup — General code refactoring and housekeeping (#16)

Download Link

Chaiware