Tag: web-crawler

How to parse a sitemap index that has compressed links

I’ve made a program that reads the /robots.txt and the /sitemap.xml of a page and substracts the available sitemaps and stores them on the siteMapsUnsorted list. Once there I use crawler-commons library to analyze if the links are SiteMaps or SiteMapIndexes (cluster of SiteMaps). When I use it on a normal siteMapIndex it works, the problem occurs in some cases

insert data stored in array list in excel using java

arrays excel java web-crawler

Its been a while i m trying to create a excel sheet to store the crawled data in a table format in a excel , the data is fetched from a url and stored in a array list , this data is needed to be stored in a array list ` Answer Your code is globally correct it has only