Skip to content
Advertisement

How to extract all text’s link and other properties form html?

Note, if it is single element I can extract but I need to extract all of them together.

Hi I am trying to extract the text and link from a list of items from a page using Selenium and Java. I am able to extract all link text but facing issue to figure out the link text. The html code looks like below:

<div class="col-12">
        <a href="/category/agricultural-products-service">
                <img src="/assets/images/icon/1.jpg" alt="icon" class="img-fluid category_icon">
                    <h5 class="category_title">Agricultural </h5>
        </a>
 </div>
<div class="col-12">
        <a href="/category/products-service">
                <img src="/assets/images/icon/7.jpg" alt="icon" class="img-fluid category_icon">
                    <h5 class="category_title">Products</h5>
        </a>
 </div>

Using h5 I can extract all the elements but I need to extract all href of those elements

Advertisement

Answer

To extract text or link or any other attribute value from several web elements you need to collect all these elements in a list and then to iterate over the list extracting the desired value from each web element object.
As following:

List<WebElement> elements = driver.findElements(By.tagName("h5"));
for(WebElement element : elements){
    String value = element.getText();
    System.out.println(value);
}

This will give you all the links there

List<WebElement> links = driver.findElements(By.cssSelector(".top_cat a"));
for(WebElement link : links){
    String value = link.getAttribute("href");
    System.out.println(value);
}

On this specific page the structure is:
There are several blocks defined by class="col-12 col-sm-6 col-md-4 border all_cat" elements. Inside each such block several links and titles. Each a is below the class="col-12 col-sm-6 col-md-4 border all_cat" element and the title is below it a element. So, extracting the links and titles here can be done as following:

List<WebElement> blocks = driver.findElements(By.cssSelector(".all_cat"));
for(WebElement block : blocks){
    List<WebElement> links = block.findElements(By.xpath(".//a"));
    for(WebElement link : links){
        String linkValue = link.getAttribute("href");
        System.out.println("The link is " + linkValue);
        WebElement title = block.findElements(By.xpath(".//h5"));
        String titleValue = title.getText();
        System.out.println("The title is " + titleValue);
    }
}
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement