Get all texts after and between by using Jsoup

Tags: , , , ,

<h2><span class="mw-headline" id="The_battle">The battle</span></h2>
<div class="thumb tright"></h2>
<p>text I want</p>
<p>text I want</p>
<p>text I want</p>
<p>text I want</p>
<h2>Second Title I want to stop collecting p tags after</h2>

I am learning Jsoup by trying to scrap all the p tags, arranged by title from wikipedia site. I can scrap all the p tags between h2, from the help of this question:
extract unidentified html content from between two tags, using jsoup? regex?

by using

Elements elements =", h2 ~ p");

but I can’t scrap it when there is a <div> between them. Here is the wikipedia site I am working on:

How can I grab all the p tags where they are between two specific h2 tags? Preferably ordered by id.


Try this option : Elements elements =“, h2 ~ div, h2 ~ p”);

sample code :

package jsoupex;

import org.jsoup.Jsoup;
import org.jsoup.helper.Validate;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;


 * Example program to list links from a URL.
public class stackoverflw {
    public static void main(String[] args) throws IOException {

        //Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        //String url = "http://localhost/stov_wiki.html";
        String url = " ";
        System.out.println("Fetching %s..." + url);

        Document doc = Jsoup.connect(url).get();
        Elements elements =", h2 ~ div, h2 ~ p");

        for (Element elem : elements) {
            if ( elem.hasClass("mw-headline")) {
            if ( elem.hasClass("mw-headline")) {
            } else {

Source: stackoverflow