Skip to content
Advertisement

Get all texts after and between by using Jsoup

JavaScript

I am learning Jsoup by trying to scrap all the p tags, arranged by title from wikipedia site. I can scrap all the p tags between h2, from the help of this question:
extract unidentified html content from between two tags, using jsoup? regex?

by using

JavaScript

but I can’t scrap it when there is a <div> between them. Here is the wikipedia site I am working on: https://simple.wikipedia.org/wiki/Battle_of_Hastings

How can I grab all the p tags where they are between two specific h2 tags? Preferably ordered by id.

Advertisement

Answer

Try this option : Elements elements = doc.select(“span.mw-headline, h2 ~ div, h2 ~ p”);

sample code :

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement