Skip to content
Advertisement

Tag: web-scraping

Exchange cookies between requests in OkHttp

I’m trying to scrape one website, and for that, I need to exchange the cookies and headers between all the requests. The question is the following: how can I achieve such behaviour in a smart way, not by resetting the cookies and headers manually between the Request and Response objects each time? Answer You’ll need a CookieJar. There’s an in-memory

Get all texts after and between by using Jsoup

I am learning Jsoup by trying to scrap all the p tags, arranged by title from wikipedia site. I can scrap all the p tags between h2, from the help of this question: extract unidentified html content from between two tags, using jsoup? regex? by using but I can’t scrap it when there is a <div> between them. Here is

How to connect via HTTPS using Jsoup?

It’s working fine over HTTP, but when I try and use an HTTPS source it throws the following exception: Here’s the relevant code: Answer If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in

Advertisement