Tag: web-scraping

ajax search and scraping prices of bus page // maybe selenium?

ajax java python selenium-chromedriver web-scraping

I am trying to get prices of routes on a bus page import requests from bs4 import BeautifulSoup import re this is the link https://new.turbus.cl/turbuscl/inicio-compra So right now I am able to find the input boxes of the search, but not sure were to run it. I am getting this error ValueError: too many values to unpack (expected 2) so

Selenium And Java: Exception in thread “main” org.openqa.selenium.NoSuchWindowException: no such window: target window already closed

java selenium web-scraping

I am accessing a Quebec Laws website and I am trying to web scrape all of its law names along with their associated PDFs. When doing this, I open each and every tab of each law and then go through all those tabs to get the information I am looking for. However, after a while of going through the tabs

How to get the only PDF url from web page?

dom java javascript selenium web-scraping

I am trying to get some DOM elements using Selenium and I am doing all of this using Java but I am getting this Error when trying it out: I am still a newbie in all this but the code I am using to retrieve the DOM element is: I believe the error is that it cannot find the XPath

Exchange cookies between requests in OkHttp

java okhttp web-scraping

I’m trying to scrape one website, and for that, I need to exchange the cookies and headers between all the requests. The question is the following: how can I achieve such behaviour in a smart way, not by resetting the cookies and headers manually between the Request and Response objects each time? Answer You’ll need a CookieJar. There’s an in-memory

Using JSoup to parse a String with Clojure

clojure java jsoup web-scraping

Using JSoup to parse a html string with Clojure, the source as the following Dependencies :dependencies [[org.clojure/clojure “1.10.1”] [org.jsoup/jsoup “1.13.1”]] …

Why getting “TypeError: redeclaration of const e.” while scraping with HtmlUnit?

htmlunit java web-scraping

I want to scrape live Bitcoin price using the HtmlUnit. I am running the following code to get the content of the website but getting an error. Exception that I get: Answer You got this error more or less because the js engine used by HtmlUnit is not 100% compatible with the js of current browsers. The engine gets improved

Get all texts after and between by using Jsoup

html java jsoup web-scraping wikipedia

I am learning Jsoup by trying to scrap all the p tags, arranged by title from wikipedia site. I can scrap all the p tags between h2, from the help of this question: extract unidentified html content from between two tags, using jsoup? regex? by using but I can’t scrap it when there is a <div> between them. Here is

Logging in to a website with Jsoup which redirects, and scraping a page that isn’t the redirect

authentication java jsoup redirect web-scraping

This is the website I’m trying to scrape from. I’m able to login to the website fairly easily. However, I’m unable to retrieve and reuse the cookies or session ID to scrape a page other than the one the login page redirects to. I receive a 403 everytime. Here is an example of what I’ve tried: Answer This code works

How to connect via HTTPS using Jsoup?

android https java jsoup web-scraping

It’s working fine over HTTP, but when I try and use an HTTPS source it throws the following exception: Here’s the relevant code: Answer If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in

How can I efficiently parse HTML with Java?

html html-parsing java parsing web-scraping

I do a lot of HTML parsing in my line of work. Up until now, I was using the HtmlUnit headless browser for parsing and browser automation. Now, I want to separate both the tasks. I want to use a light HTML parser because it takes much time in HtmlUnit to first load a page, then get the source and