I am trying to get prices of routes on a bus page import requests from bs4 import BeautifulSoup import re this is the link https://new.turbus.cl/turbuscl/inicio-compra So right now I am able to find the input boxes of the search, but not sure were to run it. I am getting this error ValueError: too many values to unpack (expected 2) so
Tag: web-scraping
Selenium And Java: Exception in thread “main” org.openqa.selenium.NoSuchWindowException: no such window: target window already closed
I am accessing a Quebec Laws website and I am trying to web scrape all of its law names along with their associated PDFs. When doing this, I open each and every tab of each law and then go through all those tabs to get the information I am looking for. However, after a while of going through the tabs
How to get the only PDF url from web page?
I am trying to get some DOM elements using Selenium and I am doing all of this using Java but I am getting this Error when trying it out: I am still a newbie in all this but the code I am using to retrieve the DOM element is: I believe the error is that it cannot find the XPath
Exchange cookies between requests in OkHttp
I’m trying to scrape one website, and for that, I need to exchange the cookies and headers between all the requests. The question is the following: how can I achieve such behaviour in a smart way, not by resetting the cookies and headers manually between the Request and Response objects each time? Answer You’ll need a CookieJar. There’s an in-memory
Using JSoup to parse a String with Clojure
Using JSoup to parse a html string with Clojure, the source as the following Dependencies :dependencies [[org.clojure/clojure “1.10.1”] [org.jsoup/jsoup “1.13.1”]] …
Why getting “TypeError: redeclaration of const e.” while scraping with HtmlUnit?
I want to scrape live Bitcoin price using the HtmlUnit. I am running the following code to get the content of the website but getting an error. Exception that I get: Answer You got this error more or less because the js engine used by HtmlUnit is not 100% compatible with the js of current browsers. The engine gets improved
Get all texts after and between by using Jsoup
I am learning Jsoup by trying to scrap all the p tags, arranged by title from wikipedia site. I can scrap all the p tags between h2, from the help of this question: extract unidentified html content from between two tags, using jsoup? regex? by using but I can’t scrap it when there is a <div> between them. Here is
Logging in to a website with Jsoup which redirects, and scraping a page that isn’t the redirect
This is the website I’m trying to scrape from. I’m able to login to the website fairly easily. However, I’m unable to retrieve and reuse the cookies or session ID to scrape a page other than the one the login page redirects to. I receive a 403 everytime. Here is an example of what I’ve tried: Answer This code works
How to connect via HTTPS using Jsoup?
It’s working fine over HTTP, but when I try and use an HTTPS source it throws the following exception: Here’s the relevant code: Answer If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in
How can I efficiently parse HTML with Java?
I do a lot of HTML parsing in my line of work. Up until now, I was using the HtmlUnit headless browser for parsing and browser automation. Now, I want to separate both the tasks. I want to use a light HTML parser because it takes much time in HtmlUnit to first load a page, then get the source and