How to get the only PDF url from web page?

Question

I am trying to get some DOM elements using Selenium and I am doing all of this using Java but I am getting this Error when trying it out: I am still a newbie in all this but the code I am using to retrieve the DOM element is: I believe the error is that it cannot find the XPath

Accepted Answer

There is a href attribute is having pdf URL but that URL opens the pdf within webpage.So I extracted the pdf URL from href attribute and fetched the pdf name from that then concatenated with https://www.qp.alberta.ca/documents/Acts/ URL.You can write the code like below to get the pdf URL.Code to get PDF URL: driver = new ChromeDriver(); /*I hard coded below URL. You need parameterize based on your requirement.*/ driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link"); String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href"); System.out.println("Page PDF URL: " + pagePdfUrl); String pdfName = StringUtils.substringBetween(pagePdfUrl, "page=", ".cfm&"); driver.get("https://www.qp.alberta.ca/documents/Acts/" + pdfName + ".pdf");Code to download PDF:Required ChromOptions: ChromeOptions options = new ChromeOptions(); HashMap chromeOptionsMap = new HashMap(); chromeOptionsMap.put("plugins.plugins_disabled", new String[] { "Chrome PDF Viewer" }); chromeOptionsMap.put("plugins.always_open_pdf_externally", true); chromeOptionsMap.put("download.default_directory", "C:\Users\Downloads est"); options.setExperimentalOption("prefs", chromeOptionsMap); options.addArguments("--headless"); Accessing PDF: driver = new ChromeDriver(options); driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link"); String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href"); System.out.println("Page PDF URL: " + pagePdfUrl); String pdfName = StringUtils.substringBetween(pagePdfUrl, "page=", ".cfm&"); System.out.println("Only PDF URL: "+"https://www.qp.alberta.ca/documents/Acts/" + pdfName + ".pdf"); driver.get("https://www.qp.alberta.ca/documents/Acts/" + pdfName + ".pdf");OutPut:Page PDF URL: https://www.qp.alberta.ca/1266.cfm?page=2017ch18_unpr.cfm&leg_type=Acts&isbncln=9780779808571Only PDF URL: https://www.qp.alberta.ca/documents/Acts/2017ch18_unpr.pdfImport for StringUtils:import org.apache.commons.lang3.StringUtils;

How to get the only PDF url from web page?

Advertisement

Answer

Code to get `PDF` URL:

Code to download `PDF`: