scrape an angularjs website with java

Question

I need to scrape a website with content &#8216;inserted&#8217; by Angular. And it needs to be done with java. I have tried Selenium Webdriver (as I have used Selenium before for scraping less dynamic webpages). But I have no idea how to deal with the Angular part. Apart from the script tags in the head sectio…

Accepted Answer

In the end, I have followed Madusudanan ‘s excellent advise and I looked into PhantomJS / Selenium combination. And there actually is a solution! Its called PhantomJSDriver. You can find the maven dependency here. Here is more info on ghost driver. The setup in Maven- I have added the following: net.sourceforge.htmlunit htmlunit 2.41.0 com.github.detro phantomjsdriver 1.2.0 It also runs with Selenium version 2.45 which is the latest version up until now. I am mentioning this, because of some articles I read in which people say that the Phantom driver isn’t compatible with every version of Selenium, but I guess they addressed that problem in the meantime. If you are already using a Selenium/Phantomdriver combination and you are getting ‘strict javascript errors’ on a certain site, update your version of selenium. That will fix it. And here is some sample code: public void testPhantomDriver() throws Exception { DesiredCapabilities options = new DesiredCapabilities(); // the website i am scraping uses ssl, but I dont know what version options.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, new String[] { "--ssl-protocol=any" }); PhantomJSDriver driver = new PhantomJSDriver(options); driver.get("https://www.mywebsite"); List elements = driver.findElementsByClassName("media-title"); for(WebElement element : elements ){ System.out.println(element.getText()); } driver.quit();}

Advertisement

Answer