Skip to content
Advertisement

HtmlUnit Scraping Xpath from Div

I am trying to scrape the contents of the google movies page, i want the name of the theater, the address and the time. As you can see in the google movie page each block of that information is inside a div with a class named theater, and inside that div theres the name, address and times of each theater.

So what i did was use htmlunit to extract a List of theater divs:

JavaScript

When printing the contents of the list i get the expected result:

JavaScript

Now i want to split this information into name, address and times, the problem is that when i do:

JavaScript

The result is the name of every single theater in the page:

JavaScript

How is it possible that i am getting all the Theaters if i am doing a getByXpath inside an object that doesnt even have that information?

Advertisement

Answer

You need to add a dot (.) at the beginning of the XPath to indicate that it meant to be relative to current context element which in this case is the first div (div.get(0)). Otherwise the XPath will ignore the context element and search for matching elements starting from the root :

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement