Skip to content
Advertisement

HTMlUnit – getByXPath – Get Values Back From Attribute List

I’m trying to get just the value from an xpath query for hrefs attributes but I can’t figure out how to state the query, at best I get my refs back in a list of DomAttr that I need to use getValue() on to get the actual link.

My very simple set-up is the following:

WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage(siteRef);
var hrefs = page.getByXPath("//@href"); // Returns a list of DomAttr

E: This returns the value but it also only returns the first element it finds

var hrefs = page.getByXPath("string(//@href)");

Advertisement

Answer

I guess you are right, there is no way to get an array (or List) of String from getByXPath values.

Nevertheless, you can achieve that behavior by utilizing java streams. There you benefit from additional possibilities to work with that result list (e.g. filter it or use additional processing like toLowerCase on Strings):

var hrefs = page.getByXPath("//@href")
                .stream()
                .filter(o -> o instanceof DomAttr) //to be sure you have the correct type
                .map(o -> ((DomAttr) o)) //cast the stream from Object to DomAttr
                .map(DomAttr::getValue) //get value of every DomAttr
                .collect(Collectors.toList()); //collect it to a list

hrefs now contains a List<String>.

Instead of collecting the results in last step you can further work with the stream.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement