I’m trying to get links from html of a site but unable to do so using Jsoup.
This is the HTML:
<div class="anime_muti_link"> <ul> <li><div class="doamin">Domain</div><div class="link">Link</div></li> <li class="anime"> <a href="#" class="active" rel="1" data-video="example.com" ><div class="server m1">Server m1</div><span>Watch This Link</span></a> </li> <li class="anime"> <a href="#" rel="1" data-video="example.com" ><div class="server m1">Server m2</div><span>Watch This Link</span></a> </li> <li class="xstreamcdn"> <a href="#" rel="29" data-video="example.com">Xstreamcdn</div><span>Watch This Link</span></a> </li> <li class="mixdrop"> <a href="#" rel="7" data-video="example.com"><div class="server mixdrop">Mixdrop</div><span>Watch This Link</span></a> </li> <li class="streamsb"> <a href="#" rel="13" data-video="example.com">StreamSB</div><span>Watch This Link</span></a> </li> <li class="doodstream"> <a href="#" rel="14" data-video="example.com">Doodstream</div><span>Watch This Link</span></a> </li> </ul> </div>
This is the android code that I wrote which doesn’t seem to work:
try { Document doc = Jsoup.connect(URL).get(); Elements content = doc.getElementsByClass("anime_muti_link"); Elements links = content.select("a"); String[] urls = new String[links.size()]; for (int i = 0; i < links.size(); i++) { urls[i] = links.get(i).attr("data-video"); if (!urls[i].startsWith("https://")) { urls[i] = "https:" + urls[i]; } } arrayList.addAll(Arrays.asList(urls)); Log.d("CALLING_URL", "Links: " + Arrays.toString(urls)); } catch (IOException e) { e.getMessage(); }
Can someone please help me with this? Thanks
Edit: Basically I’m trying to get those 6 links and add them to my list to use it within the app.
Edit 2:
So I found another HTML that can seems better:
<div class="heading-servers"> <span><i class="fa fa-signal"></i> Servers</span> <ul class="servers"> <li data-vs="https://example.com" class="server server-active" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Netu</li> <li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">VideoVard</li> <li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Doodstream</li> <li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Okstream</li> </ul> </div>
Advertisement
Answer
As you can see, in this li
definition you are including a nested div
:
<li class="xstreamcdn"> <a href="#" rel="29" data-video="example.com">Xstreamcdn</div><span>Watch This Link</span></a> </li>
This is causing that the variable content, the HTML fragment with class anime_muti_link
, to look like:
<div class="anime_muti_link"> <ul> <li> <div class="doamin"> Domain </div> <div class="link"> Link </div></li> <li class="anime"> <a href="#" class="active" rel="1" data-video="example.com"> <div class="server m1"> Server m1 </div><span>Watch This Link</span></a> </li> <li class="anime"> <a href="#" rel="1" data-video="example.com"> <div class="server m1"> Server m2 </div><span>Watch This Link</span></a> </li> <li class="xstreamcdn"> <a href="#" rel="29" data-video="example.com">Xstreamcdn</a></li> </ul> </div>
A similar result will be obtained even if you tidy your HTML. I used this code from one of my previous answers:
Tidy tidy = new Tidy(); tidy.setXHTML(true); tidy.setIndentContent(true); tidy.setPrintBodyOnly(true); tidy.setInputEncoding("UTF-8"); tidy.setOutputEncoding("UTF-8"); tidy.setSmartIndent(true); tidy.setShowWarnings(false); tidy.setQuiet(true); tidy.setTidyMark(false); org.w3c.dom.Document htmlDOM = tidy.parseDOM(new ByteArrayInputStream(html.getBytes()), null); OutputStream out = new ByteArrayOutputStream(); tidy.pprint(htmlDOM, out); String tidiedHtml = out.toString(); // System.out.println(tidiedHtml); Document document = Jsoup.parse(tidiedHtml); Elements content = document.getElementsByClass("anime_muti_link"); System.out.println(content);
And this is why you are finding only three anchors.
Please, try correcting your HTML or selecting the anchor tag as the document level instead:
Document document = Jsoup.parse(html); // Elements content = document.getElementsByClass("anime_muti_link"); // System.out.println(content); Elements links = document.select("a"); String[] urls = new String[links.size()]; for (int i = 0; i < links.size(); i++) { urls[i] = links.get(i).attr("data-video"); if (!urls[i].startsWith("https://")) { urls[i] = "https://" + urls[i]; } } System.out.println(Arrays.asList(urls));
If the result obtained contains undesired links, perhaps you can try narrowing the selector used, something like:
document.select(".anime_muti_link a")
If this doesn’t work, another possible alternative could be selecting the anchor elements with a data-video
attribute, a[data-video]
:
Document document = Jsoup.parse(html); Elements videoLinks = document.select("a[data-video]"); String[] urls = new String[videoLinks.size()]; for (int i = 0; i < videoLinks.size(); i++) { urls[i] = videoLinks.get(i).attr("data-video"); if (!urls[i].startsWith("https://")) { urls[i] = "https://" + urls[i]; } } System.out.println(Arrays.asList(urls));
With your new test case, you can obtain the desired information with a very similar code:
String html = "<div class="heading-servers">n" + " <span><i class="fa fa-signal"></i> Servers</span>n" + " <ul class="servers">n" + " <li data-vs="https://example.com" class="server server-active" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Netu</li>n" + " <li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">VideoVard</li>n" + " <li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Doodstream</li>n" + " <li data-vs="https://example.com" class="server" style="display: block;" onclick="return loadIframe('ifrm', this.getAttribute('data-vs'));">Okstream</li>n" + " </ul>n" + " </div>"; Document document = Jsoup.parse(html); Elements videoLinks = document.select("div.heading-servers ul.servers li.server"); String[] urls = new String[videoLinks.size()]; for (int i = 0; i < videoLinks.size(); i++) { urls[i] = videoLinks.get(i).attr("data-vs"); if (!urls[i].startsWith("https://")) { urls[i] = "https://" + urls[i]; } } System.out.println(Arrays.asList(urls));
The most important part is the definition of the selector that should be applied to the parsed document, div.heading-servers ul.servers li.server
in our case.
I provided a selector with many fragments, but depending on the actual use HTML it could be simplified with ul.servers li.server
or even li.server
.