I’m trying to download images form tumblr using java selenium. I extracted the url of the image from src and tried to download images from url. But the images saved not what I expected. Those are in unsupported formats and smaller in size. How can I correct this? Please help.
This is my code:
JavaScript
x
public static void main(String[] args) throws InterruptedException, AWTException, IOException {
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://artist-childe-hassam.tumblr.com/");
Thread.sleep(5000);
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_END);
robot.keyRelease(KeyEvent.VK_END);
List<WebElement> list = driver.findElements(By.xpath("//img[@alt]"));
int count;
count = 1;
for (WebElement element : list) {
String srcs = element.getAttribute("src");
String attribute = element.getAttribute("alt");
System.out.println("title: " + attribute);
System.out.println(" ");
System.out.println("link " + srcs);
URL url = new URL(srcs);
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1!=(n=in.read(buf)))
{
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream("path"+count+".jpg");
count++;
fos.write(response);
fos.close();
}
}
}
Advertisement
Answer
I don’t have idea on BufferedInputStream to download image from URL. Instead, I used to use curl to download from URL. I have modified your code and its working fine for me.
JavaScript
public static void main(String[] args) throws InterruptedException, AWTException, IOException {
WebDriverManager.chromedriver().setup();
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://artist-childe-hassam.tumblr.com/");
Thread.sleep(5000);
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_END);
robot.keyRelease(KeyEvent.VK_END);
List<WebElement> list = driver.findElements(By.xpath("//img[@alt]"));
int count;
count = 1;
for (WebElement element : list) {
String srcs = element.getAttribute("src");
String attribute = element.getAttribute("alt");
System.out.println("title: " + attribute);
System.out.println(" ");
System.out.println("link " + srcs);
downloadFromUrl(srcs,"Path" + count+".jpg",Duration.ofSeconds(20));
count++;
}
}
public static boolean downloadFromUrl(String url, String fileNameWithPath, Duration timeoutDuration) {
try {
if(timeoutDuration == null) {
timeoutDuration = Duration.ofMinutes(5);
}
String curlStr = "curl " + url + " --output " + fileNameWithPath;
Process process = Runtime.getRuntime().exec(curlStr);
long totalSeconds = 0;
System.out.println("Downloading file to " + fileNameWithPath + " ...");
while(process.isAlive()) {
Thread.sleep(1000);
totalSeconds++;
if(totalSeconds > timeoutDuration.getSeconds()) {
throw new Exception("Unable to download file even after 5 mins of wait");
}
}
System.out.println(fileNameWithPath + " got downloaded in seconds - " + totalSeconds);
return true;
} catch (Exception ex) {
ex.printStackTrace();
}
return false;
}