I am storing a TAR file in Google Cloud Storage. The file can be successfully downloaded via gsutil
and extracted in my computer using macOS Archive Utility. However, the Java program that I implement always encounter java.io.IOException: Corrupted TAR archive
upon accessing the file. I have tried several ways and all of them are utilizing the org.apache.commons:commons-compress
library. Can you give me insight on how to fix this problem or something that I can try on?
Here are the implementations that I have tried:
Blob blob = storage.get(BUCKET_NAME, FILE_PATH); blob.downloadTo(Paths.get("filename.tar")); String contentType = blob.getContentType(); // application/x-tar InputStream is = Channels.newInputStream(blob.reader()); String mime = URLConnection.guessContentTypeFromStream(is); // null TarArchiveInputStream ais = new TarArchiveInputStream(is); ais.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive InputStream is2 = new ByteArrayInputStream(blob.getContent()); String mime2 = URLConnection.guessContentTypeFromStream(is2); // null TarArchiveInputStream ais2 = new TarArchiveInputStream(is2); ais2.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive InputStream is3 = new FileInputStream("filename.tar"); String mime3 = URLConnection.guessContentTypeFromStream(is3); // null TarArchiveInputStream ais3 = new TarArchiveInputStream(is3); ais3.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive TarFile file = new TarFile(blob.getContent()); // raise java.io.IOException: Corrupted TAR archive TarFile tarFile = new TarFile(Paths.get("filename.tar")); // raise java.io.IOException: Corrupted TAR archive
Addition: I have tried to parse a JSON from GCS and it’s working fine.
Blob blob = storage.get(BUCKET_NAME, FILE_PATH); JSONTokener jt = new JSONTokener(Channels.newInputStream(blob.reader())); JSONObject jo = new JSONObject(jt);
Advertisement
Answer
The problem is that your tar
is compressed, it is a tgz
file.
For that reason, you need to decompress the file when processing your tar contents.
Please, consider the following example; note the use of the common compress builtin GzipCompressorInputStream
class:
public static void main(String... args) { final File archiveFile = new File("latest.tar"); try ( FileInputStream in = new FileInputStream(archiveFile); GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in); TarArchiveInputStream tarIn = new TarArchiveInputStream(gzIn) ) { TarArchiveEntry tarEntry = tarIn.getNextTarEntry(); while (tarEntry != null) { final File path = new File("/tmp/" + File.separator + tarEntry.getName()); if (!path.getParentFile().exists()) { path.getParentFile().mkdirs(); } if (!tarEntry.isDirectory()) { try (OutputStream out = new FileOutputStream(path)){ IOUtils.copy(tarIn, out); } } tarEntry = tarIn.getNextTarEntry(); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }