Skip to content
Advertisement

how to just extract the last 2 days recent files from tftpfilelist based on modified time without storing in a tbufferoutput component-talend job

As of now i am iterating through all the 5k files available in the folder and store them in a tbufferoutput and read through them by using tbufferinput and sorting them based on mtime desc(modified time in the ftp site) in the descending order and extract the top 10 files only.

Since its iterating through all the 5k files at once its time consuming and causing unnecessary latency issues with the remote ftp site.

i was wondering if there is any other simple way without iterating just get the latest top 10 files from the ftp site directly and sort them based on mtime desc and perform operations with them?

My talend job flow looks like this at the moment,would advise any other methods that could optimize the performance of the job in a much better way! enter image description here

Basically i dont want to iterate and run through all the files in the ftp site,instead directly get the top 10 from the remote ftp :tftpfilelist and perform checks in db and download them later

IS THERE ANYWAY WITHOUT ITERATING ,CAN I JUST GET THE LATEST 10 FILES just by using modified timestamp in desc order alone?-This is the question in short OR I want to extract the LAST 3 days files from the remote ftp site.

Filename is in this format:A_B_C_D_E_20200926053617.csv

Approach B:WITH JAVA, I tried using the tjava code as below: for the flow B:

Date lastModifiedDate = TalendDate.parseDate("EEE MMM dd HH:mm:ss zzz yyyy", row2.mtime_string);

Date current_date = TalendDate.getCurrentDate();

System.out.println(lastModifiedDate);

System.out.println(current_date);
System.out.println(((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")));

if(TalendDate.diffDate(current_date, lastModifiedDate,"dd") <= 1) {

System.out.println

output_row.abs_path = input_row.abs_path;

System.out.println(output_row.abs_path);
}

Now the tlogrow3 is printing NULL values all over,please suggest enter image description here

Advertisement

Answer

Define 3 context variables :

enter image description here

in tJava, compute the mask (with wildcard) for the 3 days (starting at the current date) :

Date currentDate = TalendDate.getCurrentDate();
Date currentDateMinus1 = TalendDate.addDate(currentDate, -1, "dd");
Date currentDateMinus2 = TalendDate.addDate(currentDate, -2, "dd");

context.mask1 ="*" + TalendDate.formatDate("yyyyMMdd", currentDate) + "*.csv";
context.mask2 ="*" + TalendDate.formatDate("yyyyMMdd", currentDateMinus1) + "*.csv";
context.mask3 ="*" + TalendDate.formatDate("yyyyMMdd", currentDateMinus2) + "*.csv";

then in the tFTPFileList, use the 3 context variables for filemask :

enter image description here

to retrieve the files only from today and the 2 previous day.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement