Skip to content
Advertisement

Why can’t Nextflow handle this awk phrase?

Background: Using a csv as input, I want to combine the first two columns into a new one (separated by an underscore) and add that new column to the end of a new csv.

Input:

column1,column2,column3
1,2,3
a,b,c

Desired output:

column1,column2,column3,column1_column2
1,2,3,1_2
a,b,c,a_b

The below awk phrase works from the command line:

awk 'BEGIN{FS=OFS=","} {print $0, (NR>1 ? $1"_"$2 : "column1_column2")}' file.csv > full_template.csv

However, when placed within a nextflow script (below) it gives an error.

#!/usr/bin/env nextflow

params.input = '/file/location/here/file.csv'

process unique {
    input:
    path input from params.input

    output:
    path 'full_template.csv' into template

    """
    awk 'BEGIN{FS=OFS=","} {print $0, (NR>1 ? $1"_"$2 : "combined_header")}' $input > full_template.csv
    """
}

Here is the error:

N E X T F L O W  ~  version 21.10.0
Launching `file.nf` [awesome_pike] - revision: 1b63d4b438
class groovyx.gpars.dataflow.expression.DataflowInvocationExpression cannot be cast to class java.nio.file.FileSystem (groovyx.gpars.dataflow.expression.Dclass groovyx.gpars.dataflow.expression.DataflowInvocationExpression cannot be cast to class java.nio.file.FileSystem (groovyx.gpars.dataflow.expression.DataflowInvocationExpression is in unnamed module of loader 'app'; java.nio.file.FileSystem is in module java.base of loader 'bootstrap')

I’m not sure what is causing this, and any help would be appreciated.

Thanks!

Edit:

Yes it seems this was not the source of the error (sorry!). I’m trying to use splitCsv on the resulting csv and this appears to be what’s causing the error. Like so:

Channel
    .fromPath(template)
    .splitCsv(header:true, sep:',')
    .map{ row -> tuple(row.column1, file(row.column2), file(row.column3)) }
    .set { split }

I expect my issue is it’s not acceptable to use .fromPath on a channel, but I can’t figure out how else to do it.


Edit 2:

So this was a stupid mistake. I simply needed to add the .splitCsv option directly after the input line where I invoked the channel. Hardly elegant, but appears to be working great now.

process blah {

    input:
    what_you_want from template.splitCsv(header:true, sep:',').map{ row -> tuple(row.column1, file(row.column2), file(row.column3)) }

Advertisement

Answer

I was unable to reproduce the error you’re seeing with your example code and Nextflow version. In fact, I get the expected output. This shouldn’t be much of a surprise though, because you have correctly escaped the special dollar variables in your AWK command. The cause of the error is likely somewhere else in your code.

If escaping the special characters gets tedious, another way is to use a shell block instead:

It is an alternative to the Script definition with an important difference, it uses the exclamation mark ! character as the variable placeholder for Nextflow variables in place of the usual dollar character.

The example becomes:

params.input_csv = '/file/location/here/file.csv'

input_csv = file( params.input_csv)


process unique {

    input:
    path input_csv

    output:
    path 'full_template.csv' into template

    shell:
    '''
    awk 'BEGIN { FS=OFS="," } { print $0, (NR>1 ? $1 "_" $2 : "combined_header") }' \
    "!{input_csv}" > "full_template.csv"
    '''
}

template.view { it.text }

Results:

$ nextflow run file.nf 
N E X T F L O W  ~  version 20.10.0
Launching `file.nf` [wise_hamilton] - revision: b71ff1eb03
executor >  local (1)
[76/ddbb87] process > unique [100%] 1 of 1 ✔
column1,column2,column3,combined_header
1,2,3,1_2
a,b,c,a_b
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement