I have to convert a string of length k with 4 possible characters – {A, C, G, T} – into an integer between 0 and 4^k. The advice is to convert the {A, C, G, T} into {0, 1, 2, 3} respectively, but I do not know how to convert those numbers into a number between 0 and 4^k. For example, if the string “ACT” is given, I have to convert that to a number between 0 and 64.
Advertisement
Answer
You can do it like this:
char[] chars=yourString.toCharArray(); int result=0; for(char c:chars){ result*=4; switch(c){ case 'A': result+=0; break; case 'C': result+=1; break; case 'G': result+=2; break; case 'T': result+=3; break; } }
This takes every character and adds a value from 0 to 3 to the result (depending on the character).
After that, the value is multiplied with 4 in order to leace space for the next value.
Note that this is not hashing because it can be reversed easily.
A one-line-version of the code would be:
Integer.parseInt(yourString.replace('A','0').replace('C','1').replace('G','2').replace('T','3'),4);
This replaces A/C/G/T to 0/1/2/3 and reads it as a base 4 number.
You can also get the original String from the converted int:
int intVal;//whatever it is StringBuilder sb=new StringBuilder(); while(intVal!=0){ switch(intVal%4){ case 0: sb.append('A'); break; case 1: sb.append('C'); break; case 2: sb.append('G'); break; case 3: sb.append('T'); break; } intVal=intVal/4; } String result=intVal.reverse().toString();
This gets each digit one after another and adds the corresponding value to the StringBuilder
. Because it starts with the last digit, a reversal is needed.
It is also possible to create a one-liner for this:
Integer.toString(intVal,4).replace('0','A').replace('1',C').replace('2','G').replace('3','T');
Note that you might want to use long
/BigInteger
for longer sequences as you would reach the integer limit for those.
Since int
has 32 bits of data, you could use sequences up to 16 characters. With long
, you could have sequences up to 32 characters and with BigInteger
, you would likely reach the memory limit of JVM with your sequence string or the char[]
used in the calculation before the limit of BigInteger
becomes a problem (the limit of BigInteger
is 2 to the power of Integer.MAX_VALUE
).