Skip to content
Advertisement

(Java) How do you check a string for two or more digit integers using .matches()?

The objective of this program is to return the atomic mass from a given formula (as a String), using a CSV file that contains elements and their atomic masses.

My code for this particular problem is this:

    double mass = 0.0;

    Map<String, Double> perTableMap = fileReader(periodicTable);

    Map<String, Integer> atomMap = getAtoms(formula);

    for (Map.Entry<String, Double> periEntry : perTableMap.entrySet()) {
        for (Map.Entry<String, Integer> atomEntry : atomMap.entrySet()) {
            if (atomEntry.getKey().equals(periEntry.getKey())) {
                mass += periEntry.getValue() * atomEntry.getValue();
            }
        }
    }

    return mass;

I have another method “fileReader” that takes the data from the mile and returns a map with the elements as the key and the mass as the value, this works fine.

This code also works well for “formulas” that have a single digit number of atoms, like “OCS”, “C4H4AsH”, and “Fe2O3”.

However, when the amount of atoms is a 2 or more digit number, it is only reading the first digit of that number. For example: with “RuSH2112”, it is supposed to be Ru1 + S1 + H2112, but instead the output is Ru1 + S1 + H2.

I believe there is something wrong in my method “getAtoms” where it says “!!! HERE !!!”, here is the code for that:

public static Map<String, Integer> getAtoms(String formula) {

    // LinkedHashMap stores in insertion order
    Map<String, Integer> newMap = new LinkedHashMap<>();

    for (int i = 0; i < formula.length(); i++) {
        int count = 0;

        // convert string to char
        char c = formula.charAt(i);

        // convert char to string
        String a = String.valueOf(c);

        // check formula for upper case values
        if (a.matches("[A-Z]")) {

            for (int j = i + 1; j < formula.length(); j++) {
                char d = formula.charAt(j);
                String b = String.valueOf(d);

                // check formula for lower case values
                if (b.matches("[a-z]")) {
                    a += b;
                    if (newMap.get(a) == null)
                        newMap.put(a, 1);
                    else
                        newMap.put(a, newMap.get(a) + 1);
                    count = 1;
                }

                // check formula for integer values (the end of each molecule)

                // !!! HERE !!!
                else if (b.matches("[\d]")) {
                    int k = Integer.parseInt(b);
                    newMap.put(a, k);
                    count = 1;
                }

                else {
                    i = j - 1;
                    break;
                }
            }

            // put values into a map
            if (count == 0) {
                if (newMap.get(a) == null)
                    newMap.put(a, 1);
                else
                    newMap.put(a, newMap.get(a) + 1);
            }
        }
    }
    return newMap;
}

Is there another way to say .matches(“[d]”)) because I think that only uses one digit numbers?

Advertisement

Answer

I’ve reimplemented your method getAtom(). The main change that has been done to it is that instead of processing a formula character by character it splits the formula into chunks that represent either uppercase letters, combinations of an uppercase and a lowercase letter, or number.

That’s the code for it:

String[] elementsAndIndices = formula.split("(?<=\p{Lower})(?=\p{Upper})|(?<=\p{Upper})(?=\p{Upper})|(?<=\D)(?=\d)|(?<=\d)(?=\D)");

Let’s examine what’s going on here:

  • d – a digit: [0-9];
  • D – a non-digit: [^0-9];
  • p{Lower} – a lower-case alphabetic character: [a-z];
  • p{Upper} – an upper-case alphabetic character:[A-Z];
  • p{Alpha} – an alphabetic character:[p{Lower}p{Upper}];

Special constructs that start with a question mark are called lookbehind (?<=\p{Lower}) and lookahead (?=\p{Upper}). Both match a zero-length string that allows to split the formula in a manner that I described above without losing any symbol (you can read more about them here).

Meanings of lookbehind and lookbehind combinations that are used to split the formula:

  • (?<=p{Lower})(?=p{Upper}) – matches a zero-length string between a lower-case character p{Lower} and an upper-case character p{Upper};

  • (?<=p{Upper})(?=p{Upper}) – matches a zero-length string on the border between two upper-case characters p{Upper};

  • (?<=D)(?=d) – matches a zero-length string between a non-digit D and a digit d;

  • (?<=d)(?=D) – matches a zero-length string between a digit d and a non-digit D.

Method merge() is used to update the map. It takes three arguments: key, value and remapping function. If the given key is absent (or this key is associated with null value) then this key will be associated with the given value. The remapping function, which is used to merge two values will be evaluated only if the provided key is already present and the value for it is not null.

elements.merge(elementsAndIndices[i], 1, Integer::sum);

The line of code shown above associates value 1 with a key elementsAndIndices[i] if it’s not present in the map otherwise, it’ll merge the existing value with 1 producing a sum. Method reference Integer::sum is an equivalent of lambda (val1, val2) -> val1 + val2;.

    public static Map<String, Integer> getAtoms(String formula) {
        Map<String, Integer> elements = new LinkedHashMap<>();
        String[] elementsAndIndices = 
                formula.split("(?<=\p{Lower})(?=\p{Upper})|(?<=\p{Upper})(?=\p{Upper})|(?<=\D)(?=\d)|(?<=\d)(?=\D)");

        for (int i = 0; i < elementsAndIndices.length; i++) {
            if (elementsAndIndices[i].matches("\p{Alpha}+")) {
                if (i == elementsAndIndices.length - 1 || !elementsAndIndices[i + 1].matches("\d+")) {
                    elements.merge(elementsAndIndices[i], 1, Integer::sum);
                } else {
                    elements.merge(elementsAndIndices[i], Integer.parseInt(elementsAndIndices[i + 1]), Integer::sum);
                    i++;
                }
            }
        }
        return elements;
    }
    public static void main(String[] args) {
        System.out.println(getAtoms("C6H12O6"));
        System.out.println(getAtoms("NH3"));
        System.out.println(getAtoms("RuSH2112"));
    }

output

{C=6, H=12, O=6}
{N=1, H=3}
{Ru=1, S=1, H=2112}
Advertisement