Skip to content
Advertisement

Java. Extracting character from array that isn’t ASCII

I’m trying to extract a certain character from a buffer that isn’t ASCII. I’m reading in a file that contains movie names that have some non ASCII character sprinkled in it like so.

1|Tóy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Gét Shorty (1995)

I was able to pick off the lines that contained the non ASCII characters, but I’m trying to figure out how to get that particular character from the lines that have said non ASCII character and replace it with an ACSII character from the map I’ve made.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {

        HashMap<Character, Character>Char_Map = new HashMap<>();
        Char_Map.put('o','ó');
        Char_Map.put('e','é');
        Char_Map.put('i','ï');

        for(Map.Entry<Character,Character> entry: Char_Map.entrySet())
        {
            System.out.println(entry.getKey() + " -> "+ entry.getValue());
        }

        try
        {
            BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
            String contentLine= br.readLine();


            while(contentLine != null)
            {
                String[] contents = contentLine.split("\|");
                boolean result = contents[1].matches("\A\p{ASCII}*\z");

                if(!result)
                {
                    System.out.println(contentLine);

                    

                    //System.out.println();
                }

                contentLine= br.readLine();

            }
        }
        catch (IOException ioe)
        {
            System.out.println("Cannot open file as it doesn't exist");
        }
    }
}

I tried using something along the lines of:

if((contentLine.charAt(i) == something

But I’m not sure.

Advertisement

Answer

You can just use replaceAll. Put this in the while loop, so that it works on each line you read from the file. With this change, you won’t need the split and if (... matches) anymore.

contentLine.replaceAll("ó", "o");
contentLine.replaceAll("é", "e");
contentLine.replaceAll("ï", "i");

If you want to keep a map, just iterate over its keys and replace with the values you want to map to:

Map<String, String> map = new HashMap<>();
map.put("ó", "o");
// ... and all the others

Later, in your loop reading the contents, you replace all the characters:

for (Map.Entry<String, String> entry : map.entrySet())
{
    String oldChar = entry.getKey();
    String newChar = entry.getValue();
    contentLine = contentLine.replaceAll(oldChar, newChar);
}

Here is a complete example:

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) throws Exception {
        HashMap<String, String> nonAsciiToAscii = new HashMap<>();
        nonAsciiToAscii.put("ó", "o");
        nonAsciiToAscii.put("é", "e");
        nonAsciiToAscii.put("ï", "i");

        BufferedReader br = new BufferedReader(new FileReader("movie-names.txt"));
        String contentLine = br.readLine();
        while (contentLine != null)
        {
            for (Map.Entry<String, String> entry : nonAsciiToAscii.entrySet())
            {
                String oldChar = entry.getKey();
                String newChar = entry.getValue();
                contentLine = contentLine.replaceAll(oldChar, newChar);
            }

            System.out.println(contentLine); // or whatever else you want to do with the cleaned lines

            contentLine = br.readLine();
        }
    }
}

This prints:

robert:~$ javac Main.java && java Main
1|Toy Story (1995)
2|GoldenEye (1995)
3|Four Rooms (1995)
4|Get Shorty (1995)
robert:~$
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement