I’m struggling to find the best way to merge arrays (or create new ones) by looking at their shared value.
List<String[]> dictionary = new ArrayList<String[]>();
this is my “dictionary” filled with arrays of 2 words, for example it contains arrays:
["A","B"] ["B","C"] ["D","E"] ["F","C"] ["G","H"] ["T","D"]
I need to merge them by values they share, so for example the finished “dictionary” (or completely new list) would look like this:
["A","B","C","F"]; ["D","E","T"]; ["G","H"];
Also, the old arrays don’t have to be removed they can stay in “dictionary” but I need the merged ones and I have hard time figuring it out.
Arrays don’t have to be sorted at anyhow.
This is what i have so far and it is not working
public static void SynonymsMerge(List<String[]> dictionary){ ArrayList<ArrayList<String>> newDictionary = new ArrayList<ArrayList<String>>(); for(int i=0;i < dictionary.size(); i++){ ArrayList<String> synonyms = new ArrayList<String>(); for(int j=0; j < dictionary.get(i).length; j++){ synonyms.add(dictionary.get(i)[j]); } newDictionary.add(synonyms); } for(int i=0;i< newDictionary.size();i++){ for(int j=0; j < newDictionary.size();j++){ for (int k=0; k < newDictionary.get(j).size() ;k++) { if (newDictionary.get(i).equals(newDictionary.get(j))) continue; if (newDictionary.get(i).contains(newDictionary.get(j).get(k))) newDictionary.get(i).addAll(newDictionary.get(j));
Advertisement
Answer
First of all, here is the code. I changed the input type from List<String[]>
to List<List<String>>
as it does not really make sense to mix up both Lists and Arrays. This also applies to the output type.
The code
public static List<List<String>> merge(List<List<String>> dictionary) { List<List<String>> newDictionary = new ArrayList<>(); for (List<String> stringPair : dictionary) { List<Integer> matchIndices = new ArrayList<>(); for (int i = 0; i < newDictionary.size(); i++) { List<String> newStrings = newDictionary.get(i); for (String str : stringPair) { if (newStrings.contains(str)) { matchIndices.add(i); } } } if (matchIndices.size() == 0) { newDictionary.addAll(new ArrayList<List<String>>(Collections.singleton(new ArrayList<>(stringPair)))); continue; } matchIndices.sort(Integer::compareTo); if (matchIndices.size() == 1) { newDictionary.get(matchIndices.get(0)).addAll(new ArrayList<>(stringPair)); } else { int last = matchIndices.remove(0); while (matchIndices.size() > 0) { int i = matchIndices.get(0); newDictionary.get(last).addAll(newDictionary.get(i)); newDictionary.remove(i); matchIndices.remove(0); matchIndices = new ArrayList<>(matchIndices.stream().map(a -> a - 1).toList()); } } } newDictionary = newDictionary.stream() .map(strings -> strings.stream().distinct().toList()) .toList(); return newDictionary; }
How does it work?
dictionary
the input of typeList<List<String>>
(inner List has max size of 2, even though the function would work with even more strings in theory)newDictionary
the output of the function of typeList<List<String>>
The following code is executed for every input pair/List of strings in directory
- Get all the existing different “groups” (their indicies) in
newDictionary
in which the strings from the par are already present. This List of indices is calledmatchIndices
Example:stringPair
=[“A”,”E”]newDictionary
:[[“I”, “A”, “O”], [“P”, “D”]] would result inmatchIndices
=[0] because only “A” is present one time in the first element ofnewDictionary
- If
matchIndices.size()
is 0, create a new group innewDictionary
with the string pair. Back to 1. - If
matchIndices.size()
is 1, append the strings from the pair to the specificnewDictionary
group with the index specified inmatchIndices
. Back to 1. - If
matchIndices.size()
is greater than 1, that means that multiple groups fromnewDictionary
with the indices specified inmatchIndices
will have to be merged together in thefor
-loop. Back to 1.
In the end we have to make sure there are no duplicates in the Lists in newDictionary
.
Main method
public static void main(String[] args) { List<List<String>> dictionary = new ArrayList<>(List.of( List.of("A", "B"), List.of("B", "C"), List.of("D", "E"), List.of("F", "C"), List.of("G", "H"), List.of("T", "D"))); System.out.println(merge(dictionary)); }
Why do we need step 4?
In your specific example we don’t have to merge multiple groups.
But with input data like this
List<List<String>> dictionary = new ArrayList<>(List.of( List.of("A", "B"), List.of("B", "C"), List.of("D", "E"), List.of("F", "E"), List.of("E", "A")));
we eventually come to the point where newDictionary=[[A, B, B, C], [D, E, F, E]]
and we have to try to insert [E, A]
. Here both groups from newDictionary
will have to be merged together.
This then results in the output of [[A, B, C, D, E, F]]
, where both groups are merged and removed duplicates.
P.s.
I am not really happy with this solution as it is not really clear what is actually going on, but I’m still posting this, because you said you would be happy with any solution. 🙂