I was trying to construct an object of the MTree class (https://github.com/Waikato/moa/blob/master/moa/src/main/java/moa/clusterers/outliers/utils/mtree/MTree.java)
The constructor of MTree looks like this:
public MTree(DistanceFunction<? super DATA> distanceFunction, SplitFunction<DATA> splitFunction) { this(DEFAULT_MIN_NODE_CAPACITY, distanceFunction, splitFunction); }
The DistanceFunction here is an interface, the code of it is:
/** * An object that can calculate the distance between two data objects. * * @param <DATA> The type of the data objects. */ public interface DistanceFunction<DATA> { double calculate(DATA data1, DATA data2); }
And it’s implementation is:
import java.util.HashMap; import java.util.List; import java.util.Map; /** * Some pre-defined implementations of {@linkplain DistanceFunction distance * functions}. */ public final class DistanceFunctions { /** * Don't let anyone instantiate this class. */ private DistanceFunctions() {} /** * Creates a cached version of a {@linkplain DistanceFunction distance * function}. This method is used internally by {@link MTree} to create * a cached distance function to pass to the {@linkplain SplitFunction split * function}. * @param distanceFunction The distance function to create a cached version * of. * @return The cached distance function. */ public static <Data> DistanceFunction<Data> cached(final DistanceFunction<Data> distanceFunction) { return new DistanceFunction<Data>() { class Pair { Data data1; Data data2; public Pair(Data data1, Data data2) { this.data1 = data1; this.data2 = data2; } @Override public int hashCode() { return data1.hashCode() ^ data2.hashCode(); } @Override public boolean equals(Object arg0) { if(arg0 instanceof Pair) { Pair that = (Pair) arg0; return this.data1.equals(that.data1) && this.data2.equals(that.data2); } else { return false; } } } private final Map<Pair, Double> cache = new HashMap<Pair, Double>(); @Override public double calculate(Data data1, Data data2) { Pair pair1 = new Pair(data1, data2); Double distance = cache.get(pair1); if(distance != null) { return distance; } Pair pair2 = new Pair(data2, data1); distance = cache.get(pair2); if(distance != null) { return distance; } distance = distanceFunction.calculate(data1, data2); cache.put(pair1, distance); cache.put(pair2, distance); return distance; } }; } /** * An interface to represent coordinates in Euclidean spaces. * @see <a href="http://en.wikipedia.org/wiki/Euclidean_space">"Euclidean * Space" article at Wikipedia</a> */ public interface EuclideanCoordinate { /** * The number of dimensions. */ int dimensions(); /** * A method to access the {@code index}-th component of the coordinate. * * @param index The index of the component. Must be less than {@link * #dimensions()}. */ double get(int index); } /** * Calculates the distance between two {@linkplain EuclideanCoordinate * euclidean coordinates}. */ public static double euclidean(EuclideanCoordinate coord1, EuclideanCoordinate coord2) { int size = Math.min(coord1.dimensions(), coord2.dimensions()); double distance = 0; for(int i = 0; i < size; i++) { double diff = coord1.get(i) - coord2.get(i); distance += diff * diff; } distance = Math.sqrt(distance); return distance; } /** * A {@linkplain DistanceFunction distance function} object that calculates * the distance between two {@linkplain EuclideanCoordinate euclidean * coordinates}. */ public static final DistanceFunction<EuclideanCoordinate> EUCLIDEAN = new DistanceFunction<DistanceFunctions.EuclideanCoordinate>() { @Override public double calculate(EuclideanCoordinate coord1, EuclideanCoordinate coord2) { return DistanceFunctions.euclidean(coord1, coord2); } }; /** * A {@linkplain DistanceFunction distance function} object that calculates * the distance between two coordinates represented by {@linkplain * java.util.List lists} of {@link java.lang.Integer}s. */ public static final DistanceFunction<List<Integer>> EUCLIDEAN_INTEGER_LIST = new DistanceFunction<List<Integer>>() { @Override public double calculate(List<Integer> data1, List<Integer> data2) { class IntegerListEuclideanCoordinate implements EuclideanCoordinate { List<Integer> list; public IntegerListEuclideanCoordinate(List<Integer> list) { this.list = list; } @Override public int dimensions() { return list.size(); } @Override public double get(int index) { return list.get(index); } }; IntegerListEuclideanCoordinate coord1 = new IntegerListEuclideanCoordinate(data1); IntegerListEuclideanCoordinate coord2 = new IntegerListEuclideanCoordinate(data2); return DistanceFunctions.euclidean(coord1, coord2); } }; /** * A {@linkplain DistanceFunction distance function} object that calculates * the distance between two coordinates represented by {@linkplain * java.util.List lists} of {@link java.lang.Double}s. */ public static final DistanceFunction<List<Double>> EUCLIDEAN_DOUBLE_LIST = new DistanceFunction<List<Double>>() { @Override public double calculate(List<Double> data1, List<Double> data2) { class DoubleListEuclideanCoordinate implements EuclideanCoordinate { List<Double> list; public DoubleListEuclideanCoordinate(List<Double> list) { this.list = list; } @Override public int dimensions() { return list.size(); } @Override public double get(int index) { return list.get(index); } }; DoubleListEuclideanCoordinate coord1 = new DoubleListEuclideanCoordinate(data1); DoubleListEuclideanCoordinate coord2 = new DoubleListEuclideanCoordinate(data2); return DistanceFunctions.euclidean(coord1, coord2); } }; }
And my first question is what’s the meaning of return new DistanceFunction<Data>()
in the method public static <Data> DistanceFunction<Data> cached(final DistanceFunction<Data> distanceFunction)
[the method is in the class DistanceFunctions] I am just a beginner of Java and this one is a little bit hard for me to understand.
Also, to create an object of MTree, I should create an object of DistanceFunctions and an object of ComposedSplitFunction(Which is the implementation of SplitFunction interface) and input them as parameter for MTree constructor. But I really don’t know how to do that because in DistanceFunctions class, the constructor is private. So I cannot generate a parameter for the constructor of MTree. What should I do?
New Update: What I want to do is create a Junit Test for MTree, and I believe the first thing I need to do is create an object of MTree.
Advertisement
Answer
Interfaces can have multiple implementations. They just form the general contract implementations need to follow.
The cache
implementation here i.e. takes a DistanceFunction
as input and guarantees that distance values between A and B (or B and A) are only calculated once and thereafter served from the internal cache
map. The generic type of that cache
function just guarantees that you can literally pass any type to it. I.e. you could have an implementation that takes in its simplest form just two integers and calculates the difference of these like this:
DistanceFunction<Integer> func = (Integer a, Integer b) -> Math.abs(a - b);
which is a labmda expression which could be also written a bit more verbose like this
DistanceFunction<Integer> func = new DistanceFunction<Integer>() { @Override public double calculate(Integer data1, Integer data2) { return Math.abs(data1 - data2); } };
and then use it like that to cache the return value for the provided input parameters:
DistanceFunction<Integer> cache = DistanceFunctions.cached(func); double distance = cache.calculate(10, 5);
If you later on have a call like
distance = cache.calculate(10, 5);
again or even
distance = cache.calculate(5, 10);
The distance value in the above case is not recalculated but its value is returned from the internal cache
map as the distance for these parameters was already calculated before. This is especially benefitial if you have plenty of data points but only a limited number of combination of these and calculation is rather expensive.
If you further look into the DistanceFunctions
class you’ve provided you will see that it already provides some implementations for i.e. EUCLIDEAN
, EUCLIDEAN_INTEGER_LIST
and EUCLIDEAN_DOUBLE_LIST
implementations which due to their static final nature can be used as constant in your code directly. Here you just need to provide matching input arguments to the calculate(...)
method based on the implementation you’ve chosen.
In regards to Waikato’s MTree` initialization, a rough template may look like this:
MTree mTree = new MTree(EUCLIDEAN_INTEGER_LIST, new SplitFunction<List<Integer>>(...) { ... @Override public SplitResult<List<Integer>> process(Set<List<Integer>> dataSet, DistanceFunction<? super List<Integer>> distanceFunction) { Pair<List<Integer>> promoted = ... Pair<Set<List<Integer>>> partitions = ... return new SplitResult<List<Integer>>(promoted, partitions); } });
where the parts outlined by ...
need to be defined and implemented by you. The code in that package though provides i.e. a ComposedSplitFunction implementation already that requires PartitionFunction and PromotionFunction as inputs where implementations of these are already available in the PartitionFunctions and PromotionFunctions classes that just work the same way as the DistanceFunction
and DistanceFunctions
discussed here.