Skip to content
Advertisement

Using SMOTE on java raises Comparison method violates its general contract


I’m working on a project in java and I need to use Weka’s API. I use Maven to manage dependencies and, in particular, I have the following one:
JavaScript

In this version, the SMOTE class is not kept, but I really need it; that’s why I also added in my pom.xml the following dependency:

JavaScript

In my Java code, i also try to develop the WalkForward validation technique: I can prepare both training set and testing set for each step, so i can use them in a loop in which what I do is the following:

JavaScript

trainingDataset and testingDataset type is Instances and their value changes appropriately in each iteration. In the first iteration, no problem occurs, but in the second one the java.lang.IllegalArgumentException: Comparison method violates its general contract! is raised. The exception stack trace is:

JavaScript

Does anyone know how to solve the problem?
Thanks in advance.

EDIT: I forgot to say I’m using java 11.0.11.

EDIT 2: Based on the @fracpete answer, I deduce that the problem may be the sets creation. I state that I’m trying to predict bugginess of classes of another opensource project. Because of Walk Forward, I have 19 steps and should have 19 different training files and 19 testing files. To avoid this, I have a list of class InfoKeeper which keeps Instances for train and test for each step. During the creation of this array, i do the following:

  1. from the base ARFF file, i crete 2 temporary files: training test file keeping version 1 data, testing set file keeping version 2 data. Then I read these temp ARFFs to create the Instances class. These will be kept by InfoKeeper related on step 1.
  2. I append testing set file’s row (only data, of course) in the training set files, so that it will keep version 1 and version 2 data. Then I override the training file to let it keeps the version 3 data. I read these temp ARFFs to get the Instances that will be kept by InfoKeeper related on step 2.

The code iterates on step 2 to create all the remaining InfoKeeper. May this operation be the problem?

I also tried to use @frecpete snippet, but the same error occurs. The files I used are the following:
training set file
testing set file

EDIT 3: this is how I compute files:

JavaScript

The last public method is invoked inside a loop of another class, starting from 1 to 19:

JavaScript

EDIT 4: I removed duplicated instances from totalData in FilesCreator by doing the following:

JavaScript

I cannot manually modify it because it’s output of previous computation. The code works for iteration 2, but get the same error for iteration 3.
The files for this iteration are:
train_iteration4.arff
test_iteration4.arff
This is the very full arff file obtained by the previous snippet and it’s the one which is loaded by arffLoader.setSource(tmpFile);:
full.arff

Advertisement

Answer

I solved the problem changing the smote dependency in my pom.xml in:

JavaScript

In this version, I don’t have any problem and my code runs as expected. Hope this will help others.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement