I’m working on a project in java and I need to use Weka’s API. I use Maven to manage dependencies and, in particular, I have the following one:
<dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>weka-stable</artifactId> <version>3.8.5</version> </dependency>
In this version, the SMOTE class is not kept, but I really need it; that’s why I also added in my pom.xml
the following dependency:
<dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>SMOTE</artifactId> <version>1.0.2</version> </dependency>
In my Java code, i also try to develop the WalkForward
validation technique: I can prepare both training set and testing set for each step, so i can use them in a loop in which what I do is the following:
for (...){ var filtered = new FilteredClassifier(); var smote = new SMOTE(); filtered.setFilter(smote); filtered.setClassifier(new NaiveBayes()); filtered.buildClassifier(trainingDataset); var currEvaluation = new Evaluation(testingDataset); currEvaluation.evaluateModel(filtered, testingDataset); }
trainingDataset
and testingDataset
type is Instances
and their value changes appropriately in each iteration. In the first iteration, no problem occurs, but in the second one the java.lang.IllegalArgumentException: Comparison method violates its general contract!
is raised. The exception stack trace is:
java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.base/java.util.TimSort.mergeLo(TimSort.java:781) at java.base/java.util.TimSort.mergeAt(TimSort.java:518) at java.base/java.util.TimSort.mergeCollapse(TimSort.java:448) at java.base/java.util.TimSort.sort(TimSort.java:245) at java.base/java.util.Arrays.sort(Arrays.java:1441) at java.base/java.util.List.sort(List.java:506) at java.base/java.util.Collections.sort(Collections.java:179) at weka.filters.supervised.instance.SMOTE.doSMOTE(SMOTE.java:637) at weka.filters.supervised.instance.SMOTE.batchFinished(SMOTE.java:489) at weka.filters.Filter.useFilter(Filter.java:708) at weka.classifiers.meta.FilteredClassifier.setUp(FilteredClassifier.java:719) at weka.classifiers.meta.FilteredClassifier.buildClassifier(FilteredClassifier.java:794)
Does anyone know how to solve the problem?
Thanks in advance.
EDIT: I forgot to say I’m using java 11.0.11
.
EDIT 2: Based on the @fracpete answer, I deduce that the problem may be the sets creation. I state that I’m trying to predict bugginess of classes of another opensource project. Because of Walk Forward
, I have 19 steps and should have 19 different training files and 19 testing files. To avoid this, I have a list of class InfoKeeper
which keeps Instances for train and test for each step. During the creation of this array, i do the following:
- from the base ARFF file, i crete 2 temporary files: training test file keeping version 1 data, testing set file keeping version 2 data. Then I read these temp ARFFs to create the Instances class. These will be kept by
InfoKeeper
related on step 1. - I append testing set file’s row (only data, of course) in the training set files, so that it will keep version 1 and version 2 data. Then I override the training file to let it keeps the version 3 data. I read these temp ARFFs to get the Instances that will be kept by
InfoKeeper
related on step 2.
The code iterates on step 2 to create all the remaining InfoKeeper
. May this operation be the problem?
I also tried to use @frecpete snippet, but the same error occurs. The files I used are the following:
training set file
testing set file
EDIT 3: this is how I compute files:
public class FilesCreator { private File basicArff; private Instances totalData; private ArrayList<Instance> testingInstances; private File testingSet; private File trainingSet; /* *******************************************************************/ public FilesCreator(File csvFile, File arffFile, File training, File testing) throws IOException { var loader = new CSVLoader(); loader.setSource(csvFile); this.totalData = loader.getDataSet(); // get instances object this.basicArff = arffFile; this.testingSet = testing; this.trainingSet = training; } private ArrayList<Attribute> getAttributesList(){ var attributes = new ArrayList<Attribute>(); int i; for (i = 0; i < this.totalData.numAttributes(); i++) attributes.add(this.totalData.attribute(i)); return attributes; } private void writeHeader(PrintWriter pf) { // just write the attributes in the given file. // f is either this.testingSet or this.trainingSet pf.append("@relation " + this.totalData.relationName() + "nn"); pf.flush(); var attributes = this.getAttributesList(); for (Attribute line : attributes){ pf.append(line.toString() + "n"); pf.flush(); } pf.append("n@datan"); pf.flush(); } /* *******************************************************************/ /* testing file */ // testing instances private void computeTestingSet(int indexRelease){ int i; int currIndex; // re-initialize the list this.testingInstances = new ArrayList<>(); for (i = 0; i < this.totalData.numInstances(); i++){ // first attribute is the release index currIndex = (int) this.totalData.instance(i).value(0); if (currIndex == indexRelease) testingInstances.add(this.totalData.instance(i)); else if (currIndex > indexRelease) break; } } // testing file private void computeTestingFile(int indexRelease){ this.computeTestingSet(indexRelease); try(var fp = new PrintWriter(this.testingSet)) { this.writeHeader(fp); for (Instance line : this.testingInstances){ fp.append(line.toString() + "n"); fp.flush(); } } catch (IOException e) { var logger = Logger.getLogger(FilesCreator.class.getName()); logger.log(Level.OFF, Arrays.toString(e.getStackTrace())); } } /* *******************************************************************/ // training file private void computeTrainingFile(int indexRelease){ int i; try(var fw = new FileWriter(this.trainingSet, true); var fp = new PrintWriter(fw)) { if (indexRelease == 1) { // first iteration: need the header. fp.print(""); fp.flush(); this.writeHeader(fp); for (i = 0; i < this.totalData.numInstances(); i++) { if ( (int) this.totalData.instance(i).value(0) > indexRelease) break; fp.append(this.totalData.instance(i).toString() + "n"); fp.flush(); } } else { // in this case just append the testing instances, which // are the indexReleas+1-th data: for (Instance obj : this.testingInstances){ fp.append(obj.toString() + "n"); fp.flush(); } } } catch (IOException e) { var logger = Logger.getLogger(FilesCreator.class.getName()); logger.log(Level.OFF, Arrays.toString(e.getStackTrace())); } } /* *******************************************************************/ // public method public void computeFiles(int indexRelease){ this.computeTrainingFile(indexRelease); this.computeTestingFile(indexRelease + 1); } }
The last public method is invoked inside a loop of another class, starting from 1 to 19:
FilesCreator filesCreator = new FilesCreator(csvFile, arffFile, training, testing); for (i = 1; i < 20; i++) { filesCreator.computeFiles(i); /* do something with files, such as getting Instances and use them for SMOTE computation */ }
EDIT 4: I removed duplicated instances from totalData
in FilesCreator
by doing the following:
var currDir = Paths.get(".").toAbsolutePath().normalize().toFile(); var ext = ".arff"; var tmpFile = File.createTempFile("without_replicated", ext, currDir); RemoveDuplicates.main(new String[]{"-i", this.basicArff.toPath().toString(), "-o", tmpFile.toPath().toString()}); // output file has effective 0 instances repetitions var arffLoader = new ArffLoader(); arffLoader.setSource(tmpFile); this.totalData = arffLoader.getDataSet(); Files.delete(tmpFile.toPath());
I cannot manually modify it because it’s output of previous computation. The code works for iteration 2
, but get the same error for iteration 3
.
The files for this iteration are:
train_iteration4.arff
test_iteration4.arff
This is the very full arff file obtained by the previous snippet and it’s the one which is loaded by arffLoader.setSource(tmpFile);
:
full.arff
Advertisement
Answer
I solved the problem changing the smote dependency in my pom.xml
in:
<dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>SMOTE</artifactId> <version>1.0.3</version> </dependency>
In this version, I don’t have any problem and my code runs as expected. Hope this will help others.