Skip to content
Advertisement

Deeplearning4j – how to iterate multiple DataSets for large data?

I’m studying Deeplearning4j (ver. 1.0.0-M1.1) for building neural networks.

I use IrisClassifier from Deeplearning4j as an example, it works fine:

JavaScript

For my project, I have inputs ~30000 records (in iris example – 150). Each record is a vector size ~7000 (in iris example – 4).

Obviously, I can’t process the whole data in one DataSet – in will produce OOM for JVM.

How I can process data in multiple DataSets?

I assume it should be something like this (store DataSets in List and iterate):

JavaScript

But when I start evaluation, I got this error:

JavaScript

JavaScript

What is correct way to iterate miltiple DataSet for learning network?

Thanx!

Advertisement

Answer

Firstly, always use Nd4j.create(..) for ndarrays. Never use the implementation. That allows you to safely create ndarrays that will work whether you use cpus or gpus.

2nd: Always use the RecordReaderDataSetIterator’s builder rather than the constructor. It’s very long and error prone.

That is why we made the builder in the first place.

Your NullPointer actually isn’t coming from where you think it is. it’s due to how you’re creating the ndarray. There’s no data type or anything so it can’t know what to expect. Nd4j.create(..) will properly setup the ndarray for you.

Beyond that you are doing things the right way. The record reader handles the batching for you.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement