public class SparkComputationGraph extends Object implements Serializable
| Modifier and Type | Field and Description |
|---|---|
static String |
ACCUM_GRADIENT |
static String |
AVERAGE_EACH_ITERATION |
static int |
DEFAULT_EVAL_SCORE_BATCH_SIZE |
static String |
DIVIDE_ACCUM_GRADIENT |
| Constructor and Description |
|---|
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
ComputationGraph network) |
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
ComputationGraphConfiguration conf) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraph network)
Instantiate a ComputationGraph instance with the given context and network.
|
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraphConfiguration conf) |
| Modifier and Type | Method and Description |
|---|---|
double |
calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average) |
double |
calculateScoreDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average) |
ComputationGraph |
fit(String path,
int labelIndex,
org.canova.api.records.reader.RecordReader recordReader,
int examplesPerFit,
int totalExamples,
int numPartitions)
Train a ComputationGraph network based on data loaded from a text file +
RecordReader. |
ComputationGraph |
fitDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd,
int examplesPerFit,
int totalExamples,
int numPartitions)
DataSet version of
fitMultiDataSet(JavaRDD, int, int, int). |
ComputationGraph |
fitDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
Fit the dataset rdd
|
ComputationGraph |
fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd,
int examplesPerFit,
int totalExamples,
int numPartitions)
Fit the data, splitting into smaller data subsets if necessary.
|
ComputationGraph |
getNetwork() |
double |
getScore()
Gets the last (average) minibatch score from calling fit
|
protected void |
invokeListeners(ComputationGraph network,
int iteration) |
protected void |
runIteration(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd) |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE. |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE. |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamplesDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaPairRDD, boolean) |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamplesDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean,int) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaRDD, boolean) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
void |
setListeners(Collection<IterationListener> listeners)
This method allows you to specify IterationListeners for this model.
|
void |
setNetwork(ComputationGraph network) |
public static final int DEFAULT_EVAL_SCORE_BATCH_SIZE
public static final String AVERAGE_EACH_ITERATION
public static final String ACCUM_GRADIENT
public static final String DIVIDE_ACCUM_GRADIENT
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraph network)
sparkContext - the spark context to usenetwork - the network to usepublic SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
ComputationGraph network)
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraphConfiguration conf)
public SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
ComputationGraphConfiguration conf)
public ComputationGraph fit(String path, int labelIndex, org.canova.api.records.reader.RecordReader recordReader, int examplesPerFit, int totalExamples, int numPartitions)
RecordReader.
This method splits the data into approximately examplesPerFit sized splits, and trains on each split.
one after the other. See fitDataSet(JavaRDD, int, int, int) for further details.path - the path to the text filelabelIndex - the label indexrecordReader - the record reader to parse resultsexamplesPerFit - Number of examples to fit on at each iterationtotalExamples - total number of examplesnumPartitions - Number of partitions. Usually set to number of executorsMultiLayerNetworkpublic ComputationGraph getNetwork()
public void setNetwork(ComputationGraph network)
public ComputationGraph fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd, int examplesPerFit, int totalExamples, int numPartitions)
JavaRDD<DataSet>s)
to be trained as a set of smaller steps instead of all together.examplesPerFit examples -> average parameters -> train on examplesPerFit -> average
parameters etc until entire data set has been processedexamplesPerFit=1000, with rdd.count()=1200. Then, we round up to 2000 examples, and the
network will then be fit in two steps (as 2000/1000=2), with 1200/2=600 examples at each step. These 600 examples
will then be distributed approximately equally (no guarantees) amongst each executor/core for training.rdd - Data to train onexamplesPerFit - Number of examples to learn on (between averaging) across all executors. For example, if set to
1000 and rdd.count() == 10k, then we do 10 sets of learning, each on 1000 examples.
To use all examples, set maxExamplesPerFit to Integer.MAX_VALUEtotalExamples - total number of examples in the data RDDnumPartitions - number of partitions to divide the data in topublic ComputationGraph fitDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd, int examplesPerFit, int totalExamples, int numPartitions)
fitMultiDataSet(JavaRDD, int, int, int).
Handles conversion from DataSet to MultiDataSet internally.public ComputationGraph fitDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
rdd - the rdd to fitDataSetprotected void runIteration(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
public void setListeners(@NonNull
Collection<IterationListener> listeners)
listeners - protected void invokeListeners(ComputationGraph network, int iteration)
public double getScore()
public double calculateScoreDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average)
public double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
scoreExamples(JavaRDD, boolean)public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
scoreExamples(JavaPairRDD, boolean, int)public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamplesDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms)
scoreExamples(JavaPairRDD, boolean)public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamplesDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms, int batchSize)
scoreExamples(JavaPairRDD, boolean,int)public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE. Unlike calculateScore(JavaRDD, boolean),
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean) or scoreExamples(JavaPairRDD, boolean, int) which can have
a key for each example.data - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)ComputationGraph.scoreExamples(MultiDataSet, boolean)public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
calculateScore(JavaRDD, boolean),
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean) or scoreExamples(JavaPairRDD, boolean, int) which can have
a key for each example.data - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)batchSize - Batch size to use when doing scoringComputationGraph.scoreExamples(MultiDataSet, boolean)public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE. Unlike calculateScore(JavaRDD, boolean),
this method returns a score for each example separatelyK - Key typedata - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double> containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)
calculateScore(JavaRDD, boolean),
this method returns a score for each example separatelyK - Key typedata - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double> containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)Copyright © 2016. All Rights Reserved.