This class dispatches columns from a Dataframe to Kafka clusters in Avro format.
This class dispatches columns from a Dataframe to Kafka clusters in Avro format.
The connection and dispatching of data are controlled by Spark.
The conversion to Avro is performed by ABRiS.
Specs:
1. The settings used to communicate with the Kafka cluster must be defined in a java.util.Properties file, whose address must
be informed as a parameter to the dispatcher method.
2. There are two mandatory parameters: 'kafka.bootstrap.servers' and 'topic'.
3. Optional settings (e.g. SSL credentials) must be prefixed with 'option.' (e.g. option.kafka.security.protocol=SSL).
4. The Dataframe must contain a schema, otherwise there is no way to convert it into an Avro record.
5. Spark operations (e.g. Dataframe caching) must be performed outside this component.
6. There will be no exception thrown if the properties file is invalid, or if properties are misnamed. This is a functional
requirement that nothing else breaks (a.k.a. the whole job) in case of errors in the Kafka configuration. HOWEVER, problems
related to Spark operations (e.g. wrong column names) are not guaranteed and may throw, since it would mean wrong assumptions
about the system on the user side, and as such, must be uncovered asap.
This class dispatches columns from a Dataframe to Kafka clusters in Avro format. The connection and dispatching of data are controlled by Spark. The conversion to Avro is performed by ABRiS.
Specs:
1. The settings used to communicate with the Kafka cluster must be defined in a java.util.Properties file, whose address must be informed as a parameter to the dispatcher method. 2. There are two mandatory parameters: 'kafka.bootstrap.servers' and 'topic'. 3. Optional settings (e.g. SSL credentials) must be prefixed with 'option.' (e.g. option.kafka.security.protocol=SSL). 4. The Dataframe must contain a schema, otherwise there is no way to convert it into an Avro record. 5. Spark operations (e.g. Dataframe caching) must be performed outside this component. 6. There will be no exception thrown if the properties file is invalid, or if properties are misnamed. This is a functional requirement that nothing else breaks (a.k.a. the whole job) in case of errors in the Kafka configuration. HOWEVER, problems related to Spark operations (e.g. wrong column names) are not guaranteed and may throw, since it would mean wrong assumptions about the system on the user side, and as such, must be uncovered asap.