Dispatches columns from a Dataframe to a Kafka broker in Avro format.
Dispatches columns from a Dataframe to a Kafka broker in Avro format. It will also send files to executors and genrate Avro schemas, if specified in the .properties configuration file. WILL NOT throw if something goes wrong, thus, it is necessary to check the logs to find errors.
This class dispatches columns from a Dataframe to Kafka clusters in Avro format. The connection and dispatching of data are controlled by Spark. The conversion to Avro is performed by ABRiS.
Specs:
1. The settings used to communicate with the Kafka cluster must be defined in a java.util.Properties file, whose address must be informed as a parameter to the dispatcher method. 2. There are two mandatory parameters: 'kafka.bootstrap.servers' and 'topic'. 3. Optional settings (e.g. SSL credentials) must be prefixed with 'option.' (e.g. option.kafka.security.protocol=SSL). 4. The Dataframe must contain a schema, otherwise there is no way to convert it into an Avro record. 5. Spark operations (e.g. Dataframe caching) must be performed outside this component. 6. There will be no exception thrown if the properties file is invalid, or if properties are misnamed. This is a functional requirement that nothing else breaks (a.k.a. the whole job) in case of errors in the Kafka configuration. HOWEVER, problems related to Spark operations (e.g. wrong column names) are not guaranteed and may throw, since it would mean wrong assumptions about the system on the user side, and as such, must be uncovered asap.