Json format

To use the JSON format you need to add the Flink JSON dependency to your project:

  1. <dependency>
  2. <groupId>org.apache.flink</groupId>
  3. <artifactId>flink-json</artifactId>
  4. <version>1.16.0</version>
  5. <scope>provided</scope>
  6. </dependency>

For PyFlink users, you could use it directly in your jobs.

Flink supports reading/writing JSON records via the JsonSerializationSchema/JsonDeserializationSchema. These utilize the Jackson library, and support any type that is supported by Jackson, including, but not limited to, POJOs and ObjectNode.

The JsonDeserializationSchema can be used with any connector that supports the DeserializationSchema.

For example, this is how you use it with a KafkaSource to deserialize a POJO:

  1. JsonDeserializationSchema<SomePojo> jsonFormat=new JsonDeserializationSchema<>(SomePojo.class);
  2. KafkaSource<SomePojo> source=
  3. KafkaSource.<SomePojo>builder()
  4. .setValueOnlyDeserializer(jsonFormat)
  5. ...

The JsonSerializationSchema can be used with any connector that supports the SerializationSchema.

For example, this is how you use it with a KafkaSink to serialize a POJO:

  1. JsonSerializationSchema<SomePojo> jsonFormat=new JsonSerializationSchema<>();
  2. KafkaSink<SomePojo> source =
  3. KafkaSink.<SomePojo>builder()
  4. .setRecordSerializer(
  5. new KafkaRecordSerializationSchemaBuilder<>()
  6. .setValueSerializationSchema(jsonFormat)
  7. ...

Custom Mapper

Both schemas have constructors that accept a SerializableSupplier<ObjectMapper>, acting a factory for object mappers. With this factory you gain full control over the created mapper, and can enable/disable various Jackson features or register modules to extend the set of supported types or add additional functionality.

  1. JsonSerializationSchema<SomeClass> jsonFormat=new JsonSerializationSchema<>(
  2. () -> new ObjectMapper()
  3. .enable(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS))
  4. .registerModule(new ParameterNamesModule());

Python

In PyFlink, JsonRowSerializationSchema and JsonRowDeserializationSchema are built-in support for Row type. For example to use it in KafkaSource and KafkaSink:

  1. row_type_info = Types.ROW_NAMED(['name', 'age'], [Types.STRING(), Types.INT()])
  2. json_format = JsonRowDeserializationSchema.builder().type_info(row_type_info).build()
  3. source = KafkaSource.builder() \
  4. .set_value_only_deserializer(json_format) \
  5. .build()
  1. row_type_info = Types.ROW_NAMED(['name', 'age'], [Types.STRING(), Types.INT()])
  2. json_format = JsonRowSerializationSchema.builder().with_type_info(row_type_info).build()
  3. sink = KafkaSink.builder() \
  4. .set_record_serializer(
  5. KafkaRecordSerializationSchema.builder()
  6. .set_topic('test')
  7. .set_value_serialization_schema(json_format)
  8. .build()
  9. ) \
  10. .build()