flink.12 序列化

一.元组(Tuples and Case Classes ) 对java来说Tuples是flink自带的一种类, 对于scala来说flink没有提供类似Tuples的类, 因为scala天生自带了一种特殊类 case class.
主要说说java版的Tuples, Java API 提供从Tuple1最高到Tuple25. 元组的每个字段都可以是任意 Flink 类型, 1 25这个数字的意思是参数的个数.
Tuple1 t1;
Tuple2 t2;
Tuple3 t3;
访问Tuple中的数据flink提供了便捷的方法,比如:tuple.getField(int position) 。字段索引从 0 开始, 或者tuple.f格式, f后面跟数字,也是从0开始.

比如 Tuple3 t3=new Tuple3 String,Integer>(“张三”, “男”,20)
如果要访问年龄有下面两种方法:
  1. t3.f2
  2. t3.getField(2)
case class WordCount(word: String, count: Int)val input = env.fromElements(WordCount("hello", 1),WordCount("world", 2)) // Case Class Data Set 2.java版
DataStream> wordCounts = env.fromElements(new Tuple2, Integer>("hello", 1),new Tuple2, Integer>("world", 2));wordCounts.map(new MapFunction, Integer>() {@Overridepublic Integer map(Tuple2, Integer> value) throws Exception {return value.f1;}}); 我们来看看java的Tuple,Tuple是一个实现了java序列化接口的一个顶层接口,Tuple2 Tupe3 …Tuple25是实现了Tuple接口的具体的类
Tuple接口实现了java的序列化接口,public abstract class Tuple implements java.io.Serializable
下面是Tuple2的源码:
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements.See the NOTICE file * distributed with this work for additional information * regarding copyright ownership.The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License.You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// --------------------------------------------------------------//THIS IS A GENERATED SOURCE FILE. DO NOT EDIT!//GENERATED FROM org.apache.flink.api.java.tuple.TupleGenerator.// --------------------------------------------------------------package org.apache.flink.api.java.tuple;import org.apache.flink.annotation.Public;import org.apache.flink.util.StringUtils;/** * A tuple with 2 fields. Tuples are strongly typed; each field may be of a separate type. The * fields of the tuple can be accessed directly as public fields (f0, f1, ...) or via their position * through the {@link #getField(int)} method. The tuple field positions start at zero. * * Tuples are mutable types, meaning that their fields can be re-assigned. This allows functions * that work with Tuples to reuse objects in order to reduce pressure on the garbage collector. * * Warning: If you subclass Tuple2, then be sure to either * *
    *
  • not add any new fields, or *
  • make it a POJO, and always declare the element type of your DataStreams/DataSets to your *descendant type. (That is, if you have a "class Foo extends Tuple2", then don't use *instances of Foo in a DataStream<Tuple2> / DataSet<Tuple2>, but declare it as *DataStream<Foo> / DataSet<Foo>.) *
* * @see Tuple * @param The type of field 0 * @param The type of field 1 */@Publicpublic class Tuple2 extends Tuple {private static final long serialVersionUID = 1L;/** Field 0 of the tuple. */public T0 f0;/** Field 1 of the tuple. */public T1 f1;/** Creates a new tuple where all fields are null. */public Tuple2() {}/*** Creates a new tuple and assigns the given values to the tuple's fields.** @param f0 The value for field 0* @param f1 The value for field 1*/public Tuple2(T0 f0, T1 f1) {this.f0 = f0;this.f1 = f1;}@Overridepublic int getArity() {return 2;}@Override@SuppressWarnings("unchecked")public T getField(int pos) {switch (pos) {case 0:return (T) this.f0;case 1:return (T) this.f1;default:throw new IndexOutOfBoundsException(String.valueOf(pos));}}@Override@SuppressWarnings("unchecked")public void setField(T value, int pos) {switch (pos) {case 0:this.f0 = (T0) value;break;case 1:this.f1 = (T1) value;break;default:throw new IndexOutOfBoundsException(String.valueOf(pos));}}/*** Sets new values to all fields of the tuple.** @param f0 The value for field 0* @param f1 The value for field 1*/public void setFields(T0 f0, T1 f1) {this.f0 = f0;this.f1 = f1;}/*** Returns a shallow copy of the tuple with swapped values.** @return shallow copy of the tuple with swapped values*/public Tuple2 swap() {return new Tuple2(f1, f0);}// -------------------------------------------------------------------------------------------------// standard utilities// -------------------------------------------------------------------------------------------------/*** Creates a string representation of the tuple in the form (f0, f1), where the individual* fields are the value returned by calling {@link Object#toString} on that field.** @return The string representation of the tuple.*/@Overridepublic String toString() {return "("+ StringUtils.arrayAwareToString(this.f0)+ ","+ StringUtils.arrayAwareToString(this.f1)+ ")";}/*** Deep equality for tuples by calling equals() on the tuple members.** @param o the object checked for equality* @return true if this is equal to o.*/@Overridepublic boolean equals(Object o) {if (this == o) {return true;}if (!(o instanceof Tuple2)) {return false;}@SuppressWarnings("rawtypes")Tuple2 tuple = (Tuple2) o;if (f0 != null ? !f0.equals(tuple.f0) : tuple.f0 != null) {return false;}if (f1 != null ? !f1.equals(tuple.f1) : tuple.f1 != null) {return false;}return true;}@Overridepublic int hashCode() {int result = f0 != null ? f0.hashCode() : 0;result = 31 * result + (f1 != null ? f1.hashCode() : 0);return result;}/*** Shallow tuple copy.** @return A new Tuple with the same fields as this.*/@Override@SuppressWarnings("unchecked")public Tuple2 copy() {return new Tuple2<>(this.f0, this.f1);}/*** Creates a new tuple and assigns the given values to the tuple's fields. This is more* convenient than using the constructor, because the compiler can infer the generic type* arguments implicitly. For example: {@code Tuple3.of(n, x, s)} instead of {@code new* Tuple3(n, x, s)}*/public static Tuple2 of(T0 f0, T1 f1) {return new Tuple2<>(f0, f1);}}
所以flink针对Tuple的序列化,底层还是用的java的序列化,并没有用其他的序列化框架.
二.java或者scala 遵循下述规范的类(POJOs ) 普通类有以下要求:
  1. 必须是public 类
  2. 必须有一个不带参数的默认构造函数
  3. 字段必须也是公共的,或者提供get/set方法
  4. 字段的类型必须被注册的序列化器支持
下面是例子代码:
public class WordWithCount {public String word;public int count;public WordWithCount() {}public WordWithCount(String word, int count) {this.word = word;this.count = count;}}DataStream wordCounts = env.fromElements(new WordWithCount("hello", 1),new WordWithCount("world", 2));wordCounts.keyBy(value -> value.word);下面是scalaclass WordWithCount(var word: String, var count: Int) {//无参辅助构造器def this() {this(null, -1)}}val input = env.fromElements(//下面这种是直接调用的主构造器,关于scala构造器请参考我的其他文章new WordWithCount("hello", 1),new WordWithCount("world", 2)) // Case Class Data Setinput.keyBy(_.word) 下面来说说工作原理:对于你自己定义的普通类,flink首先会对你的这个类做类的检测,比如针对第一条检测是否是public 修饰的类–>Modifier.isPublic([类].getModifiers()), 检测完了之后发现符合上述四条规则,那么就会对当前类调用PojoSerializer 序列化器进行封装,下面是继承关系:
public final class PojoSerializer extends TypeSerializer {…}
public abstract class TypeSerializer implements Serializable{…}
可以看出最后用的序列化还是java的序列化. TypeSerializer是一个顶层接口,基本上所有的序列化的类都是TypeSerializer的一种实现包括PojoSerializer,下面是一些实现了TypeSerializer的类.

如果检测不符合上述四条规则,那么flink默认的序列化器是上图中的:KryoSerializer ,这个序列化器就是用的 Kryo框架.打开KryoSerializer 类发现有下面的注释:
A type serializer that serializes its type using the Kryo serialization framework (https://github.com/EsotericSoftware/kryo).
This serializer is intended as a fallback serializer for the cases that are not covered by the basic types, tuples, and POJOs.
Type parameters:
– The type to be serialized.
public class KryoSerializer extends TypeSerializer {…代码省略}
三.原始类型(Primitive Types ) flink支持所有scala/java 的所有原始类型:Integer String Double
四.通用类(General Class Types) java/scala 不遵守二中所说的规范,那么scala会将此类按照统一的序列化标准进行序列化,这个序列化标准采用的序列化框架是Kryo
五.flink内置的Values类型 你需要实现org.apache.flink.types.Value 接口的 read 和write方法. 和