This is the complete example of using Protobuf on PySpark
- write the proto file
- compile the proto file, Spark uses a descriptor file
# brew install protobuf
sudo apt-get update
sudo apt-get install -y protobuf-compiler
protoc --version
protoc \
--proto_path=. \
--include_imports \
--descriptor_set_out=person.desc \
person.proto
you don't need pb2 to parse the data