Fabrizio Lazzaretti

Talk: Bringing Avro and AsyncAPI Together - Pitfalls and Learnings

Abstract

AsyncAPI is the de facto standard for defining events in asynchronous communication, supporting various data formats including JSON, XML, and Avro. Apache Avro, a binary serialization format developed as a Hadoop sub-project, offers powerful schema evolution capabilities critical for event-driven architectures.

This presentation shares practical experiences and lessons learned from integrating Avro schemas within AsyncAPI specifications. We explore the challenges of combining these technologies, including handling optional fields, managing examples in documentation, dealing with separate versioning, and extracting schemas for use with schema registries. The talk addresses common pitfalls such as the differences between AsyncAPI’s “description” and Avro’s “doc” fields, version compatibility issues, and the complexities of schema linking versus embedding.

Text-based vs Binary Formats Comparison of text-based formats (like JSON) versus binary formats (like Avro) in data serialization

Key Takeaways

  • Avro has a clear evolution strategy: Understanding schema evolution is critical for maintaining long-term compatibility in event-driven systems, especially when implementing event sourcing where you need to read both new and old events

  • Avro falls back to default values when wrong formats are provided: This behavior can mask errors but also provides resilience in distributed systems - understanding when and how this happens is crucial for robust implementations

  • JSON usage offers grey zones in communication: While JSON is human-readable and great for debugging, the flexibility it provides can lead to ambiguity - binary formats like Avro provide stricter type safety

This talk was co-presented with Dr. Annegret Junker from Codecentric at APIdays London 2025 on September 23, 2025.