Skip to content

Comments

feat: Add centralized protobuf conversion system#5998

Draft
franciscojavierarceo wants to merge 5 commits intomasterfrom
feat/refactor-protobuf-conversion-system
Draft

feat: Add centralized protobuf conversion system#5998
franciscojavierarceo wants to merge 5 commits intomasterfrom
feat/refactor-protobuf-conversion-system

Conversation

@franciscojavierarceo
Copy link
Member

Summary

  • Introduces a new feast.proto_conversion module that provides a centralized, type-safe system for converting between Python objects and their protobuf representations
  • Replaces scattered conversion logic throughout the codebase with a unified, maintainable approach
  • Provides comprehensive unit tests (41 tests) for the new conversion system

Key Components

Core Architecture

  • ProtoConverter: Generic abstract base class for bidirectional proto conversion
  • ConverterRegistry: Centralized registry for all converters with type-safe lookup
  • ProtoSerializable: Mixin class for standardized to_proto()/from_proto() methods

Converters

  • ValueTypeConverter: Consolidated value conversion with type-specific handlers (scalars, lists, sets, maps)
  • DataFrameProtoConverter: Unified Arrow/DataFrame to proto conversion
  • EntityConverter: Entity <-> EntityProto conversion
  • FeatureViewConverter: FeatureView family conversions
  • FeatureServiceConverter: FeatureService <-> FeatureServiceProto conversion

Benefits

  • Eliminates code duplication (80% reduction in shared logic between _convert_arrow_* functions)
  • Reduces cyclomatic complexity (from 25+ branches to <10 per function)
  • Replaces duck-typing with proper isinstance() checks
  • Provides consistent error handling across all converters
  • Enables type-safe extension for custom converters
  • Maintains full backward compatibility with existing APIs via compat module

Test plan

  • All 41 unit tests pass
  • Tests cover value type conversion for all supported types
  • Tests verify Entity converter roundtrip
  • Tests validate error handling and edge cases
  • Tests confirm backward compatibility with type_map.py

🤖 Generated with Claude Code

franciscojavierarceo and others added 3 commits February 20, 2026 21:08
This commit introduces a new proto_conversion module that provides a
centralized, type-safe system for converting between Python objects
and their protobuf representations.

Key improvements:
- ProtoConverter: Generic abstract base class for all converters
- ConverterRegistry: Centralized registry with type-safe lookup
- ValueTypeConverter: Consolidated value conversion with type handlers
- DataFrameProtoConverter: Unified Arrow/DataFrame conversion
- Object converters: Entity, FeatureView, FeatureService converters
- Exception hierarchy: Clear, typed exceptions for error handling
- Backward compatibility: compat module for gradual migration

Benefits:
- Eliminates code duplication (80% reduction in shared logic)
- Reduces cyclomatic complexity (from 25+ branches to <10)
- Replaces duck-typing with proper isinstance() checks
- Provides consistent error handling across all converters
- Enables type-safe extension for custom converters
- Maintains full backward compatibility with existing APIs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TYPE_CHECKING import for ProtoValue in converter.py
- Create separate TypeVar L for LegacyProtoSerializable methods
- Use cast() for return types in feature_view_converter.py
- Fix timestamp return type annotation in dataframe_converter.py
- Extract pa_type to variable before using in null_column creation
- Remove unused Any import from serializable.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new centralized protobuf conversion system for Feast, consolidating scattered conversion logic into a unified, type-safe architecture. The system provides abstract base classes (ProtoConverter, ValueConverter), a centralized registry, and specific converters for Entity, FeatureView, FeatureService, and value types.

Changes:

  • Adds new feast.proto_conversion module with core interfaces, registry, and error hierarchy
  • Implements converters for Entity, FeatureView, OnDemandFeatureView, FeatureService, DataFrame/Arrow, and value types
  • Provides backward compatibility layer through compat module
  • Includes comprehensive test suite with 41 unit tests

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
sdk/python/feast/proto_conversion/init.py Module initialization exporting public API
sdk/python/feast/proto_conversion/converter.py Abstract base classes ProtoConverter and ValueConverter
sdk/python/feast/proto_conversion/registry.py Thread-safe singleton registry for converter lookup
sdk/python/feast/proto_conversion/serializable.py Mixin classes for proto serialization support
sdk/python/feast/proto_conversion/errors.py Exception hierarchy for conversion errors
sdk/python/feast/proto_conversion/registration.py Converter registration utilities
sdk/python/feast/proto_conversion/compat.py Backward compatibility shims
sdk/python/feast/proto_conversion/converters/init.py Converter exports
sdk/python/feast/proto_conversion/converters/entity_converter.py Entity <-> EntityProto bidirectional conversion
sdk/python/feast/proto_conversion/converters/feature_view_converter.py FeatureView and OnDemandFeatureView converters
sdk/python/feast/proto_conversion/converters/feature_service_converter.py FeatureService <-> FeatureServiceProto conversion
sdk/python/feast/proto_conversion/converters/value_converter.py Consolidated value type conversion with type handlers
sdk/python/feast/proto_conversion/converters/dataframe_converter.py Arrow/DataFrame to proto conversion using strategy pattern
sdk/python/tests/unit/test_proto_conversion.py Comprehensive test suite with 41 tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +358 to +362
if isinstance(table, pyarrow.Table):
batches = table.to_batches()
if not batches:
raise ArrowConversionError("Empty table provided")
return batches[0]
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential data loss: when converting a pyarrow.Table with multiple batches, only the first batch is used (line 362). If the table contains multiple batches, the data from subsequent batches will be silently dropped. This should either iterate over all batches or raise an error/warning if multiple batches are present.

Copilot uses AI. Check for mistakes.
result = handler.from_proto_value(proto)
if result is not None:
return result
except Exception:
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent exception swallowing in deserialization. The catch-all except Exception: continue at line 730-731 silently swallows all exceptions from handlers, making debugging difficult. At minimum, these exceptions should be logged. Additionally, checking if result is not None at line 728 means legitimate None values can't be distinguished from handler failures, which could lead to incorrect fallback behavior.

Suggested change
except Exception:
except Exception:
logger.exception(
"Error in %s.from_proto_value while deserializing proto %r; "
"falling back to next handler",
type(handler).__name__,
proto,
)

Copilot uses AI. Check for mistakes.
Comment on lines 123 to 167
def safe_to_proto(self, obj: T) -> P:
"""
Safely convert to proto with standardized error handling.

This wrapper catches any exceptions during conversion and
re-raises them as SerializationError for consistent handling.

Args:
obj: The Python object to convert.

Returns:
The protobuf message representation.

Raises:
SerializationError: If conversion fails for any reason.
"""
try:
return self.to_proto(obj)
except SerializationError:
raise
except Exception as e:
raise SerializationError(obj, self.proto_type, cause=e) from e

def safe_from_proto(self, proto: P) -> T:
"""
Safely convert from proto with standardized error handling.

This wrapper catches any exceptions during conversion and
re-raises them as DeserializationError for consistent handling.

Args:
proto: The protobuf message to convert.

Returns:
The Python object representation.

Raises:
DeserializationError: If conversion fails for any reason.
"""
try:
return self.from_proto(proto)
except DeserializationError:
raise
except Exception as e:
raise DeserializationError(proto, self.supported_type, cause=e) from e
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate_object() and validate_proto() methods are defined but never called. The safe_to_proto() and safe_from_proto() methods should call these validation methods before attempting conversion to catch issues early. Currently validation is only useful if called manually by subclasses.

Copilot uses AI. Check for mistakes.
Comment on lines 48 to 71
global _initialized

if _initialized:
return

if registry is None:
registry = get_registry()

# Import converters here to avoid circular imports
from feast.proto_conversion.converters import (
EntityConverter,
FeatureServiceConverter,
FeatureViewConverter,
OnDemandFeatureViewConverter,
)

# Register object converters
registry.register(EntityConverter())
registry.register(FeatureViewConverter())
registry.register(OnDemandFeatureViewConverter())
registry.register(FeatureServiceConverter())

logger.debug("Registered all proto converters")
_initialized = True
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition in registration. The global _initialized flag is not thread-safe. If multiple threads call register_all_converters() or ensure_converters_registered() simultaneously, they could both see _initialized as False and attempt to register converters concurrently. This should use a Lock to protect both the check and set of _initialized.

Copilot uses AI. Check for mistakes.
else:
raise SerializationError(
item,
context=f"MapList can only contain dictionaries, got {type(item)}",
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerializationError is being called with an invalid parameter. The SerializationError constructor takes (obj, target_proto_type, cause) but here it's being called with a 'context' keyword argument which doesn't exist. The error message should either be passed via a custom exception in the 'cause' parameter, or the code should be refactored to construct the error message differently.

Suggested change
context=f"MapList can only contain dictionaries, got {type(item)}",
MapList,
ValueError(
f"MapList can only contain dictionaries, got {type(item)}"
),

Copilot uses AI. Check for mistakes.

@property
@abstractmethod
def supported_types(self) -> tuple[Type, ...]:
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent type annotation style. The codebase uses Tuple from the typing module, but here lowercase tuple is used. For consistency with the existing codebase conventions, these should use Tuple[Type, ...] imported from typing. The same issue appears in multiple type mappings (lines 107, 217, 390).

Copilot uses AI. Check for mistakes.
Comment on lines 156 to 158
for registered_type, converter in self._converters.items():
if issubclass(obj_type, registered_type):
return converter
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issubclass() call can raise TypeError if obj_type is not a class (e.g., if someone passes a non-type object). This should be wrapped in a try-except to handle edge cases gracefully, especially since this is a public API. The same issue exists in has_converter() at line 222.

Suggested change
for registered_type, converter in self._converters.items():
if issubclass(obj_type, registered_type):
return converter
try:
for registered_type, converter in self._converters.items():
if issubclass(obj_type, registered_type):
return converter
except TypeError:
# obj_type is not a class; fall through to ConverterNotFoundError
logger.debug(
"get_converter called with non-type obj_type %r; "
"no converter can be resolved",
obj_type,
)

Copilot uses AI. Check for mistakes.
Comment on lines +748 to +753
def get_value_converter() -> ValueTypeConverter:
"""Get the global ValueTypeConverter instance."""
global _converter_instance
if _converter_instance is None:
_converter_instance = ValueTypeConverter()
return _converter_instance
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The singleton pattern for ValueTypeConverter is not thread-safe. If multiple threads call get_value_converter() simultaneously before initialization, they could create multiple instances. This should use double-checked locking with a Lock, similar to the ConverterRegistry implementation.

Copilot uses AI. Check for mistakes.
def __init__(
self,
value: Any,
expected_types: Optional[list[Type]] = None,
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent type annotation style. The codebase uses List, Dict, Optional from the typing module (as seen in entity.py and throughout the codebase), but here lowercase list[Type] is used. For consistency with the existing codebase, this should be List[Type] imported from typing. The same issue exists with tuple vs Tuple throughout the new code.

Copilot uses AI. Check for mistakes.
franciscojavierarceo and others added 2 commits February 22, 2026 15:38
Remove unnecessary complexity from proto conversion module:
- Remove ConverterRegistry singleton and registration system
- Remove ProtoSerializable/LegacyProtoSerializable mixins
- Remove compat.py backward compatibility layer
- Simplify converter.py to just the ProtoConverter base class
- Update tests to test converters directly without registry

The converters remain useful for consolidating duplicated logic:
- ValueTypeConverter: Unified value type conversion
- DataFrameProtoConverter: Arrow/DataFrame conversion
- EntityConverter, FeatureViewConverter, etc.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Simplify dataframe_converter.py by replacing the ArrowConversionStrategy
abstract class and its two implementations with straightforward conditionals.

Before: ~490 lines with 3 classes (ABC + 2 strategies)
After: ~300 lines with 1 class using is_odfv boolean flag

The strategy pattern added unnecessary indirection for just 2 variations.
Simple if/else branches are easier to follow and maintain.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant