Skip to content

Comments

feat: Add SQLAlchemy Offline Store Support#5993

Open
aniketpalu wants to merge 6 commits intofeast-dev:masterfrom
aniketpalu:feature/sqlalchemy-offline-store
Open

feat: Add SQLAlchemy Offline Store Support#5993
aniketpalu wants to merge 6 commits intofeast-dev:masterfrom
aniketpalu:feature/sqlalchemy-offline-store

Conversation

@aniketpalu
Copy link
Contributor

@aniketpalu aniketpalu commented Feb 19, 2026

What this PR does / why we need it:

This PR introduces a universal SQLAlchemy-based Offline Store for Feast, enabling historical feature retrieval from any SQL database supported by SQLAlchemy. Instead of maintaining separate implementations for each database (PostgreSQL, MySQL, Oracle, SQLite, etc.), this single implementation uses dialect-aware SQL generation to support multiple databases through one codebase.

Key Features

  • Dialect-Aware SQL Generation: Jinja2 macros automatically generate correct SQL syntax for each database
  • Point-in-Time Correctness: Full support for temporal joins ensuring no data leakage
  • TTL Support: Configurable time-to-live for feature freshness
  • Table & Query Sources: Support both direct table access and custom SQL queries
  • Connection Validation: validate() method verifies database connectivity and table existence
Tested Databases

Database Driver Connection String Format Status
PostgreSQL psycopg postgresql+psycopg://user:pass@host:port/db Tested
Oracle oracledb oracle+oracledb://user:pass@host:port/?service_name=XE Tested
MySQL pymysql mysql+pymysql://user:pass@host:port/db Tested
SQLite built-in sqlite:///path/to/database.db Tested
SQL Dialect Differences Handled

Feature PostgreSQL Oracle MySQL SQLite
Table Alias AS t t AS t AS t
Interval INTERVAL '604800' SECOND NUMTODSINTERVAL(604800, 'SECOND') INTERVAL 604800 SECOND datetime(ts, '-604800 seconds')
Boolean TRUE ON TRUE ON 1=1 ON TRUE ON 1=1
String Cast CAST(x AS VARCHAR) TO_CHAR(x) CAST(x AS CHAR) CAST(x AS TEXT)
Concatenation a || b a || b CONCAT(a, b) a || b
JOIN USING Supported Use ON Supported Supported
Identifiers lowercase UPPERCASE ANSI_QUOTES mode any case

Configure feature_store.yaml

PostgreSQL Example

project: my_project
registry: data/registry.db
provider: local
offline_store:  
  type: sqlalchemy  
  connection_string: postgresql+psycopg://user:password@localhost:5432/feast_db

Oracle Example

project: my_project
registry: data/registry.db
provider: local
offline_store:  
  type: sqlalchemy  
  connection_string: oracle+oracledb://system:password@localhost:1521/?service_name=FREEPDB1

MySQL Example

project: my_project
registry: data/registry.db
provider: local
offline_store:
  type: sqlalchemy
  connection_string: mysql+pymysql://root:password@localhost:3306/feast_db  
  sqlalchemy_config_kwargs:
    connect_args:
      init_command: "SET sql_mode='ANSI_QUOTES'"  # Required for MySQL

SQLite Example

project: my_project
registry: data/registry.db
provider: local
offline_store:
  type: sqlalchemy
  connection_string: sqlite:///data/features.db

Define Data Source

from feast import SQLAlchemySource, Entity, FeatureView, Field
from feast.types import Int64, Float64
from datetime import timedelta

# Define entity
user = Entity(name="user_id", join_keys=["user_id"])

# Define data source
transaction_source = SQLAlchemySource(
    name="transaction_source",
    table="transaction_features",           # Table name in database
    timestamp_field="feature_time",         # Event timestamp column
    created_timestamp_column="created_at",  # Optional: for deduplication
)

# Define feature view
transaction_fv = FeatureView(
    name="transaction_features",
    entities=[user],
    ttl=timedelta(days=7),
    schema=[
        Field(name="txn_count_24h", dtype=Int64),
        Field(name="txn_total_amount_24h", dtype=Float64),
    ],
    source=transaction_source,
)

Which issue(s) this PR fixes:

Misc


Open with Devin

…greSQL & SQLite

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
#}

{% macro table_alias(alias) -%}
{% if dialect == 'oracle' %}{{ alias }}{% else %}AS {{ alias }}{% endif %}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle doesn't use AS keyword for table aliases.

@aniketpalu aniketpalu changed the title Feature/sqlalchemy offline store Feat: Add SQLAlchemy Offline Store Support Feb 22, 2026
@aniketpalu aniketpalu marked this pull request as ready for review February 22, 2026 14:41
@aniketpalu aniketpalu requested a review from a team as a code owner February 22, 2026 14:41
devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
@aniketpalu aniketpalu changed the title Feat: Add SQLAlchemy Offline Store Support feat: Add SQLAlchemy Offline Store Support Feb 22, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@tokoko
Copy link
Collaborator

tokoko commented Feb 22, 2026

@aniketpalu While I fully share the desire to simplify and somehow consolidate offline store implementations, the idea of using sqlalchemy has come up before and was (imho rightfully) rejected. sqlalchemy data transfer performance is simply unacceptable for anything other than a toy pipeline. To get the rows to arrow, you first go through row-by-row sqlalchemy python tuples, then use pandas as an intermediate step.

Previously I was trying to use ibis to accomplish something similar (both duckdb and mssql stores share same ibis implementation for example), but even if ibis sounds too complex, I'd strongly suggest exploring different technologies for this. For example you could generate sql in any dialect, transpile to target dialect with something like sqlglot and then use adbc for transport. you would avoid extra heavy dependencies and get probably the best performance one could realistically get with python.

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment on lines +818 to +824
WHERE fv_sub_{{ outer_loop_index }}.event_timestamp <= base.event_timestamp
{% if featureview.ttl != 0 %}
AND fv_sub_{{ outer_loop_index }}.event_timestamp >= {{ timestamp_minus_seconds('base.event_timestamp', featureview.ttl) }}
{% endif %}
{% for entity in featureview.entities %}
AND fv_sub_{{ outer_loop_index }}."{{ entity }}" = base."{{ entity }}"
{% endfor %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 PostgreSQL non-entity multi-FV path: correlated LEFT JOIN subquery without LATERAL keyword

When get_historical_features is called without an entity_df (non-entity retrieval mode) and there are multiple feature views, the Jinja2 template generates a LEFT JOIN with a derived table that references columns from the outer base_entities alias. In PostgreSQL, this is a correlated subquery pattern that requires the LATERAL keyword, but it is not included.

Root cause and generated SQL

The template at lines 807-825 generates SQL like:

FROM base_entities base
LEFT JOIN (
    SELECT DISTINCT ON ("entity_col")
        event_timestamp, "entity_col", "feature1"
    FROM "fv_0__data" AS fv_sub_0
    WHERE fv_sub_0.event_timestamp <= base.event_timestamp   -- references outer 'base'
    AND fv_sub_0."entity_col" = base."entity_col"             -- references outer 'base'
    ORDER BY "entity_col", event_timestamp DESC
) AS fv_0 ON TRUE

PostgreSQL prohibits referencing other FROM-clause entries inside a non-LATERAL derived table. This will fail at runtime with:
ERROR: invalid reference to FROM-clause entry for table "base"

The fix would require adding LATERAL to the LEFT JOIN (LEFT JOIN LATERAL (...)) when producing a correlated subquery for PostgreSQL.

Impact: All multi-feature-view non-entity historical feature retrievals on PostgreSQL will fail with a SQL error.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +842 to +845
) {{ table_alias('ranked') }}
WHERE ranked.rn = 1
{% endif %}
) {{ table_alias('fv_' ~ outer_loop_index) }} {{ on_true() }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Non-PostgreSQL non-entity multi-FV path: LEFT JOIN ON 1=1 with no entity matching produces cross join

For non-PostgreSQL dialects (Oracle, MySQL, SQLite, etc.) in the non-entity multi-feature-view path, the ROW_NUMBER subquery selects the globally latest row per entity, then joins to base_entities using only ON 1=1 (a cross join condition) with no entity key matching. This silently produces incorrect results.

Root cause and generated SQL

The template at lines 826-845 generates SQL like:

FROM base_entities base
LEFT JOIN (
    SELECT event_timestamp, "entity_col", "feature1"
    FROM (
        SELECT fv_sub_0.*,
            ROW_NUMBER() OVER(PARTITION BY "entity_col" ORDER BY event_timestamp DESC) AS rn
        FROM "fv_0__data" AS fv_sub_0
    ) AS ranked
    WHERE ranked.rn = 1
) AS fv_0 ON 1=1              -- no entity matching!

The inner subquery cannot reference base (since it's not a LATERAL join), so it returns only the single globally-latest row per entity. The outer ON 1=1 then cross-joins this with every row in base_entities. If base_entities has 10 entity+timestamp combinations, each will get the same feature values (the latest per entity, ignoring timestamp), which is both a cross-join explosion and semantically wrong.

Contrast this with the PostgreSQL path which at least attempts entity matching (but fails due to the missing LATERAL keyword, see BUG-0001).

Impact: Multi-feature-view non-entity historical retrievals on non-PostgreSQL databases will silently return incorrect, cross-joined data.

Prompt for agents
In sdk/python/feast/infra/offline_stores/contrib/sqlalchemy_offline_store/sqlalchemy.py, the Jinja2 template MULTIPLE_FEATURE_VIEW_POINT_IN_TIME_JOIN (lines 804-847) has broken LEFT JOIN logic in the non-entity multi-feature-view path (the path taken when start_date and end_date are set and there are multiple featureviews).

Two issues need to be fixed:

1. PostgreSQL path (lines 807-825): The DISTINCT ON subquery references base.event_timestamp and base.entity columns, but the derived table is not declared as LATERAL. Change LEFT JOIN to LEFT JOIN LATERAL for this path.

2. Non-PostgreSQL path (lines 826-844): The ROW_NUMBER subquery cannot correlate with the outer base table. Instead of using ON 1=1, add entity-matching conditions to the ON clause, e.g.: ON base.entity_col = fv_0.entity_col. The subquery should also be filtered to only include relevant rows (not just the globally latest), or the ON clause should handle the matching.

A clean approach would be:
- For PostgreSQL: Use LEFT JOIN LATERAL with the existing correlated DISTINCT ON subquery.
- For all other dialects: Either use a similar LATERAL pattern (if supported), or restructure the join to put entity matching conditions in the ON clause instead of ON 1=1. For example, change line 845 from ON 1=1 to ON base.entity_col = fv_N.entity_col for each entity column.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>
@aniketpalu
Copy link
Contributor Author

@tokoko

Thank you for your comment. It has been eye opening.

I agree that read_sql() is a bottleneck. My primary goal was to add support for OracleDB offline store. Since SQL query is 90% identical to PostgreSQL, or MySQL (unsupported), I thought of adding SQLAlchemy support would solve the issue for all SQL-based offline store issue.

I understand that ibis supports Oracle DB driver, so Oracle DB support using ibis-framework seems like easier way forward considering the functions are already available. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants