feat: Add SQLAlchemy Offline Store Support by aniketpalu · Pull Request #5993 · feast-dev/feast

aniketpalu · 2026-02-19T17:53:51Z

What this PR does / why we need it:

This PR introduces a universal SQLAlchemy-based Offline Store for Feast, enabling historical feature retrieval from any SQL database supported by SQLAlchemy. Instead of maintaining separate implementations for each database (PostgreSQL, MySQL, Oracle, SQLite, etc.), this single implementation uses dialect-aware SQL generation to support multiple databases through one codebase.

Key Features

Dialect-Aware SQL Generation: Jinja2 macros automatically generate correct SQL syntax for each database
Point-in-Time Correctness: Full support for temporal joins ensuring no data leakage
TTL Support: Configurable time-to-live for feature freshness
Table & Query Sources: Support both direct table access and custom SQL queries
Connection Validation: validate() method verifies database connectivity and table existence

Tested Databases

Database	Driver	Connection String Format	Status
PostgreSQL	psycopg	postgresql+psycopg://user:pass@host:port/db	Tested
Oracle	oracledb	oracle+oracledb://user:pass@host:port/?service_name=XE	Tested
MySQL	pymysql	mysql+pymysql://user:pass@host:port/db	Tested
SQLite	built-in	sqlite:///path/to/database.db	Tested

SQL Dialect Differences Handled

Feature	PostgreSQL	Oracle	MySQL	SQLite
Table Alias	AS t	t	AS t	AS t
Interval	INTERVAL '604800' SECOND	NUMTODSINTERVAL(604800, 'SECOND')	INTERVAL 604800 SECOND	datetime(ts, '-604800 seconds')
Boolean TRUE	ON TRUE	ON 1=1	ON TRUE	ON 1=1
String Cast	CAST(x AS VARCHAR)	TO_CHAR(x)	CAST(x AS CHAR)	CAST(x AS TEXT)
Concatenation	a \|\| b	a \|\| b	CONCAT(a, b)	a \|\| b
JOIN USING	Supported	Use ON	Supported	Supported
Identifiers	lowercase	UPPERCASE	ANSI_QUOTES mode	any case

Configure feature_store.yaml

PostgreSQL Example

project: my_project
registry: data/registry.db
provider: local
offline_store:  
  type: sqlalchemy  
  connection_string: postgresql+psycopg://user:password@localhost:5432/feast_db

Oracle Example

project: my_project
registry: data/registry.db
provider: local
offline_store:  
  type: sqlalchemy  
  connection_string: oracle+oracledb://system:password@localhost:1521/?service_name=FREEPDB1

MySQL Example

project: my_project
registry: data/registry.db
provider: local
offline_store:
  type: sqlalchemy
  connection_string: mysql+pymysql://root:password@localhost:3306/feast_db  
  sqlalchemy_config_kwargs:
    connect_args:
      init_command: "SET sql_mode='ANSI_QUOTES'"  # Required for MySQL

SQLite Example

project: my_project
registry: data/registry.db
provider: local
offline_store:
  type: sqlalchemy
  connection_string: sqlite:///data/features.db

Define Data Source

from feast import SQLAlchemySource, Entity, FeatureView, Field
from feast.types import Int64, Float64
from datetime import timedelta

# Define entity
user = Entity(name="user_id", join_keys=["user_id"])

# Define data source
transaction_source = SQLAlchemySource(
    name="transaction_source",
    table="transaction_features",           # Table name in database
    timestamp_field="feature_time",         # Event timestamp column
    created_timestamp_column="created_at",  # Optional: for deduplication
)

# Define feature view
transaction_fv = FeatureView(
    name="transaction_features",
    entities=[user],
    ttl=timedelta(days=7),
    schema=[
        Field(name="txn_count_24h", dtype=Int64),
        Field(name="txn_total_amount_24h", dtype=Float64),
    ],
    source=transaction_source,
)

Which issue(s) this PR fixes:

Misc

…greSQL & SQLite Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

aniketpalu · 2026-02-19T18:19:53Z

sdk/python/feast/infra/offline_stores/contrib/sqlalchemy_offline_store/sqlalchemy.py

+#}
+
+{% macro table_alias(alias) -%}
+{% if dialect == 'oracle' %}{{ alias }}{% else %}AS {{ alias }}{% endif %}


Oracle doesn't use AS keyword for table aliases.

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

tokoko · 2026-02-22T19:01:09Z

@aniketpalu While I fully share the desire to simplify and somehow consolidate offline store implementations, the idea of using sqlalchemy has come up before and was (imho rightfully) rejected. sqlalchemy data transfer performance is simply unacceptable for anything other than a toy pipeline. To get the rows to arrow, you first go through row-by-row sqlalchemy python tuples, then use pandas as an intermediate step.

Previously I was trying to use ibis to accomplish something similar (both duckdb and mssql stores share same ibis implementation for example), but even if ibis sounds too complex, I'd strongly suggest exploring different technologies for this. For example you could generate sql in any dialect, transpile to target dialect with something like sqlglot and then use adbc for transport. you would avoid extra heavy dependencies and get probably the best performance one could realistically get with python.

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

devin-ai-integration · 2026-02-22T19:23:51Z

sdk/python/feast/infra/offline_stores/contrib/sqlalchemy_offline_store/sqlalchemy.py

+    WHERE fv_sub_{{ outer_loop_index }}.event_timestamp <= base.event_timestamp
+    {% if featureview.ttl != 0 %}
+    AND fv_sub_{{ outer_loop_index }}.event_timestamp >= {{ timestamp_minus_seconds('base.event_timestamp', featureview.ttl) }}
+    {% endif %}
+    {% for entity in featureview.entities %}
+    AND fv_sub_{{ outer_loop_index }}."{{ entity }}" = base."{{ entity }}"
+    {% endfor %}


🔴 PostgreSQL non-entity multi-FV path: correlated LEFT JOIN subquery without LATERAL keyword

When get_historical_features is called without an entity_df (non-entity retrieval mode) and there are multiple feature views, the Jinja2 template generates a LEFT JOIN with a derived table that references columns from the outer base_entities alias. In PostgreSQL, this is a correlated subquery pattern that requires the LATERAL keyword, but it is not included.

Root cause and generated SQL

The template at lines 807-825 generates SQL like:

FROM base_entities base LEFT JOIN ( SELECT DISTINCT ON ("entity_col") event_timestamp, "entity_col", "feature1" FROM "fv_0__data" AS fv_sub_0 WHERE fv_sub_0.event_timestamp <= base.event_timestamp -- references outer 'base' AND fv_sub_0."entity_col" = base."entity_col" -- references outer 'base' ORDER BY "entity_col", event_timestamp DESC ) AS fv_0 ON TRUE

PostgreSQL prohibits referencing other FROM-clause entries inside a non-LATERAL derived table. This will fail at runtime with:
ERROR: invalid reference to FROM-clause entry for table "base"

The fix would require adding LATERAL to the LEFT JOIN (LEFT JOIN LATERAL (...)) when producing a correlated subquery for PostgreSQL.

Impact: All multi-feature-view non-entity historical feature retrievals on PostgreSQL will fail with a SQL error.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-02-22T19:23:52Z

sdk/python/feast/infra/offline_stores/contrib/sqlalchemy_offline_store/sqlalchemy.py

+    ) {{ table_alias('ranked') }}
+    WHERE ranked.rn = 1
+    {% endif %}
+) {{ table_alias('fv_' ~ outer_loop_index) }} {{ on_true() }}


🔴 Non-PostgreSQL non-entity multi-FV path: LEFT JOIN ON 1=1 with no entity matching produces cross join

For non-PostgreSQL dialects (Oracle, MySQL, SQLite, etc.) in the non-entity multi-feature-view path, the ROW_NUMBER subquery selects the globally latest row per entity, then joins to base_entities using only ON 1=1 (a cross join condition) with no entity key matching. This silently produces incorrect results.

Root cause and generated SQL

The template at lines 826-845 generates SQL like:

FROM base_entities base LEFT JOIN ( SELECT event_timestamp, "entity_col", "feature1" FROM ( SELECT fv_sub_0.*, ROW_NUMBER() OVER(PARTITION BY "entity_col" ORDER BY event_timestamp DESC) AS rn FROM "fv_0__data" AS fv_sub_0 ) AS ranked WHERE ranked.rn = 1 ) AS fv_0 ON 1=1 -- no entity matching!

The inner subquery cannot reference base (since it's not a LATERAL join), so it returns only the single globally-latest row per entity. The outer ON 1=1 then cross-joins this with every row in base_entities. If base_entities has 10 entity+timestamp combinations, each will get the same feature values (the latest per entity, ignoring timestamp), which is both a cross-join explosion and semantically wrong.

Contrast this with the PostgreSQL path which at least attempts entity matching (but fails due to the missing LATERAL keyword, see BUG-0001).

Impact: Multi-feature-view non-entity historical retrievals on non-PostgreSQL databases will silently return incorrect, cross-joined data.

Prompt for agents

In sdk/python/feast/infra/offline_stores/contrib/sqlalchemy_offline_store/sqlalchemy.py, the Jinja2 template MULTIPLE_FEATURE_VIEW_POINT_IN_TIME_JOIN (lines 804-847) has broken LEFT JOIN logic in the non-entity multi-feature-view path (the path taken when start_date and end_date are set and there are multiple featureviews). Two issues need to be fixed: 1. PostgreSQL path (lines 807-825): The DISTINCT ON subquery references base.event_timestamp and base.entity columns, but the derived table is not declared as LATERAL. Change LEFT JOIN to LEFT JOIN LATERAL for this path. 2. Non-PostgreSQL path (lines 826-844): The ROW_NUMBER subquery cannot correlate with the outer base table. Instead of using ON 1=1, add entity-matching conditions to the ON clause, e.g.: ON base.entity_col = fv_0.entity_col. The subquery should also be filtered to only include relevant rows (not just the globally latest), or the ON clause should handle the matching. A clean approach would be: - For PostgreSQL: Use LEFT JOIN LATERAL with the existing correlated DISTINCT ON subquery. - For all other dialects: Either use a similar LATERAL pattern (if supported), or restructure the join to put entity matching conditions in the ON clause instead of ON 1=1. For example, change line 845 from ON 1=1 to ON base.entity_col = fv_N.entity_col for each entity column.

Was this helpful? React with 👍 or 👎 to provide feedback.

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

aniketpalu · 2026-02-23T07:52:12Z

@tokoko

Thank you for your comment. It has been eye opening.

I agree that read_sql() is a bottleneck. My primary goal was to add support for OracleDB offline store. Since SQL query is 90% identical to PostgreSQL, or MySQL (unsupported), I thought of adding SQLAlchemy support would solve the issue for all SQL-based offline store issue.

I understand that ibis supports Oracle DB driver, so Oracle DB support using ibis-framework seems like easier way forward considering the functions are already available. WDYT?

aniketpalu added 2 commits February 19, 2026 23:21

feat: Added support for SQLAlchemy lib to support Oracle, MySQL, Post…

616004a

…greSQL & SQLite Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

Minor formatting

fc70a48

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

aniketpalu commented Feb 19, 2026

View reviewed changes

Merge branch 'master' into feature/sqlalchemy-offline-store

05a6719

aniketpalu changed the title ~~Feature/sqlalchemy offline store~~ Feat: Add SQLAlchemy Offline Store Support Feb 22, 2026

aniketpalu marked this pull request as ready for review February 22, 2026 14:41

aniketpalu requested a review from a team as a code owner February 22, 2026 14:41

This comment was marked as resolved.

Sign in to view

Added sqlalchemy in operator as offline store option

e7ccbd6

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

aniketpalu changed the title ~~Feat: Add SQLAlchemy Offline Store Support~~ feat: Add SQLAlchemy Offline Store Support Feb 22, 2026

This comment was marked as resolved.

Sign in to view

Minor fixes, fixed materialization issue

aff4549

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

devin-ai-integration bot reviewed Feb 22, 2026

View reviewed changes

Updated requirements hash

aff2852

Signed-off-by: Aniket Paluskar <apaluska@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Add SQLAlchemy Offline Store Support#5993

feat: Add SQLAlchemy Offline Store Support#5993
aniketpalu wants to merge 6 commits intofeast-dev:masterfrom
aniketpalu:feature/sqlalchemy-offline-store

aniketpalu commented Feb 19, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

aniketpalu Feb 19, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

tokoko commented Feb 22, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 22, 2026

Uh oh!

devin-ai-integration bot Feb 22, 2026

Uh oh!

aniketpalu commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

aniketpalu commented Feb 19, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Key Features

Configure feature_store.yaml

PostgreSQL Example

Oracle Example

MySQL Example

SQLite Example

Define Data Source

Which issue(s) this PR fixes:

Misc

Uh oh!

aniketpalu Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

tokoko commented Feb 22, 2026

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

aniketpalu commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aniketpalu commented Feb 19, 2026 •

edited by devin-ai-integration bot

Loading