fix: protobuf (de)ser for docvec#1639
Merged
JohannesMessner merged 9 commits intomainfrom Jun 13, 2023
Merged
Conversation
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
JohannesMessner
commented
Jun 12, 2023
| # handle values that were None before serialization | ||
| tensor_columns[tens_col_name] = None | ||
| else: | ||
| # TODO(johannes): handle torch, tf, numpy |
Member
Author
There was a problem hiding this comment.
i will do this in a separate PR, it might require a proto change (but not sure)
Member
There was a problem hiding this comment.
I don't think that it require a proto change. I think here you should look at the tensor type of this field in the doc type and use it to load the proto
Member
There was a problem hiding this comment.
the same way you do it for doc_columns
Member
Author
There was a problem hiding this comment.
let's discuss that in a separate PR
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
Signed-off-by: Johannes Messner <messnerjo@gmail.com>
|
📝 Docs are deployed on https://ft-fix-docvec-proto--jina-docs.netlify.app 🎉 |
samsja
approved these changes
Jun 13, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #1561
The problem here was that deser from proto to docvec did not acutally deserialize the columns. Instead, it stored raw protos as columns instead of deserialized values.
A special case is when a column is
None, and this PR introduces a convention of how those columns are represented in the proto. This avoid having to change the proto definition.Limitation: Right now this only works for np columns, not tf or torch. Support for those two will come in a separate PR.
Breaking Change: The .proto definition is changed in this PR. I added
ListOfDocVecPrototo properly represent a DocVec's doc_vec_columns, and adjustedDocVecProtoaccordingly. This will break backwards compatibility for DocVecs serialiazed using older versions of docarray. But those older version were broken in that regard anyways, so it should not be an issue.TODO:
Nonehandling for