Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add OCR Decoding support - WIP #113

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

add OCR Decoding support - WIP #113

wants to merge 1 commit into from

Conversation

N950
Copy link

@N950 N950 commented Jun 10, 2024

This is a WIP to add CTC OCR recognition/decoding
Conformity to contribution guidelines will be fixed before closing

Next change will be adding kpt label to the TEXT LabelType, even though the goal is only OCR recognition, on the data side it makes since to create the annotation/LabelType from the begining to support kpt annotations

class_mapping: Dict[str, int],
**_,
) -> np.ndarray:
text_labels = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text_labels can be already set to np.zeros((len(annotations), ann.max_len)) so there's no chance to return None

@@ -174,6 +174,7 @@ def _load_image_with_annotations(self, idx: int) -> Tuple[np.ndarray, Labels]:

uuid = self.instances[idx]
df = self.df.loc[uuid]
print(df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgotten print



def validate_text_value(
value: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation of value seems to be incorrect.

Copy link
Collaborator

@kozlov721 kozlov721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, otherwise looks good.

@type is_train: bool
"""
super(OCRAugmentation, self).__init__()
self.transforms = A.Compose(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a set of some standard augmentations that are usually performed for OCR task or how is this defined?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious on this

],
p=0.2
),
A.Compose( # resize to image_size with aspect ratio, pad if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resize and Normalize are already part of the default augmentations. Resize is always done (you can control if it keeps aspect ratio or not) and Normalize is also appended to list of augmentations (if used by luxonis-train, can be deactivated through config though). So is this needed here?

@param is_train: True if image is train. False if image is val/test.
@type is_train: bool
"""
super(OCRAugmentation, self).__init__()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Le'ts keep it just super().__init__(). The arguments in super are a relic from python 2.

Copy link
Collaborator

@conorsim conorsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next change will be adding kpt label to the TEXT LabelType, even though the goal is only OCR recognition, on the data side it makes since to create the annotation/LabelType from the begining to support kpt annotations

What is meant by this? We have the LabelType.KEYPOINTS already. We also plan to support nested annotations, so I think the final form for OCR + keypoints would be TEXT and KEYPOINTS nested within a BOUNDINGBOX

Comment on lines +296 to +297
def set_global_metadata(self, metadata: Dict[str, Any]) -> None:
self.global_metadata = metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to GCS datasets, I think we need a way to persist this via storage instead of just memory? Perhaps we could use the existing datasets.json or metadata folder?

Base automatically changed from dev to main July 1, 2024 23:35
@kozlov721 kozlov721 changed the base branch from main to dev July 8, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants