There is extreme noise in the patient annotations: for instance, in their health rhythm labels there’s around 20% contradictions in the dataset.
- accurate diagnosis
- limiting “domain rule violations”
approach
- take your dataset, and validate rules
- IF validation is successful, train model normally with that sample
- IF the validation is unsuccessful, then use the output samples as negative examples