fixed wrong word limiter for answer warnings#2540
fixed wrong word limiter for answer warnings#2540AlexanderSchicktanz wants to merge 10 commits intoe-valuation:mainfrom
Conversation
| let j; | ||
| for (j = 0; j < sub.length; j++) { | ||
| if (arr[i + j] !== sub[j]) break; | ||
| } | ||
| if (j == sub.length) return true; |
There was a problem hiding this comment.
I think this could be slightly nicer if you create another helper function arrayEquals(a: string[]. b: string[]): bool :)
Bonus points: Can you remove the string type and make a generic function instead? Take a look at our utils.ts file for examples, or ask if you need some input :)
There was a problem hiding this comment.
I still think that this change would make a nice improvement to the current implementation
| const words = text.split(" "); | ||
| const triggerWords = triggerString.split(" "); |
There was a problem hiding this comment.
Splitting by a single space character means that we don't match if users accidentally put two spaces. We should split by any amount of whitespace (like what happens in Python when using str.split without passing any argument)
| function matchesTriggerString(text: string, triggerString: string): boolean { | ||
| const words = text.split(" "); | ||
| const triggerWords = triggerString.split(" "); | ||
| return containsSubArray(words, triggerWords); |
There was a problem hiding this comment.
This probably means that if users write something like "see above." at the end of a sentence we will not show a warning because the dot is included in the last word.
Doing this completely correct seems tough, unless we at some point change the UI so that staff users can enter arbitrary regexes as trigger words. What do you think we should aim for in this PR @janno42 ?
There was a problem hiding this comment.
We could first preprocess (before splitting) by regex-replacing all non-word characters with a delimiter, then splitting on the delimiter. Then, punctuation, arbitrarily weird (repeated) whitespace and other non-printable characters would not be an issue.
There was a problem hiding this comment.
It now matches strings including dots, question- and exclamation-marks by trying to match the start of the string and then checking, if the rest is made up of only the aforementioned characters.
This way, trigger words including these delimiters (i.e. "s.o.") can also be matched.
There was a problem hiding this comment.
but there is still the problem that if there is more than one space character we have an empty string as one of the words
| let j; | ||
| for (j = 0; j < sub.length; j++) { | ||
| if (arr[i + j] !== sub[j]) break; | ||
| } | ||
| if (j == sub.length) return true; |
There was a problem hiding this comment.
I still think that this change would make a nice improvement to the current implementation
| return text.length > 0 && ["", "ka", "na", "none", "keine", "keines", "keiner"].includes(text.replace(/\W/g, "")); | ||
| } | ||
|
|
||
| function containsPhrase(arr: string[], sub: string[]): boolean { |
There was a problem hiding this comment.
This function is very difficult to read, it does too many things at once. We should probably have something like
function matchesTriggerString(text: string, triggerString: string): boolean {
const words = extractWords(text);
const triggerWords = triggerString.split(" ");
return isSubArray(triggerWords, words);
}Then the isSubArray should also not be this complicated, it should use another function areArraysEqual, as I previously suggested
There was a problem hiding this comment.
note that this also means that all of our custom "how do we handle weird text things" is separated from the entire "do the arrays match" logic
fix #2516