I have a regex pattern created on regex101.com: https://regex101.com/r/cMvHlm/7/codegen?language=java
however, that regex does not seem to work in my Java program (I use spring toolsuite as IDE):
@Test
public void testRegex() {
//Pattern referenceCodePattern = Pattern.compile("((\h|\:)+)(([u00DFA-Za-z0-9-_#\\\/])+)(([[:punct:]])?)");
Pattern pattern = Pattern.compile(""
+ "(?:\s+|chiffre|job-id|job-nr[.]|job-nr|\bjob id\b|job nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job nr.|ziffer|kennziffer|kennz.|referenz code|referenz-code|"
+ "referenzcode|ref[.] nr[.]|ref[.] id|ref id|ref[.]id|ref[.]-nr[.]|ref[.]- nr[.]|"
+ "referenz nummer|referenznummer|referenz nr[.]|stellenreferenz| referenz-nr[.]|referenznr[.]|referenz|referenznummer der stelle|id#|id #|stellenausschreibungen|"
+ "stellenausschreibungs\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots id|stellenangebots-id|stellenangebotsid|stellen id|stellen-id|stellenid|stellenreferenz|"
+ "stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\bst[.]-nr[.]\b|\bst[.] nr[.]\b|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|"
+ "bewerbungskennziffer|projekt id|projekt-id|reference number|reference no[.]|reference code|job code|job id|job vacancy no[.]|job-ad-number|auto req id|job ref|\bstellenausschreibung nr[.]\b)"
+ ":?(?:\w*)(?:\s*)([A-Z]*\s*)([!"#$%&'()*+,\-.\/:;<=>?@[\]^_`{|}~]*\w*[!"#$%&'()*+,\-.\/:;<=>?@[\]^_`{|}~]*\w*[!"#$%&'()*+,\-.\/:;<=>?@[\]^_`{|}~]*\w*[!"#$%&'()*+,\-.\/:;<=>?@[\]^_`{|}~]*)?");
String line = "Referenznummer: INDUSTRY Kontakt: ZAsdfsdfS Herr Andrafgdh Neue Str. 7 21244 Buchholz +42341 22322 mdjob.bu44lz@zaqusssis.de Stellenanzeige teilen: Jetzt online bewerben! oder bewerben Sie sich mitn" +
"Geben Sie bei Ihrer Bewerbung die Stellenreferenz und die Stellenbezeichnung an! n" +
"Stellenreferenz: 21533448-JOtestnn" +
"Stellenausschreibung Nr. PD-666/19";
// Create a Pattern object
//Pattern r = Pattern.compile(pattern);
Matcher m = pattern.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
}else {
System.out.println("NO MATCH");
}
}
I get the following error:
java.util.regex.PatternSyntaxException: Unclosed character class near index 1337
at java.util.regex.Pattern.error(Pattern.java:1957)
at java.util.regex.Pattern.clazz(Pattern.java:2550)
at java.util.regex.Pattern.clazz(Pattern.java:2506)
at java.util.regex.Pattern.clazz(Pattern.java:2506)
at java.util.regex.Pattern.clazz(Pattern.java:2506)
at java.util.regex.Pattern.sequence(Pattern.java:2065)
at java.util.regex.Pattern.expr(Pattern.java:1998)
at java.util.regex.Pattern.group0(Pattern.java:2907)
at java.util.regex.Pattern.sequence(Pattern.java:2053)
at java.util.regex.Pattern.expr(Pattern.java:1998)
at java.util.regex.Pattern.compile(Pattern.java:1698)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
Is there a way to find out where index 1337 is?
Advertisement
Answer
The main problem with the regex is that both [
and ]
must be escaped in a character class in a Java regex as these are used to form character class unions and intersections, are “special” there.
Another issue is the [.]b
patterns won’t work as expected because a word boundary after a non-word char will require a word char immediately to the right of the current position. You need a B
there, not b
.
You need to escape /
char in a Java regex pattern.
You do not have to repeat the pattern at the end of the regex, you may “repeat” it with a limiting {0,3}
quantifier after wrapping the repeated pattern with a non-capturing group, (?:...)
.
Consider a while
block to get all matches. You may use a boolean flag to see if there were any matches or not.
Also, you probably want to use \s+
alternative as the last one in the first group, it is too generic, but I will leave it at the start for the time being.
Use
Pattern pattern = Pattern.compile(""
+ "(?:\s+|chiffre|job-id|job-nr[.]|job-nr|\bjob id\b|job nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job nr\.|ziffer|kennziffer|kennz\.|referenz code|referenz-code|"
+ "referenzcode|ref[.] nr[.]|ref[.] id|ref id|ref[.]id|ref[.]-nr[.]|ref[.]- nr[.]|"
+ "referenz nummer|referenznummer|referenz nr[.]|stellenreferenz| referenz-nr[.]|referenznr[.]|referenz|referenznummer der stelle|id#|id #|stellenausschreibungen|"
+ "stellenausschreibungs\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots id|stellenangebots-id|stellenangebotsid|stellen id|stellen-id|stellenid|stellenreferenz|"
+ "stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\bst[.]-nr[.]\B|\bst[.] nr[.]\B|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|"
+ "bewerbungskennziffer|projekt id|projekt-id|reference number|reference no[.]|reference code|job code|job id|job vacancy no[.]|job-ad-number|auto req id|job ref|\bstellenausschreibung nr[.]\B)"
+ ":?\w*\s*([A-Z]*\s*)([!"#$%&'()*+,\-./:;<=>?@\[\]^_`{|}~]*(?:\w*[!"#$%&'()*+,\-./:;<=>?@\[\]^_`{|}~]*){0,3})?");
String line = "Referenznummer: INDUSTRY Kontakt: ZAsdfsdfS Herr Andrafgdh Neue Str. 7 21244 Buchholz +42341 22322 mdjob.bu44lz@zaqusssis.de Stellenanzeige teilen: Jetzt online bewerben! oder bewerben Sie sich mitn" +
"Geben Sie bei Ihrer Bewerbung die Stellenreferenz und die Stellenbezeichnung an! n" +
"Stellenreferenz: 21533448-JOtestnn" +
"Stellenausschreibung Nr. PD-666/19";
Matcher m = pattern.matcher(line);
boolean found = false;
while (m.find()) {
found = true;
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
System.out.println(" ----------------------- " );
}
if (!found) {
System.out.println("NO MATCH");
}
See this Java demo.