Hello I am trying to capture a specific part of a text. I have tried different patterns but without any luck. I have tried different answers for similar questions but also without any luck. After struggling for awhile I wanted to ask it. I am not sure if it is possible at all and I am trying this in java…
So what I am trying to get is any url for the caIssuers
within the AuthorityInfoAccess
in this case that would be http://cacerts.digicert.com/DigiCertHighAssuranceEVRootCA.crt
however the value is dynamic.
The raw text is:
[ [ Version: V3 Subject: CN=DigiCert High Assurance TLS Hybrid ECC SHA256 2020 CA1, O="DigiCert, Inc.", C=US Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11 Key: Sun EC public key, 256 bits public x coord: 46922930096926857556524221823769659737755518953746800561114373165101317926430 public y coord: 28418761285841432519462039103521095162475800069609980635592577603211275159549 parameters: secp256r1 [NIST P-256, X9.62 prime256v1] (1.2.840.10045.3.1.7) Validity: [From: Thu Dec 17 01:00:00 CET 2020, To: Tue Dec 17 00:59:59 CET 2030] Issuer: CN=DigiCert High Assurance EV Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US SerialNumber: [ 0667035b bb14fd63 afc0d6a8 534efe16] Certificate Extensions: 8 [1]: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: ocsp accessLocation: URIName: http://ocsp.digicert.com , accessMethod: caIssuers accessLocation: URIName: http://cacerts.digicert.com/DigiCertHighAssuranceEVRootCA.crt ] ] [2]: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentifier [ KeyIdentifier [ 0000: B1 3E C3 69 03 F8 BF 47 01 D4 98 26 1A 08 02 EF .>.i...G...&.... 0010: 63 64 2B C3 cd+. ] ] [3]: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:0 ] [4]: ObjectId: 2.5.29.31 Criticality=false CRLDistributionPoints [ [DistributionPoint: [URIName: http://crl3.digicert.com/DigiCertHighAssuranceEVRootCA.crl] ]] [5]: ObjectId: 2.5.29.32 Criticality=false CertificatePolicies [ [CertificatePolicyId: [2.23.140.1.2.2] [] ] [CertificatePolicyId: [2.23.140.1.2.3] [] ] [CertificatePolicyId: [2.23.140.1.1] [] ] [CertificatePolicyId: [2.23.140.1.2.1] [] ] ] [6]: ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ serverAuth clientAuth ] [7]: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_CertSign Crl_Sign ] [8]: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: 50 61 A6 A0 D2 35 C4 11 2A 20 8D 1F 0F AC 42 F0 Pa...5..* ....B. 0010: CD 29 CF 4B .).K ] ] ] Algorithm: [SHA256withRSA] Signature: 0000: 73 10 1F C8 61 88 17 CD 6F 1C 04 C3 16 DB 4C 09 s...a...o.....L. 0010: EE 8C FC 94 87 FA 22 D0 9A DF 64 8D EE F4 9B A2 ......"...d..... 0020: 2E A7 1A EF 6D 03 E9 FA 12 FC 00 79 FB 81 08 C6 ....m......y.... 0030: 99 BB 08 C1 B8 31 D3 7F 97 BA 00 88 38 A9 68 23 .....1......8.h# 0040: EF 98 E9 A9 61 4A 67 4F B0 3A DC 2A F4 AB 88 3C ....aJgO.:.*...< 0050: E2 B2 35 66 67 6A 03 8D 25 55 45 1F EA A0 BA 13 ..5fgj..%UE..... 0060: 7E 2D 0B BD EA 0D 01 7C 4C 94 AB 7E C7 16 15 D0 .-......L....... 0070: A5 45 74 7D 27 84 06 AE 46 76 54 D3 12 0F 39 43 .Et.'...FvT...9C 0080: 47 35 82 68 0F 79 31 F3 BC C7 4D 65 F9 97 68 A5 G5.h.y1...Me..h. 0090: D1 3C 16 F3 3B F2 01 9D E3 3C 5E 59 BF 2F F7 DD .<..;....<^Y./.. 00A0: 7E 98 1C 53 0D EA 6A 2A EC BF 8C 5E 51 9B A0 61 ...S..j*...^Q..a 00B0: 7F 1A F7 DC 00 D1 B3 AD 2C D6 DD 7A 76 D6 77 A4 ........,..zv.w. 00C0: E6 0B 00 B0 53 3C 3E 4A 85 9E 9A FB F7 64 E5 D9 ....S<>J.....d.. 00D0: E1 E9 CE 0F 69 E6 50 60 15 00 87 E1 AE C5 F6 81 ....i.P`........ 00E0: 95 4E 2A 43 C1 2D 8C 13 02 40 7A DE 30 8C 17 1D .N*C.-...@z.0... 00F0: 81 D6 E4 54 58 1A 38 11 E0 D3 2E 68 8C 36 8C 3D ...TX.8....h.6.= ]
and a different one but same use case:
[ [ Version: V3 Subject: CN=ISRG Root X1, O=Internet Security Research Group, C=US Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11 Key: Sun RSA public key, 4096 bits modulus: 709477870415445373015359016562426660610553770685944520893298396600226760899977879191004898543350831842119174188613678136510262472550532722234131754439181090009824131001234702144200501816519311599904090606194984753842587622398776018408050245574116028550608708896478977104703101364577377554823893350339376892984086676842821506637376561471221178677513035811884589888230947855482554780924844280661412982827405878164907670403886160896655313460186264922042760067692235383478494519985672059698752915965998412445946254227413232257276525240006651483130792248112417425846451951438781260632137645358927568158361961710185115502577127010922344394993078948994750404287047493247048147066090211292167313905862438457453781042040498702821432013765502024105065778257759178356925494156447570322373310256999609083201778278588599854706241788119448943034477370959349516873162063461521707809689839710972753590949570167489887658749686740890549110678989462474318310617765270337415238713770800711236563610171101328052424145478220993016515262478543813796899677215192789612682845145008993144513547444131126029557147570005369943143213525671105288817016183804256755470528641042403865830064493168693765438364296560479053823886598989258655438933191724193029337334607 public exponent: 65537 Validity: [From: Wed Jan 20 20:14:03 CET 2021, To: Mon Sep 30 20:14:03 CEST 2024] Issuer: CN=DST Root CA X3, O=Digital Signature Trust Co. SerialNumber: [ 40017721 37d4e942 b8ee76aa 3c640ab7] Certificate Extensions: 7 [1]: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: caIssuers accessLocation: URIName: http://apps.identrust.com/roots/dstrootcax3.p7c ] ] [2]: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentifier [ KeyIdentifier [ 0000: C4 A7 B1 A4 7B 2C 71 FA DB E1 4B 90 75 FF C4 15 .....,q...K.u... 0010: 60 85 89 10 `... ] ] [3]: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] [4]: ObjectId: 2.5.29.31 Criticality=false CRLDistributionPoints [ [DistributionPoint: [URIName: http://crl.identrust.com/DSTROOTCAX3CRL.crl] ]] [5]: ObjectId: 2.5.29.32 Criticality=false CertificatePolicies [ [CertificatePolicyId: [2.23.140.1.2.1] [] ] [CertificatePolicyId: [1.3.6.1.4.1.44947.1.1.1] [PolicyQualifierInfo: [ qualifierID: 1.3.6.1.5.5.7.2.1 qualifier: 0000: 16 22 68 74 74 70 3A 2F 2F 63 70 73 2E 72 6F 6F ."http://cps.roo 0010: 74 2D 78 31 2E 6C 65 74 73 65 6E 63 72 79 70 74 t-x1.letsencrypt 0020: 2E 6F 72 67 .org ]] ] ] [6]: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ Key_CertSign Crl_Sign ] [7]: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: 79 B4 59 E6 7B B6 E5 E4 01 73 80 08 88 C8 1A 58 y.Y......s.....X 0010: F6 E9 9B 6E ...n ] ] ] Algorithm: [SHA256withRSA] Signature: 0000: 0A 73 00 6C 96 6E FF 0E 52 D0 AE DD 8C E7 5A 06 .s.l.n..R.....Z. 0010: AD 2F A8 E3 8F BF C9 0A 03 15 50 C2 E5 6C 42 BB ./........P..lB. 0020: 6F 9B F4 B4 4F C2 44 88 08 75 CC EB 07 9B 14 62 o...O.D..u.....b 0030: 6E 78 DE EC 27 BA 39 5C F5 A2 A1 6E 56 94 70 10 nx..'.9...nV.p. 0040: 53 B1 BB E4 AF D0 A2 C3 2B 01 D4 96 F4 C5 20 35 S.......+..... 5 0050: 33 F9 D8 61 36 E0 71 8D B4 B8 B5 AA 82 45 95 C0 3..a6.q......E.. 0060: F2 A9 23 28 E7 D6 A1 CB 67 08 DA A0 43 2C AA 1B ..#(....g...C,.. 0070: 93 1F C9 DE F5 AB 69 5D 13 F5 5B 86 58 22 CA 4D ......i]..[.X".M 0080: 55 E4 70 67 6D C2 57 C5 46 39 41 CF 8A 58 83 58 U.pgm.W.F9A..X.X 0090: 6D 99 FE 57 E8 36 0E F0 0E 23 AA FD 88 97 D0 E3 m..W.6...#...... 00A0: 5C 0E 94 49 B5 B5 17 35 D2 2E BF 4E 85 EF 18 E0 ..I...5...N.... 00B0: 85 92 EB 06 3B 6C 29 23 09 60 DC 45 02 4C 12 18 ....;l)#.`.E.L.. 00C0: 3B E9 FB 0E DE DC 44 F8 58 98 AE EA BD 45 45 A1 ;.....D.X....EE. 00D0: 88 5D 66 CA FE 10 E9 6F 82 C8 11 42 0D FB E9 EC .]f....o...B.... 00E0: E3 86 00 DE 9D 10 E3 38 FA A4 7D B1 D8 E8 49 82 .......8......I. 00F0: 84 06 9B 2B E8 6B 4F 01 0C 38 77 2E F9 DD E7 39 ...+.kO..8w....9 ]
Any idea if that would be possible?
Advertisement
Answer
You might use a capture group
bAuthorityInfoAccesss*[s*[(?:R(?!.*baccessMethod: caIssuers).*)*R.*baccessMethod: caIssuersRs*accessLocation: URIName:s*(https?://S+)
bAuthorityInfoAccesss*[s*[
(?:
Non capture groupR(?!.*baccessMethod: caIssuers).*
Match a newline and and the whole line if it does not containaccessMethod: caIssuers
)*
Close group and repeat 0+ timesR.*baccessMethod: caIssuersR
Match a newline andaccessMethod: caIssuers
at the end of the line followed by a newlines*accessLocation: URIName:s*
MatchaccessLocation: URIName:
between optional whitespace chars(https?://S+)
Capture group 1, match the url starting with http