Skip to content
Advertisement

Capture text between multiline text

Hello I am trying to capture a specific part of a text. I have tried different patterns but without any luck. I have tried different answers for similar questions but also without any luck. After struggling for awhile I wanted to ask it. I am not sure if it is possible at all and I am trying this in java…

So what I am trying to get is any url for the caIssuers within the AuthorityInfoAccess in this case that would be http://cacerts.digicert.com/DigiCertHighAssuranceEVRootCA.crt however the value is dynamic.

The raw text is:

[
[
  Version: V3
  Subject: CN=DigiCert High Assurance TLS Hybrid ECC SHA256 2020 CA1, O="DigiCert, Inc.", C=US
  Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11

  Key:  Sun EC public key, 256 bits
  public x coord: 46922930096926857556524221823769659737755518953746800561114373165101317926430
  public y coord: 28418761285841432519462039103521095162475800069609980635592577603211275159549
  parameters: secp256r1 [NIST P-256, X9.62 prime256v1] (1.2.840.10045.3.1.7)
  Validity: [From: Thu Dec 17 01:00:00 CET 2020,
               To: Tue Dec 17 00:59:59 CET 2030]
  Issuer: CN=DigiCert High Assurance EV Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
  SerialNumber: [    0667035b bb14fd63 afc0d6a8 534efe16]

Certificate Extensions: 8
[1]: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false
AuthorityInfoAccess [
  [
   accessMethod: ocsp
   accessLocation: URIName: http://ocsp.digicert.com
, 
   accessMethod: caIssuers
   accessLocation: URIName: http://cacerts.digicert.com/DigiCertHighAssuranceEVRootCA.crt
]
]

[2]: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: B1 3E C3 69 03 F8 BF 47   01 D4 98 26 1A 08 02 EF  .>.i...G...&....
0010: 63 64 2B C3                                        cd+.
]
]

[3]: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:0
]

[4]: ObjectId: 2.5.29.31 Criticality=false
CRLDistributionPoints [
  [DistributionPoint:
     [URIName: http://crl3.digicert.com/DigiCertHighAssuranceEVRootCA.crl]
]]

[5]: ObjectId: 2.5.29.32 Criticality=false
CertificatePolicies [
  [CertificatePolicyId: [2.23.140.1.2.2]
[]  ]
  [CertificatePolicyId: [2.23.140.1.2.3]
[]  ]
  [CertificatePolicyId: [2.23.140.1.1]
[]  ]
  [CertificatePolicyId: [2.23.140.1.2.1]
[]  ]
]

[6]: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  serverAuth
  clientAuth
]

[7]: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_CertSign
  Crl_Sign
]

[8]: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 50 61 A6 A0 D2 35 C4 11   2A 20 8D 1F 0F AC 42 F0  Pa...5..* ....B.
0010: CD 29 CF 4B                                        .).K
]
]

]
  Algorithm: [SHA256withRSA]
  Signature:
0000: 73 10 1F C8 61 88 17 CD   6F 1C 04 C3 16 DB 4C 09  s...a...o.....L.
0010: EE 8C FC 94 87 FA 22 D0   9A DF 64 8D EE F4 9B A2  ......"...d.....
0020: 2E A7 1A EF 6D 03 E9 FA   12 FC 00 79 FB 81 08 C6  ....m......y....
0030: 99 BB 08 C1 B8 31 D3 7F   97 BA 00 88 38 A9 68 23  .....1......8.h#
0040: EF 98 E9 A9 61 4A 67 4F   B0 3A DC 2A F4 AB 88 3C  ....aJgO.:.*...<
0050: E2 B2 35 66 67 6A 03 8D   25 55 45 1F EA A0 BA 13  ..5fgj..%UE.....
0060: 7E 2D 0B BD EA 0D 01 7C   4C 94 AB 7E C7 16 15 D0  .-......L.......
0070: A5 45 74 7D 27 84 06 AE   46 76 54 D3 12 0F 39 43  .Et.'...FvT...9C
0080: 47 35 82 68 0F 79 31 F3   BC C7 4D 65 F9 97 68 A5  G5.h.y1...Me..h.
0090: D1 3C 16 F3 3B F2 01 9D   E3 3C 5E 59 BF 2F F7 DD  .<..;....<^Y./..
00A0: 7E 98 1C 53 0D EA 6A 2A   EC BF 8C 5E 51 9B A0 61  ...S..j*...^Q..a
00B0: 7F 1A F7 DC 00 D1 B3 AD   2C D6 DD 7A 76 D6 77 A4  ........,..zv.w.
00C0: E6 0B 00 B0 53 3C 3E 4A   85 9E 9A FB F7 64 E5 D9  ....S<>J.....d..
00D0: E1 E9 CE 0F 69 E6 50 60   15 00 87 E1 AE C5 F6 81  ....i.P`........
00E0: 95 4E 2A 43 C1 2D 8C 13   02 40 7A DE 30 8C 17 1D  .N*C.-...@z.0...
00F0: 81 D6 E4 54 58 1A 38 11   E0 D3 2E 68 8C 36 8C 3D  ...TX.8....h.6.=

]

and a different one but same use case:

[
[
  Version: V3
  Subject: CN=ISRG Root X1, O=Internet Security Research Group, C=US
  Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11

  Key:  Sun RSA public key, 4096 bits
  modulus: 709477870415445373015359016562426660610553770685944520893298396600226760899977879191004898543350831842119174188613678136510262472550532722234131754439181090009824131001234702144200501816519311599904090606194984753842587622398776018408050245574116028550608708896478977104703101364577377554823893350339376892984086676842821506637376561471221178677513035811884589888230947855482554780924844280661412982827405878164907670403886160896655313460186264922042760067692235383478494519985672059698752915965998412445946254227413232257276525240006651483130792248112417425846451951438781260632137645358927568158361961710185115502577127010922344394993078948994750404287047493247048147066090211292167313905862438457453781042040498702821432013765502024105065778257759178356925494156447570322373310256999609083201778278588599854706241788119448943034477370959349516873162063461521707809689839710972753590949570167489887658749686740890549110678989462474318310617765270337415238713770800711236563610171101328052424145478220993016515262478543813796899677215192789612682845145008993144513547444131126029557147570005369943143213525671105288817016183804256755470528641042403865830064493168693765438364296560479053823886598989258655438933191724193029337334607
  public exponent: 65537
  Validity: [From: Wed Jan 20 20:14:03 CET 2021,
               To: Mon Sep 30 20:14:03 CEST 2024]
  Issuer: CN=DST Root CA X3, O=Digital Signature Trust Co.
  SerialNumber: [    40017721 37d4e942 b8ee76aa 3c640ab7]

Certificate Extensions: 7
[1]: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false
AuthorityInfoAccess [
  [
   accessMethod: caIssuers
   accessLocation: URIName: http://apps.identrust.com/roots/dstrootcax3.p7c
]
]

[2]: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: C4 A7 B1 A4 7B 2C 71 FA   DB E1 4B 90 75 FF C4 15  .....,q...K.u...
0010: 60 85 89 10                                        `...
]
]

[3]: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

[4]: ObjectId: 2.5.29.31 Criticality=false
CRLDistributionPoints [
  [DistributionPoint:
     [URIName: http://crl.identrust.com/DSTROOTCAX3CRL.crl]
]]

[5]: ObjectId: 2.5.29.32 Criticality=false
CertificatePolicies [
  [CertificatePolicyId: [2.23.140.1.2.1]
[]  ]
  [CertificatePolicyId: [1.3.6.1.4.1.44947.1.1.1]
[PolicyQualifierInfo: [
  qualifierID: 1.3.6.1.5.5.7.2.1
  qualifier: 0000: 16 22 68 74 74 70 3A 2F   2F 63 70 73 2E 72 6F 6F  ."http://cps.roo
0010: 74 2D 78 31 2E 6C 65 74   73 65 6E 63 72 79 70 74  t-x1.letsencrypt
0020: 2E 6F 72 67                                        .org

]]  ]
]

[6]: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  Key_CertSign
  Crl_Sign
]

[7]: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 79 B4 59 E6 7B B6 E5 E4   01 73 80 08 88 C8 1A 58  y.Y......s.....X
0010: F6 E9 9B 6E                                        ...n
]
]

]
  Algorithm: [SHA256withRSA]
  Signature:
0000: 0A 73 00 6C 96 6E FF 0E   52 D0 AE DD 8C E7 5A 06  .s.l.n..R.....Z.
0010: AD 2F A8 E3 8F BF C9 0A   03 15 50 C2 E5 6C 42 BB  ./........P..lB.
0020: 6F 9B F4 B4 4F C2 44 88   08 75 CC EB 07 9B 14 62  o...O.D..u.....b
0030: 6E 78 DE EC 27 BA 39 5C   F5 A2 A1 6E 56 94 70 10  nx..'.9...nV.p.
0040: 53 B1 BB E4 AF D0 A2 C3   2B 01 D4 96 F4 C5 20 35  S.......+..... 5
0050: 33 F9 D8 61 36 E0 71 8D   B4 B8 B5 AA 82 45 95 C0  3..a6.q......E..
0060: F2 A9 23 28 E7 D6 A1 CB   67 08 DA A0 43 2C AA 1B  ..#(....g...C,..
0070: 93 1F C9 DE F5 AB 69 5D   13 F5 5B 86 58 22 CA 4D  ......i]..[.X".M
0080: 55 E4 70 67 6D C2 57 C5   46 39 41 CF 8A 58 83 58  U.pgm.W.F9A..X.X
0090: 6D 99 FE 57 E8 36 0E F0   0E 23 AA FD 88 97 D0 E3  m..W.6...#......
00A0: 5C 0E 94 49 B5 B5 17 35   D2 2E BF 4E 85 EF 18 E0  ..I...5...N....
00B0: 85 92 EB 06 3B 6C 29 23   09 60 DC 45 02 4C 12 18  ....;l)#.`.E.L..
00C0: 3B E9 FB 0E DE DC 44 F8   58 98 AE EA BD 45 45 A1  ;.....D.X....EE.
00D0: 88 5D 66 CA FE 10 E9 6F   82 C8 11 42 0D FB E9 EC  .]f....o...B....
00E0: E3 86 00 DE 9D 10 E3 38   FA A4 7D B1 D8 E8 49 82  .......8......I.
00F0: 84 06 9B 2B E8 6B 4F 01   0C 38 77 2E F9 DD E7 39  ...+.kO..8w....9

]

Any idea if that would be possible?

Advertisement

Answer

You might use a capture group

bAuthorityInfoAccesss*[s*[(?:R(?!.*baccessMethod: caIssuers).*)*R.*baccessMethod: caIssuersRs*accessLocation: URIName:s*(https?://S+)
  • bAuthorityInfoAccesss*[s*[
  • (?: Non capture group
    • R(?!.*baccessMethod: caIssuers).* Match a newline and and the whole line if it does not contain accessMethod: caIssuers
  • )* Close group and repeat 0+ times
  • R.*baccessMethod: caIssuersR Match a newline and accessMethod: caIssuers at the end of the line followed by a newline
  • s*accessLocation: URIName:s* Match accessLocation: URIName: between optional whitespace chars
  • (https?://S+) Capture group 1, match the url starting with http

Regex demo | Java demo

Advertisement