I am having HTMl Form in email body, how can i read string text content after converting HTML FORM into text. Can Anyone please help me?
Email Body – HTML Form Content:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="Generator" content="Microsoft Word 15 (filtered medium)"> <!--[if !mso]><style>v:* {behavior:url(#default#VML);} o:* {behavior:url(#default#VML);} w:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style><![endif]--><style><!-- /* Font Definitions */ @font-face {font-family:Helvetica; panose-1:2 11 6 4 2 2 2 2 2 4;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:"Century Gothic"; panose-1:2 11 5 2 2 2 2 2 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri",sans-serif;} span.EmailStyle17 {mso-style-type:personal-compose;} .MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> </head> <body lang="EN-US" link="#0563C1" vlink="#954F72"> <div class="WordSection1"> <p class="MsoNormal"><img width="809" height="364" style="width:8.427in;height:3.7916in" id="Picture_x0020_4" src="cid:image001.jpg@01D609B1.5BB77760"><o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">Non-Contacted/Non-Qualified Leads from Ateco: <o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="1553" style="width:1165.0pt;margin-left:-.15pt;border-collapse:collapse"> <tbody> <tr style="height:15.0pt"> <td width="167" nowrap="" valign="bottom" style="width:125.0pt;border:solid #8EA9DB 1.0pt;border-right:none;background:#4472C4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Name<o:p></o:p></span></b></p> </td> <td width="111" nowrap="" valign="bottom" style="width:83.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background ;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Mobile<o:p></o:p></span></b></p> </td> <td width="259" nowrap="" valign="bottom" style="width:194.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun 4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Email<o:p></o:p></span></b></p> </td> <td width="103" nowrap="" valign="bottom" style="width:77.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background ;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Postal Code<o:p></o:p></span></b></p> </td> <td width="109" nowrap="" valign="bottom" style="width:82.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background ;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Enquiry Date<o:p></o:p></span></b></p> </td> <td width="239" nowrap="" valign="bottom" style="width:179.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun 4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Lead Source<o:p></o:p></span></b></p> </td> <td width="261" nowrap="" valign="bottom" style="width:196.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun 4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Dealer<o:p></o:p></span></b></p> </td> <td width="93" nowrap="" valign="bottom" style="width:70.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background: padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Date Sent <o:p></o:p></span></b></p> </td> <td width="212" nowrap="" valign="bottom" style="width:159.0pt;border:solid #8EA9DB 1.0pt;border-left:none;background:#4472C4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Preferred Model<o:p></o:p></span></b></p> </td> </tr> <tr style="height:15.0pt"> <td width="167" nowrap="" valign="bottom" style="width:125.0pt;border-top:none;border-left:solid #8EA9DB 1.0pt;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun 2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">Test Justin<o:p></o:p></span></p> </td> <td width="111" nowrap="" valign="bottom" style="width:83.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">+61 420 888 999<o:p></o:p></span></p> </td> <td width="259" nowrap="" valign="bottom" style="width:194.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">testmail@hotmail.com<o:p></o:p></span></p> </td> <td width="103" nowrap="" valign="bottom" style="width:77.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">4218<o:p></o:p></span></p> </td> <td width="109" nowrap="" valign="bottom" style="width:82.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">31-03-20<o:p></o:p></span></p> </td> <td width="239" nowrap="" valign="bottom" style="width:179.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">LDV Facebook - Book a Test Drive<o:p></o:p></span></p> </td> <td width="261" nowrap="" valign="bottom" style="width:196.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">QLD - Von Bibra Gold Coast - 554216<o:p></o:p></span></p> </td> <td width="93" nowrap="" valign="bottom" style="width:70.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">03-04-20<o:p></o:p></span></p> </td> <td width="212" nowrap="" valign="bottom" style="width:159.0pt;border-top:none;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:solid #8EA9DB 1.0pt;backgroun 2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt"> <p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">T60 4WD Diesel Dual Cab Ute<o:p></o:p></span></p> </td> </tr> </tbody> </table> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">Thank you,<o:p></o:p></p> <p class="MsoNormal">Anna<o:p></o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><b><span lang="EN-AU" style="color:black;mso-fareast-language:EN-AU">Anna Tupou</span></b><span lang="EN-AU" style="font-family:"Helvetica",s f;color:black;mso-fareast-language:EN-AU"><o:p></o:p></span></p> <p class="MsoNormal"><span lang="EN-AU" style="color:black;mso-fareast-language:EN-AU">Call Centre Supervisor ÔÇô Lead Management</span><span lang="EN-AU" style="font-famil Helvetica",sans-serif;color:black;mso-fareast-language:EN-AU"><o:p></o:p></span></p> <p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:black;mso-fareast-language:EN-AU"><br> </span><b><span lang="EN-AU" style="font-family:"Century Gothic",sans-serif;color:black;mso-fareast-language:EN-AU"><img width="294" height="34" style="width:3.06 ght:.3541in" id="_x0038_11B48E0-2644-4F0E-A8FF-F2DD7ECD462F" src="cid:image002.jpg@01D609B1.5BB77760" alt="cid:BD091752-D740-4B3A-B050-FF52A328E5C8"></span></b><b><span lan " style="font-family:"Century Gothic",sans-serif;color:black;mso-fareast-language:EN-AU"><o:p></o:p></span></b></p> <p class="MsoNormal"><span lang="EN-AU" style="mso-fareast-language:EN-AU"><o:p> </o:p></span></p> <p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:black;mso-fareast-language:EN-AU">2A Hill Rd Lidcombe NSW 2141 Australia</span><span lang="EN-AU" styl size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU"> <o:p></o:p></span></p> <p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">P</span><span lang="EN-AU" style="font-size:10.5pt;color:black;mso -language:EN-AU"> ÔÇé+61 2 8577 8097ÔÇé</span><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">|</span><span lang="EN-AU" style="fon 0.5pt;color:black;mso-fareast-language:EN-AU">ÔÇé</span><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU"> E</span><span lang="EN-AU" style="font-size:10.5pt;color:black;mso-fareast-language:EN-AU">ÔÇé</span><u><span lang="EN-AU" style="font-size:10.5pt;color:blue;mso-fareast-l EN-AU">atupou@ateco.com.au<o:p></o:p></span></u></p> <p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">M </span><span lang="EN-AU" style="font-size:10.5pt;mso-fareast-language:EN-AU">0407 588 506<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:10.0pt"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size:10.0pt"><o:p> </o:p></span></p> </div> <div> <p><b><span style="font-size:13.5pt;font-family:webdings;color:green">P</span> <span style="font-size: 7.5pt;font-family:"Arial","sans-serif";color:gree <i>: Please consider the environment before printing this e-mail. </span></i></b></p> <p id="disclaimer-input" style="font-family: Helvetica,Arial,sans-serif; color: gray; font-size: 7.5pt;" class="txt"> IMPORTANT NOTICE: If this e-mail is received by other than the named addressee, please notify us immediately by telephone or return e-mail and delete all copies from your c system. This document contains information proprietary to Ateco Group and its affiliates or third parties to which Ateco may have a legal obligation to protect such information from unauthorised disclosure, use or duplication. Any disclosure, use or tion of this document or the information contained herein for other than the specific purpose for which it was disclosed by Ateco is expressly prohibited. It is the recipient's responsibility to check this message and attachments for viruses. </p> <div></div> </div> </body> </html> <br> <p>┬á</p> <p align="center" style="text-align:center">**********Disclaimer**********</p> <p style="text-align:justify">"This email and any attachments are confidential and are for the intended addressee[s] only. Unauthorised use of this communication is prohibited. If you have received this communication in error, please notify the sender and remove them from your system. Confidentiality is not waived or lost by reason of the mistaken delivery to you. Please scan this email and any attachment(s) for viruses. It is your responsibility to check them before opening" </p> <p align="center" style="text-align:center">********End of Disclaimer*********</p>
String Content after convertion (Email Body):
Non-Contacted/Non-Qualified Leads from Ateco: Name Mobile Email Postal Code Enquiry Date Lead Source Dealer Date Sent Preferred Model Test Justin +61 420 888 999 testmail@hotmail.com 4218 31-03-20 LDV Facebook - Book a Test Drive QLD - Von Bibra Gold Coast - 554216 03-04-20 T60 4WD Diesel Dual Cab Ute Thank you, Anna Anna Tupou Call Centre Supervisor ÔÇô Lead Management 2A Hill Rd Lidcombe NSW 2141 Australia P ÔÇé+61 2 8577 8097ÔÇé|ÔÇé EÔÇéatupou@ateco.com.au M 0407 588 506 P : Please consider the environment before printing this e-mail. IMPORTANT NOTICE: If this e-mail is received by other than the named addressee, please notify us immediately by telephone or return e-mail and delete all copies from your computer system. This document contains information proprietary to Ateco Group and its affiliates or third parties to which Ateco may have a legal obligation to protect such information fro m unauthorised disclosure, use or duplication. Any disclosure, use or duplication of this document or the information contained herein for other than the specific purpose for which it was disclosed by Ateco is expressly prohibited. It is the recipient's responsibility to check this message and attachments for viruses. **********Disclaimer********** "This email and any attachments are confidential and are for the intended addressee[s] only. Unauthorised use of this communication is prohibited. If you have received this communi cation in error, please notify the sender and remove them from your system. Confidentiality is not waived or lost by reason of the mistaken delivery to you. Please scan this email and any attachment(s) for viruses. It is your responsibility to check them before opening" ********End of Disclaimer*********
Note: I need to make exact key value pair for eg. Postal Code : 4218 .
Advertisement
Answer
You can use any DOM Parser library to parse your HTML. You can simply give id to any HTML tag and then you can get that element. I’ll suggest you Jsoup library.
Use below code using Jsoup library
Add ID in HTML
<p id="POSTAL_CODE">4218</p>
Java Code
Document doc = Jsoup.parse(htmlString); Element elPostalCode = doc.getElementById("POSTAL_CODE"); String postalCode = elPostalCode.text();
You can also use attribute extraction for your HTML, for more information about attribute extraction using Jsoup you can visit this page.
. .
For more information, you can refer this article, in this article they have mentioned multiple HTML parsing libraries for multiple programming languages.
Code for your exact problem
.
NOTE: This code will only work when you will be having exact same number of p tags in each row including the header.
Document doc = Jsoup.parse(htmlString); List<String> keys = new ArrayList<>(); List<Map<String, String>> dataPairs = new ArrayList<>(); Elements trElements = doc.getElementsByTag("tr"); for (int i = 0; i < trElements.size(); i++) { Element element = trElements.get(i); Elements pElements = element.getElementsByTag("p"); Map<String, String> map = new HashMap<>(); for (int i1 = 0; i1 < pElements.size(); i1++) { Element p = pElements.get(i1); if (i == 0) { keys.add(p.text()); } else { map.put(keys.get(i1), p.text()); } } dataPairs.add(map); }