To read String text content after Converting HTMl to text in java

Tags: , , ,



I am having HTMl Form in email body, how can i read string text content after converting HTML FORM into text. Can Anyone please help me?

Email Body – HTML Form: OldImage NewImage

Email Body – HTML Form Content:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v:* {behavior:url(#default#VML);}
o:* {behavior:url(#default#VML);}
w:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:"Century Gothic";
        panose-1:2 11 5 2 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle17
        {mso-style-type:personal-compose;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><img width="809" height="364" style="width:8.427in;height:3.7916in" id="Picture_x0020_4" src="cid:image001.jpg@01D609B1.5BB77760"><o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">Non-Contacted/Non-Qualified Leads from Ateco:&nbsp; <o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="1553" style="width:1165.0pt;margin-left:-.15pt;border-collapse:collapse">
<tbody>
<tr style="height:15.0pt">
<td width="167" nowrap="" valign="bottom" style="width:125.0pt;border:solid #8EA9DB 1.0pt;border-right:none;background:#4472C4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Name<o:p></o:p></span></b></p>
</td>
<td width="111" nowrap="" valign="bottom" style="width:83.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background
;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Mobile<o:p></o:p></span></b></p>
</td>
<td width="259" nowrap="" valign="bottom" style="width:194.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun
4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Email<o:p></o:p></span></b></p>
</td>
<td width="103" nowrap="" valign="bottom" style="width:77.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background
;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Postal Code<o:p></o:p></span></b></p>
</td>
<td width="109" nowrap="" valign="bottom" style="width:82.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background
;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Enquiry Date<o:p></o:p></span></b></p>
</td>
<td width="239" nowrap="" valign="bottom" style="width:179.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun
4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Lead Source<o:p></o:p></span></b></p>
</td>
<td width="261" nowrap="" valign="bottom" style="width:196.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun
4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Dealer<o:p></o:p></span></b></p>
</td>
<td width="93" nowrap="" valign="bottom" style="width:70.0pt;border-top:solid #8EA9DB 1.0pt;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:none;background:
padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Date Sent
<o:p></o:p></span></b></p>
</td>
<td width="212" nowrap="" valign="bottom" style="width:159.0pt;border:solid #8EA9DB 1.0pt;border-left:none;background:#4472C4;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><b><span style="color:white">Preferred Model<o:p></o:p></span></b></p>
</td>
</tr>
<tr style="height:15.0pt">
<td width="167" nowrap="" valign="bottom" style="width:125.0pt;border-top:none;border-left:solid #8EA9DB 1.0pt;border-bottom:solid #8EA9DB 1.0pt;border-right:none;backgroun
2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">Test Justin<o:p></o:p></span></p>
</td>
<td width="111" nowrap="" valign="bottom" style="width:83.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">&#43;61 420 888 999<o:p></o:p></span></p>
</td>
<td width="259" nowrap="" valign="bottom" style="width:194.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">testmail@hotmail.com<o:p></o:p></span></p>
</td>
<td width="103" nowrap="" valign="bottom" style="width:77.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">4218<o:p></o:p></span></p>
</td>
<td width="109" nowrap="" valign="bottom" style="width:82.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">31-03-20<o:p></o:p></span></p>
</td>
<td width="239" nowrap="" valign="bottom" style="width:179.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">LDV Facebook - Book a Test Drive<o:p></o:p></span></p>
</td>
<td width="261" nowrap="" valign="bottom" style="width:196.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">QLD - Von Bibra Gold Coast - 554216<o:p></o:p></span></p>
</td>
<td width="93" nowrap="" valign="bottom" style="width:70.0pt;border:none;border-bottom:solid #8EA9DB 1.0pt;background:#D9E1F2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">03-04-20<o:p></o:p></span></p>
</td>
<td width="212" nowrap="" valign="bottom" style="width:159.0pt;border-top:none;border-left:none;border-bottom:solid #8EA9DB 1.0pt;border-right:solid #8EA9DB 1.0pt;backgroun
2;padding:0in 5.4pt 0in 5.4pt;height:15.0pt">
<p class="MsoNormal" align="center" style="text-align:center"><span style="color:black">T60 4WD Diesel Dual Cab Ute<o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal">Thank you,<o:p></o:p></p>
<p class="MsoNormal">Anna<o:p></o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p class="MsoNormal"><b><span lang="EN-AU" style="color:black;mso-fareast-language:EN-AU">Anna Tupou</span></b><span lang="EN-AU" style="font-family:&quot;Helvetica&quot;,s
f;color:black;mso-fareast-language:EN-AU"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-AU" style="color:black;mso-fareast-language:EN-AU">Call Centre Supervisor ÔÇô Lead Management</span><span lang="EN-AU" style="font-famil
Helvetica&quot;,sans-serif;color:black;mso-fareast-language:EN-AU"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:black;mso-fareast-language:EN-AU"><br>
</span><b><span lang="EN-AU" style="font-family:&quot;Century Gothic&quot;,sans-serif;color:black;mso-fareast-language:EN-AU"><img width="294" height="34" style="width:3.06
ght:.3541in" id="_x0038_11B48E0-2644-4F0E-A8FF-F2DD7ECD462F" src="cid:image002.jpg@01D609B1.5BB77760" alt="cid:BD091752-D740-4B3A-B050-FF52A328E5C8"></span></b><b><span lan
" style="font-family:&quot;Century Gothic&quot;,sans-serif;color:black;mso-fareast-language:EN-AU"><o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-AU" style="mso-fareast-language:EN-AU"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:black;mso-fareast-language:EN-AU">2A Hill Rd Lidcombe NSW 2141 Australia</span><span lang="EN-AU" styl
size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">P</span><span lang="EN-AU" style="font-size:10.5pt;color:black;mso
-language:EN-AU"> ÔÇé&#43;61 2 8577 8097ÔÇé</span><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">|</span><span lang="EN-AU" style="fon
0.5pt;color:black;mso-fareast-language:EN-AU">ÔÇé</span><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">
 E</span><span lang="EN-AU" style="font-size:10.5pt;color:black;mso-fareast-language:EN-AU">ÔÇé</span><u><span lang="EN-AU" style="font-size:10.5pt;color:blue;mso-fareast-l
EN-AU">atupou@ateco.com.au<o:p></o:p></span></u></p>
<p class="MsoNormal"><span lang="EN-AU" style="font-size:10.5pt;color:#005CFB;mso-fareast-language:EN-AU">M&nbsp;
</span><span lang="EN-AU" style="font-size:10.5pt;mso-fareast-language:EN-AU">0407 588 506<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p><b><span style="font-size:13.5pt;font-family:webdings;color:green">P</span> <span style="font-size: 7.5pt;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;color:gree
<i>: Please consider the environment before printing this e-mail. </span></i></b></p>
<p id="disclaimer-input" style="font-family: Helvetica,Arial,sans-serif; color: gray; font-size: 7.5pt;" class="txt">
IMPORTANT NOTICE: If this e-mail is received by other than the named addressee, please notify us immediately by telephone or return e-mail and delete all copies from your c
system. This document contains information proprietary to Ateco Group and its
 affiliates or third parties to which Ateco may have a legal obligation to protect such information from unauthorised disclosure, use or duplication. Any disclosure, use or
tion of this document or the information contained herein for other than the
 specific purpose for which it was disclosed by Ateco is expressly prohibited. It is the recipient's responsibility to check this message and attachments for viruses.
</p>
<div></div>
</div>
</body>
</html>

<br>
<p> </p>

<p align="center" style="text-align:center">**********Disclaimer**********</p>

<p style="text-align:justify">&quot;This email and any
attachments are confidential and are for the intended addressee[s] only.
Unauthorised use of this communication is prohibited. If you have received this
communication in error, please notify the sender and remove them from your
system. Confidentiality is not waived or lost by reason of the mistaken
delivery to you. Please scan this email and any attachment(s) for viruses. It
is your responsibility to check them before opening&quot; </p>

<p align="center" style="text-align:center">********End of
Disclaimer*********</p>

String Content after convertion (Email Body):

Non-Contacted/Non-Qualified Leads from Ateco:

Name
Mobile
Email
Postal Code
Enquiry Date
Lead Source
Dealer
Date Sent
Preferred Model
Test Justin
+61 420 888 999
testmail@hotmail.com
4218
31-03-20
LDV Facebook - Book a Test Drive
QLD - Von Bibra Gold Coast - 554216
03-04-20
T60 4WD Diesel Dual Cab Ute

Thank you,
Anna


Anna Tupou
Call Centre Supervisor ÔÇô Lead Management


2A Hill Rd Lidcombe NSW 2141 Australia
P ÔÇé+61 2 8577 8097ÔÇé|ÔÇé EÔÇéatupou@ateco.com.au
M 0407 588 506


P : Please consider the environment before printing this e-mail.
IMPORTANT NOTICE: If this e-mail is received by other than the named addressee, please notify us immediately by telephone or return e-mail and delete all copies from your computer
system. This document contains information proprietary to Ateco Group and its affiliates or third parties to which Ateco may have a legal obligation to protect such information fro
m unauthorised disclosure, use or duplication. Any disclosure, use or duplication of this document or the information contained herein for other than the specific purpose for which
 it was disclosed by Ateco is expressly prohibited. It is the recipient's responsibility to check this message and attachments for viruses.


**********Disclaimer**********
"This email and any attachments are confidential and are for the intended addressee[s] only. Unauthorised use of this communication is prohibited. If you have received this communi
cation in error, please notify the sender and remove them from your system. Confidentiality is not waived or lost by reason of the mistaken delivery to you. Please scan this email
and any attachment(s) for viruses. It is your responsibility to check them before opening"
********End of Disclaimer*********

Note: I need to make exact key value pair for eg. Postal Code : 4218 .

Answer

You can use any DOM Parser library to parse your HTML. You can simply give id to any HTML tag and then you can get that element. I’ll suggest you Jsoup library.

Use below code using Jsoup library

Add ID in HTML

<p id="POSTAL_CODE">4218</p>

Java Code

Document doc = Jsoup.parse(htmlString);

Element elPostalCode = doc.getElementById("POSTAL_CODE");
String postalCode = elPostalCode.text();

You can also use attribute extraction for your HTML, for more information about attribute extraction using Jsoup you can visit this page.

. .

For more information, you can refer this article, in this article they have mentioned multiple HTML parsing libraries for multiple programming languages.

Code for your exact problem

.

NOTE: This code will only work when you will be having exact same number of p tags in each row including the header.

Document doc = Jsoup.parse(htmlString);

List<String> keys = new ArrayList<>();
List<Map<String, String>> dataPairs = new ArrayList<>();

Elements trElements = doc.getElementsByTag("tr");

    for (int i = 0; i < trElements.size(); i++) {
    Element element = trElements.get(i);
    Elements pElements = element.getElementsByTag("p");

    Map<String, String> map = new HashMap<>();
    for (int i1 = 0; i1 < pElements.size(); i1++) {
        Element p = pElements.get(i1);
        if (i == 0) {
            keys.add(p.text());
        } else {
            map.put(keys.get(i1), p.text());
        }
    }
    dataPairs.add(map);
}


Source: stackoverflow