remove HTML tags in string

How to remove HTML tags in content or string?

We can remove HTML tags in string; it is little bit easy using Regex in .net. Here is the code for removing HTML tags for content or string. Code is in C#.net.

First import namespace called System.Text.RegularExpressions.
Code:-

using System.Text.RegularExpressions;

Then create one method like this below

Code :-


#region reusable regex's
protected static Regex htmlRegex = new Regex("<[^>]+>|\ \;", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled);
protected static Regex inlineTextHtmlRegex = new Regex("<!-- ]*(?.*? // -->|<!-- ]*(?.*? -->", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled);
protected static Regex spacer = new Regex(@"s{2,}", RegexOptions.Compiled);
#endregion


public static string RemoveHtml(string html)
{
if(string.IsNullOrEmpty(html))
return string.Empty;
string nonhtml = spacer.Replace(htmlRegex.Replace(inlineTextHtmlRegex.Replace(html, ""), " ").Trim(), " ");
return nonhtml;
}

We can call this method like this below

Code:-

string htmlStr=”<a href=’http://shareourideas.wordpress.com/’ title=’Share our ideas’> share our ideas</a>”;
sring textOnly=RemoveHtml(htmlStr);

It just returns share our ideas.

Enjoy while coding..!

Thanks,
Naga Harish.

One thought on “remove HTML tags in string

Leave a Reply