Aboo Bolaky

{.NET, C#, Sitecore ...} Free your mind...

Best way to parse HTML content

clock September 22, 2009 23:26 by author Aboo Bolaky


I'm going to keep this short and simple..rather short actually.

There is no better way to parse HTML other than using HtmlAgility Pack.

It's a lot simpler than Regex..which is a BIG no no!!

 

Currently rated 3.0 by 5 people

  • Currently 3/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Comments

29. September 2009 09:36

Colin McClure

Regex is cool!

Colin McClure

3. October 2009 15:35

Aboo Bolaky

@Colin McClure
Regex is NOT the right tool parsing HTML. Because Regular Expressions are an awfully complex subject (and even books dedicated to it), it is highly unlikely for someone to use Regex in this scenario since:
1 : It will take you a lot of time to write your monster Regex expression
2: Regex does not deal very well with nested data.

Aboo Bolaky

8. November 2009 20:21

Colin McClure

Aboo,

You are right to a point however possibly the best tool for the job is "HTML Agility Pack" found on codeplex @ http://www.codeplex.com/htmlagilitypack.

Regex can deal with nested content using concept of groups however I take your point on complexity. I am a Regex Ninja :-)

Colin McClure

Add comment


(Will show your Gravatar )  

  Country flag

biuquote
  • Comment
  • Preview
Loading



A b o u t M e

Annoying

Brilliant

Open and

Objective

in every way..
Only Human >>
 
"First learn computer science and all the theory.

Next develop a programming style.

Then forget all that and just hack." Carrette (1990)