Monday, January 22, 2007

Strip HTML Tags

It's trivial really...but regexp's are a bitch.

#!/usr/bin/ruby -n
puts $_.gsub(/<\/?[^>]*>/, "")

3 comments:

hollaco said...

Thanks! I'll be using this for stripping <font> tags for HTML highlighted source code in Vim. I can now select the HTML source, and then go, :!killhtml

to get rid of those tags and focus on the content UNDER the html code.

Unknown said...

If you work in Ruby on Rails, you can use a text helper to accomplish this:

http://api.rubyonrails.com/classes/ActionView/Helpers/TextHelper.html#M000633

Note that this uses a tokenizer which may be more effective than your regex which has problems with things like:

input type="text" value="a=>b"

(leading/trailing "<" ">" marks removed to reduce tag stripping by blog comment engine)

SoccerShoutPhil said...

Priceless - thanks!