Monday, January 22, 2007

Strip HTML Tags

It's trivial really...but regexp's are a bitch.

#!/usr/bin/ruby -n
puts $_.gsub(/<\/?[^>]*>/, "")

3 comments:

  1. Thanks! I'll be using this for stripping <font> tags for HTML highlighted source code in Vim. I can now select the HTML source, and then go, :!killhtml

    to get rid of those tags and focus on the content UNDER the html code.

    ReplyDelete
  2. If you work in Ruby on Rails, you can use a text helper to accomplish this:

    http://api.rubyonrails.com/classes/ActionView/Helpers/TextHelper.html#M000633

    Note that this uses a tokenizer which may be more effective than your regex which has problems with things like:

    input type="text" value="a=>b"

    (leading/trailing "<" ">" marks removed to reduce tag stripping by blog comment engine)

    ReplyDelete