Last year I mentioned a nifty little command line tool called jq. It had a very specific purpose: make it easy to process JSON. That same line of thinking inspired pup, from Eric Chiang:
pup is a command line tool for processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors.
Here is a little taste of how this works:
[sourcecode]
curl -s https://wordpress.com/ |
pup link
[/sourcecode]
That will return all of the link
elements on the WordPress.com home page. You can easily tighten it down to just the shortlink:
[sourcecode]
curl -s https://wordpress.com |
pup link[rel=shortlink]
<link rel="shortlink" href="http://wp.me/1">
[/sourcecode]
One final step further, get just the href
value:
[sourcecode]
curl -s https://wordpress.com |
pup link[rel=shortlink] attr{href}
http://wp.me/
[/sourcecode]
I like it.
Pup is open source, written in go and available at https://github.com/EricChiang/pup.
2 replies on “pup, Command Line HTML Processor”
I love jq and did not know about pupl. jq is one of the “triggers” that helped us release this http://blog.superfeedr.com/json-path/
Being able to subscribe to specific fragments is really nice.