Categories
Posts

pup, Command Line HTML Processor

Last year I mentioned a nifty little command line tool called jq. It had a very specific purpose: make it easy to process JSON. That same line of thinking inspired pup, from Eric Chiang:

pup is a command line tool for processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors.

Here is a little taste of how this works:

[sourcecode]
curl -s https://wordpress.com/ |
pup link
[/sourcecode]

That will return all of the link elements on the WordPress.com home page. You can easily tighten it down to just the shortlink:

[sourcecode]
curl -s https://wordpress.com |
pup link[rel=shortlink]

<link rel="shortlink" href="http://wp.me/1">
[/sourcecode]

One final step further, get just the href value:

[sourcecode]
curl -s https://wordpress.com |
pup link[rel=shortlink] attr{href}

http://wp.me/
[/sourcecode]

I like it.

Pup is open source, written in go and available at https://github.com/EricChiang/pup.