Sed awesomeness and inline file inclusion.

Sed is one of my favourite tools. It goes to a pretty much every single one-liner I write. Did you know that sed is so powerful that a single sed statement can turn cat into cement? Try:

echo cat | sed statement

Today I learnt something new about it, hence my first post in a few years (after that I’ll most likely go silent for another few). I wanted to replace part of text with the content of the file the text was referring to.

In other words I’d like to turn:

blah blah INCLUDE:xxx blah blah

into

blah blah $(cat xxx) blah blah

Sed happens to have a built-in command for including files. From info sed:

`r FILENAME'
     As a GNU extension, this command accepts two addresses.

     Queue the contents of FILENAME to be read and inserted into the
     output stream at the end of the current cycle, or when the next
     input line is read.  Note that if FILENAME cannot be read, it is
     treated as if it were an empty file, without any error indication.

     As a GNU `sed' extension, the special value `/dev/stdin' is
     supported for the file name, which reads the contents of the
     standard input.

This works fine if the file name is static:

$ echo abc > f
$ echo foo REPLACEME bar | sed '/REPLACEME/ r f'
foo REPLACEME bar
abc

, however in my application I needed to use part of the matched text as a file name. So something like:

$ echo foo REPLACEME:f bar \
  | sed '/REPLACEME:\(\S\+\)/ r \1'

Unfortunately, it seems that backreferences can’t be used after regular expression is terminated (it seems so, because above does not work). I started digging in sed manual and came across this awesome flag to s/ command:

`e'
     This command allows one to pipe input from a shell command into
     pattern space.  If a substitution was made, the command that is
     found in pattern space is executed and pattern space is replaced
     with its output.  A trailing newline is suppressed; results are
     undefined if the command to be executed contains a NUL character.
     This is a GNU `sed' extension.

So, how does it work? It will apply replace and then eval the whole line in shell. This means that we should match from beginning of the line. If I wrote:

$ echo foo REPLACEME:f bar \
  | sed 's/REPLACEME:\(\S\+\)/cat \1/e'

, then my result would be “foo f bar”, which is (for most of us) not a valid command.

$ echo foo REPLACEME:f bar \
  | sed 's/REPLACEME:\(\S\+\)/cat \1/e'
sh: foo: command not found

This is not exactly what the manual says (note the command that is found in pattern space is executed part), but there is two easy workarounds.
First one would be to pre-add new line characters before and after matched pattern, but that requires removing them at later on (if you need to):

$ echo foo REPLACEME:f bar \
  | sed 's/\(REPLACEME:\S\+\)/\n\1\n/g' \
  | sed 's/REPLACEME:\(\S\+\)$/cat \1/e'
foo 
abc
 bar

We can also recreate the whole line in shell:

$ echo foo REPLACEME:f bar \
  | sed 's/^\(.*\)REPLACEME:\(\S\+\)\(.*\)$/echo "\1"`cat \2`"\3"/e'
foo abc bar

Awesome, isn’t it?

Don’t do it on the input you don’t trust!

Share
  1. No comments yet.

  1. No trackbacks yet.