sed and Multi-Line Search and Replace

by filosofo. Posted on April 26, 2008 at 1:21 pm

I’ve been experimenting with getting regular expression patterns to match over multiple lines using sed. For example, one might want to change

<p>previous text</p>
<h2>
<a href="http://some-link.com">A title here</a>
</h2>
<p>following text</p>

to

<p>previous text</p>
No title here
<p>following text</p>

sed cycles through each line of input one line at a time, so the most obvious way to match a pattern that extends over several lines is to concatenate all the lines into what is called sed’s “hold space,” then look for the pattern in that (long) string. That’s what I do in the following lines:

#!/bin/sh
sed -n '
# if the first line copy the pattern to the hold buffer
1h
# if not the first line then append the pattern to the hold buffer
1!H
# if the last line then ...
$ {
        # copy from the hold to the pattern buffer
        g
        # do the search and replace
        s/<h2.*</h2>/No title here/g
        # print
        p
}
'
sample.php > sample-edited.php;

A more compact version:


sed -n '1h;1!H;${;g;s/<h2.*</h2>/No title here/g;p;}' sample.php > sample-edited.php;
 

As far as I can tell, that’s the most efficient way to match general multi-line patterns. I initially thought it might be more efficient not to keep the complete input in the hold buffer, so I modified the algorithm to print out the string whenever a match is found:


#!/bin/sh
sed -n '1h
1!{
        # if the sought-after regex is not found, append the pattern space to hold space
        /<h2.*</h2>/ !H
        # copy hold space into pattern space
        g
        # if the regex is found, then...
        /<h2.*</h2>/ {
                # the regular expression
                s/<h2.*</h2>/No title here/g
                # print
                p
                # read the next line into the pattern space
                n
                # copy the pattern space into the hold space
                h
        }
        # copy pattern buffer into hold buffer
        h
}
# if the last line then print
$p
'
sample.php > sample-edited.php;
 

In the last example, sed concatenates lines only until it finds a match, and then it prints the line (after substituting the text). Then, it starts again to concatenate the following lines.

However, that approach is usually massively inefficient, as the regex work increases logarithmically. Unless a sed guru can point out a better way, I’m going to continue using the first approach.

I’ve put the following script, which I call “sedml,” for sed multi-line, in my bash path.

#!/bin/sh
if [ "$#" -lt 2 ]
then
exit;
fi

# change the input file if no 3rd argument
if [ -z "$3" ]
then
        outputfile="$1"
else
        outputfile="$3"
fi
sed -n '
# if the first line copy the pattern to the hold buffer
1h
# if not the first line then append the pattern to the hold buffer
1!H
# if the last line then ...
$ {
        # copy from the hold to the pattern buffer
        g
        # do the search and replace
        '
"$2"'
        # print
        p
}
'
$1 > $1.tmp;
mv -f $1.tmp $outputfile;
 

So I can replace multi-line patterns in multiple files like so:

 grep -rl '<h2' * | while read i; do sedml $i "s/<h2.*</h2>/No title here/g" $i.tmp; done;

One Trackback/Pingback

  1. [...] if I wanted to wipe everything above that and substitute some include script? I’d use sed, [...]

4 Comments

  1. David Runion commented on April 30, 2009 at 11:45 am | Permalink
    David Runion

    Thank you for this. You are a philosopher and a poet.

    -David

  2. bence commented on June 8, 2009 at 4:45 pm | Permalink
    bence

    Hi, filosofo

    Maybe I misunderstood something, nut I created the following bash script:

    #!/bin/bash
    SRCH=”a\nb”
    file=”test.txt”
    sed -i.bak -n ‘
    # if the first line copy the pattern to the hold buffer
    1h
    # if not the first line then append the pattern to the hold buffer
    1!H
    # if the last line then …
    $ {
    # copy from the hold to the pattern buffer
    g
    # do the search and replace
    ‘”$SRCH”‘
    # print
    p
    }
    ‘ $file;

    …and it cannot find the pattern in test.txt:

    a
    b

    any tips?

  3. bence commented on June 8, 2009 at 4:48 pm | Permalink
    bence

    Hi, filosofo

    Maybe I misunderstood something, but I created the following bash script:

    #!/bin/bash
    SRCH=”a\nb”
    file=”test.txt”
    sed -i.bak -n ‘
    # if the first line copy the pattern to the hold buffer
    1h
    # if not the first line then append the pattern to the hold buffer
    1!H
    # if the last line then …
    $ {
    # copy from the hold to the pattern buffer
    g
    # do the search and replace
    ‘”$SRCH”‘
    # print
    p
    }
    ‘ $file;

    …and it cannot find the pattern in test.txt:

    a
    b

    any tips?

  4. anku commented on June 19, 2009 at 6:00 pm | Permalink
    anku

    >> # do the search and replace
    >> ‘”$SRCH”‘

    If you want to search and replace, do something like this :
    s/’”$SRCH”‘/xxx/

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*