Sed awesomeness and inline file inclusion.

Sed is one of my favourite tools. It goes to a pretty much every single one-liner I write. Did you know that sed is so powerful that a single sed statement can turn cat into cement? Try:

echo cat | sed statement

Today I learnt something new about it, hence my first post in a few years (after that I’ll most likely go silent for another few). I wanted to replace part of text with the content of the file the text was referring to.

In other words I’d like to turn:

blah blah INCLUDE:xxx blah blah

into

blah blah $(cat xxx) blah blah

Sed happens to have a built-in command for including files. From info sed:

`r FILENAME'
     As a GNU extension, this command accepts two addresses.

     Queue the contents of FILENAME to be read and inserted into the
     output stream at the end of the current cycle, or when the next
     input line is read.  Note that if FILENAME cannot be read, it is
     treated as if it were an empty file, without any error indication.

     As a GNU `sed' extension, the special value `/dev/stdin' is
     supported for the file name, which reads the contents of the
     standard input.

This works fine if the file name is static:

$ echo abc > f
$ echo foo REPLACEME bar | sed '/REPLACEME/ r f'
foo REPLACEME bar
abc

, however in my application I needed to use part of the matched text as a file name. So something like:

$ echo foo REPLACEME:f bar \
  | sed '/REPLACEME:\(\S\+\)/ r \1'

Unfortunately, it seems that backreferences can’t be used after regular expression is terminated (it seems so, because above does not work). I started digging in sed manual and came across this awesome flag to s/ command:

`e'
     This command allows one to pipe input from a shell command into
     pattern space.  If a substitution was made, the command that is
     found in pattern space is executed and pattern space is replaced
     with its output.  A trailing newline is suppressed; results are
     undefined if the command to be executed contains a NUL character.
     This is a GNU `sed' extension.

So, how does it work? It will apply replace and then eval the whole line in shell. This means that we should match from beginning of the line. If I wrote:

$ echo foo REPLACEME:f bar \
  | sed 's/REPLACEME:\(\S\+\)/cat \1/e'

, then my result would be “foo f bar”, which is (for most of us) not a valid command.

$ echo foo REPLACEME:f bar \
  | sed 's/REPLACEME:\(\S\+\)/cat \1/e'
sh: foo: command not found

This is not exactly what the manual says (note the command that is found in pattern space is executed part), but there is two easy workarounds.
First one would be to pre-add new line characters before and after matched pattern, but that requires removing them at later on (if you need to):

$ echo foo REPLACEME:f bar \
  | sed 's/\(REPLACEME:\S\+\)/\n\1\n/g' \
  | sed 's/REPLACEME:\(\S\+\)$/cat \1/e'
foo 
abc
 bar

We can also recreate the whole line in shell:

$ echo foo REPLACEME:f bar \
  | sed 's/^\(.*\)REPLACEME:\(\S\+\)\(.*\)$/echo "\1"`cat \2`"\3"/e'
foo abc bar

Awesome, isn’t it?

Don’t do it on the input you don’t trust!

Share

Elasticsearch – sorting on string types with more than one value per doc, or more than one token per field

We were experiencing problems with sorting results with a recent version of elasticsearch.

Typical sort on name field (consisting of both first and last names) was coming back with exception:

“Can’t sort on string types with more than one value per doc, or more than one token per field”

Other users also experienced the problem:

http://elasticsearch-users.115913.n3.nabble.com/Sorting-failing-in-latest-master-td967979.html

The solution was to add multi type mapping on the field. Initially the mapping was plain string:

curl -s -XGET 'es:9200/users/journalist/_mapping' 

{ 
    "journalist": {
        "properties": {
            "name": {
                "type": "string"
            } 
        }
    }
} 

and the search queries like below , were failing.

curl -s -XGET es:9200/users/journalist/_search -d '
{
    "sort": [ { "name" : "asc" } ], 
    "fields": ["name"],
    "query" : {
        "query_string": {
            "query": "harry",
            "fields": [ "name" ] 
        }
    }
}'

Following the documentation I’ve replaced the mapping on “name” field to be a composite of standard string type and not analyzed version of it:



curl -s -XPUT es:9200/users/journalist/_mapping -d '
{ 
    "journalist" :  { 
        "properties": {
            "name" :  {
                "type": "multi_field", 
                    "fields" : { 
                        "name": { 
                            "type" : "string" 
                        },
                        "untouched" : {
                            "type": "string",
                            "index" : "not_analyzed" 
                        }
                    }  
            }
        }
    }
}'

To verify:

curl -s -XGET 'es:9200/users/journalist/_mapping' 
{
    "journalist": {
            "name": {
                "fields": {
                    "name": {
                        "type": "string"
                    }, 
                    "untouched": {
                        "include_in_all": false, 
                        "index": "not_analyzed", 
                        "type": "string"
                    }
                }, 
                "type": "multi_field"
            }
        }
    }
}

After reindexing and changing the sort query (note the “name.untouched” part), the search went fine.

curl -s -XGET es:9200/users/journalist/_search -d '
{
    "sort": [ { "name.untouched" : "asc" } ], 
        "fields": ["name"],
        "query" : {
            "query_string": {
                "query": "harry",
                "fields": [ "name" ] 
            }
        }
}'
Share

Performance issues with Debian’s timezone database PHP patch

I have recently been debugging Apache2/PHP5 (running with mpm_prefork what turned out to be important later) server suffering for very high load spikes. Load was completely CPU related, there was almost no IO operations performed during peak times and surprisingly significant part of CPU time was spent in the kernel (about 20%) mode. I have examined Apache2 process using strace and found that there were three main groups of syscalls:

strace -p 28105 -f -c

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.15    8.867324         304     29128      3861 stat
 30.30    5.357779        5299      1011       194 access
 19.05    3.368963          18    184918      4641 lstat

Tunning FollowSymLinks and AllowOverride allowed to significantly cut down amount of lstat calls.

Before I’ve examined stat() closely, I’ve made wrong assumption that it APC (opcode/user cache that utilizes SHM or MMAP shared memory segments) is responsible for the syscalls. APC will usually issue stat() on PHP file prior to serving it’s opcode from cache. If file has been modified since APC cached it, it is sent to the parser again. This is done to allow changing code on the fly, without restarting the web server (or other means of flushing APC cache).

Since it was relatively safe to do I’ve tried reconfiguring APC to not check file’s modification dates by adding:

apc.stat = 0

Unfortunately it didn’t help much, so I’ve ended up analyzing details of syscalls. After quick analysis it turned out that occasionally I can find whole blocks (of over 600) calls to check the data in /usr/share/zoneinfo. This seemed like some sort of initialization for me, and it seemed that it’s only PHP content causing that.

[pid 10293]      0.000121 stat("/usr/share/zoneinfo/Pacific/Funafuti",
{st_mode=S_IFREG|0644, st_size=141, ...}) = 0
[pid 10293]      0.000121 stat("/usr/share/zoneinfo/Pacific/Fiji",
{st_mode=S_IFREG|0644, st_size=296, ...}) = 0
[pid 10293]      0.000122 stat("/usr/share/zoneinfo/Pacific/Fakaofo",
{st_mode=S_IFREG|0644, st_size=140, ...}) = 0
[pid 10293]      0.000121 stat("/usr/share/zoneinfo/Pacific/Enderbury",
{st_mode=S_IFREG|0644, st_size=204, ...}) = 0
[pid 10293]      0.000122 stat("/usr/share/zoneinfo/Pacific/Efate",
{st_mode=S_IFREG|0644, st_size=464, ...}) = 0
[pid 10293]      0.000122 stat("/usr/share/zoneinfo/Pacific/Easter",
{st_mode=S_IFREG|0644, st_size=8971, ...}) = 0
[pid 10293]      0.000123 stat("/usr/share/zoneinfo/Pacific/Chuuk",
{st_mode=S_IFREG|0644, st_size=144, ...}) = 0
[pid 10293]      0.000121 stat("/usr/share/zoneinfo/Pacific/Chatham",
{st_mode=S_IFREG|0644, st_size=2018,

I went to apache2.conf and found cause of the load – mpm_prefork was misconfigured:

 
   StartServers          8
   MinSpareServers      15
   MaxSpareServers      10
   MaxClients          150
   ServerLimit         300
   MaxClients          300
   MaxRequestsPerChild   4000

Because of MinSpareServers being bigger than MaxSpareServers (which is illogical itself), Apache kept forking off and terminating processes a lot – and every of short lived processes had to initialize PHP if it served any. Changing MaxSpareServers to more sensible value reduced both user and kernel CPU load (I should have read Apache’s config properly in the first place ;) ).

After that I have kept playing, trying to narrow down the reason for all these syscalls. I’ve got a copy of application on a spare box and set MaxRequestsPerChild to 1, what should make Apache to initialize PHP on every single request. Few die()’s later, it turned out that the timezone data is being scanned on:

date_default_timezone_set('GMT');

, and it’s easily replicable from CLI by running:

 strace -r php -n -r"date_default_timezone_set('GMT');" 2>&1 | grep zoneinfo

I have initially submitted bug for PHP, however it was quickly straightened that it’s Debian patching its PHP package for the convenience of utilizing system’s timezone updates (use_embedded_timezonedb.patch). Full discussion can be found here:

Eventually, the whole timezone initialization is not really of much importance (at least with PHP served of Apache), the real problem server I was looking at had was misconfigured Apache, however it seems that lack of cooperation between PHP authors and package maintainers leads to performance regressions (slight, but still).

Share

Amazon introduces DNS service.

Amazon has just launched DNS service, which they’ve called Route 53 (why not Amazon Elastic DNS?). I find it quite funny as day a before I’ve said “I wished Amazon have done their DNS”. It seems to be still in beta, but is publicly active and ready to test. Few things I’ve noticed so far:

  • They provide nice API for the whole thing, making it easy to integrate and automate management
  • Multiple geographic locations, providing redundancy and reliability, pretty standard with them. They actually list more locations than Dynect.
  • They don’t seem to do geo based anycast. Definitely a minus.
  • Flexible on TTL settings, making it easier moving domains around as opposed to some cheap and bad providers.
  • Allows round robin DNS load balancing.

They charge $1.00 per zone/month + additional $0.50 per million of queries (up to first billion, then it’s cheaper). Product seems to be competitive to Custom DynDNS and comes at lower price as well. Little table for comparison (for a single domain, as from what it seems queries are charged per account not per domain).

Amazon DynDNS
Cost of domain per year
(minimum amount of queries)
($1.00 + $0.50) * 12 = $18.00 $29.95
Queries in price 1.000.000 600.000
Geographic locations 16 5
Limit on zone entries nothing mentioned 75
LOC record no yes
SOA record yes no
SPF record yes no

Generally it seems to be superior for less money and quite more scalable. I’m waiting for AWS to become a registrar ;) Considering their pace of rolling out new services it shouldn’t be long.

Share

Tunnelling individual user’s traffic through remote server with Linux using GRE, NAT and iptables.

I had a need to route a particular process’ traffic through remote server in order to be let in by third parties firewalls. Most flexible way to do so in was to mark the traffic per account basis. Fortunately Linux allows you to do so using owner match in iptables. In my instance user account is called external.ip.

iptables -t mangle -A OUTPUT -m owner --uid-owner external.ip -j MARK --set-mark 5

Above marks all the traffic originating from local server with mark value of 5 (if you use higher values keep in mind that at iptables level marks are parsed as hexadecimal, while in iproute2 they’re considered decimals).

In order to verify, run watch on the command below, then log in into the account, generate some traffic and watch counters growing.

# iptables --list OUTPUT -t mangle -v -n 
Chain OUTPUT (policy ACCEPT 111M packets, 72G bytes)
 pkts bytes target     prot opt in     out     source               destination         
66938 4605K MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0           owner UID match 1050 MARK xset 0x5/0xffffffff 

Once packets are correctly marked, it is time to set up a tunnel. In my case I’ve used GRE tunnelling, as it quick and easy to set up. If it’s meant to last longer then couple hours, I would suggest encrypting it and using racoon (IPSec) instead. In this setup both client and a server would need external IPs (remote and local in the example below). Both “p2p_local” and “p2p_remote” are ends of the tunnel at our location and remote server respectively.

remote="5.5.5.5"
local="4.4.4.4"
p2p_local="10.0.1.1" 
p2p_remote="10.0.1.2" 
iface="gre_something" 

Run this on local machine:

ip tunnel add $iface mode gre remote $remote local $local ttl 255 dev eth0 
ifconfig $iface $p2p_remote netmask 255.255.255.252 pointopoint $p2p_local mtu 1400 up

, and on remote:

ip tunnel add $iface mode gre local $remote remote $local ttl 255 dev eth0 
ifconfig $iface $p2p_local netmask 255.255.255.252 pointopoint $p2p_remote mtu 1400 up

Keep in mind interfaces, they can be different for you. MTU may be set little bit too small (I have stolen this from googled example, but does not really matter.

Once that’s set check the tunnel is up:

# ip link show $iface
17: gre_something@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN 
    link/gre 4.4.4.4 peer 5.5.5.5

You should see at least LOWER_UP. If doesn’t try bringing links up:

ip link set up dev $iface

, on both boxes and ping each other.

In order to use remote’s IP address you’ll need to set up NAT on it. This is easily done with single iptables command:

iptables -t nat -A POSTROUTING  -i $iface -j MASQUERADE

Don’t forget to enable ip forwarding:

echo 1 > /proc/sys/net/ipv4/ip_forward

Now, when remote receives a packet through the tunnel not addressed to itself it will nicely translate it into its own address. The final step is to route previously marked packets through the tunnel. To do so we’re going to need a separate routing table and RPDB rule directing packets marked with 5 to it.

ip rule add fwmark 5 table 5
ip route add table 5 default via $p2p_local dev $iface

At this stage packets should start flowing to the final destination, however depending on your setup (was the case with me), remote may not know where to send them back. Using src hint in default rule out of $p2p_local won’t help, as system doesn’t know it will go through it at the stage when it’s generating a packet (it has to go through netfilter first to get mark). The way around it is to simply MASQUERADE again at the point where packets are entering the tunnel:

iptables -t nat -A POSTROUTING  -o $iface -j MASQUERADE
Share

Android handsets do not recognise GoDaddy’s SSL certificates

I’ve just spent some time to figure out why my Android mobile can’t connect to certain mail account (both IMAP and SMTP, delivered over SSL and TLS respectively). The problem is that SSL certificates were signed by GoDaddy, which is not trusted CA on Android platform. The standard mail application does not allow you to accept untrusted certificates, so the easiest workaround was to use other email client. Came across K-9 Mail, which looks and feels way better than the one that comes shipped with the phone (and does have an option to accept untrusted certificate).

Share

Real time log based statistics

Some time ago I came across a need to generate real time statistics of how many emails do our Exim installations send per second. Didn’t want to go into reconfiguring exim itself, thus I wrote a simple C program, that follows syslog generated file, does little parsing on a timestamp, and writes stats to disks every couple seconds. It :

  • does work only on one kind of timestamps (%Y-%m-%d %H:%M:%S)
  • accepts no parameters apart of the output file
  • does reading from stdin in main loop, and has another thread to write stats to disk
  • could be written better, faster, cleaner, using less memory, having less IO impact
  • generally works… been running for months without any problems (it will reallocate more memory if needed and clean after itself after a while)
  • can work on any log, as far timestamp is in the right format (unless you make it accepting other ;) ), as long you can grep what you want to count
  • is released under WTFPL

When it comes to Exim, it would be probably way more feasible (less IO overhead) to add a transport filter and count there, but then you’ll have all the issues with locking (exim sends emails with multiple queue runners), parallelism etc. Plus if your filter failed, you lose your MTA…

Once you compile, simply tail -f any log file and pipe it through the meter. Every 10 seconds it will write to the output file (which can be later fetched via HTTP) you specified as a parameter three numbers:

  1. how many events per second were in the piped in input, over last 10 seconds
  2. 10
  3. timestamp of the last update
/**
 * Copyright (C) 2009 Maciej Wiercinski http://blog.wiercinski.net/
 *
 * This program is free software. It comes without any warranty, to
 * the extent permitted by applicable law. You can redistribute it
 * and/or modify it under the terms of the Do What The Fuck You Want
 * To Public License, Version 2, as published by Sam Hocevar. See
 * http://sam.zoy.org/wtfpl/COPYING for more details.
 */

#include <stdio.h>
#include <pthread.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define   READBUFSIZE 512
#define   QUEUE_BLOCK 128
#define   HITTIME 1
#define   COLLECTSTATS 10 

/**
Compilation:
    gcc -lpthread monitor.c  -o monitor

To test:
    while [ 1 ]; do sleepenh 0.`echo "$RANDOM % 10000" | bc |  awk '{ printf "%05d" , $1; }'` > /dev/null; date +'%Y-%m-%d %H:%M:%S'; done | ./monitor output

Usage :
tail --follow=name /var/log/exim4/mainlog | grep Completed --line-buffered   | ./monitor stats.txt
*/

pthread_mutex_t mx_queue = PTHREAD_MUTEX_INITIALIZER;


struct rw_parms {
    int ** queue;
    int * queuesize;
    int * elems;
    char * filepath;
};


int comparator (const void *a, const void *b) {
    return (*(int*)b - *(int*)a );
}

void * writer(void *ptr) {
    struct rw_parms * rp;   FILE * f;
    double result;
    int i = 0; 
    int diff;
    rp = (struct rw_parms *) ptr;

    while(1) {
        sleep(HITTIME);
        pthread_mutex_lock(&mx_queue);
        qsort(*(rp->queue), *(rp->elems), sizeof(int), comparator);
        for(i= 0 ; i< (*(rp->elems) ); i++) {
            diff = time(NULL) - (*(rp->queue))[i];
            if(diff > COLLECTSTATS) {
                (*(rp->elems)) --;         }
        }
        result = (double)(*(rp->elems)) / (double) COLLECTSTATS;
        f = fopen(rp->filepath,"w");
        if(!f) {
            fprintf(stderr, "\n Failed to open a file");
            return NULL;
        }
        fprintf(f,"%.1lf %d %d", result, COLLECTSTATS,time(NULL));
        fclose(f);
        pthread_mutex_unlock(&mx_queue);
    }
}


int get_time_stamp(char * buf) {
    int days = -1, months = -1, years = -1, hours = -1, minutes = -1, seconds = -1;
    struct tm timeinfo;
    int * ptrs[6];
    char bufatoi[5] = {0};
    int i = 0, j=0, tmp = -1, ts;
    ptrs[0] = &years;
    ptrs[1] = &months;
    ptrs[2] = &days;
    ptrs[3] = &hours;
    ptrs[4] = &minutes;
    ptrs[5] = &seconds;
    for(i=0; i<6; i++) {             *bufatoi = *(buf++);
        *(bufatoi+1) = *(buf++);
        if(i == 0) {
            *(bufatoi+2) = *(buf++);
            *(bufatoi+3) = *(buf++);
        }
        if(i == 1) {
            *(bufatoi+2) = 0;
        }
        buf++;
        // min 2 chars
        if( *bufatoi < '0' || *bufatoi > '9' || *(bufatoi+1) < '0' || *(bufatoi+1) > '9' ) {
            return -1;
        }
        **(ptrs+i) = atoi(bufatoi);
    }
    timeinfo.tm_sec = seconds;
    timeinfo.tm_min = minutes;
    timeinfo.tm_hour = hours;
    timeinfo.tm_mday = days;
    timeinfo.tm_mon = months - 1;
    timeinfo.tm_year = years - 1900;
    timeinfo.tm_isdst = timeinfo.tm_wday = timeinfo.tm_yday = -1;
    return mktime(&timeinfo);
}

int main(int argc, char ** argv) {
    if(argc < 2) {
        fprintf(stdout,"\tUsage:\t%s <outputfilename>\n", *argv);       return 1;
    } 
    pthread_t th_writer;
    struct rw_parms rp;
    char c, readbuf[READBUFSIZE] = {0};
    int ** queue = 0;
    int ret_writer, lastts, *elems = 0, *queuesize = 0,iteration = 0;

    queuesize = (int *) malloc(sizeof(int));
    *queuesize = QUEUE_BLOCK;

    elems = (int *) malloc(sizeof(int));
    *elems = 0;

    queue = (int **) malloc(sizeof(int *));   *queue = (int *) malloc(sizeof(int) * (*queuesize));
    rp.queue = queue;
    rp.queuesize = queuesize;
    rp.elems = elems;
    rp.filepath = argv[1];
    ret_writer = pthread_create(&th_writer, NULL,writer, (void*) &rp);

    while(!feof(stdin)) {
        fgets(readbuf, READBUFSIZE,stdin);
        if(strlen(readbuf) == READBUFSIZE && readbuf[READBUFSIZE-1] != '\n') {
            c = 0;
            // flush long lines
            while(!feof(stdin) && fgetc(stdin) != '\n');
        }
        lastts = get_time_stamp(readbuf);
        if(lastts < 0) {
            continue;
        }
        iteration++;
        pthread_mutex_lock(&mx_queue);
        if( *queuesize <= ((*elems) + 1)) {
            *queuesize += QUEUE_BLOCK;
            (*(rp.queue)) = realloc((*(rp.queue)), sizeof(int) * (*queuesize));
        }
        if(iteration % (QUEUE_BLOCK * 4) == (QUEUE_BLOCK * 4 - 1)) {
            if((*queuesize) - QUEUE_BLOCK > (*elems) + 1) {
                (*queuesize) = *elems + QUEUE_BLOCK - (*elems % QUEUE_BLOCK) ;
                *rp.queue = realloc(*(rp.queue),sizeof(int) * (*queuesize));         }
        }
        (*queue)[ (*elems)++] = lastts;
        pthread_mutex_unlock(&mx_queue);
    }

    fprintf(stderr, "\n Reader finished... ");
    pthread_join(th_writer, NULL);   return 0;
}

Share

MySQL Enterprise Monitor Agent having memory leaks

I’ve recently updated MySQL Enterprise Monitor on few boxes to the most recent version. Apparently, since Oracle acquired Sun, Enterprise team has lowered its quality standards. The agent (in agent,proxy mode) kept leaking memory at pace of about 1 Gb per hour (got up to 5gbs before we’ve noticed).

The offending binary was:

mysqlmonitoragent-2.1.1.1144-linux-glibc2.3-x86-64bit-installer.bin

I find it extremely disappointing, as long as I wouldn’t care about Monitor Server going down, I expect agents to be extremely stable, as their failure can affect production services. With agent deployed on the same box as the database, it’s easy to redirect the traffic to port of the real db in case if it crashed, however if agent consumed memory dedicated for MySQL, it would cause the database & other services to crash.

Share