So I am having a problem with a BASH service in Debian 7 that I’ve been working on for quite a while and that randomly started having trouble with its fifo, or so it seems. It is based on kind of your classic fifo use example and has worked fine for months but suddenly, today, started giving me trouble. It seems like whenever things like this happen, it is always something completely different from what I originally conclude so I will present what I have and maybe somebody can point out to me the obvious bit I’m not seeing.
As I said, my code for reading / writing from a named pipe is kinda standard. I made a boiled down version (150ish lines) that I thought I’d present but, of course, it worked fine and I have no idea why. So here is the super boiled down version for reference:
#--------------------------------Writer Script--------------------------------------#
#!/bin/bash
fifoIn=".../path/fifoIn"
#Read user input
IFS='' #Changed IFS so that spaces aren't trimmed from input
while true; do
read -e line
printf "%bn" "$line" >&4
done 4>"$fifoIn"
exit 0
#--------------------------------Reader Script--------------------------------------#
#!/bin/bash
fifoIn=".../path/fifoIn"
LogFile=".../path/srvc.log"
[ -d ".../path" ] || mkdir -p ".../path"
[ -e "$fifoIn" ] || mkfifo "$fifoIn"
printf "%bn" "Flushing input pipe" >> "$LogFile"
dd if="$fifoIn" iflag=nonblock of=/dev/null >/dev/null 2>&1
while true; do
if read -t 0.1 -a str; then
printf "n%sn" "<${str[*]}>"
case "${str[0]}" in
"foo")
printf '%bn' "You said foo..."
;;
"bar")
printf '%bn' "You said bar..."
;;
"")
;;
*)
printf "%bn" "${str[*]}:"
printf "%bn" "Uhhuh..."
;;
esac
fi
done <"$fifoIn" >> "$LogFile" 2>&1 3>"$fifoIn"
So you take ‘reader script’ and run it as a daemon, then talk to it by echo
ing or printf
ing or using the writer script to send messages to the named pipe, fifoIn
. This has worked great from the get go but today it got weird.
It, for some reason, started getting choosey about who could write (or at least it seemed to be who could write) to the pipe. I didn’t see any errors, but I would try to send text to the pipe and nothing would happen on the service side. I have cron jobs set up to write to the pipe and those would go through no problem while me echo
ing from a terminal would get nothing. Not even errors or permission denied messages. The cron jobs are set up to be the same user as my terminal anyway so I don’t think this is a permissions thing.
It seems that every time I deleted the fifo and restarted the service, I could get a few terminal-entered messages through then usually, but not always, that would seem to block or otherwise stop working after a cron-originated message was sent to the service. I would no longer be able to send messages through the pipe, but the cron-originated messages would continue to go through just fine!
I did some googling and came across the strace
command. I tried doing something like strace printf '%bn' "foo" >> .../path/fifoIn
, got a whole bunch of diagnostic system call stuff that I don’t really understand, but looks like it all worked because there was nothing like like Hey! right here! something broke right here!!
and it ended with:
...
write(1, "foon", 4)
close(1)
...
which I’m guessing is a good thing. Now the funny thing, the message went through and the daemon read it as expected! I removed the strace
from that exact line and again, no dice.
So all you folks who know way more about io operations and system calls than I do, what happens differently between when you have strace
as a preface and when you don’t? What can generally gum up a pipe without its having been closed for reading? Any other leads you may pick up on because I’m at a loss.
UPDATE
@Gilles, I think you’re on to something in suggesting other processes trying to read that same pipe and causing problems… I wrote a new function that calls some instances of mutt that seem to have some association with fifoIn
for some reason. I’m not super sure how to read the output of lsof
, but it reads this after I execute that function (and consequently gum up my pipe):
COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mutt 13874 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13874 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13897 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13897 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13932 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13932 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13971 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 13971 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14012 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14012 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14051 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14051 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14096 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14096 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14124 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
mutt 14124 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
srvc 14298 uname 0r FIFO 8,17 0t0 393222 .../path/fifoIn
srvc 14298 uname 3w FIFO 8,17 0t0 393222 .../path/fifoIn
lsof 15587 uname 1w FIFO 0,8 0t0 176516 pipe
lsof 15587 uname 5w FIFO 0,8 0t0 176524 pipe
lsof 15587 uname 6r FIFO 0,8 0t0 176525 pipe
grep 15588 uname 0r FIFO 0,8 0t0 176516 pipe
lsof 15589 uname 4r FIFO 0,8 0t0 176524 pipe
lsof 15589 uname 7w FIFO 0,8 0t0 176525 pipe
I’m guessing I miss-wrote the mutt calls (which wind up executed in subshells) but they are latching onto the inherited FD’s because of whatever I did wrong with the command. I’d say that’s the answer and I’ll take it from there! If you post an ‘answer’ I’d be happy to select it!