Hi.
the first thing you have to deal with is multicast. It's designed so
that you can multicast your data to multiple servers without having to
specify several destinations and send the traffic to each separately.
However, if your switches / routers don't support multicast, it will
automatically fall back to broadcasting, even if you only have one
server as the destination for the data. My web hosting provider
quickly cut off the server that was broadcasting so much data to
everyone on the subnet. Broadcasts, since they go to everyone, can
slow down all the servers on your network as they all have to decide
what to do with the packets they're receiving. you can override this
behavior to singlecast if you only have one destination server, but
not if you have more than one.
secondly is the problem of getting it up and running in the first
place, setting up your various variables, compiling and installing,
which is not particularly simple.
Then you have to decide, how is my data going to get into spread?
You're going to need a program that sends the data into spread. Maybe
you'll use a perl script, that seems to be popular. One way to do that
is to pipe your log file output to a perl script. But what happens
when the perl script unexpectedly dies? The application that is piping
out to the script fails. You'll have to notice that this has occurred,
and kill your hosting software (nginx, apache, squid, whatever) as
well as the perl script, and restart the perl script and then your
hosting software, in that order. Your perl script can die for all
sorts of reasons, not least of which that it lost contact with the
spread server for too long, or it's queue of messages to send across
the pipe got too long. Yeah, definitely what you want when you've got
a backlog of 10,000 requests to send across is to lose them all when
your logging program crashes.
Ok so another method is to have your hosting platform log to a file as
normal, but have your perl script attach to the file with something
like tail -f | whatever.pl. That can work, but it suffers the same
problems with dying unexpectedly, the only difference being that when
the perl script dies, logging to the file continues and the hosting
platform doesn't unexpectedly die.
the other issue is performance. the receiving spread server has to be
able to process all these incoming "messages" as they're called from
all your servers, do something with them, and be available to receive
more messages, without crashing. Again, this is probably a perl
script. And, again, I had no real trouble creating a huge load on the
receiving spread server when it was receiving real time log data from
just one server. I have 20. If I was lucky, I could have gotten it
doing 2 or 3 servers of realtime log data. 20 was not going to happen.
The whole thing with spread is that, in theory, it is designed to be
robust, but in my experience it is far from it, the whole operation
seemed quite fragile. It looked like the amount of effort I was going
to have to put in to write programs to make sure that spread was
working properly, and to work around potential failure conditions in
an elegant way, was obscene. I'm sure spread has a number of good
uses, but I could not recommend it for centralized logging. It does
look interesting for a program called whack-a-mole, which is designed
to help you set up your servers in high availability, but that
requires a lot fewer messages flying around than log files would be.
SCP'ing or rsyncing a file from your source server to your centralized
logging server is a lot more robust. Those transfer programs have a
number of protections to make sure the file got there intact, can do
compression, whatever. And you don't have to transfer your log files
line-by-line with scp or rsync, you can dump huge amounts of data
across, and then have your central log server process them at it's
leisure. If it can't keep up with peak demand, it can catch up when
the site isn't as busy, so you don't have an end-of-the-world scenario
if the centralized log server can't keep up with the generation of
real time logs.
On Fri, Apr 17, 2009 at 4:54 AM, Zev Blut <zblut@cerego.co.jp> wrote:
> Hello Gabriel,
>
> I'd like to know why.