if just looking for some sort of distribution you could put stuff into
memcached, mysql cluster, look at spread toolkit (spread.org i
believe) and gearman...
On Thu, Apr 16, 2009 at 8:55 PM, W. Andrew Loe III <andrew@andrewloe.com> wrote:
> I'm by no means a splunk expert, you should ask them, but I think it
> scales pretty well. You can use multiple masters to receive and
> load-balance logs, and you can distribute the searching map/reduce
> style to leverage more cores. Search speed seems to be much more CPU
> bound than I/O bound, the logs are pretty efficiently packed. *Works
> for me* with ~ 15-20 EC2 instances and one central logging server. It
> also keeps logs in tiered buckets, so things from 30 days ago are
> available, but slower to search on where as yesterday's logs are
> 'hotter'.
>
> On Thu, Apr 16, 2009 at 8:41 PM, Gabriel Ramuglia <gabe@vtunnel.com> wrote:
>> Does this scale well? I'm running a web based proxy that generates an
>> absolute ton of log files. Easily 40gb / week / server, with around 20
>> servers. I'm looking to be able to store and search up to 7 days of
>> logs. Currently, I only move logs from the individual servers onto a
>> central server when I get a complaint, import it into mysql, and
>> search it. The entire process, even for just one server, takes
>> forever.
>>
>> On Thu, Apr 16, 2009 at 7:37 PM, W. Andrew Loe III <andrew@andrewloe.com> wrote:
>>> Its commercial, but Splunk is amazing at this. I think you can process
>>> a few hundred MB/day on the free version. http://splunk.com/
>>>
>>> You set up a light-weight forwarder on every node you are interested
>>> in, and then it slurps the files up and relays them to a central
>>> splunk installation. It will queue internally if the master goes away.
>>> Tons of support for sending different files different directions etc.
>>> We have it setup in the default Puppet payload so every log on every
>>> server is always centralized and searchable.
>>>
>>> On Wed, Apr 15, 2009 at 8:44 AM, Michael Shadle <mike503@gmail.com> wrote:
>>>> On Wed, Apr 15, 2009 at 7:06 AM, Dave Cheney <dave@cheney.net> wrote:
>>>>
>>>>> What about
>>>>>
>>>>> cat *.log | sort -k 4
>>>>
>>>> or just
>>>>
>>>> cat *whatever.log >today.log
>>>>
>>>> I assume the processing script can handle out-of-order requests. but I
>>>> guess that might be an arrogant assumption. :)
>>>>
>>>> I do basically the same thing igor does, but would love to simplify it
>>>> by just having Host: header counts for bytes (sent/received/total
>>>> amount of bytes used, basically) and how many http requests. Logging
>>>> just enough of that to a file and parsing it each night seems kinda
>>>> amateur...
>>>>
>>>>
>>>
>>>
>>
>>
>
>