I've used spread for centralized logging before, it's a horrible
clusterF. If you want details as to why, let me know.
On Thu, Apr 16, 2009 at 9:09 PM, Michael Shadle <mike503@gmail.com> wrote:
> if just looking for some sort of distribution you could put stuff into
> memcached, mysql cluster, look at spread toolkit (spread.org i
> believe) and gearman...
>
> On Thu, Apr 16, 2009 at 8:55 PM, W. Andrew Loe III <andrew@andrewloe.com> wrote:
>> I'm by no means a splunk expert, you should ask them, but I think it
>> scales pretty well. You can use multiple masters to receive and
>> load-balance logs, and you can distribute the searching map/reduce
>> style to leverage more cores. Search speed seems to be much more CPU
>> bound than I/O bound, the logs are pretty efficiently packed. *Works
>> for me* with ~ 15-20 EC2 instances and one central logging server. It
>> also keeps logs in tiered buckets, so things from 30 days ago are
>> available, but slower to search on where as yesterday's logs are
>> 'hotter'.
>>
>> On Thu, Apr 16, 2009 at 8:41 PM, Gabriel Ramuglia <gabe@vtunnel.com> wrote:
>>> Does this scale well? I'm running a web based proxy that generates an
>>> absolute ton of log files. Easily 40gb / week / server, with around 20
>>> servers. I'm looking to be able to store and search up to 7 days of
>>> logs. Currently, I only move logs from the individual servers onto a
>>> central server when I get a complaint, import it into mysql, and
>>> search it. The entire process, even for just one server, takes
>>> forever.
>>>
>>> On Thu, Apr 16, 2009 at 7:37 PM, W. Andrew Loe III <andrew@andrewloe.com> wrote:
>>>> Its commercial, but Splunk is amazing at this. I think you can process
>>>> a few hundred MB/day on the free version. http://splunk.com/
>>>>
>>>> You set up a light-weight forwarder on every node you are interested
>>>> in, and then it slurps the files up and relays them to a central
>>>> splunk installation. It will queue internally if the master goes away.
>>>> Tons of support for sending different files different directions etc.
>>>> We have it setup in the default Puppet payload so every log on every
>>>> server is always centralized and searchable.
>>>>
>>>> On Wed, Apr 15, 2009 at 8:44 AM, Michael Shadle <mike503@gmail.com> wrote:
>>>>> On Wed, Apr 15, 2009 at 7:06 AM, Dave Cheney <dave@cheney.net> wrote:
>>>>>
>>>>>> What about
>>>>>>
>>>>>> cat *.log | sort -k 4
>>>>>
>>>>> or just
>>>>>
>>>>> cat *whatever.log >today.log
>>>>>
>>>>> I assume the processing script can handle out-of-order requests. but I
>>>>> guess that might be an arrogant assumption. :)
>>>>>
>>>>> I do basically the same thing igor does, but would love to simplify it
>>>>> by just having Host: header counts for bytes (sent/received/total
>>>>> amount of bytes used, basically) and how many http requests. Logging
>>>>> just enough of that to a file and parsing it each night seems kinda
>>>>> amateur...
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>