urandom Mangot ideastech.mangot.comhttp://tech.mangot.com/roller/dave/feed/entries/atom2012-08-02T11:07:36-07:00Apache Roller (incubating)http://tech.mangot.com/roller/dave/entry/devopsdays_2012_event_detection_openDevOpsDays 2012: "Event Detection" Open Spacedave2012-08-02T11:07:36-07:002012-08-02T11:07:36-07:00<p><em>A few weeks ago at <a href="http://devopsdays.org/events/2012-mountainview/">DevOpsDays</a> we were given the opportunity to propose topics to be discussed in the afternoon "open spaces". I was lucky enough to have my proposals chosen, on the condition that someone write a blog post to detail what was discussed during the session. This is the second of those posts...</em></p>
<p>The second discussion I facilitated was one on event detection (also known as anomaly detection). This is a project that had been started many times at Tagged, and like logging from the previous open space, there were no rock-solid recommened answers from the community. The discussion really broke along 3 different lines: thresholding, complex event correlation, and more advanced signal processing/analysis.</p>
<p>The overriding theme of the discussion was looking for ways to realize state changes as events. These events could trigger alerting systems and be shown as annotations in Graphite, Ganglia, etc.</p>
<h2>Thresholding</h2>
<p>The discussion started off around the idea of automated thresholding as a first step toward event detection. There are many examples of this, and the first one mentioned was the auto-baselining Etsy does with Holt-Winters in Graphite. For alerting, with Nagios as an example, the consensus was that individual plugins could do the thresholding themselves and even pull the data from RRD or Graphite, or be instrumented with something like <a href="http://erma.wikidot.com/">ERMA</a> . It was proposed that monitoring the derivatives was superior to simple absolute values. With something like Nagios (and presumably other systems), it could take passive checks from a CEP in addition to the thresholded alerts.</p>
<p>There are a number of other programs, like <a href="http://labs.omniti.com/labs/reconnoiter">Reconnitor</a>, and <a href="https://github.com/dallasmarlow/edgar">Edgar</a> that have built in facilities for trending/prediction and forecasting instead of relying on individual checks to implement this themselves. This seems to be a more common feature in the industry now, with forcasting even making it into the latest versions of Ganglia-web. With these types of systems, you can do forecasting on a given set of time series data.</p>
<p>People we also very high on <a href="https://github.com/aphyr/riemann">Riemann</a> which is written in Clojure. The constructs of the Clojure language make it ideally suited for being able to perform the types of functions (combining, filtering, alerting) you would want in this kind of monitoring and alerting system. The big question people had about this system was how well it would scale vs. other approaches.</p>
<h2>Complex Event Processing (CEP)</h2>
<p>When discussing complex event processors, the discussion immediately fell to <a href="http://esper.codehaus.org/">Esper</a>. The idea behind Esper is that unlike a traditional database where you run your queries against stored data, with a stream processor, you run your data against your stored queries. You can define windows of time over which you would like to look for specific events. It can be run "in process" or run as its own instance. Many people were in favor of the latter approach so you did not need to restart your application when changing rules. It was also suggested that you run Esper in an active-active configuration so that when restarting, you don't lose visibility into your environment.</p>
<p>We spent some time discussing the history of CEPs and a few people pointed out that most of the innovation here was driven by the financial industry and a language called <a href="http://en.wikipedia.org/wiki/K_(programming_language)">K</a>). There have also been a few other CEPs in the past like SEC (simple event correlator), but as it was implemented in Perl, it had scaling problems and couldn't handle any significant loads. </p>
<p>We talked about the idea that a CEP could actually take many forms. Instead of having fancy algorithms for determining errors states, it could simple be as simple as the codifying of tribal knowledge so that a human does not need to sit and watch the state of the system.</p>
<p>We arrived at the conclusion that many had expressed at the beginning, which was that complex event processing is hard. There were only a few people in the group who had made a serious stab at it with something like Esper, and there were many that were looking for answers.</p>
<h2>Signal Analysis</h2>
<p>The discussion then turned to other ways of being able to detect events. Because this was a DevOps heavy crowd who was accustomed to bridging gaps between disciplines, they started looking for answers elsewhere. People started to question whether or not this was simply a digital signal processing (DSP) problem. Should we be involving data scientists, signal processors, or mathematicians? Who could help us look at a pattern over the long term to be able to detect a memory leak?</p>
<p>One idea was to be able to look at a baseline and apply filters to it to be able to find deviations. Someone asked that there be a blog post written describing how you could apply a filter to a stream of data to be able to show the failure. </p>
<p>Then <a href="https://twitter.com/jbergmans">John Bergman</a> proposed an interesting idea. What if we were to make a dataset of time series data with known failures available to the academic community? A little bit like a Human Genome Project for operations failures. Would that be enough to attract interest from the academic community? The hope would be that if this data were to be made available, those same data scientists and statisticians from academia would have a large corpus upon which they could test and develop those filters and analysis tools that the DevOps community needs in order to be incorporated into our analysis systems. Much like basic auto-thresholding, and trending/prediction, are being incorporated now. In effect, this would be an extension of that same effort. We felt that in order to get a project like this off the ground, we would need a partner in the academic community who would be able to curate this collection of data in order for it to gain a critical mass of scholarly adoption.</p>
<h2>Conclusion</h2>
<p>Ultimately, this open space was really a confirmation of many of the concerns and fears that many who participated had already felt. Complex event processing and correlation is a hard problem, and nobody is doing it extremely well, yet. By coming at the problem from a variety of different approaches, we are getting closer all the time to something workable.</p>
<p>The notion of an Operations Failure Database was some great "outside the box" thinking that could bring two different yet overlapping communities together for a common purpose. If we could get enough people to contribute their data in the proper form, and enough interest from those who would like to analyze that data, there could quickly be some major advances in our currently primative tooling for this purpose.</p>http://tech.mangot.com/roller/dave/entry/devopsdays_logging_open_spaceDevOpsDays 2012: "Logging" Open Spacedave2012-07-17T07:53:05-07:002012-07-17T07:53:05-07:00<p><em>A few weeks ago at <a href="http://devopsdays.org/events/2012-mountainview/">DevOpsDays</a> we were given the opportunity to propose topics to be discussed in the afternoon "open spaces". I was lucky enough to have my proposals chosen, on the condition that someone write a blog post to detail what was discussed during the session. This is one of those posts...</em></p>
<p>We started out the discussion when I gave a short history about my experiences with logging over the years. It basically boiled down to the fact that there used to be a website and mailing list associated with LogAnalysis.Org (run by Tina Bird and Marcus Ranum). I remembered reading on the website or the mailing list, I don't remember which, about the fact that when you go looking for log analysis tools, you find logsurfer and swatch. Eventually you come to the realization that there are no good open source tools for this purpose, and to the conclusion that all that is left is <a href="http://www.splunk.com">Splunk</a>. Unfortunately I can't find this even in the <a href="http://www.archive.org">Internet Wayback Machine</a>.</p>
<p>The topic of <a href="http://www.splunk.com">Splunk</a> came up many, many, times during the course of our discussion. I even found it researching this blog post when I found the <a href="http://lists.jammed.com/loganalysis/2007/03/0023.html">mailing list post</a> by Tina Bird talking about how Splunk has graciously accepted the role as maintaining both the loganalysis mailing list, and managing loganalysis.org. Sadly or curiously, after Splunk took over managment of the domain and mailing list, they have dissappeared and the loganalysis.org domain has been taken over by a bunch of squatters. The consensus around Splunk was that it is great. I mean really great at mining the data. But there need to be more competitors in the space than just one. The pricing model for Splunk actually punishes you for being sucessful with logging and thus discourages people from doing lots of logging. This seemed wrong.</p>
<p>So what are the alternatives? We were lucky enough to have <a href="https://twitter.com/jordansissel">Jordan Sissel</a> the author of <a href="http://www.logstash.net">LogStash</a> join as as part of the discussion (he also made a pitch for this same open space). He began talking about Open Source alternatives to Splunk like Logstash, <a href="http://code.google.com/p/enterprise-log-search-and-archive/">ELSA</a>, and <a href="http://graylog2.org">Graylog2</a>. For more ideas, you can check out this <a href="http://www.delicious.com/stacks/view/JZbUrE">Delicious Stack</a>. He also described the problem space as being broken down into two main areas as he sees it, the Transport Problem, and the Unstructured Data problem. The group spent the rest of the time discussing each of these areas as well as a third which I'll call the Presentation Problem.</p>
<h2>The Transport Problem</h2>
<p>This aspect focused on the idea that it would be great to both transport and process logging data in a similar format like JSON. In fact, many projects do this sending their logs over Scribe or Flume. The nice part about this is that you can still grep through the logs even if there have been changes to the JSON fields because it does not cause a fundamental change in the log structure. Basically, it will not break your fragile regexes. Also, the logs that are sent have to make sense and have value. There is no point in sending logs over the wire for no purpose. What a lot of companies have done to ensure this, is to build standardized logging functions into their code so that each developer is not creating their own. This is an attempt to at least give some structure to the data while it is being transported so that it is easier to handle when it reaches its destination.</p>
<h2>The Unstructured Data Problem</h2>
<p><em>"Logs are messages from developers to themselves"</em></p>
<p>A topic that was brought up repeatedly revolved around the question of why each company was doing this themselves. Why are there no standards about what is logged and in what format that should be? Is there a potential to standardize some of these things? If so, how? Whose standard should we adopt? Should we choose some IITL nomencature? The purpose of this would be so that if someone logged something with a level of ERROR or WARNING or INFO, everyone would actually know what this means. The problem is that it is hard for everyone to agree on the same standard. You can call it a style guide problem, or a people problem, but it all comes down to the fact that we are currently dealing with completely unstructured data.</p>
<p>With all that unstructured data to be handled, you come to realize that "logging is fundamentally a data mining problem", as one of our participants commented. Even if you're able to store the data, where do you put the secondary indices? Assuming you are indexing on time, if that is even a safe assumption, what's next? Application? Log source? "What do you do with apps you don't control?" How are you going to get their data into your structured log database?</p>
<p>Once the data is stored, how do we know what is actionable? Project managers only know one severity, URGENT!</p>
<h2>The Presentation Problem</h2>
<p><em>"Sending a CS person Postfix logs is actively hostile"</em></p>
<p>Once you've figured out how to transport the logs and store them, the final problem is presentation. How do you create something that is consumable by different end users? The folks at Esty have come up with ways to try and make the data they are mining more meaningful. They have a standard format that allows for traceability just like Google's Dapper or Twitter's Zipkin. Getting these logs in these kinds of formats are useful is not just for developers. There was consensus that there needs to be feedback from Ops to the developers as well. Ops needs to to have ways to know what is really an error. Having first hand knowlege of this situation, where the logs are filled with errors, and we were supposed to memorize which ones were real and which could be ignored, I can safely say this was not an ideal situation. Ops also needs to be able to specify what THEY want in the logs for an app (latencies?). </p>
<p><em>"Holt Winters and standard deviation are your friends"</em> </p>
<p>The final part of the presentation problem focused on what do with the data. Etsy contributed Holt Winters forecasting to the Graphite project because they felt it was so important to be able to make sense of the data you had collected. There were also suggestions to alert on rates over time, not on individual events. With all the disjointed tools out there, and the lack of any consensus of what forms logs should take, being able to present the data poses even more of a challenge.</p>
<h2>The Future</h2>
<p>There seemed to be a fundamental feeling within the group that the tools we have now for log transport, collection, and analysis were just not sufficient, unless you were willing to buy Splunk. Also as you can tell, the discussion raised many more questions than it did answers. But depsite that general tone to the space it was not all dour or dire. Jordan made a really big pitch for his vision of Logstash in the future. Luckly he's reiterated that same sentiment in a <a href="https://gist.github.com/3088552">recent gist</a>, so you don't have to hear it from me! </p>
<p>Logstash actually tackles a number of these problem areas, so the future is potentially not as dark as it seems.</p>
<dl>
<dt>The Transport Problem</dt>
<dd>Logstash provides the logstash log shipper which is basically logstash run with a special config file. Alternatively, there is the same idea in Python provided by <a href="https://github.com/lusis/logstash-shipper">@lusis</a>.<dd>
<dt>The Unstructured Data Problem</dt>
<dd>This is the main problem that Logstash fixes. Logstash recognizes many common logfile formats and can translate them into the appropriate JSON. If it doesn't recognize yours, you can write your own. It can take many types of unstructured inputs, and send the now structured data to many different types of outputs. You can think of it like a neuron where the dendrites take input from multiple axons, and the axon can send the data to multiple dendrites across the synaptic cleft.</dd>
<dt>The Presentation Problem</dt>
<dd>Most of the time, you will send your log data into Elasticsearch (ES). Once in Elasticsearch, it can be queried using standard ES methods (e.g. REST). The is a great FOSS interface to ES called <a href="http://rashidkpc.github.com/Kibana/">Kibana</a> which allows you to search, graph, score, and stream your Logstash/Elasticsearch data.</dd>
</dl>
<p>The community is potentially at a turning point. Accept the juggernaut that is Splunk and live with the currently lacking status quo, or get together and change it. Which path will we choose?</p>
<p><em>Quotes in this blog post are unattributed statements made during the discussion</em></p>http://tech.mangot.com/roller/dave/entry/ode_to_the_external_nodeOde to the External Node Classifier (ENC)dave2012-05-23T14:21:32-07:002012-05-23T14:21:32-07:00<img src="/roller/dave/resource/blogimages/Puppet-Labs-Logo-Horizontal-Sm.png">
<p><em>External Node Classifier, how do I love thee? Let me count...</em> There
has been a great deal of attention lately being paid to the backends
that are available in my current configuration management tool of
choice, Puppet. I'm sure Chef must have some similar types of
constructs. The buzz is about
<a href="http://projects.puppetlabs.com/projects/hiera">Hiera</a>, which is a
pluggable hierarchical database for Puppet. Which means that when
Puppet is looking up information about a node, it can look in multiple
places. I think this is a great idea, and at
<a href="http://about.tagged.com">Tagged</a> we have been using something similar
that we are in love with for a few years, the <a href="http://docs.puppetlabs.com/guides/external_nodes.html">External Node
Classifier (ENC)</a>.</p>
<p>What the ENC allows us to do is make a call to our centralized
management database (CMDB) for each host that calls in for Puppet
configuration. We return a bit of YAML from our Perl script, and
Puppet uses that information to configure the node. Click on the
link above to find out more about how it works. The powerful thing
about this mechanism, is that we can return almost <strong>anything</strong> we
want for Puppet to use. Each variable that we return in the YAML can
be used as an actual variable in our Puppet manifests. This is
what's so amazing about the ENC, it allows us to organize our network
of hosts however we want it, with almost no preconceived notions of
what we are going to want to build next (within reason of course).</p>
<P>
<code>
---
classes:
- web
environment: production
parameters:
SpecId: 6
appType: web
cabLocation: 98b
cageLocation: 34
consolePort: 19
cores: 8
cpuSpeed: 2.53GHz
ganglia_cluster_name: Web
ganglia_ip: 172.16.11.34
ganglia_ip2: 172.16.11.35
ganglia_port: 8670
gen: 4
portNumber: 19
vendor: Dell
</code>
</P
<p>We have been working with a network management tool lately that was
written by a bunch of network engineers, and it shows. Not that it
is a bad product by any means, in fact, it's quite useful if you put
the work into it. The part that the Ops team found more troublesome,
was that we were expected to slice our tiers (web, search, memcached,
etc) by IP address. The thinking goes that if we have different
classes of server, they can be assigned to their own subnet, and
therefore it should be pretty easy to segment our hosts, following on
the network segmentation. Except for the fact that, it doesn't work
like that in real life.</p>
<p>Sure we could pre-allocate large chunks of the network for web
servers, and another chunk for security services. But what if we
guess wrong? What if they don't all fit into a /24, or a /23.
What if we allocate a /28 but it turns out we need a /25? This is a
problem. It gets worse if you consider that you actually don't want
your memcached servers to be in a different subnet than your web
servers. In a datacenter environment, latency is important, and
layer 2 is the only way to go for some applications. Routing will
kill you.</p>
<p>So, what to do? Enter the ENC. Our ENC returns lots of information,
but at a high level, it returns our Puppetclass and what we call an
AppType, which is like a subclass. For example, the Puppetclass may
be <em>web</em> and the AppType <em>imageserver</em>. Now, I can actually slice my
hosts any way <em>I</em> want to. The same group of engineers should be
able to login to the <em>imageserver</em> hosts? No problem, distribute an
access control file based on AppType. All the <em>imageservers</em> should
get the same Apache configuration? Again, not a problem. If an
<em>imageserver</em> and a <em>PHPserver</em> are in sequential IPs, it does not
matter. If they have a memcached host situated on an IP in between?
Again, not a problem. Puppet will take care of ensuring each host
gets the proper configuration.</p>
<p>But it actually get's better. Using the ENC, we can actually group
hosts any way that we imagine. One thing we use very heavily at
Tagged is <a href="http://ganglia.sourceforge.net/">Ganglia</a>. Very simply,
we could map Ganglia clusters to AppTypes. We don't even need to
simply return the AppType, we could actually return the Ganglia
configuration for each host and plug that into a Puppet ERB template.
This is where it gets interesting. We actually combine multiple
AppTypes into Ganglia clusters in some cases. For example, our
security group has all kinds of different applications that they use to
keep our users safe and secure. Some are on one server, some are on
many, but it is very unlikely that our security group needs to get a
large "cluster-wide" view of an application tier. Very often they are
looking at the performance of individual hosts. If we were segmented
by IP address, we would have to guess how many applications they would
develop over some arbitrary time period. If we were segmented purely
on AppType, we might have 10 different Ganglia clusters with one or
two hosts each. </p>
<p>But because of the power of the External Node Classifier, we can
actually slice and group our network of hosts any way that we choose,
in ways that serve our purposes best. When we changed from
collecting our system information from Ganglia gmond to <a href="http://host-sflow.sourceforge.net/">Host
sFlow</a>, it was literally a change
to a few variables and templates, and within 30 minutes, we had a
completely different monitoring infrastructure. It was that simple.</p>
<p>If you haven't looked at the more capable backends to Puppet or your
current configuration management tool of choice, you should. Just
like "infrastructure as code", a little up front hacking, goes a long,
long way.</p>http://tech.mangot.com/roller/dave/entry/i_m_speaking_at_velocityI'm speaking at Velocity 2012!dave2012-04-16T22:17:23-07:002012-04-16T22:17:23-07:00<P>They want me to publicize it, so here goes, I'm speaking at Velocity this year. If you read my last blog post you know that I'm pretty excited about host sFlow and the amazing things it been able to do for us on our network at <A HREF="http://blog.tagged.com/">Tagged</A>.</P>
<P>This year, Peter Phaal and I will be presenting <A HREF="http://velocityconf.com/velocity2012/public/schedule/detail/23487">The sFlow standard: scalable, unified monitoring of networks, systems and applications</A>.</P>
<P>We'll be talking about:<BR>
<UL>
<LI>What sFlow is</LI>
<LI>What it can do for you</LI>
<LI>Integrating sFlow with Ganglia</LI>
<LI>What sFlow gives you outside of your graphs</LI>
<LI>Lots of cool examples from Tagged with real world data</LI>
</UL>
<P>If you were on the fence about attending Velocity, attend! Then, attend my talk! You can use the discount code FRIEND to get 20% off your registration. Cheers.
</P>http://tech.mangot.com/roller/dave/entry/host_based_sflow_a_dropHost-based sFlow: a drop-in cloud-friendly monitoring standard dave2011-11-01T07:51:17-07:002011-11-01T07:51:17-07:00<IMG SRC="/roller/dave/resource/blogimages/sflowlogo-med.gif"><BR>
<p>Everyone who is a professional sysadmin knows that part of the excitement and drain of our jobs, is keeping track of all the different technologies out there, how and what to add to our toolbox, and what's coming next.</p>
<p>Sometime we are lucky enough to bump into an old friend that has grown and matured over the years. I'm talking about technology, and in this case, <a href="http://www.sflow.org/">sFlow</a>. I used a number of Foundry (now Brocade) switches at different companies over the years and they all implemented sFlow. I would send all my sFlow data to various different collectors at different jobs, and was constantly amazed at the power and versatility of this technology.</p>
<p>One of the things in which sFlow really excels in the network space is doing things like showing you the "top talkers" on a network segment. It does this by sampling the packet stream and allowing you to see what it sees. This is much more efficient than trying to capture every packet. When you are able to adjust the amount of sampling you do based on the packet count you experience, you are able to handle much larger volumes of traffic with a high degree of confidence in your data. I always thought that it would be great if I could get this level of visibility on my application tier, and now I can.</p>
<p>The sFlow community has been making great strides with <a href="http://host-sflow.sourceforge.net/">Host sFlow</a> which takes some of the same great characteristics from the network sFlow standard and applies them on the host and application side. This means that you can actually find out which URLs are being hit the most, which memcache keys are the hottest, and how that correlates with what you are seeing on the network.</p>
<h2>Setup</h2>
<p>Setting up host-flow could not be much easier. First, you can download packages for FreeBSD, Linux, or Windows from the <a href="http://host-sflow.sourceforge.net">SourceForge Site</a>. Once installed, when you start the daemon; on Linux, it will check <em>/etc/hsflowd.conf</em> to find out where the sFlow collector(s) are located. This is where the daemon will send all the data. You can also set things like polling and sampling rates in this file. If you wish, you may also define these using the location services in DNS. That's it.</p>
<p>You will also need a collector. The simplest collector is sflowtool which will capture the packets and present them to you in various formats all of which are consumable by your favorite scripting language. There are many <a href="http://www.sflow.org/products/collectors.php">collectors</a> to choose from. At <a href="http://about.tagged.com">Tagged</a> one of our favorite collectors is <a href="http://ganglia.sourceforge.net/">Ganglia!</a>. </p>
<p>As of Ganglia 3.2, it can understand and process sFlow packets. At Tagged, we have replaced all of our gmond process with hsflowd. </p>
<h2>Efficiency</h2>
<p>One of the great things about replacing our gmond processes is that our monitoring infrastructure is now much more efficient. With gmond, every metric that you measure sends a packet across the wire. If you sample every 15 seconds, it simply sends a packet every 15 seconds, for each metric that you monitor. With hsflowd, you can sample every 15 seconds, but hsflowd will batch all those metrics up into a single packet and send those across the wire. We are actually able to collect more metrics, more often, with less packets. On a big network like Tagged, anything we can do to lower our packets per second is a big win. The difficult part was converting from multicast which a trivial setup to unicast. We took it as an opportunity to templatize all our puppet configs for this purpose based on our CMDB. Now we have a system that we really love.</p>
<h2>A Standard, Really</h2>
<p>Perhaps one of the things that was most challenging to wrap our heads around is that sFlow is not a replacement for our <a href="http://ganglia.sourceforge.net/">Ganglia</a> or <a href="http://graphite.wikidot.com/">Graphite</a> tools. sFlow is a standard on switches and it's a standard on the host side too. That does not mean that you cannot instrument your own applications with sFlow. It means that this is not the default configuration for sFlow. It means that if you are going to look at your HTTP metrics whether they come from Apache, Nginx, or Tomcat, they are going to be the same metrics.</p>
<p> If you want to monitor things like the number of active users on your site, you can still do those things with gmetric or graphite. However if you want to be able to find out how many of your HTTP requests have 200, 300, or 500 response codes, and you want to be able to do that in real-time across a huge web farm ( which makes log analyzers and packet sniffers completely impractical) then you want <a href="http://code.google.com/p/mod-sflow/">mod-sflow</a> (for Apache).</p>
<IMG SRC="/roller/dave/resource/blogimages/apache.graph.png"><BR><BR>
<h2>Solves The Java JMX Problem</h2>
<p>There are a few other things that have me excited about sFlow. One, is that it solves the JVM monitoring problem. Ops folks always want to know how their Tomcat or JBoss servers are running. You can buy fancy tools from Oracle to do this, or you can use the <a href="http://code.google.com/p/jmx-sflow-agent/">jmx-sflow-agent</a>. Typically, the way we solve this problem is that we either fire up a tool like <a href="http://exchange.nagios.org/directory/Plugins/Java-Applications-and-Servers/check_jmx/details">check_jmx</a> which basically fires up a JVM each and every time it needs to check a metric *shudder*, or we run a long running java process that we need to constantly update with a list of servers to poll in order to get graphs of our heap sizes.</p>
<p>Alternatively you could run jmx-flow-agent which runs as a -javaagent argument to the jvm command line and have all your JVMs automatically send their metrics to a central location the moment they start.</p>
<h2>Cloud-Friendly</h2>
<p>That's the thing. When applications start up, they start sending their data via sFlow to a central location for you. There is no polling. This is the same model as all the next generation of monitoring tools like Ganglia and Graphite. This is cloud-friendly.</p>
<p> Imagine you were Netflix running thousands of instances on EC2. Would you rather have to update your config file every few seconds to make your monitoring systems aware of all the hosts that have been provisioned or destroyed, or would you like new hosts to just appear on your monitoring systems as the hosts appear? At Tagged, we would be constantly updating our config files every time a disk failed, or when a tier was expanded or a new one provisioned. We would have to specify in the file, which hosts were running Java, or Memcache or Apache, or both.</p>
<p> Instead, in our world, if an application is running on a host, we see that application in our monitoring tools, instantly. Deploying mod-sflow to your apache servers was as simple as creating an RPM and putting a few lines in Puppet. Awesome.</p>
<h2>The Future</h2>
<p>sFlow's relationship with the host side of the equation is just picking up steam now. We've been lucky enough to be at the leading edge of this, mostly through my giving my <a href="http://tech.mangot.com/roller/dave/entry/graphite_as_presented_to_the">LSPE Meetup Talk</a> at on the right day, at the right time. In the coming weeks, we hope to share more with the world what we're getting from using sFlow on our network, why we are loving it, and what problems it's helped us to solve.</p>http://tech.mangot.com/roller/dave/entry/graphite_as_presented_to_theGraphite as presented to the LSPE Meetup 16 June 2011dave2011-06-21T11:40:33-07:002011-06-21T11:40:34-07:00My talk at the LSPE Meetup.<p>Last Thursday I had the opportunity to give a talk on one of my favorite visualization tools, <a href="http://graphite.readthedocs.org">Graphite</a>, at the <a href="http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/15481164/">Bay Area Large Scale Production Engineering Meetup</a>. I'm posting the slides and video here for those of you that missed it, and also for reference. Special thanks to the Chris', Willis and Westin for writing Graphite and putting together the meetup. I hope to do more blogging about Graphite and Ganglia soon!</p>
<p>My slides:</p>
<div style="width:425px" id="__ss_8372113"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/dmangot/lspe-meetup-talk-on-graphite" title="LSPE Meetup talk on Graphite">LSPE Meetup talk on Graphite</a></strong><object id="__sse8372113" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=lspemeetiup-110621011821-phpapp02&stripped_title=lspe-meetup-talk-on-graphite&userName=dmangot" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse8372113" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=lspemeetiup-110621011821-phpapp02&stripped_title=lspe-meetup-talk-on-graphite&userName=dmangot" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object><div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/dmangot">dmangot</a>.</div></div>
<p>Video of my talk, I start talking about 10 minutes and 30 seconds into the recording.</p>
<p><object type="application/x-shockwave-flash" height="300" width="400" id="clip_embed_player_flash" data="http://www.justin.tv/widgets/archive_embed_player.swf" bgcolor="#000000"><param name="movie" value="http://www.justin.tv/widgets/archive_embed_player.swf" /><param name="allowScriptAccess" value="always" /><param name="allowNetworking" value="all" /><param name="allowFullScreen" value="true" /><param name="flashvars" value="auto_play=false&start_volume=25&title=SF Bay Area Large Scale Production Engineering-Speakers Vladimir Vuksan,Dave Mangot,Alexis Le-Quoc, Video by KC Leung&channel=kctv88&archive_id=288229232" /></object><br /><a href="http://www.justin.tv/kctv88#r=-rid-&s=em" class="trk" style="padding:2px 0px 4px; display:block; width: 320px; font-weight:normal; font-size:10px; text-decoration:underline; text-align:center;">Watch live video from kctv88 on Justin.tv</a></p>http://tech.mangot.com/roller/dave/entry/the_graphite_cliThe Graphite CLIdave2011-03-02T10:17:45-08:002011-03-03T11:36:31-08:00<P>As part of working on a large scale website like <A
HREF="http://www.tagged.com">Tagged</A> we are constantly exploring
new technologies to see what might be advantageous to help with the
site. Whether it's exploring NoSQL technologies, new storage or
server hardware, or visualization tools, there is no shortage of
software and hardware to try.</P>
<P>Recently, we've been trying out the <A
HREF="http://graphite.wikidot.com/">Graphite Realtime
Graphing</A> system. It started as an experiment during our latest
Hackathon, and the more we've tried it, the more things there
are to like. Because the current Graphite documentation doesn't
include a CLI tutorial, I thought it might be nice to write one.</P>
<P>One of the first things you notice when using Graphite is how powerful
and flexible its graphing system is. Sometimes I feel like Tom
Cruise in the "Minority Report" being able to have complete control
over the way I manipulate and visualize my data. One of the great
things (I guess if you're a Unix geek) about Graphite is that it also
comes with a CLI. <BR>
To access the CLI, simple point your browser at
http://yourgraphiteinstall/cli or just click the link on the top of
the regular composer window. You will be presented with a prompt like
this:<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.cmd.prompt.png"><BR>
If you haven't logged in yet (here I am logged in as "admin"), simply
type <B>'login'</B> at the command prompt and you will be taken to a screen
where you can login to Graphite. Without being logged in, we will be
unable to save our views, which we will get to a bit later.</P>
<P>While we can simply begin drawing graphs, right inside the actual
cli, I prefer to draw them inside windows. This way, if you create
multiple windows and save them as a view, you can move them around,
resize them, etc, independently of one another. To create a new
window, type <B>'create windowname'</B> and it pops up on your screen:<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.open.window.png"><BR>
</P>
<P>Now we have somewhere to plot our data. For the purposes of this
tutorial, we have populated our datastore with some fictionalized data
about some Earth defense stations that are using their lasers to blast
invading ships out of the sky. Now we are going to plot how the
individual stations are doing. In reality, this data can be anything
that you can get into Graphite (networking data, number of users on a
site, temperature, stock prices, etc). In the CLI, I type <B>'draw
battle.ships.'</B> and as I type, Graphite automatically shows me below
the cursor all the possible completions for my namespace as I
type:<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.draw.png"><BR>
</P>
<P>One of the nice things about using a CLI is that I can use
wildcards, so for this example, I type <B>'draw battle.ships.destroyed.*
in Earth_Defense'</B> and Graphite shows me my data in the window on a
smart looking graph. It has automatically matched all the different
parts of my namespace and plotted them individually on the graph (in
this case, SFO, JFK, and LAX).<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.draw.in.window.png"><BR>
</P>
<P>That's pretty nice, but it's difficult to see how the individual
battle stations are doing because we've only recently been getting our
reports in from the stations. So, we need to change the timescale of
our graph. In the CLI, this is trivially easy. We type <B>'change
Earth_Defense from to -20min'</B> and the timescale on our window is
updated instantly.<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.change.window.png"><BR>
</P>
<P>If we are going to send these reports to our superiors, they might
not be as tuned into our data collection methods as we are, so let's
add a title to our graph. We type <B>'change Earth_Defense title to
"Laser Batteries"'</B> and Graphite updates our graph once again.<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.change.title.png"><BR></P>
<P>That is just one example of a way we can manipulate a graph in the
CLI, but in reality, we can not only change the chrome of our graph,
but also work with the data itself. Maybe the generals in charge our
our battle don't care about the individual battle stations, they want
to know about the total number of ships destroyed minute by minute.
So let's give them a total they can see on the graph. We type <B>'add
sum(battle.ships.destroyed.*) to Earth_Defense'</B> and our total
kills now appears on the graph:<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.add.sum.png"><BR>
</P>
<P>There are lots more things we can do with the CLI, and many more
manipulations we can do on our graphs, but for now, we want to be able
to save our graph, so that we can return to it later. In the CLI
this is called a "view". You can have multiple windows/plots saved
in a view, but we are going to save our view now by typing <B>'save
laserreport'</B>.<BR>
<IMG SRC="/roller/dave/resource/blogimages/graphite.views.png"><BR>
If at some point in the future, we wanted to get this report back, we
could simple type <B>'views'</B> to get a list of all the different
saved views, followed by <B>'load viewname'</B> to actually retrieve
the report.</P>
<P>Of course having an easy to use CLI like graphite also allows us to do other powerful things. Having a CLI is almost like having a basic API, so we can actually script what we will cut and paste into our CLI in order to automate repetitive actions. We'll use this Ruby script as a basic example.<BR>
<pre name="code" class="brush: ruby;">
#!/usr/bin/ruby
graphite = { "Apple" => "AAPL", "Google" => "GOOG", "Visa" => "V" }
graphite.each { |key, value| puts "create #{key}\ndraw stocks.#{value} in #{key}\n" }
</pre>
Here we take a hash with information about our window titles and iterate through a list of stock quotes we have in our Graphite installation. You can imagine the possibilities this can give you. Here's the output:<BR>
<pre name="code" class="brush: plain;">
create Visa
draw stocks.V in Visa
create Google
draw stocks.GOOG in Google
create Apple
draw stocks.AAPL in Apple
</pre>
<P>Graphite is an exciting tool that is becoming easier to use and more capable with every release. Currently we're trying to see how it compares to the
tools that we have internally (like you can see <A HREF="http://about-tagged.com/news/monitor">here</A>).
There are a number of compelling attributes to both tools and it may be the case that one tool will not be the best fit for all cases. In the few short weeks we've been experimenting with Graphite, at the very least, it's proven to be a great addition to the sysadmin's toolbox.</P>
http://tech.mangot.com/roller/dave/entry/back_on_the_blog_gangBack on the Blog Gangdave2011-03-02T10:16:17-08:002011-03-02T10:16:17-08:00It's been about a year and a half since I last put up a blog entry. A lot has happened in that time, most notably in regards to blogging, I have a new job. After 4 awesome years at <A HREF="http://www.terracotta.org">Terracotta</A> learning about the latest in scaling technology, I moved to <A HREF="http://www.tagged.com">Tagged</A> where I'm applying the lessons I learned every day. Tagged is the 3rd largest social network in the US and we're growing. I like to say that it's big enough where all significant failures are cascading failures. Because the environment is so large and so complex, there is lots of great stuff to blog about. So today, it's back on the blog gang, and here's to lots of great new stuff going forward.
http://tech.mangot.com/roller/dave/entry/on_running_terracotta_on_ec2A framework for running anything on EC2: Terracotta tests on the Cloud - Part 1dave2009-08-17T16:03:08-07:002009-08-18T18:30:26-07:00A framework for running Terracotta (or any other) software on EC2<P>One of the most fun things about working as the CTO's SWAT team is he's always thinking of new ideas and things he wants to do. If you've ever met <A HREF="http://www.pojomojo.com">Ari</A>, you know the guy is an idea fountain.</P>
<P>One recent project he asked me to do was run a large test of <A HREF="http://www.terracotta.org">Terracotta</A> on a cloud. Any cloud. So I set out to find an appropriate cloud environment and a way to run one of our session clustering tests. What I wound up with was a framework that can be used to scale anything on EC2 with relative ease. </P>
<P>I was able to leverage a few sysadmin tools that I already knew about and some I was just learning, to create an environment where running an arbitrary size test is as simple as saying "runTest < number >" and the appropriate number of AMIs would be launched, configured, and the test would be run. In the end, collecting the results was just as simple.</P>
<P>The nice thing about running Terracotta on the cloud is that it's dead simple. There are a few reasons for this. One is that all clustering is done over standard TCP/IP, no multicast. One of the initial parts of the task was to run a comparison of Tomcat session clustering with Terracotta but I couldn't get it to work. There is no multicast traffic on EC2 so that method was eliminated. I also tried Tomcat's support for unicast clustering but it is so poorly documented that I gave up pretty quickly. I don't know if it even works. I expect this is a problem for any clustering/datagrid/distributed cache product that relies heavily on multicast support. Another great thing, is that the recommended way for Terracotta clients to get their configurations is to pull it directly from the server. This is a huge advantage in the cloud and anywhere you are trying to configure massive numbers of machines. As you'll see below, Puppet uses the same model.</P>
<P>This first blog post will primarily cover what it took to come up with a machine image for running the simple tests. For the actual results of the tests themselves, you'll have to stay tuned for Part 2.</P>
<H1>The Building Blocks</H1>
<H2>CentOS</H2>
<P>The first task was to pick an operating system. I'd run Ubuntu on the cloud with the smart guys at <A HREF="http://www.tubemogul.com">Tubemogul</A> before. But I've been really intrigued with a lot of the stuff coming out of the Fedora labs, specifically, <A HREF="https://fedorahosted.org/cobbler/">Cobbler</A> and <A HREF="https://fedorahosted.org/func/">Func</A> (even though I ultimately did not wind up using either). CentOS meant those were a possibility, and RedHat is also a supported Terracotta platform. I chose to take one of the excellent CentOS images made public by the guys at <A HREF="http://www.rightscale.com">Rightscale</A> (who also have a REALLY impressive platform) and modify it so that it had all the software we needed, as well as any configurations necessary for management and security.</P>
<H2>EC2</H2>
<P>As you already know by now, we ran the tests on EC2. The reason I mention it here is that the framework behind the tests is probably applicable to any cloud computing environment as long as it has an API. There are a few code changes that would be required but it shouldn't be too hard to abstract the underlying cloud environment away so that requesting more machines is as simple as that.</P>
<H2>Puppet</H2>
<P>I must admit when I began the project I was a complete <A HREF="http://reductivelabs.com/products/puppet/">Puppet</A> newbie. After the project, I'd moved up to intermediate. This is probably the main reason that Puppet wasn't used even more. The great thing is now that all the machines in the cloud actually talk to the puppetmaster, it can be embraced and extended to take advantage of all the great things Puppet can do. (Puppet is my new sysadmin's best friend, seriously)</P>
<H2>PSSH</H2>
<P>Pssh is a great piece of Python written by Brent Chun over at <A HREF="http://www.theether.org/pssh/">The Ether.Org</A>. It is actually a suite of tools like pssh, pslurp, prsync, and pcsp that will allow you to ssh onto dozens of machines in parallel (by default 32) and perform the above operations. We could have done some of these things with Puppet and puppetrun (though before 0.25 Puppet is terrible at delivering directory trees), but instead pssh was used to run the tests, whereas Puppet was used to configure the OS.</P>
<H2>Perl and Shell</H2>
<P>The last parts were plain old Perl and Shell taking advantage of the Net-Amazon-EC2 Perl module. Using some scripting in both languages, I was able to have the AMIs auto-configure themselves on launch to be ready for the cluster.</P>
<H1>Making it work</H1>
<P>The first task was to be able to have the machines join our "cluster" when they are launched. This process works like this:</P>
<OL>
<LI>The AMI is launched on EC2 (we used a c1.medium instance for this) to play the role of the Terracotta server/puppetmaster.</LI>
<LI>A credentials file is edited to contain the information needed to be able to spawn more copies of the AMI</LI>
<LI>A script is run that fixes puppet and creates a TC user, then a tarball is extracted as root that sets up the puppet config and some cron jobs necessary to make everything work. The reason this is done manually is that we don't want each system to try and become the puppetmaster or the munin-host. This one time operation allows us to use the same AMI for both master and workers.</LI>
<LI>The keys and software for the Terracotta user to run are copied onto the launched AMI. For security reasons we didn't want to store private ssh keys on the AMI. Additionally, we didn't want to have to build a new AMI every time a new version of Terracotta or Tomcat were released. With this method, we can stay current and the only limitation is the speed of the file copy. We only have to do this once however, as all the remaining file distribution is done from the master and occurs at gigabit (see my previous <A HREF="http://tech.mangot.com/roller/dave/entry/ec2_variability_the_numbers_revealed">post</A> on this topic) speeds within the cloud. There could definitely be a method developed where the host pulls files/configuration from S3 and gets the latest version, but for the purposes of our test, we went the simple route.</LI>
<LI>Now that all the pieces are in place, we launch the workers from the master using a Perl script and they all call in to the puppetmaster and are configured in such a way that they can be managed from the master itself. This step could also be performed instead of the previous one as there is no dependency on the Terracotta user for the machines to be configured. We used the m1.small image for these tests.</LI>
<LI>Additionally, all the workers are configured as Munin nodes so that we can track the various metrics of system activity during the lifetime of the launched AMIs.</LI>
<LI>Finally the tests can be run from the Terracotta user on the master.</LI>
</OL>
<P>After the AMI is launched, we logon as root and run a script (called newmaster.sh, listed below) that will setup the machine as the master, install some cronjobs, the puppet config, and launch the appropriate daemons. We also need to put the proper credentials in /mnt/creds.cfg (AWS keys, AMI type, keypair name). This file is used by the launching script. Here is the script we run to create the master out of our AMI.
</P>
<I>~root/newmaster.sh</I><BR>
<pre name="code" class="brush: shell;">
#!/bin/bash
# setup this machine as a new L2 master/puppetmaster/etc.
/sbin/service puppet stop
/sbin/service puppetmaster start
/usr/sbin/groupadd -g 481 tc
/usr/sbin/useradd -g 481 -u 481 -c "Terracotta user" -d /mnt/tc -m tc
/bin/cp /etc/ec2/creds.cfg.template /mnt/creds.cfg
echo "** You must edit /mnt/creds.cfg to be able to launch workers **"
cd /
tar -xf ~root/l2master.tar
</PRE>
<P>Here is the script used to launch the hosts:</P>
<I>/usr/local/bin/launch.EC2.workers.pl</I><P>
<pre name="code" class="brush: perl;">
#!/usr/bin/perl -w
use strict;
use Net::Amazon::EC2;
use MIME::Base64;
use Config::Tiny;
use LWP::Simple;
die "launch.EC2.workers.pl numinstances" unless (scalar @ARGV == 1);
my $Config = Config::Tiny->new();
$Config = Config::Tiny->read('/mnt/creds.cfg') || die "Can't read credentials file";
# get these from creds.cfg
my $ec2 = Net::Amazon::EC2->new(
AWSAccessKeyId => $Config->{_}->{accesskey},
SecretAccessKey =>$Config->{_}->{secretkey}
);
my $workernum=$ARGV[0];
my $url = "http://169.254.169.254/latest/meta-data/local-hostname";
my $hostname = get $url;
die "Couldn't get $url" unless defined $hostname;
print "Launching $workernum instance(s) with master node $hostname\n";
my $userdata = encode_base64("L2ip=$hostname");
# launch the workers
my $instance = $ec2->run_instances(ImageId => $Config->{_}->{workerami},
MinCount => $workernum, MaxCount => $workernum, KeyName => $Config->{_}->{keypair},
UserData => $userdata);
# should put some error checking here
</pre>
<P>The most important thing to note here is that we identify the master server with the user-data parameter. This is the key to the entire cloud deployment. When the machine boots, it executes the following code which populates the puppetmaster and munin host information on the worker. The munin configuration could be done by Puppet itself, and probably should be. This might require the creation of an Augeas lens for the munin config file.</P>
<I>/etc/rc.d/init.d/tcec2init</I><BR>
<pre name="code" class="brush: perl;">
#!/bin/bash
# tcec2init Init script for setting up customizations on EC2
#
# Author: Dave Mangot <dmangot at terracottatech.com>
#
# chkconfig: - 89 02
#
# description: gets system ready for TC/EC2 management
PATH=/usr/bin:/sbin:/bin:/usr/sbin
export PATH
RETVAL=0
# Source function library.
. /etc/rc.d/init.d/functions
start() {
echo -n $"Getting my configs: "
RETVAL=$?
/usr/local/bin/get.EC2.userdata.pl
[ $RETVAL = 0 ]
return $RETVAL
}
stop() {
echo -n $"Nothing"
RETVAL=$?
echo
[ $RETVAL = 0 ]
}
reload() {
echo -n $"Nothing"
RETVAL=$?
echo
return $RETVAL
}
restart() {
stop
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
reload|force-reload)
reload
;;
status)
echo ""
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|status|restart|reload|force-reload"
exit 1
esac
exit $RETVAL
</pre>
<P>Very simply this is an init.d script that starts early in the boot cycle. This way the values for the Puppet startup and Munin startup are already configured before those daemons try and start.</P>
<I>/usr/local/bin/get.EC2.userdata.pl</I><BR>
<pre name="code" class="brush: perl;">
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use Tie::File;
sub setupPuppet {
my $puppetMaster = shift;
my @lines;
tie @lines, 'Tie::File', "/etc/sysconfig/puppet" or die "Can't read file: $!\n";
foreach ( @lines ) {
s/\#PUPPET\_SERVER\=puppet/PUPPET_SERVER=$puppetMaster/;
}
untie @lines;
}
sub setupMunin {
# could have made this a function that took munin/puppet arg but, why bother, so trivial
my $notNode = shift;
# we want notNode as specially formatted IP, thankfully, Perl will allow us to construct this easily
my (undef,undef,undef,undef,@addrinfo) = gethostbyname($notNode);
my ($a, $b, $c, $d) = unpack('C4', $addrinfo[0]);
my $modNode= "^" . $a . "\\." . $b . "\\." . $c . "\\." . $d . "\$";
my @lines;
tie @lines, 'Tie::File', "/etc/munin/munin-node.conf" or die "Can't read file: $!\n";
foreach ( @lines ) {
s/\\.1\$/\\.1\$\nallow $modNode\n/;
}
untie @lines;
}
sub setupDaemons {
my $masterIP = shift;
setupMunin($masterIP);
setupPuppet($masterIP);
}
my $req = get 'http://169.254.169.254/latest/user-data';
my @returned = split /\|/, $req;
# here we store each keypair in /var/local/key
foreach my $item (@returned) {
my @datakey = split /=/, $item;
chomp($datakey[1]);
open EC2DATA , ">/var/local/$datakey[0]" or die "can't open datafile: $!";
print EC2DATA "$datakey[1]\n";
close EC2DATA;
setupDaemons($datakey[1]) if ($datakey[0] eq "L2ip");
}
</pre>
<P>This script is one of the keys to the entire infrastructure. We are passing in the IP address of the L2 (the master) but we could also pass in other data as key-value pairs. If an 'L2ip' key is detected then we will setup the Munin and Puppet daemons. If any other key-value pairs are passed in (by modifying the launch script) then they will also be placed as a file in /var/local/. This way if we have other software on our system that needs access to any of those values, they will always be available. In reality, we can simulate much of this functionality in Puppet if desired. We will use the L2ip file to identify the Terracotta server when we run our sessions test.</P>
<H2>Puppet</H2>
<H3>autosign</H3>
<P>While it's great that the workers know how to find the master, the master still has to recognize them as valid clients and accept their certificates as valid workers. For this we use the autosign.conf ability of Puppet. We populate <I>/etc/puppet/autosign.conf</I> using the following cron entry and script.</P>
Cronjob:
<pre name="code" class="brush: shell;">
* * * * * /usr/local/bin/get.EC2.workers.pl > /etc/puppet/autosign.conf
</pre>
Code:
<pre name="code" class="brush: perl;">
#!/usr/bin/perl -w
use strict;
use Net::Amazon::EC2;
use MIME::Base64;
use Config::Tiny;
use Sys::Hostname;
exit 1 unless (-e "/mnt/creds.cfg");
my $Config = Config::Tiny->new();
$Config = Config::Tiny->read('/mnt/creds.cfg') || die "Can't read credentials file";
my $ec2 = Net::Amazon::EC2->new(
AWSAccessKeyId => $Config->{_}->{accesskey},
SecretAccessKey =>$Config->{_}->{secretkey}
);
open MUNIN, ">/etc/munin/munin.conf" or die "cant't open munin.conf: $!";
print MUNIN <<EOT;
dbdir /var/lib/munin
htmldir /var/www/html/munin
logdir /var/log/munin
rundir /var/run/munin
tmpldir /etc/munin/templates
[localhost]
address 127.0.0.1
use_node_name yes
EOT
our $hostname = hostname();
my $running_instances = $ec2->describe_instances;
foreach my $reservation (@$running_instances) {
foreach my $instance ($reservation->instances_set) {
if ($instance->private_dns_name) {
# don't want to use the master as a worker node
next if ($instance->private_dns_name =~ /$hostname/); #. "compute-1.internal");
print $instance->private_dns_name . "\n";
my (undef,undef,undef,undef,@addrinfo) = gethostbyname($instance->private_dns_name);
my ($a, $b, $c, $d) = unpack('C4', $addrinfo[0]);
my $nodeaddress= $a . "\." . $b . "\." . $c . "\." . $d;
my $nodename = $instance->private_dns_name;
print MUNIN "[tcscale;$nodename]\n\taddress $nodeaddress\n\tuse_node_name yes\n\n";
}
}
}
close MUNIN;
</pre>
<P>This script updates the munin config on the master and prints out all the workers hostnames to STDOUT. The output when run from cron is redirected to <I>/etc/puppet/autosign.conf</I> so that when the clients call in after booting, they will automatically be accepted by Puppet. From a security perspective, there is an obvious race condition if an attacker can guess the hostnames of the machines that will be assigned by EC2, or is able to determine those hostnames before those hosts can boot initially. One possibility would be to also write some IP tables rules to wall off anyone but the proper IPs. This way, an attacker would also have to spoof an IP address which could potentially be more difficult to execute successfully in the EC2 environment if they have proper security mechanisms in place. For the purposes of our test, an attacker could disrupt the test but they would not gain access to any of our virtual infrastructure as a result.</P>
<H3>puppet config</H3>
/etc/puppet/manifests/site.pp
<pre name="code" class="brush: plain;">
import "classes/*.pp"
node default {
include tcuser
}
</pre>
<P>Very simple, all our test nodes are configured with a default configuration. Obviously we can add whatever classes we wish into this configuration to do things like install app servers, db server, load balancers, etc. or any of the configurations necessary to get those services up and running. This is what is great about puppet, and what makes it so powerful for these arbitrarily sized cloud deployments.</P>
<I>/etc/puppet/manifests/classes/tcuser.pp</I><BR>
<pre name="code" class="brush: plain;">
class tcuser {
group { "tc":
ensure => "present",
provider => 'groupadd',
gid => 481,
name => "tc"
}
user { "tc":
comment => "TC user",
gid => 481,
home => "/mnt/tc",
name => "tc",
shell => "/bin/bash",
managehome => true,
uid => 481
}
file { "mnttcssh":
name => "/mnt/tc/.ssh",
owner => tc,
group => tc,
ensure => directory,
mode => 700,
require => File["mnttc"];
}
file { "mnttc":
name => "/mnt/tc",
owner => tc,
group => tc,
ensure => directory,
mode => 700,
require => User["tc"];
}
file { "/mnt/tc/.ssh/authorized_keys":
content => "ssh-dss AAAA..truncated..meI8A==",
owner => tc,
group => tc,
mode => 700,
require => File["mnttc"]
}
}
</pre>
<P>Though we actually store the private key off the cloud for security reasons, this creates the user under which we will be running all our tests (not root!). Once this is created, we do all our management of the remote workers as the tc user.</P>
<H2>Munin</H2>
<P>One of the best things about Munin is that you get so many great graphs and data collected for very, very, little effort. On the master machine, our <I>/etc/munin/munin.conf</I> is updated with the same script as the autosign.conf to look something like this:</P>
<pre name="code" class="brush: plain;">
dbdir /var/lib/munin
htmldir /var/www/html/munin
logdir /var/log/munin
rundir /var/run/munin
tmpldir /etc/munin/templates
[localhost]
address 127.0.0.1
use_node_name yes
[tcscale;domU-12-31-39-03-C1-06.compute-1.internal]
address 10.249.194.244
use_node_name yes
</pre>
<P>The clients are just updated on boot to point at the munin master. I like to look at my munin graphs through SSH port forwarding (you will need to start Apache which is not started by default). You could also open up the EC2 firewall to allow traffic in from your location.</P>
<H2>Additional software</H2>
<P>There are a few additional packages that are installed that are convenient for testing or that could be used by workers for testing or deployments. Some examples:</P>
<UL>
<LI>nttcp</LI>
<LI>haproxy</LI>
<LI>iozone</LI>
<LI>Sun JDK 1.6 (in /usr/java)</LI>
</UL>
<H1>Setting up the Sessions Test</H1>
<I>~tc/bin/runTC.sh</I><BR>
<pre name=code class="brush: shell;">
#!/bin/sh
# sudoers has a NOPASSWD field for this, want a clean environment for each test
sudo /usr/sbin/puppetca --clean --all > /dev/null
/usr/local/bin/launch.EC2.workers.pl $1
workerhosts=`cat /etc/puppet/autosign.conf | wc -l `
while [ $workerhosts -lt $1 ]
do
sleep 60
workerhosts=`cat /etc/puppet/autosign.conf | wc -l `
done
echo "determining L1s and load generators"
~/bin/partworkers.sh
if [ $? != 0]; then
exit $?
fi
~/bin/tcgo.sh
~/bin/tomcatgo.sh
~/bin/lggo.sh
</pre>
<P>We'll cover the actual content of these scripts in the next post, which will actually be running Terracotta on the cloud, as opposed to setting up the infrastructure. Briefly, we clear out our puppet clients between runs as we use fresh workers for each run of the test (though, this is by no means a requirement). We launch any number of workers and wait until they are all up and running. We partition our workers into load generators and Tomcat servers. We launch, Terracotta, then Tomcat, then the load generators. Simple!</P>
<H1>Conclusion</H1>
<P>As you can see, we have a very flexible infrastructure available to us now on EC2. We can launch any number of clients from our "master" server and control them as well. They can be partitioned any way we like to serve any number of roles and we keep track of them in a central location which allows us to make decisions about our "cluster" and take actions if we like. For the Terracotta test, we'll be able to show how Terracotta scales linearly in the cloud and just how simple it is to configure for this purpose. Obviously, this is not a complete solution for anything other than running our current test. Here are some things we could do to extend our functionality:
</P>
<UL>
<LI> The newest puppet (0.25) release candidate has REST so could deploy directories that way instead of using prsync (which we'll see in part 2).
<LI> EC2 gives you the ability to add a "tag" description to your instances. We could add that capability to the Net::Amazon:EC2 module and classify servers in puppet based on tag (e.g. load generator, load balancer, tomcat instance). This would allow us to deploy only the software we need to each node.
<LI> We could create an Augeas lens for munin.conf. This way we wouldn't have to edit the file with Perl but simply define the configuration in Puppet. (yes, I am currently in love with Puppet)
<LI> The timescales for munin by default are really long. They also cannot be chosen for some reason. For our short tests, it's not really that effective. For a longer running task, it would probably be fine. We could modify the Munin source code for a shorter timescale or just choose something else. I chose Munin to start with because it give you so much for so little effort. Caveat emptor.
<LI>We could improve our error checking, of course.
<LI> Everything we need for autoscaling is already there. We can do node classification in puppet to add webservers, app servers, etc. We could use metrics from Munin to determine when we need to add nodes. Currently there is no way to remove nodes but that would be pretty quick to script. We could also monitor a load balancer like haproxy either locally or remotely and add nodes based on traffic.
<LI> We could also rebuild the AMI as an x86_64 architecture. When running the Terracotta tests, we just kept adding nodes until the L2 eventually fell over. We could scale much higher with a bigger EC2 instance. Currently, we are limited to very small virtual machines. We could also run the <A HREF="http://www.terracotta.org/web/display/enterprise/Products#editions">Terracotta FX</A> product and scale the Terracotta server across nodes. Combining bigger machines and FX would give us an ridiculous amount of capacity.
</UL>
<P>There are probably a number of different directions this could take and I'm sure people have really good ideas. If you'd like to take the AMI for a spin it's available on EC2 as ami-e14dac88 (search for tcscale). Enjoy and stay tuned for part 2!
</P>http://tech.mangot.com/roller/dave/entry/a_trade_show_booth_partA Trade Show Booth: Part 2 - The Puppet Configdave2009-08-04T23:41:28-07:002009-08-04T23:41:29-07:00After writing the blog post about running a <A HREF="/roller/dave/entry/a_trade_show_booth_with">trade show booth with OpenBSD and
PF</A>, a few people asked about the <A HREF="http://reductivelabs.com/products/puppet/">puppet </A>configuration. The configuration is
actually dead simple. The idea is to configure the machines with the base configuration they need,
as well as provide for future management. Because this is such a simple environment, we are able to
keep the flexibility of both configuration management coupled with instantaneous change.
The best way to understand this is just to have a look at the site.pp.
<pre name=code class='brush: plain;' >
import "classes/*.pp"
node default {
include rootssh
include tchomedir
include postfix
include mysql
}
</pre>
<P>Let's look at each file in turn</P>
<H2>rootssh</H2>
<pre name=code class="brush :plain;" >
class rootssh {
file { "/root/.ssh":
owner => root,
group => root,
mode => 700,
ensure => directory
}
file { "/root/.ssh/authorized_keys":
owner => root,
group => root,
mode => 700,
content => "ssh-dss AAAAB...truncated...N==",
require => File["/root/.ssh"]
}
}
</pre>
<P>I know there is a puppet directive to handle ssh keys but it doesn't work, at least it didn't in my setup. Using 'content'
to the 'file' type works well. This enables the puppetmaster (<A HREF="http://www.openbsd.org">OpenBSD</A> box) to
control all the puppet hosts as root. Turns out we didn't need to use this very much but it's handy to have nonetheless.</P>
<H2>tchomedir</H2>
<pre name=code class="brush: plain;">
class tchomedir {
file { "/home/terracotta/terracotta":
owner => terracotta,
group => terracotta,
ensure => "./terracotta-3.1.0-beta.jun2",
}
file { "/home/terracotta/examinator":
owner => terracotta,
group => terracotta,
ensure => "./examinator-cache-1.2.0-SNAPSHOT.jun2",
}
file { "/home/terracotta/Destkop/Demo.Deck.Simple.2.2.pdf ":
owner => terracotta,
group => terracotta,
mode => 755,
name => "/home/terracotta/Desktop/Demo.Deck.Simple.2.2.pdf",
source => "puppet:///dist/Demo.Deck.Simple.2.2.pdf"
}
file { "/home/terracotta/.bashrc":
owner => terracotta,
group => terracotta,
ensure => present,
source => "puppet:///dist/bashrc"
}
}
</pre>
<P>We already had the terracotta user and group created, though this could have been easily done in puppet, maybe next year.
Here we make sure that the "terracotta" and "examinator" symlinks are always pointing to the right kits. We distribute a PDF file
used for the demos, and make sure everyone has the same centrally configured .bashrc</P>
<H2>postfix</H2>
<PRE name=code class="brush: plain" >
class postfix {
package { "postfix":
name => postfix,
provider => apt,
ensure => present
}
file { "/etc/postfix/main.cf":
owner => root,
group => root,
source => "puppet:///dist/main.cf"
}
service { "postfix-svc":
ensure => running,
subscribe => File["/etc/postfix/main.cf"],
require => Package["postfix"],
name => postfix
}
}
</PRE>
<P>Pretty simple and demonstrates what is pretty awesome about puppet. This definition retrieves and installs postfix, gets
the postfix configuration from the puppetmaster and makes sure postfix is running. It will even reload the postfix
configuration if the main.cf on the puppetmaster changes.</P>
<H2>mysql</H2>
<pre name=code class="brush: plain">
class mysql {
file { "/etc/init.d/mysql":
owner => root,
group => root,
source => "puppet:///dist/mysql"
}
service { "mysql-svc":
ensure => running,
subscribe => File["/etc/init.d/mysql"],
name => mysql
}
}
</pre>
<P>Here we could have used puppet to install mysql just like we did for postfix, but it is already on there from the
machine build. We replace the init script for mysql with our own with some custom arguements and then make sure the service
is running.</P>
<H2>Distributing the Terracotta kits</H2>
<P>To distribute the Examinator and Terracotta kits we used prsync in the PSSH kit by <A HREF="http://www.theether.org/pssh/">Brent Chun</A>. It can run a command, sync over rsync, etc. to 32 machines over ssh in parallel by default, and more if configured that way. With this we ran a single prsync command that distributed out each directory to all the machines at the same time whenever we needed to update the kit.</P>
<H2>Questions</H2>
So there are two questions that might be asked about this configuration.
<DL>
<DT>
Why aren't you using puppet to distribute the kits?
</DT>
<DD>
Prior to the new Puppet 0.25 RC just released with the beginnings of a move toward REST from XML-RPC, puppet has been
terrible at recursively copying large directory trees. We tried to distribute the <A href="http://www.terracotta.org">
Terracotta </A> kit with all its directories
and jars and it blew up running out of file descriptors. The is reportedly fixed in the RC mentioned above.
</DD>
<DT>
Why are you using prsync instead of puppetrun?
<DT>
<DD>
Because we couldn't use puppet to distribute the kits so triggering the run can only be used to shift the symlink. We could
have used an <i>exec</i> directive in the puppet config to pull the file from somewhere and untar it but that seems no less hackish.
</DD>
</DL>
<H2>Summary</H2>
<P>Using puppet definitely makes our lives easier when setting up a trade show booth. This is the first year we've used puppet and I've already learned a lot about the tool and will be using it even more extensively next year to separate the OS install, from the booth dependencies (e.g. mysql). See you at JavaOne! </P>
http://tech.mangot.com/roller/dave/entry/intstalling_fedora_10_on_aIntstalling Fedora 10 on a Mac Minidave2009-07-28T15:59:35-07:002009-07-28T16:04:30-07:00<P>I was on an interview once and the interviewer asked me what kind of Unix I run outside of work. I thought this was an excellent question and one that I often use myself when hiring sysadmins. With few exceptions, the people who are enthusiastic enough to have a machine at home that they can play with and learn from tend to make the best sysadmins.</P>
<P>I responded that I love to run operating systems on "exotic" hardware. I've run <A HREF="http://www.openbsd.org">OpenBSD</A> on both a Sun IPC and SparcStation 20 (with a 150 MHz Hypersparc processor mind you!) and am currently running it on a <A HREF="http://www.soekris.com/">Soekris</A> Net4801. I'm also running <A HREF="http://www.fedoraproject.org">Fedora</A> on a Mac Mini. </P><IMG SRC="/roller/dave/resource/blogimages/fedora.icon.png">
<P>At <A HREF="http://www.terracotta.org">Terracotta</A> we needed some extra machines for a project I'm working on (very hush, hush <img src="http://tech.mangot.com/roller/images/smileys/smile.gif" class="smiley" alt=":)" title=":)" /> ), so I decided to drop Fedora 10 on 2 Mac Minis we have that are currently unused. It was not without it's tricks however, even though we have a pretty nice Kickstart setup which configures everything, including <A HREF="http://reductivelabs.com/products/puppet/">Puppet</A>.</P>
<P>Here are the steps:</P>
<OL>
<LI>Boot the Mac from an installer CD and open up the Disk Utility</LI>
<LI>Partition the drive to have 1 partition and under Options, choose MBR (master boot record)</LI>
<LI>Install Fedora 10 as normal</LI>
<LI>Boot up and find out that your machine won't take a DHCP address (arrrgggh!)</LI>
<LI><I>sudo yum -y erase NetworkManager</I></LI>
<LI><I>sudo /sbin/chkconfig network on</I></LI>
<LI><I>sudo reboot</I></LI>
</OL>
<P></P>
<P>For some reason I have yet to understand, the folks at Fedora and many other Linux distributions have this notion that everyone want to run Linux as a desktop. While I do like my Ubuntu desktop (running on a boring PC), I have much more need for servers in my day job. Why Fedora seems to default to a desktop configuration is beyond me. (NetworkManager is for managing a desktop's network configuration). After I turned on networking, everything behaved as normal.</P>
<P>Believe it or not, people actually do run Fedora on servers even though it's very frustrating to have to upgrade every 6 months. Maybe I'm used to it from the OpenBSD release cycle I've been following since OpenBSD 2.5. The reason we use it is at a "cutting edge startup" like <A HREF="http://www.terracotta.org">Terracotta</A> the developers like to have the latest everything (ruby, svn, etc.) and running on Fedora allows us to provide those things through the regular package management tools (in this case, yum) that come with the system. No need to search all over the Internet for RPMs or build our own, we have enough to do.</P>
http://tech.mangot.com/roller/dave/entry/a_trade_show_booth_withA Trade Show booth with PF and OpenBSDdave2009-06-23T10:04:22-07:002009-06-23T13:34:39-07:00<P>A few months after I started at <A HREF="http://www.terracottatech.com">Terracotta</A> I attended my first JavaOne conference. Not as an attendee, but as an exhibitor. The boss came and asked me to build up some infrastructure to run a booth. Over the years, the setup of the booth and some of the software and equipment has changed, but the primary design principles have not.
</P>
<UL>
<LI>Allow all machines in the booth to share a single Internet connection</LI>
<LI>Make it simple to setup and use</LI>
<LI>Allow employees to check their email, etc. from the booth</LI>
<LI>Allow the sales engineers to explore potential client websites</LI>
<LI>Do not allow demo stations to be used by conference attendees to check their email or hog the demo station trying to show us their website</LI>
<LI>Make it secure so that we don't have any demo "surprises"</LI>
<LI>Make sure all the demo stations are consistent</LI>
</UL>
<P>
I turned to one of my favorite operating systems to solve the problem, OpenBSD. Here is what the network looks like as of 2009.
<P>
<IMG SRC="/roller/dave/resource/blogimages/trade.show.top.png" ALT="tradeshow network diagram">
<P>
We get our Internet connection from <A HREF="http://www.prioritynetworks.com">Priority Networks</A> and every year it is rock solid, they are super easy to work with, and when you need help, they actually know what you're talking about!<BR>
As you can see, each daemon on the machine serves a purpose to running the overall network. Each daemon (other than PF) is only assigned to the internal interface.
</P>
<dl>
<dt>named</dt>
<dd>We run a private domain inside the booth (javaone.tc) and also need standard resolving for internal clients</dd>
<dt>dhcp</dt>
<dd>Demo machines are given static IPs, all other clients are assigned to a different part of the subnet, more on this later</dd>
<dt>puppetmasterd</dt>
<dd>Now that machines have gotten faster and we have less graphical demos, we can run all Unix demo stations. Puppet makes sure all the machines are 100% consistent and makes it much easier to setup machines initially or substitute in a new station in case of some kind of problem</dd>
<dt>PF</dt>
<dd>This is where all the magic happens, why you can type www.yahoo.com and wind up at Terracotta.org</dd>
<dt>httpd</dt>
<dd>This was more important before puppet and when we still had Windows, but Apache is still a great way to serve up files to any network</dd>
<dt>ntpd</dt>
<dd>We're a Java clustering company and it's very important to have synchronized clocks in a cluster, then again, isn't it always?</dd>
</dl>
<P>
As you can see above, we have a private domain inside the booth. It's just a simple /24 divided in two. Machines in the lower half of the subnet are assigned static IPs by MAC address, this is for the demo stations only. Machines in the top half of the subnet (129-254) are assigned IPs dynamically and this range is for any employee who brings their laptop to the booth and wants to login to check email, fix a bug, etc. PF treats the two IP ranges differently.
</P>
<P>
Here is the firewall ruleset: <BR>
<code>
ext_if="bge0"
int_if="dc0"
DEMOSTATIONS="192.168.100.0/25"
EMPLOYEES="192.168.100.128/25"
set skip on lo
# allow demo stations to access Terracotta and a few other websites we rely upon
table <TCOK> { 64.95.112.224/27, www.google-analytics.com, now.eloqua.com, secure.eloqua.com, download.terracotta.org }
scrub all
nat-anchor "ftp-proxy/*"
rdr-anchor "ftp-proxy/*"
nat on $ext_if from $int_if:network -> ($ext_if:0)
rdr pass on $int_if proto tcp from $int_if:network to port 21 -> 127.0.0.1 port 8021
rdr pass on $int_if proto tcp from $DEMOSTATIONS to ! <TCOK> port 80 -> 64.95.112.233 port 80
rdr pass on $int_if proto tcp from $DEMOSTATIONS to ! <TCOK> port 443 -> 64.95.112.233 port 80
anchor "ftp-proxy/*"
block log all
pass quick on $int_if no state
antispoof quick for { lo $ext_if }
# fw inbound - for remote admin when Priority Networks allows this
pass in quick on $ext_if proto tcp to ($ext_if) port ssh
# fw outbound
pass out quick on $ext_if proto tcp from ($ext_if) to any modulate state flags S/SA
pass out quick on $ext_if proto udp from ($ext_if) to any keep state
# int outbound
pass in quick proto tcp from $DEMOSTATIONS to any port { 22 25 80 443 8081 } modulate state flags S/SA
pass in quick proto udp from $DEMOSTATIONS to any port { 53 } keep state
pass in quick proto tcp from $EMPLOYEES to any modulate state flags S/SA
pass in quick proto udp from $EMPLOYEES to any keep state
</code>
<P>
The only problem with this ruleset is that the name resolution for domains that are hardcoded in the ruleset (e.g. www.google-analytics.com) can only really happen after the OS has booted. Otherwise, the boot sequence stalls on name resolution. The workaround for this is to disable PF in /etc/rc.conf.local and enable it with <pre>pfctl -e -f /etc/pf.conf</pre> in /etc/rc.local. That is really the only necessary workaround.</P>
<P>As you can see, it's actually a REALLY, REALLY permissive ruleset. Much more permissive than we allow in the office. Because there is rarely a Terracotta sysadmin on the show floor during the conference, and because there are tons of open access points which our employees would use if we locked them down too much anyway, we feel this is a pretty acceptable level of risk for the few days of the show. We could certainly lock down the ports employees could access, restrict to their MAC addresses, or even put in authpf for them to authenticate, but that would mean maintaining a password file outside the corporate office, or duplicating the LDAP server, or setting up an IPSEC tunnel, all of which are excessive for a few days of conference.
</P>
<P>That's really all there is to it (other than some GENERATE statements in the zone files). Free, functional, easy, and secure by <A HREF="http://www.openbsd.org">OpenBSD</A>.
http://tech.mangot.com/roller/dave/entry/ec2_variability_the_numbers_revealedEC2 Variability: The numbers revealeddave2009-05-13T15:22:57-07:002009-05-13T17:01:23-07:00<H1>Measuring EC2 system performance</H1>
<P>I've been spending a lot of time at <A HREF="http://www.terracotta.org">Terracotta</A> working on cloud deployments of Terracotta and the cloud in general. People have been asking me what the difference is running apps on the cloud, and specifically EC2. There are a number of differences (m1.small is a uni-processor machine!) but the number one answer is "Variability". You just cannot rely on getting a consistent level of performance in the cloud. At least not that I've been able to observe.
</P>
<P>I decided to put EC2 to the test and examine three different areas that were easy to measure: disk I/O, latency, and bandwidth<P>
<BR>
<H2> Disk I/O </H2>
<UL>
<LI>Environment: EC2 m1.small
<LI>File size: 10 MB (mmap() files)
<LI>Mount point: /mnt
<LI>Testing software: <A HREF="http://www.iozone.org">iozone</A>
<LI>Duration: almost 24 hours
</UL>
<A HREF="http://tech.mangot.com/roller/dave/resource/blogimages/ec2.iozone.png"><IMG SRC="http://tech.mangot.com/roller/dave/resource/blogimages/ec2.iozone.png" alt="IO Zone graph" height="300" width="550"></A>
<P>
As you can see the numbers can vary a good deal. This is on an otherwise completely quiescent virtual machine and with a 10 MB filesize, the tests themselves took almost no time to run. Most of the numbers actually look remarkably consistent with the exception of Random Reads. Those numbers are all over the place which you might expect from "random" but this looks to be a bit much. The numbers are actually pretty respectable and compare to about a 7200 RPM SATA drive. Certainly not the kind of machine you would use for performance benchmarks, but if you threw enough instances at a clustering problem, you could certainly get the job done.
</P>
<BR>
<H2> Latency </H2>
<UL>
<LI>Environment: EC2 m1.small
<LI>Datacenter: us-east-1b
<LI>Testing software: <A HREF="http://oss.oetiker.ch/smokeping/">smokeping</A>
<LI>Duration: about 20 hours
</UL>
<IMG SRC="http://tech.mangot.com/roller/dave/resource/blogimages/latency.ec2.png" ALT="ec2 latency graph">
<P>
Here, where networking is involved between instances, things start to get a little bit more varied. The median RTT is 0.3506 ms which is about 3 times more latency than you would get on a typical gigabit ethernet network. You can see the numbers hover there for the most part but there is a tremendous amount of variability around that number. Smokeping shows outliers about 2 ms but I have seen numbers as high as 65 ms or worse in ad hoc tests. I don't know what happened at 4 a.m. on this graph but I'm glad I wasn't running a production application at the time. If you look closely, you can also see a few instances of packet loss which is something we don't usually experience on a production network. Again, this is on an otherwise quiescent machine. For comparison's sake, here is the smokeping graph between Terracotta's San Francisco and India office which is actually carrying a fair bit of traffic. This is a LAN to WAN comparison so the numbers are not going to look as exaggerated because they are running on a different scale, but in the EC2 instance, we can see more than 5 times the variability in latency, which we don't see on the WAN segment (or ever on any of my lab switches for that matter).
</P>
<IMG SRC="http://tech.mangot.com/roller/dave/resource/blogimages/sf.india.vpn.png" ALT="sf to india vpn graph">
<BR>
<H2> Bandwidth </H2>
<UL>
<LI>Environment: EC2 m1.small
<LI>Datacenter: us-east-1b
<LI>Testing software: nttcp
<LI>Duration: about 24 hours
</UL>
<IMG SRC="http://tech.mangot.com/roller/dave/resource/blogimages/ec2.bw.png" alt="bandwidth graph" height="400" width="700">
<P>
In this graph, we can see that the gigabit connection between EC2 instances is hardly gigabit at all. I would say the numbers may trend upwards of 600 Mbps on average but they fluctuate pretty wildly between real gigabit to barely faster than a 100 Mbps connection. This was run on an otherwise quiescent machine. In a real "production" environment we would expect much more consistency, especially if trying to run performance numbers.
</P>
<BR>
<H2>Conclusions</H2>
<P>
It is pretty safe to say that there won't be any vendors publishing performance numbers of what they are able to achieve with their software running on the cloud unless they had no other choice. You can easily get much more consistent, faster numbers running on dedicated hardware. In our tests with Terracotta, we've seen that you just have to throw that much more hardware at the problem to get the same kinds of numbers. It stands to reason however as a uni-processor m1.small instance is just not going to be as powerful as our quad-core Xeons. In throwing more instances at the problem you start to introduce more of the network into the equation, which as you can see, is a rather variable quantity. Thankfully, the amount of latency introduced even in this case is no big deal for the <A HREF="https://www.terracotta.org/web/display/orgsite/JVM+Level+Clustering">Terracotta server</A>, so I've been having a pretty good time running bigger and bigger clusters in the cloud.
<P>
http://tech.mangot.com/roller/dave/entry/linksys_wet54g_a_consumer_productLinksys WET54G, a consumer product?dave2009-03-30T22:33:17-07:002009-05-13T13:59:56-07:00<P>I recently bought a consumer electronic device that I wanted to hook up to the Internet. This device came with a hard wired Ethernet port, but of course, I had no Ethernet cable where I needed to hook it up and I also had no desire to run one.
</P>
<P>
There are various devices on the market that can turn a wired Ethernet jack into a wireless one and the one I choose was the Linksys WET54G.</P>
I chose this device for a few reasons:
<OL>
<LI>All my network devices at home are already Linksys
<LI>Seemed small and priced no higher than any similar devices
<LI>I could get it at my local computer store for the same price as ordering it online
</OL>
<P>
Like most technical folk, I did a lot of reading before purchasing the device. Most of the reviews on Amazon were extremely negative, but I feel like I'm pretty good at sorting through the reviews of the inexperienced vs. the reviews of the knowledgeable. Big mistake! <img src="http://tech.mangot.com/roller/images/smileys/smile.gif" class="smiley" alt=":)" title=":)" /></P>
<img src="/roller/dave/resource/blogimages/wet54g.netdiagram.png" />
<BR>
I got the device and learned that because it is version 3.1 of the product, there are no firmware updates available on the Cisco/Linksys website. It is already a newer revision than anything that is even listed. All that was needed was to plug it in and configure it according to the instructions.
Here's what happened.
<OL>
<LI>Take the bridge (the WET54G is technically a wireless to Ethernet bridge) and plug it into my Linksys mini-hub. Run the Linksys provided utility on a PC (yeah, I still have an ancient XP box kicking around). Bridge not detected. Power cycle the hub and bridge a few times. Nothing.
<LI>Notice that it says the PC and the bridge must be plugged into the same hub. Ohhhh, that must be it. Wire the PC to the hub with a cable. Bridge not detected. Power cycle the hub and bridge a few times, still nothing.
<LI>Ok, I figure, how bad could it be to just use the web interface? I look at the docs and supposedly the bridge will autoconfigure at 192.168.100.121 or some weird address like that. Fine, I reconfigure the NIC on the XP machine and soon am on the web interface for the bridge
<LI>A few minutes later, the bridge is all configured with ip, netmask, gateway etc. I should be good to go. At this point, my PC says "Duplicate IP detected on the network". Hmmm.
<LI>I unplug the PC from the hub and reconfigure everything back to normal (i.e. wireless). The PC is still complaining about duplicate IP and I can no longer ping my default route. Something is fishy.
<LI>I plug my media device into the bridge as had been the plan all along, and the device instantly recognizes the network and says it needs a firmware update. Success! I tell it to get the update and it just hangs there, forever.
<LI>My wife ways that her PC is saying duplicate device detected as well and she can't get on the Internet. Huh?
<LI>Fire up the Mac and I get on the firewall. /var/log/messages tells me that another device on the network is advertising itself as the default route's IP. I check the MAC address and sure enough, it's the bridge!
<LI>I get on the web interface for the bridge and change its default route to a bogus address on the network. Why would my bridge need to get out on the Internet anyway? Instantly, all the devices in the house start working correctly.
<LI>I configure a static IP address on the media device and it is able to access the Internet without problems. I update its firmware through a USB key anyway.
</OL>
<P>So now the network is running fine and I haven't had any of the other issues people had described in their reviews. But the question remains: How is this a consumer product? I've designed LANs and WANs for multiple companies. I've configured networks on machine all the way on the other side of the world. I was stumped for a good 20 minutes as to why my network was behaving like it was drunk. What would your average gadgethead have done aside from sit on the phone with Cisco tech support for hours? Would they have figured it out? Crazy.</P>
<B>Update:</B> Yesterday my bridge "lost it's mind" and I will have to reconfigure it from scratch. What a piece of junk.http://tech.mangot.com/roller/dave/entry/choosing_zimbra_as_told_toChoosing Zimbra as told to ex-Taosers@groups.yahoodave2008-02-07T15:41:16-08:002008-02-07T15:44:03-08:00<P>We're running Zimbra in production and have been for almost a year
now. We're running Network Edition (paid) with Mobile.</P>
<P>Eh. It's ok.</P>
<P>Of course our philosophy is to let anyone run anything they want.
Zimbra would seem to support this philosophy but I sometimes wonder if
it tries to do too much.</P>
<P>Messaging is pretty rock solid. Other than some stupid bugs like
showing 20 messages in your Drafts folder and then when you click on
it, there is nothing there, it works fine. I've never heard about
any lost messages or anything like that.</P>
<P>Calendar is wonky however. You can't set variable reminders for
meetings. That means, either get a 5 (or 10, or 15) minute warning
for every meeting you have or nothing. So if you have a sales meeting
across town that you want to have a 30 minute reminder for, and a
budget meeting down the hall you want a 5 minute reminder for, can't
do it. It's one of the most voted on bugs in bugzilla and has been
for a while, but they say they'll fix it maybe in 5.5. Weird.</P>
<P>Plus, the calendar is just buggy. Meetings don't show up all the
time. If you change a meeting time, free/busy gets screwed up. I've
had a user whose iSync just stopped putting on new meetings twice.
All kinds of headaches.</P>
<P>Of course we support, Blackberrys (via NotifyLink Inc.), Treos,
iPhones, Outloook connector, iSync connector. Basically, every bell
and whistle you can ring or blow in Zimbra so sometimes it's user
error, but sometimes it's Zimbra.</P>
<P>Despite this, I was planning on running Zimbra for mangot.com mail
because my wife and I won't have a fit if we miss a meeting. I
installed 5.0 and they've changed from Tomcat to Jetty, which is nice,
but you still can't bind Zimbra to a single IP without hacking up all
the conf files and putting some changes in LDAP. In fact, I had to
change my hostname(1) during install temporarily just to get it on the
right track. Then I went around hacking up the files to keep it from
trying to hog the entire machine.</P>
<P>Of course, Zimbra will tell you that you need to dedicate the whole
machine to ZCS. You would think the fact that it uses so many open
source components means you could hack on it, but in reality most of
your changes will be lost each time you upgrade and you will need to
re-apply them (we do exactly that for Mailman integration). Until
they get more of the config into the LDAP server, that's just what
you're going to have to live with. I guess the fact that is OSS means
that's at least an option.</P>
<P>On the plus side, it does have some nice Gee Whiz factor to it and we
are talking about setting up a Zimbra server at a remote office that
shares the same config as our current one and it's supposed to be very
easy, but as with everything else, the devil is probably in the
details. Like I said, I'm still planning on running it for my
personal stuff, so I don't hate it as much as I might have come
across. I don't know if I'd want to run it at my business again,
except for the fact that there really is no one else who comes close.</P>
<P>(except Exchange, but we're not an MS-only shop, and please don't give
me any of that it runs IMAP garbage, because we all know that's not a
real solution)</P>
Cheers,
-Davehttp://tech.mangot.com/roller/dave/entry/information_security_magazine_chuckleInformation Security Magazine Chuckledave2008-02-07T15:32:27-08:002008-02-07T15:35:11-08:00<P>This made me laugh. I get a "renew your subscription" notification from Information Security Magazine even though I've only received one issue. Fair enough, it's free. I fill out their form and click submit...and I get an (buffer?) overflow in Visual Basic on some Microsoft server.</P>
<P>Yeah, I'm really going to trust what these guys have to say now! <img src="http://tech.mangot.com/roller/images/smileys/smile.gif" class="smiley" alt=":)" title=":)" /></P>
<P>(yes, I need a new blog template, maybe after I upgrade)</P>
<IMG SRC="/roller/dave/resource/blogimages/infosec.overflow.png">http://tech.mangot.com/roller/dave/entry/a_sysadmin_s_impressions_ofA SysAdmin's impressions of MacOS Leoparddave2007-12-19T17:53:16-08:002007-12-19T17:53:16-08:00<P>I've had the chance to use Leopard for a few weeks now on my primary work machine, a 12" G4 Powerbook. The results have been mixed. On the whole, Leopard is very good, as good as Tiger (which is excellent) but there have been a few problems being an early adopter.</P>
<P>The aim in this post is to cover just a few of the experiences I've had, good and bad.<P>
<H1> The bad</H1>
<UL>
<LI>1st boot is a BSOD!
<P>Ok, not a real BSOD in the Windows sense but the kernel panic'd and the machine would not boot. This is a known issue and Apple has <A HREF="http://docs.info.apple.com/article.html?artnum=306857"> a fix.</A> Still, it is mildly disconcerting to boot your new OS and get a big blue screen of nothing. We have a number of users on different Mac hardware and it's been very hit or miss as to who has been affected. The solution is easy, but I'd rather not see it at all. <P>
</LI>
<LI> The X11 server is buggy
I'm not the only one who has noticed this: <A HREF="http://boredzo.org/blog/archives/2007-10-29/x11-on-leopard-is-broken">Boredzo.org</A>
<P>I've had two issues with the X11 server.
<OL>
<LI> The first is my dueling X11 dock icons. I need an X11 server to display remote X11 apps like the <A HREF="http://www.terracotta.org">Terracotta</A> admin console. In Leopard I have two icons in the dock, one that the OS thinks is in some weird state because it offers me to "Force Quit" the application. The other icon seems normal. "Force Quitting" the app has no effect incidentally, it remains. Lovely.
<IMG SRC="/roller/dave/resource/blogimages/x11.dock.png" ALT="x11 doc">
<LI> While trying to use Wireshark it wouldn't start. A couple of searches and the problem turned up as a bug with X11.app discussed on the <A HREF="http://www.wireshark.org/lists/wireshark-users/200710/msg00156.html">Wireshark Mailing Lists</A>. The fix is simple, I don't care about having millions of colors in Wireshark, I'm just happy when it doesn't get my machine r00ted. Still, it worked without incident in Tiger.
</OL>
<LI> The new firewall configuration.
<P>This has been discussed ad naseum (trust me) on the <A HREF="http://www.securityfocus.com/archive/142/description#0.1.1">focus-apple </A> Securityfocus list. In 10.5.0, Apple had a setting where you could tell the OS to "Block all incoming connections". Sounds great, who doesn't like default deny? The problem was, that setting didn't block all incoming connections. Not even close. Anything that ran as root allowed incoming connections, plus anything Apple deemed essential like Rendezvous. The wording was updated for 10.5.1 to be a bit more accurate.<P>
<IMG SRC="/roller/dave/resource/blogimages/firewall.png" ALT="firewall">
<LI> X-Lite is straight up busted
<P> We rely on Asterisk and X-lite at work so we don't have to buy all the engineers desk phones as they rarely spend too much time on the phone. Plus the USB headset makes it convenient to talk on the phone while typing. The guys over at Counterpath have basically told everyone who is using X-lite to <A HREF="http://support.counterpath.net/viewtopic.php?p=45099&sid=4e7e3ed2a1ca2c91d3036814c2c16fdc">
stick it </A>. We have bought a number of copies of Eyebeam because we've had a good experience with X-lite. I thought that was their model. I guess times are hard. The best part is they say emphatically that it is a bug with Leopard, yet my <A HREF="http://www.sjlabs.com/">SJphone</A> and <A HREF="http://xmeeting.sourceforge.net/pages/index.php">Xmeeting</A> work perfectly fine. Hmm.
<LI> General bugginess and insanity
In no particular order
<UL>
<LI> My machine can't eject firewire disks, it can mount them, but you can't check out. A firewire roach motel?
<LI> I can put my machine to sleep and once in a while, it just will not come back, reboot.
<LI> Since upgrading to Leopard, my machine will say I have over an hour of battery life left and then shut down without warning. Checking the battery shows it is dead. I know I probably just need to zap the PRAM, but I'm usually too lazy, or too busy to do that.
<LI> Apple filesharing gets totally confused. I mounted a drive using AFP and then tried to unmount it. The system showed the filesystem as mounted in the Filer, but it wasn't listed under /Volumes. I had to reboot to make it go away, yuk.
</UL>
</UL>
</UL>
<H1> The good</H1>
<LI>ssh-agent integration with iTerm works!
<P>One big annoyance with Tiger was starting up an ssh-agent(1) and it was only recognized for sessions I started in the default terminal. None of my bookmarks worked. I haven't upgraded <A HREF="http://iterm.sourceforge.net">iTerm</A> but I have upgraded to Leopard. All of a sudden, my bookmarks recognize my ssh-agent. Sweet! iTerm is one of my favorite Mac programs, by the way. Great terminal, great support for tabs. A sysadmin essential!
</P>
<LI> The Cisco AnyConnect client work fine.
<P>Ok, someone else I know who upgraded to Leopard had trouble until he went to the latest version. Mine worked fine however, which is pretty remarkable considering it's a new kernel. I'm not sure whether to give kudos to Cisco or Apple, but in either case, I was pleasantly surprised.</P>
<LI> Otool is ldd!
<P>Ok, I don't think this is a Leopard thing, but on every other operating system I care about, to find out what libraries are used by dynamically linked binaries, you use ldd. Not so on the Mac. I did discover that you can do the same thing with otool.<BR><BR>
I wish I knew this when I was trying to find out if my psql Postgres client had SSL support built in! (it does)<BR>
<code>
dmangot-laptop:~ $ otool -L /sw/bin/psql
/sw/bin/psql:
/sw/lib/libpq.5.0.dylib (compatibility version 5.0.0, current version 5.0.0)
/usr/lib/libpam.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libssl.0.9.7.dylib (compatibility version 0.9.7, current version 0.9.7)
/usr/lib/libcrypto.0.9.7.dylib (compatibility version 0.9.7, current version 0.9.7)
/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 5.0.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.3)
/sw/lib/libreadline.5.dylib (compatibility version 5.0.0, current version 5.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.1.6)
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
</code>
</UL>
<H1>Conclusion</H1>
That's it. I know that it sounds like the OS has a number of problems, but on the whole it is pretty stable and doesn't do very many things that make you say, "What?!?", unlike Windows (is wireless networking really that hard?).
Like I mentioned earlier, that's the price you pay for being an early adopter. I'm sure there are a ton of people out there waiting for Vista SP1 before even trying to install that beast. With Leopard, there were very few gotchas for a dot "Oh" release, but the best part was, the upgrade hardly slowed me down.
http://tech.mangot.com/roller/dave/entry/worlds_collide_rmi_vs_linuxWorlds collide: RMI vs. Linux localhostdave2007-11-02T16:15:32-07:002007-11-02T16:15:32-07:00<p>A few times in my career as a Unix sysadmin working with Java applications I've into situations that use RMI remotely, and each time there have been problems. The two most recent that come to mind are jstatd and the Terracotta admin console.</p>
<p>Because I now work at <A HREF="http://www.terracotta.org">Terracotta</A>, I've had a chance to actually do some testing to find out where the problem arises. We've seen this come up numerous times on our <A HREF="http://forums.terracotta.org/forums/posts/list/342.page">Forums</A>, so much so that my buddy Gary who explained much of the what is going on "behind the scenes" has changed our default behavior in recent versions of the software.</p>
<p>When clients connect to the RMI server, the server returns the address of the JMX server to query to get our statistics. On clients where hostname(1) is set to localhost, that is what gets returned. Of course, when the client is told to connect to <b>localhost </b>by the remote server, it fails because the JMX server is really remote. Turns out this is a pretty common problem in the Java world. You can find many examples of users complaining about it in <A HREF="http://forum.java.sun.com/thread.jspa?forumID=58&tstart=0&threadID=288759&trange=15">Sun Java Forums</A>.<br />
</p><h4>When it works</h4>
<P>
To get the Terracotta server to use the RMI stub, we have to put it into authentication mode. Details on how to do this can be found in the <A HREF="http://www.terracotta.org/confluence/display/docs1/Configuration+Guide+and+Reference#ConfigurationGuideandReference-%2Ftc%3Atcconfig%2Fservers%2Fserver%2Fauthentication">
Terracotta Configuration Guide</A></P>
<p>To actually see what is going wrong, first we'll look at a situation where it works correctly.<br />
First we take a <b>tcpdump</b> of the client connecting to the server where the server actually has its hostname mapped correctly to its externally accessible IP address. We feed it into <b>Wireshark</b> (security holes be damned!) and do a <i>Follow TCP Stream</i>.</p>
<IMG SRC="/roller/dave/resource/blogimages/rmi.remotehost.png" HEIGHT=80% WIDTH=80% ALT="Remote host image">
<P>Here we can see that the RMI server has returned the IP address of the machine on which it runs.</P>
<h4>When it fails<br /></h4>
<p align="left">
Next we change the hostname to <b>localhost</b>. We then do a <i>Follow TCP Stream </i>and see this:</p>
<IMG SRC="/roller/dave/resource/blogimages/rmi.localhost.png" HEIGHT=80% WIDTH=80% ALT="localhost image">
<p align="left">The RMI server has returned <b>127.0.0.1</b> (localhost) to our remote client, and we get connection failed on the client.</p>
<h4>Why are the hostnames of these machines being set to localhost?</h4>
<P></P>
<p align="left">I think what is happening is many of our users are testing out Terracotta on their desktops, or machines that have been built from the install CDs. When this happens, Linux is assuming that those machines will receive an IP address via DHCP. If that's the case, they don't want to hard code an IP into /etc/hosts. Therefore the only entry in /etc/hosts winds up being localhost and the hostname is set to the same. We see this kind of behavior to varying degrees even with our automated installs. Our kickstarts and autoyasts tend to put the correct name, hostname, and IP address information into /etc/hosts presumably pulling the address from DHCP. If there is no DHCP entry for a host, autoyast will put an entry with 127.0.0.2 in /etc/hosts, weird.</p>http://tech.mangot.com/roller/dave/entry/hello_worldHello Worlddave2007-10-19T15:43:28-07:002007-10-22T23:27:27-07:00Finally got the blogging software up and running. I'll be posting mostly about systems administration. There seems to be a dearth of people out there writing about SysAdmins and the tasks/problems/solutions they deal with every day. Hopefully I can find others who are doing the same. Maybe we'll fill the void left by <A HREF="http://www.sysadminmag.com">SysAdmin Magazine</A> going the way of the dodo.