August 27, 2014

Jeff SavitBest Practices for Oracle Solaris Network Performance with Oracle VM Server for SPARC

August 27, 2014 22:11 GMT
A new document has been published on OTN: "How to Get the Best Performance from Oracle VM Server for SPARC" by Jon Anderson, Pradhap Devarajan, Darrin Johnson, Narayana Janga, Raghuram Kothakota, Justin Hatch, Ravi Nallan, and Jeff Savit.

August 26, 2014

Darryl GoveMy schedule for JavaOne and Oracle Open World

August 26, 2014 06:04 GMT

I'm very excited to have got my schedule for Open World and JavaOne:

CON8108: Engineering Insights: Best Practices for Optimizing Oracle Software for Oracle Hardware
Venue / Room: Intercontinental - Grand Ballroom C
Date and Time: 10/1/14, 16:45 - 17:30

CON2654: Java Performance: Hardware, Structures, and Algorithms
Venue / Room: Hilton - Imperial Ballroom A
Date and Time: 9/29/14, 17:30 - 18:30

The first talk will be about some of the techniques I use when performance tuning software. We get very involved in looking at how Oracle software works on Oracle hardware. The things we do work for any software, but we have the advantage of good working relationships with the critical teams.

The second talk is with Charlie Hunt, it's a follow on from the talk we gave at JavaOne last year. We got Rock Star awards for that, so the pressure's on a bit for this sequel. Fortunately there's still plenty to talk about when you look at how Java programs interact with the hardware, and how careful choices of data structures and algorithms can have a significant impact on delivered performance.

Anyway, I hope to see a bunch of people there, if you're reading this, please come and introduce yourself. If you don't make it I'm looking forward to putting links to the presentations up.

August 21, 2014

Garrett D'AmoreIt's time already

August 21, 2014 05:15 GMT
(Sorry for the political/religious slant this post takes... I've been trying to stay focused on technology, but sometimes events are simply too large to ignore...)

The execution of John Foley is just the latest.  But for me, its the straw that broke the camel's back. 

Over the past weeks, I've become extremely frustrated and angry.  The "radical Islamists" have become the single biggest threat to world peace since Hitler's Nazi's.  And they are worse than the Nazi's.  Which takes some doing.  (Nazi's "merely" exterminated Jews.  The Islamists want to exterminate everyone who does't believe exactly their own particular version of extreme religion.)

I'm not a Muslim.  I'm probably not even a Christian when you get down to it.  I do believe in God, I suppose.  And I do believe that God certainly didn't intend for one group of believes to exterminate another simply because they have different beliefs.

Parts of the Muslim world claim that ISIS and those of its ilk are a scourge, primarily, I think, because they are turning the rest of the world against Islam.  If that's true, then the entire Muslim world who rejects ISIS and radical fundamentalist Islam (and it's not clear to me that rejecting one is the same as the other) needs to come together and eliminate ISIS, and those who follow its beliefs or even sympathize with it. 

That hasn't happened.  I don't see a huge military invasion of ISIS territory by forces from Arabia, Indonesia, and other Muslim nations.  Why not?

I don't believe it is possible to be a peace loving person (Muslim or otherwise), and stand idly by (or advocate standing by) why the terrorist forces who want nothing more than to destroy the very fabric of human society work to achieve their evil ends.

Just as Nazi Germany and Imperial Japan were an Axis of Evil during our grandparents' generation, so now we have a new Axis of Evil that has taken root in the middle east.

It's time now to recognize that there is absolutely no chance for a peaceful coexistence with these people.  They are, frankly, subhuman, and their very existence is at odds with that of everyone everywhere else in the world.

It's time for those of us in civilized nations to stop with our petty nonsense bickering.  The actions taking place in Ukraine, unless you live there (an in many case even if you do live there), are a diversion.  Putin and Obama need to stop their petty bickering, and cooperate to eliminate the real threat to civilization, which is radical Islam.

To be clear, I believe that the time has now come for the rest of the world to pick up and take action, where the Muslim world has failed.  We need to clean house.  We can no longer cite "freedom of expression" and "freedom of religion" as reasons to let imam's recruit young men into death cults.  We must recognize that these acts of incitement to terrorism are indeed what they are, and the perpetrators have no more right to life and liberty than Charles Manson. 

These are forces that seek to overthrow from within, by recruitment, by terrorism, or by any means they can.  These are forces that place no value on human life.  These are forces with which are inimical to the very concept of civilization.

There can be no tolerance for them.  None, whatsoever. 

To be clear, I'm advocating that when a member of one of these organizations willing self identifies as such, we should simply kill them.  Wherever they are.  These are the enemy, and there is no separate battlefield, and they do not recognize "civilians" or "innocents"; therefore, like a cancer, radical Islam must be purged from the very earth, by any means necessary.

The militaries of the world should unit, and work together, to eradicate entrenched forces of radical Islam wherever it exists in the world.  This includes all those forms that practice Sharia law, where a man and woman can be stoned to death simply for marrying without parental consent, as well as those groups that seek to eliminate the state of Israel, that seek to kill those who don't believe exactly as they do, that would issue a fatwa demanding the death of a cartoonist simply for depicting their prophet,  and those who seek to reduce women to the status of mere cattle.

To be clear, we have to do the hard work, all nations of the world, to eliminate this scourge, and eliminate it at its source.  Mosques where radicalism are preached must no longer be sanctuaries.  Schools where "teachers" train their students in the killing of Christians and Jews, and that their God demands the death of "unbelievers" and rewards suicide bombers with paradise, need to be recognized as the training camps they are.  Even if the students are women and children.

Your right to free speech and to religion does not trump my right to live.  Nor, by the way, does it trump my own rights to free speech and religion.

I suppose this means that we have to be willing to accept some losses of combat, in the fight against radicalism.  We also have to accept that "collateral damage" is inevitable.  As with rooting out a cancer, some healthy cells are going to be destroyed.  But these losses have to be endured if the entire organism that is civilization is to survive. 

If this sounds like I'm a hawk, perhaps that's true.  I think, rather, I'm merely someone who wants to survive, and wants the world to be a place where my own children and grandchildren can live without having to endure a constant fear of nut jobs who want to kill them simply because they exist and think differently.

Btw, if Islam as a religion is to survive in the long run, it must see these forces purged.  Because otherwise the only end result becomes an all out war of survival between Muslims and the rest of the world.  And guess which side has the biggest armies and weapons? And who will be the biggest losers in a conflict between Muslims and everyone else?

So, it's time to choose a side.  There is no middle ground.  Radical Islam tolerates no neutrality.  So, what's it going to be?

As for me, I choose civilization and survival.  That means a world without radical Islam.  Period.

August 15, 2014

Jeff SavitBest Practices - Top Ten Tuning Tips Updated

August 15, 2014 20:59 GMT
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly called Logical Domains). This is an update to a previous entry on the same topic.

Top Ten Tuning Tips - Updated

Oracle VM Server for SPARC is a high performance virtualization technology for SPARC servers. It provides native CPU performance without the virtualization overhead typical of hypervisors. The way memory and CPU resources are assigned to domains avoids problems often seen in other virtual machine environments, and there are intentionally few "tuning knobs" to adjust.

However, there are best practices that can enhance or ensure performance. This blog post lists and briefly explains performance tips and best practices that should be used in most environments. Detailed instructions are in the Oracle VM Server for SPARC Administration Guide. Other important information is in the Release Notes. (The Oracle VM Server for SPARC documentation home page is here.)

Big Rules / General Advice

Some important notes first:
  1. "Best practices" may not apply to every situation. There are often exceptions or trade-offs to consider. We'll mention them so you can make informed decisions. Please evaluate these practices in the context of your requirements. There is no one "best way", since there is no single solution that is optimal for all workloads, platforms, and requirements.
  2. Best practices, and "rules of thumb" change over time as technology changes. What may be "best" at one time may not be the best answer later as features are added or enhanced.
  3. Continuously measure, and tune and allocate resources to meet service level objectives. Once objectives are met, do something else - it's rarely worth trying to squeeze the last bit of performance when performance objectives have been achieved.
  4. Standard Solaris tools and tuning apply in a domain or virtual machine just as on bare metal: the *stat tools, DTrace, driver options, TCP window sizing, /etc/system settings, and so on, apply here as well.
  5. The answer to many performance questions is "it depends". Your mileage may vary. In other words: there are few fixed "rules" that say how much performance boost you'll achieve from a given practice.

Despite these disclaimers, there is advice that can be valuable for providing performance and availability:

The Tips

  1. Keep firmware, Logical Domains Manager, and Solaris up to date - Performance enhancements are continually added to Oracle VM Server for SPARC, so staying current is important. For example, Oracle VM Server for SPARC 3.1 and 3.1.1 both added important performance enhancements.

    That also means keeping firmware current. Firmware is easy to "install once and forget", but it contains much of the logical domains infrastructure, so it should be kept current too. The Release Notes list minimum and recommended firmware and software levels needed for each platform.

    Some enhancements improve performance automatically just by installing the new versions. Others require administrators configure and enable new features. The following items will mention them as needed.

  2. Allocate sufficient CPU and memory resources to each domain, especially control, I/O and service domains - This cannot be overemphasized. If a service domain is short on CPU, then all of its clients are delayed. Don't starve service domains!

    For the control domain and other service domains, use a minimum of at least 1 core (8 vCPUs) and 4GB or 8GB of memory for small workloads. Use two cores and 16GB of RAM if there is substantial I/O load. Be prepared to allocate more resources as needed. Don't think of this as "waste". To a large extent this represents CPU load to drive physical devices shifted from the guest domain to the service domain.

    Actual requirements must be based on system load: small CPU and memory allocations were appropriate with older, smaller LDoms-capable systems, but larger values are better choices for the demanding, higher scaled systems and applications now used with domains, Today's faster CPUs and I/O devices are capable of generating much higher I/O rates than older systems, and service domains must be suitably provisioned to support the load. Control domain sizing suitable for a T2000 or T5220 will not be enough for a T5-8 or an M6-32! I/O devices matter too: a 10GbE network device driven at line speed can consume an entire CPU core, so add another core to drive that.

    How can you tell if you need more resources in the service domain? Within the domain you can use vmstat, mpstat, and prstat to see if there is pent up demand for CPU. Alternatively, issue ldm list or ldm list -l from the control domain. If you consistently see high CPU utilization, add more CPU cores. You might not be observing the some peak loads, so just add proactively.

    Good news: you can dynamically add and remove CPUs to meet changing load conditions, even for the control domain. You should leave some headroom on the server so you can allocate resources as needed. Tip: Rather than leave "extra" CPU cores unassigned, just give them to the service domains. They'll make use of them if needed, and you can remove them if they are excess capacity that is needed for another domain.

    You can allocation CPU resources manually via ldm set-core or automatically with the built-in policy-based resource manager. That's a Best Practice of its own, especially if you have guest domains with peak and idle periods.

    The same applies to memory. Again, the good news is that standard Solaris tools like vmstat can be used to see if a domain is low on memory, and memory can also added to or removed from a domain. Applications need the same amount of RAM to run efficiently in a domain as they do on bare metal, so no guesswork or fudge-factor is required. Logical domains do not oversubscribe memory, which avoids problems like unpredictable thrashing.

    In summary, add another core if ldm list shows that the control domain is busy. Add more RAM if you are hosting lots of virtual devices are running agents, management software, or applications in the control domain and vmpstat -p shows that you are short on memory. Both can be done dynamically without an outage.

  3. Allocate domains on core boundaries - SPARC servers supporting logical domains have multiple CPU cores with 8 CPU threads each. (The exception is that Fujitsu M10 SPARC servers have 2 CPU threads per core. The considerations are similar, just substitute "2" for "8" as needed.) Avoid "split core" situations in which CPU cores are shared by more than one domain (different domains with CPU threads on the same core). This can reduce performance by causing "false cache sharing" in which domains compete for a core's Level 1 cache. The impact on performance is highly variable, depending on the domains' behavior.

    Split core situations are easily avoided by always assigning virtual CPUs in multiples of 8 (ldm set-vcpu 8 mydomain or ldm add-vcpu 24 mydomain). It is rarely good practice to give tiny allocations of 1 or 2 virtual CPUs, and definitely not for production workloads. If fine-grain CPU granularity is needed for multiple applications, deploy them in zones within a logical domain for sub-core resource control.

    The best method is to use the whole core constraint to assign CPU resources in increments of entire cores (ldm set-core 1 mydomain or ldm add-core 3 mydomain). The whole-core constraint requires a domain be given its own cores, or the bind operation will fail. This prevents unnoticed sub-optimal configurations, and also enables the critical thread opimization discussed below in the section Single Thread Performance.

    In most cases the logical domain manager avoids split-core situations even if you allocate fewer than 8 virtual CPUs to a domain. The manager attempts to allocate different cores to different domains even when partial core allocations are used. It is not always possible, though, so the best practice is to allocate entire cores.

    For a slightly lengthier writeup, see Best Practices - Core allocation.

  4. Use Solaris 11 in the control and service domains - Solaris 11 contains functional and performance improvements over Solaris 10 (some will be mentioned below), and will be where future enhancements are made. It is also required to use Oracle VM Manager with SPARC. Guest domains can be a mixture of Solaris 10 and Solaris 11, so there is no problem doing "mix and match" regardless of which version of Solaris is used in the control domain. It is a best practice to deploy Solaris 11 in the control domain even if you haven't upgraded the domains running applications.
  5. NUMA latency - Servers with more than one CPU socket, such as a T4-4, have non-uniform memory access (NUMA) latency between CPUs and RAM. "Local" memory access from CPUs on the same socket has lower latency than "remote". This can have an effect on applications, especially those with large memory footprints that do not fit in cache, or are otherwise sensitive to memory latency.

    Starting with release 3.0, the logical domains manager attempts to bind domains to CPU cores and RAM locations on the same CPU socket, making all memory references local. If this is not possible because of the domain's size or prior core assignments, the domain manager tries to distribute CPU core and RAM equally across sockets to prevent an unbalanced configuration. This optimization is automatically done at domain bind time, so subsequent reallocation of CPUs and memory may not be optimal. Keep in mind that that this does not apply to single board servers, like a T4-1. In many cases, the best practice is to do nothing special.

    To further reduce the likelihood of NUMA latency, size domains so they don't unnecessarily span multiple sockets. This is unavoidable for very large domains that needs more CPU cores or RAM than are available on a single socket, of course.

    If you must control this for the most stringent performance requirements, you can use "named resources" to allocate specific CPU and memory resources to the domain, using commands like ldm add-core cid=3 ldm1 and ldm add-mem mblock=PA-start:size ldm1. This technique is successfully used in the SPARC Supercluster engineered system, which is rigorously tested on a fixed number of configurations. This should be avoided in general purpose environments unless you are certain of your requirements and configuration, because it requires model-specific knowledge of CPU and memory topology, and increases administrative overhead.

  6. Single thread CPU performance - Starting with the T4 processor, SPARC servers can use a critical threading mode that delivers the highest single thread performance. This mode uses out-of-order (OOO) execution and dedicates all of a core's pipeline and cache resource to a software thread. Depending on the application, this can be several times faster than in the normal "throughput mode".

    Solaris will generally detect threads that will benefit from this mode and "do the right thing" with little or no administrative effort, whether in a domain or not. To explicitly set this for an application, set its scheduling class to FX with a priority of 60 or more. Several Oracle applications, like Oracle Database, automatically leverage this capability to get performance benefits not available on other platforms, as described in the section "Optimization #2: Critical Threads" in How Oracle Solaris Makes Oracle Database Fast. That's a serious example of the benefits of the combined software/hardware stack's synergy. An excellent writeup can be found in Critical Threads Optimization in the Observatory blog.

    This doesn't require setup at the logical domain level other than to use whole-core allocation, and to provide enough CPU cores so Solaris can dedicate a core to its critical applications. Consider that a domain with one full core or less cannot dedicate a core to 1 CPU thread, as it has other threads to dispatch. The chances of having enough cores to provide dedicated resources to critical threads get better as more cores are added to the domain, and this works best in domains with 4 or more cores. Other than that, there is little you need to do to enable this powerful capability of SPARC systems (tip of the hat to Bob Netherton for enlightening me on this area).

    Mentioned for completeness sake: there is also a deprecated command to control this at the domain level by using ldm set-domain threading=max-ipc mydomain, but this is generally unnecessary and should not be done.

  7. Live Migration - Live migration is CPU intensive in the control domain of the source (sending) host. You must configure at least 1 core to the control domain in all cases, but additional core will speed migration and reduce suspend time. The core can be added just before starting migration and removed afterwards. If the machine is older than T4, add crypto accelerators to the control domains. No such step is needed on later machines.

    Live migration also adds CPU load in the domain being migrated, so its best to perform migrations during low activity periods. Guests that heavily modify their memory take more time to migrate since memory contents have to be retransmitted, possibly several times. The overhead of tracking changed pages also increases guest CPU utilization.

    Remember that live migration is not the answer to all questions. Some other platforms lack the ability to update system software without an outage, so they require "evacuating" the server via live migration. With Oracle VM Server for SPARC you should always have an alternate service domain for production systems, and then you can do "rolling upgrades" in place without having to evacuate the box. For example, you can pkg update Solaris in both the control domain and the service domains at the same time during normal operational hours, and then reboot them one at a time into the new Solaris level. While one service domain reboots, all I/O proceed through the alternate, and you can cycle through all the service domains without any loss in application availability. Oracle VM Server for SPARC reduces the number of use cases in which live migration is the only answer.

  8. Network I/O - Configure aggregates, use multiple network links, adjust TCP windows and other systems settings the same way and for the same reasons as you would in a non-virtual environments.

    Use RxDring support to substantially reduce network latency and CPU utilization. To turn this on, issue ldm set-domain extended-mapin-space=on mydomain for each of the involved domains. The domains must run Solaris 11 or Solaris 10 update 10 and later, and the involved domains (including the control domain) will require a domain reboot for the change to take effect. This also requires 4MB of RAM per guest.

    If you are using a Solaris 10 control or service domain for virtual network I/O, then it is important to plumb the virtual switch (vsw) as the network interface and not use the native NIC or aggregate (aggr) interface. If the native NIC or aggr interface is plumbed, there can be a performance impact sinces each packet may be duplicated to provide a packet to each client of the physical hardware. Avoid this by not plumbing the NIC and only plumbing the vsw. The vsw doesn't need to be plumbed either unless the guest domains need to communicate with the service domain. This isn't an issue for Solaris 11 - another reason to use that in the service domain. (thanks to Raghuram for great tip)

    As an alternative to virtual network I/O, use Direct I/O (DIO) or Single Root I/O Virtualization (SR-IOV) to provide native-level network I/O performance. With physical I/O, there is no virtualization overhead at all, which improves bandwidth and latency, and eliminates load in the service domain. They currently have two main limitations: they cannot be used in conjunction with live migration, and introduce a dependency on the domain owning the bus containing the SR-IOV physical device, but provide superior performance. SR-IOV is described in an excellent blog article by Raghuram Kothakota.

    For the ultimate performance for large application or database domains, you can use a PCIe root complex domain for completely native performance for network and any other devices on the bus.

  9. Disk I/O - For best performance, use a whole disk backend (a LUN or full disk). Use multiple LUNs to spread load across virtual and physical disks and reduce queueing (just as you would do in a non-virtual environment). Flat files in a file system are convenient and easy to set up as backends, but have less performance.

    Starting with Oracle VM Server for SPARC 3.1.1, you can also use SR-IOV for Fibre Channel devices, with the same benefits as with networking: native I/O performance. For completely native performance for all devices, use a PCIe root complex domain and exclusively use physical I/O.

    ZFS can also be used for disk backends. This provides flexibility and useful features (clones, snapshots, compression) but can impose overhead compared to a raw device. Note that local or SAN ZFS disk backends preclude live migration, because a zpool can be mounted to only one host at a time. When using ZFS backends for virtual disk, use a zvol rather than a flat file - it performs much better. Also: make sure that the ZFS recordsize for the ZFS dataset matches the application (also, just as in a non-virtual environment). This avoids read-modify-write cycles that inflate I/O counts and overhead. The default of 128K is not optimal for small random I/O.

  10. Networked disk on NFS and iSCSI - NFS and iSCSI also can perform quite well if an appropriately fast network is used. Apply the same network tuning you would use for in non-virtual applications. For NFS, specify mount options to disable atime, use hard mounts, and set large read and write sizes.

    If the NFS and iSCSI backends are provided by ZFS, such as in the ZFS Storage Appliance, provide lots of RAM for buffering, and install write-optimized solid-state disk (SSD) "logzilla" ZFS Intent Logs (ZIL) to speed up synchronous writes.

Summary

By design, logical domains don't have a lot of "tuning knobs", and many tuning practices you would do for Solaris in a non-domained environment apply equally when domains are used. However, there are configuration best practices and tuning steps you can use to improve performance. This blog note itemizes some of the most effective (and least exotic) performance best practices.

Darryl GoveProviding feedback on the Solaris Studio 12.4 Beta

August 15, 2014 16:55 GMT

Obviously, the point of the Solaris Studio 12.4 Beta programme was for everyone to try out the new version of the compiler and tools, and for us to gather feedback on what was working, what was broken, and what was missing. We've had lots of useful feedback - you can see some of it on the forums. But we're after more.

Hence we have a Solaris Studio 12.4 Beta survey where you can tell us more about your experiences. Your comments are really helpful to us. Thanks.

August 14, 2014

Joerg MoellenkampSPARC M7

August 14, 2014 10:29 GMT
A really interesting article about SPARC M7: Oracle Cranks Up The Cores To 32 With Sparc M7 Chip

July 31, 2014

Joerg MoellenkampSolaris 11.2 released

July 31, 2014 16:25 GMT
Solaris 11.2 has just been released . No beta, the real thing! You can download it here

July 22, 2014

Jeff SavitAnnouncing Oracle VM 3.3

July 22, 2014 15:25 GMT
Oracle VM 3.3 was announced today, providing substantial enhancements to Oracle's server virtualization product family. I'll focus on a few enhancements to Oracle VM Manager support for SPARC that will appeal to SPARC users:
  1. Improved storage support: The original Oracle VM Manager support for SPARC systems only supported NFS storage. While Oracle VM Server for SPARC has long supported other storage types (local disk, SAN LUNs, iSCSI), the support in the Manager did not. This restriction has been eliminated, so customers can use Oracle VM Manager with SPARC systems with their preferred storage types.
  2. Alternate Service Domain: A Best Practice for SPARC virtualization is to configure multiple service domains for resiliency. This was also not supported when under the control of Oracle VM Manager, but is now available. Customers can control their SPARC servers with Oracle VM Manager while using the recommended high availability configuration.
  3. Improved console: Oracle VM Manager provides a way to access the guest domain console without logging into the server's control domain. In Oracle VM Manager 3.2 this was provided by a Java remote access application that depended on Java WebStart, and required that the correct software be installed and configured on the client's desktop. The new virtual console just requires a web browser that correctly supports the HTML5 standards. The new console is more robust and launches much more quickly.
  4. Oracle VM High Availability (HA) support: This release adds SPARC support for Oracle VM HA. Servers in a pool of SPARC servers can be clustered, and VMs can be enabled for HA. If a server is restarted or shutdown, then HA-enabled VMs are migrated or restarted on other servers in the pool.

There are many other enhancements, and in general the other improvements in 3.3 are beneficial to SPARC systems too, but these are the top ones that stand out for SPARC customers.

For a video demonstrating this in action, please see Oracle VM Manager 3.3.1 with Oracle VM Server for SPARC

Installation/Documents

After posting this, I was asked how to install the Oracle VM Server agent on a SPARC system, and how to set up Oracle VM HA clustering. The basic flow is to install the Oracle VM Server agent on a control domain running Solaris 11.1 and Oracle VM Server for SPARC 3.1 or later, optionally installing the Distributed Lock Manager (DLM) first if you plan to use HA features.

Here are direct links to the software and documents:

July 12, 2014

Garrett D'AmorePOSIX 2008 locale support integrated (illumos)

July 12, 2014 03:54 GMT
A year in the making... and finally the code is pushed.  Hooray!

I've just pushed 2964 into illumos, which adds support for a bunch of new libc calls for thread safe and thread-specific locales, as well as explicit locale_t objects.   Some of the interfaces added fall under the BSD/MacOS X "xlocale" class of functions.

Note that not all of the xlocale functions supplied by BSD/MacOS are present.  However, all of the routines that were added by POSIX 2008 for this class are present, and should conform to the POSIX 2008 / XPG Issue 7 standards.  (Note that we are not yet compliant with POSIX 2008, this is just a first step -- albeit a rather major one.)

The webrev is also available for folks who want to look at the code.

The new APIs are documented in newlocale(3c), uselocale(3c), etc.   (Sadly, man pages are not indexed yet so I can't put links here.)

Also, documentation for some APIs that was missing (e.g. strfmon(3c)) are now added.

This project has taken over a year to integrate, but I'm glad it is now done.

I want to say a big -- huge -- thank you to Robert Mustacchi who not only code reviewed a huge amount of change (and provided numerous useful and constructive feedback), but also contributed a rather large swath of man page content in support of this effort, working on is own spare time.  Thanks Robert!

Also, thanks to both Gordon Ross and Dan McDonald who also contributed useful review feedback and facilitated the integration of this project.  Thanks guys!

Any errors in this effort are mine, of course.  I would be extremely interested in hearing constructive feedback.  I expect there will be some minor level of performance impact (unavoidable due to the way the standards were written to require a thread-specific check on all locale sensitive routines), but I hope it will be minor.

I'm also extremely interested in feedback from folks who are making use of these new routines.  I'm told the Apache Standard C++ library depends on these interfaces -- I hope someone will try it out and let me know how it goes.   Also, if someone wants/needs xlocale interfaces that I didn't include in this effort, please drop me a note and I'll try to get to it.

As this is a big change, it is not entirely without risk.  I've done what I could to minimize that risk, and test as much as I could.  If I missed something, please let me know, and I'll attempt to fix in a timely fashion.

Thanks!

July 11, 2014

Darryl GoveStudio 12.4 Beta Refresh, performance counters, and CPI

July 11, 2014 21:12 GMT

We've just released the refresh beta for Solaris Studio 12.4 - free download. This release features quite a lot of changes to a number of components. It's worth calling out improvements in the C++11 support and other tools. We've had few comments and posts on the Studio forums, and a bunch of these have resulted in improvements in this refresh.

One of the features that is deserving of greater attention is default hardware counters in the Performance Analyzer.

Default hardware counters

There's a lot of potential hardware counters that you can profile your application on. Some of them are easy to understand, some require a bit more thought, and some are delightfully cryptic (for example, I'm sure that op_stv_wait_sxmiss_ex means something to someone). Consequently most people don't pay them much attention.

On the other hand, some of us get very excited about hardware performance counters, and the information that they can provide. It's good to be able to reveal that we've made some steps along the path of making that information more generally available.

The new feature in the Performance Analyzer is default hardware counters. For most platforms we've selected a set of meaningful performance counters. You get these if you add -h on to the flags passed to collect. For example:

$ collect -h on ./a.out

Using the counters

Typically the counters will gather cycles, instructions, and cache misses - these are relatively easy to understand and often provide very useful information. In particular, given a count of instructions and a count of cycles, it's easy to compute Cycles per Instruction (CPI) or Instructions per Cycle(IPC).

I'm not a great fan of CPI or IPC as absolute measurements - working in the compiler team there are plenty of ways to change these metrics by controlling the I (instructions) when I really care most about the C (cycles). But, the two measurements have a very useful purpose when examining a profile.

A high CPI means lots cycles were spent somewhere, and very few instructions were issued in that time. This means lots of stall, which means that there's some potential for performance gains. So a good rule of thumb for where to focus first is routines that take a lot of time, and have a high CPI.

IPC is useful for a different reason. A processor can issue a maximum number of instructions per cycle. For example, a T4 processor can issue two instructions per cycle. If I see an IPC of 2 for one routine, I know that the code is not stalled, and is limited by instruction count. So when I look at a code with a high IPC I can focus on optimisations that reduce the instruction count.

So both IPC and CPI are meaningful metrics. Reflecting this, the Performance Analyzer will compute the metrics if the hardware counter data is available. Here's an example:


This code was deliberately contrived so that all the routines had ludicrously high CPI. But isn't that cool - I can immediately see what kinds of opportunities might be lurking in the code.

This is not restricted to just the functions view, CPI and/or IPC are presented in every view - so you can look at CPI for each thread, line of source, line of disassembly. Of course, as the counter data gets spread over more "lines" you have less data per line, and consequently more noise. So CPI data at the disassembly level is not likely to be that useful for very short running experiments. But when aggregated, the CPI can often be meaningful even for short experiments.

July 01, 2014

Steve TunstallNew ZS3-2 benchmark

July 01, 2014 15:28 GMT

Oracle released a new SPC2 benchmark today, which you can find on Storage Performance Council website here: http://www.storageperformance.org/results/benchmark_results_spc2_active

As you can see, the ZS3-2 gave excellent results, with the best price/performance ratio on the entire website, and the third fastest score overall. Does the Kaminario still beat it on speed? Yep it sure does. However, you can buy FIVE Oracle ZS3-2 systems for the same price as the Kaminario.  :)

Storage Performance Council SPC2 Results

System

SPC-2 MBPS™

SPC-2 Price-Performance

ASU Capacity GB

Total Price

Data Protection Level

Date Submitted

Kaminario K2

33,477.03

$29.79

60,129.00

$997,348.00

Raid 10

11/1/2013

HDS VSP

13,147.87

$95.38

129,111.99

$1,254,093.30

Raid 5

9/1/2012

IBM DCS3700

4,018.59

$34.96

14,374.22

$140,474.00

Raid 6

3/1/2013

SGI InfiniteStorage 5600

8,855.70

$15.97

28,748.43

$141,392.86

Raid 6

5/1/2013

HP P9500 XP

13,147.87

$88.34

129,111.99

$1,161,503.90

Raid 5

3/7/2012

Oracle ZS3-4

17,244.22

$22.53

31,610.96

$388,472.03

Raid 10

9/1/2013

Oracle ZS3-2

16,212.66

$12.08

24,186.84

$195,915.62

Raid 10

6/1/2014

Results found on http://www.storageperformance.org/results/benchmark_results_spc2_active

June 23, 2014

Darryl GovePresenting at JavaOne and Oracle Open World

June 23, 2014 21:11 GMT

Once again I'll be presenting at Oracle Open World, and JavaOne. You can search the full catalogue on the web. The details of my two talks are:

Engineering Insights: Best Practices for Optimizing Oracle Software for Oracle Hardware [CON8108]

Oracle Solaris Studio is an indispensable toolset for optimizing key Oracle software running on Oracle hardware. This presentation steps through a series of case studies from real Oracle applications, illustrating how the various Oracle Solaris Studio development tools have proven instrumental in ensuring that Oracle software is fully tuned and optimized for Oracle hardware. Learn the secrets of how Oracle uses these powerful compilers and performance, memory, and thread analysis tools to write optimal, well-tested enterprise code for Oracle hardware, and hear about best practices you can use to optimize your existing applications for the latest Oracle systems.

Java Performance: Hardware, Structures, and Algorithms [CON2654]

Many developers consider the deployment platform to be a black box that the JVM abstracts away. In reality, this is not the case. The characteristics of the hardware do have a measurable impact on the performance of any Java application. In this session, two Java Rock Star presenters explore how hardware features influence the performance of your application. You will not only learn how to measure this impact but also find out how to improve the performance of your applications by writing hardware-friendly code.

June 20, 2014

Darryl GoveWhat's happening

June 20, 2014 17:50 GMT

Been isolating a behaviour difference, used a couple of techniques to get traces of process activity. First off tracing bash scripts by explicitly starting them with bash -x. For example here's some tracing of xzless:

$ bash -x xzless
+ xz='xz --format=auto'
+ version='xzless (XZ Utils) 5.0.1'
+ usage='Usage: xzless [OPTION]... [FILE]...
...

Another favourite tool is truss, which does all kinds of amazing tracing. In this instance all I needed to do was to see what other commands were started using -f to follow forked processes and -t execve to show calls to execve:

$ truss -f -t execve jcontrol
29211:  execve("/usr/bin/bash", 0xFFBFFAB4, 0xFFBFFAC0)  argc = 2
...

June 17, 2014

Adam LeventhalLessons from a decade of blogging

June 17, 2014 09:24 GMT

I started my blog June 17, 2004, tempted by the opportunity of Sun’s blogging policy, and cajoled by Bryan Cantrill’s presentation to the Solaris Kernel Team “Guerrilla Marketing” (net: Sun has forgotten about Solaris so let’s get the word out). I was a skeptical blogger. I even resisted the contraction “blog”, insisting on calling it “Adam Leventhal’s Weblog” as if linguistic purity would somehow elevate me above the vulgar blogspotter opining over toothpaste brands. (That linguistic purity did not, however, carry over into my early writing — my goodness it was painful to open that unearthed time capsule.)

A little about my blog. When I started blogging I was worried that I’d need to post frequently to build a readership. That was never going to happen. Fortunately aggregators (RSS feeds then; Twitter now) and web searches are far more relevant. My blog is narrow. There’s a lot about DTrace (a technology I helped develop), plenty in the last four years about Delphix (my employer), and samplings of flash memory, Galois fields, RAID, and musings on software and startups. The cumulative intersection consists of a single person. But — and this is hard to fathom — I’ve hosted a few hundred thousand unique visitors over the years. Aggregators pick up posts soon after posting; web searches drive traffic for years even on esoteric topics.

Ten years and 172 posts later, I wanted to see what lessons I could discern. So I turned to Google Analytics.

Most popular

3. I was surprised to see that my posts on double- and triple-parity RAID for ZFS have been among the most consistently read over the years since posting in 2006 and 2009 respectively. The former is almost exclusively an explanation of abstract algebra that I was taught in 2000, applied in 2006, and didn’t understand properly until 2009 — when wrote the post. The latter is catharsis from discovering errors in the published basis for our RAID implementation. I apparently considered it a personal affront.

2. When Oracle announced their DTrace port to Linux in 2011 a pair of posts broke the news and then deflated expectations — another personal affront — as the Oracle Linux efforts fell short of expectations (and continue to today). I had learned the lesson earlier that DTrace + a more popular operating system always garnered more interest.

1. In 2008 I posted about a defect in Apple’s DTrace implementation that was the result of it’s paranoid DRM protection. This was my perfect storm of blogging popularity: DTrace, more popular OS (Max OS X!), Apple-bashing, and DRM! The story was snapped up by Slashdot (Reddit of the mid-2000s) as “Apple Crippled Its DTrace Port” and by The Register’s Ashlee Vance (The Register’s Chris Mellor of the mid-2000s) as “Apple cripples Sun’s open source jewel: Hollywood love inspires DTrace bomb.” It’s safe to say that I’m not going to see another week with 49,312 unique visitors any time soon. And to be clear I’m deeply grateful to that original DTrace team at Apple — the subject of a different post.

And many more…

Some favorites of mine and of readers (views, time on site, and tweets) over the years:

2004 Solaris 10 11-20. Here was a fun one. Solaris 10 was a great release. Any of the top ten features would have been the headliner in a previous release so I did a series on some of the lesser features that deserved to make the marquee. (If anyone would like to fill in number 14, dynamic System V IPC, I’d welcome the submission.)

2004 Inside nohup -p. The nohup command had remained virtual untouched since being developed at Bell Labs by the late Joseph Ossanna (described as “a peach and a ramrod”). I enjoyed adding some 21st century magic, and suffocating the reader with the details.

2005 DTrace is open. It truly was an honor to have DTrace be the first open source component of Solaris. That I took the opportunity to descend to crush depth was a testament to the pride I took in that code. (tsj and Kamen, I’m seeing your comments now for the first time and will respond shortly.)

2005 Sanity and FUD. This one is honestly adorable. Only a naive believer could have been such a passionate defender of what would become Oracle Solaris.

2005 DTrace in the JavaOne Keynote. It was a trip to present to over 10,000 people at Moscone. I still haven’t brought myself to watch the video. Presentation tip: to get comfortable speaking to an audience of size N simply speak to an audience of size 10N.

2005 The mysteries of _init. I geeked out about some of the voodoo within the linker. And I’m glad I did because a few weeks ago that very post solved a problem for one of my colleagues. I found myself reading the post with fascination (of course having forgotten it completely).

2008 Hybrid Storage Pools in CACM. In one of my first published articles, I discussed how we were using flash memory — a niche product at the time — as a component in enterprise storage. Now, of course, flash has always been the obvious future of storage; no one had yet realized that at the time.

2012 Hardware Engineer. At Fishworks (building the ZFS Storage Appliance at Sun) I got the nickname “Adam Leventhal, Hardware Engineer” for my preternatural ability to fit round pegs in square holes; this post catalogued some of those experiments.

2013 The Holistic Engineer. My thoughts on what constitutes a great engineer; this has become a frequently referenced guidepost within Delphix engineering.

2013 Delphix plus three years. Obviously I enjoy anniversaries. This was both a fun one to plan and write, and the type of advice I wish I had taken to heart years ago.

You said something about lessons?

The popularity of those posts about DTrace for Mac OS X and Linux had suggested to me that controversy is more interesting than data. While that may be true, I think the real driver was news. With most tech publications regurgitating press releases, people appreciate real investigation and real analysis. (Though Google Analytics does show that popularity is inversely proportional to time on site i.e. thorough reading.)

If you want people to read (and understand) your posts, run a draft through one of those online grade-level calculators. Don’t be proud of writing at a 12th grade level; rewrite until 6th graders can understand. For complex subjects that may be difficult, but edit for clarity. Simpler is better.

Everyone needs an editor. I find accepting feedback to be incredibly difficult — painful — but it yields a better result. Find someone you trust to provide the right kind of feedback.

Early on blogging seemed hokey. Today it still can feel hokey — dispatches that feel directed at no one in particular. But I’d encourage just about any engineer to start a blog. It forces you to organize your ideas in a different and useful way, and it connects you with the broader community of users, developers, employees, and customers. For the past ten years I’ve walked into many customers who now start the conversation aware of topics and technology I care about.

Finally, reading those old blog posts was painful. I got (slightly) better the only way I knew how: repetition. Get the first 100 posts out of the way so that you can move on to the next 100. Don’t worry about readership. Don’t worry about popularity. Interesting content will find an audience, but think about your reader. Just start writing.

June 16, 2014

Jeff SavitVirtual Disk Performance Improvement for Oracle VM Server for SPARC

June 16, 2014 15:25 GMT
A new Solaris update dramatically improves performance for virtual disks on Oracle VM Server for SPARC. Prior enhancements improved virtual network performance, and now the same has been done for disk I/O. Now, Oracle VM Server for SPARC can provide the flexibility of virtual I/O with near-native performance.

The background

First, a quick review of some performance points, the same ones I discuss in tuning tips posts:

Oracle VM Server for SPARC could provide excellent performance for virtual networks (in particular, since the virtual network performance enhancement was delivered). It could provide "good" performance for disk, given appropriately sized service domains and disk backends based on full disks or LUNs instead of convenient but slower file-based backends. However, there still was a substantial performance cost for virtual disk I/O, which became a significant factor for the more demanding applications increasingly deployed in logical domains.

The physical alternative

Oracle VM Server for SPARC addressed this by improving virtual I/O performance over time, and by offering physical I/O as a higher-performance alternative. This could be done by dedicating an entire PCIe bus and its host bus adapters to a domain, which yielded native I/O performance for every device on the bus. This is the highly effective method used with Oracle SuperCluster.

Oracle VM Server for SPARC 3.1.1 added the ability to use Single Root I/O Virtualization (SR-IOV) for Fibre Channel (FC) devices. This provides native performance with better resource granularity: there can be many SR-IOV devices to hand to domains.

Both provide native performance but have limitations: There are a fixed number of PCIe buses on each server based on the server mode, so only a limited number of domains can be assigned a bus for its use. SR-IOV provides much more resource granularity, as a single SR-IOV card can be presented as many "virtual functions", but is only supported for qualified FC cards. Both forms of physical I/O prevent the use of live migration, which only applies to domains that use virtual I/O. One had to either compromise on flexibility or on performance - but now you can have both together.

The virtual disk I/O performance boost

Just as this issue was largely addressed for virtual network devices, it has now been addressed for virtual disk devices. Solaris 11.1 SRU 19.6 introduces new algorithms that remove bottlenecks caused by serialization (Update: patch update 150400-13 provides the same improvement on Solaris 10 ). Each virtual disk now has multiple read and multiple write threads assigned to it - this is analogous to the "queue depth" seen for real enterprise-scale disks.

The result is sharply reduced I/O latency and increased I/O operations per second - close to the results that would be seen in a non-virtualized environment. This is especially effective for workloads with multiple readers and writers in parallel, rather than a simplistic dd test.

Want the numbers and more detailed explanation? Read Stefan's Blog!

Stefan Hinker has written an excellent blog entry Improved vDisk Performance for LDoms that quantifies the improvements. Rather than duplicate the material he put there, I strongly urge you to read his blog and then come back here. However, I can't resist "quoting" two of the graphics he produces:

I/O operations per second (IOPS)

This chart shows that delivered IOPS were essentially the same with the new virtual I/O and with bare-metal, exceeding 150K IOPS.

IO latency - response times

This chart shows that I/O response time is also the essentially the same as bare metal:

This is a game-changing improvement - the flexibility of virtualization with the performance of bare-metal.

That said, I will emphasize some caveats: this will not solve I/O performance problems due to overloaded disks or LUNs. If the physical disk is saturated, then removing virtualization overhead won't solve the problem. A simple, single-threaded I/O program is not a good example to show the improvement, as it is really going to be gated by individual disk speeds. This enhancement provides I/O performance scalability for real workloads backed by appropriate disk subsystems.

How to implement the improvement

The main task to implement this improvement is to update Solaris 11 guest domains and service domains they use to Solaris 11.1 SRU 19.6. Solaris 10 users should apply patch 150400-13, which was delivered June 16, 2014.

All of those domains have to be updated, or I/O will proceed using the prior algorithm. On Solaris 11, assuming that your systems are set up with the appropriate service repository, this is as simple as issuing the command: pkg update and rebooting. This is one of the things Solaris 11 makes really easy. The full dialog looks like this:

$ sudo pkg update
Password: 
           Packages to install:   1
            Packages to update:  76
       Create boot environment: Yes
Create backup boot environment:  No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                              77/77     2859/2859  208.4/208.4  3.8M/s

PHASE                                          ITEMS
Removing old actions                         325/325
Installing new actions                       362/362
Updating modified actions                  4137/4137
Updating package state database                 Done 
Updating package cache                         76/76 
Updating image state                            Done 
Creating fast lookup database                   Done 

A clone of solaris-3 exists and has been updated and activated.
On the next boot the Boot Environment solaris-4 will be
mounted on '/'.  Reboot when ready to switch to this updated BE.

---------------------------------------------------------------------------
NOTE: Please review release notes posted at:

https://support.oracle.com/epmos/faces/DocContentDisplay?id=1501435.1
---------------------------------------------------------------------------

After that completes, just reboot by using init 6. That's all you have to do to install the software.

To gain the full performance benefits, it is still important to have properly sized service domains. The small allocations used for older servers and modest workloads, say one CPU core and 4GB of RAM, may not be enough. Consider boosting your control domain and other service domains to two cores and 8GB or 16GB of RAM: if the service domain is starved for resources, than all of the clients depending on it will be delayed. Use ldm list to see if the domains have high CPU utilization and adjust appropriately.

It's also essential to have appropriate virtual disk backends. No virtualization enhancement is going to make a single disk super-fast; a single spindle is going to max out at 150 to 300 IOPS no matter what you do. This is really intended for the robust disk resources needed for an I/O intensive application, just as would be the case for non-virtualized systems.

While there may be some benefits for virtual disks backed by files or ZFS 'zvols', the emphasis and measurements have focused on production I/O configurations based on enterprise storage arrays presenting many LUNs.

The big picture

Now, Oracle VM Server for SPARC can be used with virtual I/O that maintains flexibility without compromising on performance, for both network and disk I/O. This can be applied to the most demanding applications with full performance.

Properly configured systems, in terms of choice of device backends and domain configuration, can achieve performance comparable to what they would receive in a non-virtualized environment, while still maintaining the features of dynamic reconfiguration (add and remove virtual devices as needed) and live migration. For upwards compatibility, and for applications requiring the ultimate in performance, we continue the availability of physical I/O, using root complex domains that own entire PCIe buses, or using SR-IOV. That said, the improved performance of virtual I/O means that there will be fewer instances in which physical I/O is necessary - virtual I/O will increasingly be the recommended way to provide I/O without compromising performance or functionality.

June 13, 2014

Darryl GoveEnabling large file support

June 13, 2014 16:25 GMT

For 32-bit apps the "default" maximum file size is 2GB. This is because the interfaces use the long datatype which is a signed int for 32-bit apps, and a signed long long for 64-bit apps. For many apps this is insufficient. Solaris already has huge numbers of large file aware commands, these are listed under man largefile.

For a developer wanting to support larger files, the obvious solution is to port to 64-bit, however there is also a way to remain with 32-bit apps. This is to compile with large file support.

Large file support provides a new set of interfaces that take 64-bit integers, enabling support of files greater than 2GB in size. In a number of cases these interfaces replace the existing ones, so you don't need to change the source. However, there are some interfaces where the long type is part of the ABI; in these cases there is a new interface to use.

The way to find out what flags to use is through the command getconf LFS_CFLAGS. The getconf command returns environment settings, and in this case we're asking it to provide the C flags needed to compile with large file support. It's useful to take a look at the other information that getconf can provide.

The documentation for compiling with large file support talks about both the flags that are needed, and what functions need to be changed. There are two functions that do not map directly onto large file equivalents because they have a long data type in their prototypes. These two functions are fseek and ftell; calls to these two functions need to be replaced by calls to fseeko and ftello

Alan HargreavesWhy you should Patch NTP

June 13, 2014 00:46 GMT

This story about massive DDoS attacks using monlist as a threat vector give an excellent reason as to why you should apply the patches listed on the Sun Security Blog for NTP.


Adam LeventhalEnterprise support and the term abroad

June 13, 2014 00:03 GMT

Delphix custsignsomers include top companies across a wide range of industries, most of them executing around the clock. Should a problem arise they require support from Delphix around the clock as well. To serve our customers’ needs we’ve drawn from industry best-practices while recently mixing in an unconventional approach to providing the best possible customer service regardless of when a customer encounters a problem.

There are three common approaches to support: outsourcing, shifts, and “follow the sun”. Outsourcing is economical but quality and consistency suffer especially for difficult cases. Asking outstanding engineers to cover undesirable shifts is unappealing. An on-call rotation (shifts “lite”) may be more tolerable but can be inadequate — and stressful — in a crisis. Hiring a geographically dispersed team — whose natural work day “follows the sun” — provides a more durable solution but has its own challenges. Interviewing is tough. Training is tougher. And maintaining education and consistency across the globe is nearly impossible.

Live communication simplifies training. New support engineers learn faster with live — ideally local — mentors, experts on a wide range of relevant technologies. The team is more able to stay current on the product and tools by working collaboratively. In a traditional “follow the sun” model, the first support engineer in a new locale is doubly disadvantaged — the bulk of the team is unavailable during the work day, and there’s no local experienced team for collaboration.

At Delphix, we don’t outsource our support engineering. We do hire around the globe, and we do have an on-call schedule. We’ve also drawn inspiration from an innovative approach employed by Moneypenny, a UK-based call center. Moneypenny had resisted extending their service to off-hours because they didn’t want to incur the detrimental effects of shift work to employee’s health and attitude. They didn’t want to outsource work because they were afraid customer satisfaction would suffer. Instead they took the novel step of opening an Auckland office — 12 hours offset — and sending employees for 4-6 months on a voluntary basis.

I was idly listening to NPR in the car when I heard the BBC report on Moneypenny. Their customers and employees raved about the approach. It was such a simple and elegant solution to the problem of around the clock support; I pulled over to consider the implications for Delphix Support. The cost of sending a support engineer to a remote destination would be paltry compared with the negative consequences associated with other approaches to support: weak hires, inconsistent methodologies, insufficient mentorship, not to mention underserved, angry, or lost customers. And the benefits to customers and the rest of the team would again far exceed the expense.

We call it the Delphix Support “term abroad.” As with a term abroad in school, it’s an opportunity for one of our experienced support engineers to work in a foreign locale. Delphix provides lodging in a sufficiently remote timezone with the expectation of a fairly normal work schedule. As with Moneypenny, that means that Delphix is able to provide the same high level of technical support at all times of day. In addition, that temporarily remote engineer can help to build a local team by recruiting, interviewing, and mentoring.

David — the longest tenured member of the Delphix support team — recently returned from a term abroad to the UK where he joined Scott, a recent hire and UK native. Scott spent a month working with David and others at our Menlo Park headquarters. Then David joined Scott in the UK to continue his mentorship and training. Both worked cases that would have normally paged the on-call engineer. A day after arriving in the UK, in fact, David and Scott handled two cases that would have otherwise woken up an engineer based in the US.

Early results give us confidence that the term abroad is going to be a powerful and complementary tool. Delphix provides the same high quality support at all hours, while expanding globally and increasing the satisfaction of the team. And it makes Delphix Support an even more attractive place to work for those who want to opt in to a little global adventure.

June 12, 2014

Steve TunstallNew expansion for the ZS3-2

June 12, 2014 00:07 GMT

If you missed the announcement, the ZS3-2 can now grow to 16 disk trays, up from 8. It can now also support four of any kind of IO card. 

I know, I know, I have not done anything in this blog for a while now. That was not by design. There will be a nice upgrade for the 2013 code (OS8.2) coming soon. When it comes out I will certainly blog about it ASAP.

June 11, 2014

Bryan CantrillBroadening node.js contributions

June 11, 2014 16:15 GMT

Several years ago, I gave a presentation on corporate open source anti-patterns. Several of my anti-patterns were clear and unequivocal (e.g., don’t announce that you’re open sourcing something without making the source code available, dummy!), but others were more complicated. One of the more nuanced anti-patterns was around copyright assignment and contributor license agreements: while I believe these constructs to be well-intended (namely, to preserve relicensing options for the open source project and to protect that project from third-party claims of copyright and patent infringement), I believe that they are not without significant risks with respect to the health of the community. Even at their very best, CLAs and copyright assignments act as a drag on contributions as new corporate contributors are forced to seek out their legal department — which seems like asking people to go to the dentist before their pull request can be considered. And that’s the very best case; at worst, these agreements and assignments grant a corporate entity (or, as I have personally learned the hard way, its acquirer) the latitude for gross misbehavior. Because this very worst scenario had burned us in the illumos community, illumos has been without CLA and copyright assignment since its inception: as with Linux, contributors hold copyright to their own contributions and agree to license it under the prevailing terms of the source base. Further, we at Joyent have also adopted this approach in the many open source components we develop in the node.js ecosystem: like many (most?) GitHub-hosted projects, there is no CLA or copyright assignment for node-bunyan, node-restify, ldap.js, node-vasync, etc. But while many Joyent-led projects have been without copyright assignment and CLA, one very significant Joyent-led project has had a CLA: node.js itself.

While node.js is a Joyent-led project, I also believe that communities must make their own decisions — and a CLA is a sufficiently nuanced issue that reasonable people can disagree on its ultimate merits. That is, despite my own views on a CLA, I have viewed the responsibility for the CLA as residing with the node.js leadership team, not with me. The upshot has been that the node.js status quo of a CLA (one essentially inherited from Google’s CLA for V8) has remained in place for several years.

Given this background you can imagine that I found it very heartwarming that when node.js core lead TJ Fontaine returned from his recent Node on the Road tour, one of the conclusions he came to was that the CLA had outlived its usefulness — and that we should simply obliterate it. I am pleased to announce that today, we are doing just that: we have eliminated the CLA for node.js. Doing this lowers the barrier to entry for node.js contributors thereby broadening the contributor base. It also brings node.js in line with other projects that Joyent leads and (not unimportantly!) assures that we ourselves are not falling into corporate open source anti-patterns!

Darryl GoveArticle in Oracle Scene magazine

June 11, 2014 16:09 GMT

Oracle Scene is the quarterly for the UK Oracle User Group. For the current issue, I've contributed an article on developing with Solaris Studio.

June 06, 2014

Jeff SavitBest Practices - Top Ten Tuning Tips

June 06, 2014 23:47 GMT
This is the original version of this blog entry kept for reference. Please refer to the updated version.
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly called Logical Domains)

Top Ten Tuning Tips

Oracle VM Server for SPARC is a high performance virtualization technology for SPARC servers. It provides native CPU performance without the virtualization overhead typical of hypervisors. The way memory and CPU resources are assigned to domains avoids problems often seen in other virtual machine environments, and there are intentionally few "tuning knobs" to adjust.

However, there are best practices that can enhance or ensure performance. This blog post lists and briefly explains performance tips and best practices that should be used in most environments. Detailed instructions are in the Oracle VM Server for SPARC Administration Guide. Other important information is in the Release Notes. (The Oracle VM Server for SPARC documentation home page is here.)

Big Rules / General Advice

Some important notes first:
  1. "Best practices" may not apply to every situation. There are often exceptions or trade-offs to consider. We'll mention them so you can make informed decisions. Please evaluate these practices in the context of your requirements and systems.
  2. Best practices, and "rules of thumb" change over time as technology changes. What may be "best" at one time may not be the best answer later as new features are added or enhanced.
  3. Continuously measure, and tune and allocate resources to meet service level objectives. Then do something else - it's rarely worth trying to squeeze the last bit of performance when performance objectives have been achieved!
  4. Standard Solaris tools and tuning apply in a domain or virtual machine just as on bare metal: the *stat tools, DTrace, driver options, TCP window sizing, /etc/system settings, and so on.
  5. The answer to many performance questions is "it depends". Your mileage may vary. In other words: there are few fixed "rules" that say how much performance boost you'll achieve from a given practice.

The Tips

  1. Keep firmware, Logical Domains Manager, and Solaris up to date - Performance enhancements are continually added to Oracle VM Server for SPARC, so staying current is important.

    That include the firmware, which is easy to "install once and forget". The firmware contains much of the logical domains infrastructure, so it should be kept current. The Release Notes list minimum and recommended firmware and software levels needed for each platform.

    Some enhancements improve performance automatically just by installing the new versions. Others require administrators configure and enable new features. The following items will mention them as needed.

  2. Allocate sufficient CPU and memory resources to each domain, especially control, I/O and service domains - This should be obvious, but cannot be overemphasized. If a service domain is short on CPU, then all of its clients are delayed. Within the domain you can use vmstat, mpstat, and prstat to see if there is pent up demand for CPU. Alternatively, issue ldm list or ldm list -l from the control domain.

    Good news: you can dynamically add and remove CPUs to meet changing load conditions, even on the control domain. You can do this manually or automatically with the built-in policy-based resource manager. That's a Best Practice of its own, especially if you have guest domains with peak and idle periods.

    The same applies to memory. Again, the good news is that standard Solaris tools can be used to see if a domain is low on memory, and memory can also added to or removed from a domain. Applications need the same amount of RAM to run efficiently in a domain as they do on bare metal, so no guesswork or fudge-factor is required. Logical domains do not oversubscribe memory, which avoids problems like unpredictable thrashing.

    For the control domain and other service domains, a good starting point is at least 1 core (8 vCPUs) and 4GB or 8GB of memory. Actual requirements must be based on system load: small CPU and memory allocations were appropriate with older, smaller LDoms-capable systems, but larger values are better choices for the demanding, higher scaled systems and applications now used with domains, Today's faster CPUs are capable of generating much higher I/O rates than older systems, and service domains have to be suitably provisioned to support the load. Don't starve the service domains! Two cores and 8GB of RAM are a good starting point if there is substantial I/O load.

    Live migration is known to run much faster if the control domain has at least 2 cores, both for total migration time and suspend time, so don't run with a minimum-sized control domain if live migration times are important.

    In general, add another core if ldm list shows that the control domain is busy. Add more RAM if you are hosting lots of virtual devices are running agents, management software, or applications in the control domain and vmpstat -p shows that you are short on memory. Both can be done dynamically without an outage.

  3. Allocate domains on core boundaries - SPARC servers supporting logical domains have multiple CPU cores with 8 CPU threads each. Avoid "split core" situations in which CPU cores are shared by more than one domain (different domains have CPU threads on the same core). This can reduce performance by causing "false cache sharing" in which domains compete for a core's Level 1 cache. The impact on performance is highly variable, depending on the domains' behavior.

    Split core situations are easily avoided by always assigning virtual CPUs in multiples of 8 (ldm set-vcpu 8 mydomain or ldm add-vcpu 24 mydomain). It is rarely good practice to give tiny allocations of 1 or 2 virtual CPUs, and definitely not for production workloads. If fine-grain CPU granularity is needed for multiple applications, deploy them in zones within a logical domain for sub-core resource control.

    Alternatively, use the whole core constraint (ldm set-core 1 mydomain or ldm add-core 3 mydomain). The whole-core constraint requires a domain be given its own cores, or the bind operation will fail. This prevents unnoticed sub-optimal configurations.

    In most cases the logical domain manager avoids split-core situations even if you allocate fewer than 8 virtual CPUs to a domain. The manager attempts to allocate different cores to different domains even when partial core allocations are used. It is not always possible, though, so the best practice is to allocate entire cores.

    For a slightly lengthier writeup, see Best Practices - Core allocation.

  4. Use Solaris 11 in the control and service domains - Solaris 11 contains functional and performance improvements over Solaris 10 (some will be mentioned below), and will be where future enhancements are made. It is also required to use Oracle VM Manager with SPARC. Guest domains can be a mixture of Solaris 10 and Solaris 11, so there is no problem doing "mix and match" regardless of which version of Solaris is used in the control domain. It is a best practice to deploy Solaris 11 in the control domain even if you haven't upgraded the domains running applications.
  5. NUMA latency - Servers with more than one CPU socket, such as a T4-4, have non-uniform memory access (NUMA) latency between CPUs and RAM. "Local" memory access from CPUs on the same socket has lower latency than "remote". This can have an effect on applications, especially those with large memory footprints that do not fit in cache, or are otherwise sensitive to memory latency.

    Starting with release 3.0, the logical domains manager attempts to bind domains to CPU cores and RAM locations on the same CPU socket, making all memory references local. If this is not possible because of the domain's size or prior core assignments, the domain manager tries to distribute CPU core and RAM equally across sockets to prevent an unbalanced configuration. This optimization is automatically done at domain bind time, so subsequent reallocation of CPUs and memory may not be optimal. Keep in mind that that this does not apply to single board servers, like a T4-1. In many cases, the best practice is to do nothing special.

    To further reduce the likelihood of NUMA latency, size domains so they don't unnecessarily span multiple sockets. This is unavoidable for very large domains that needs more CPU cores or RAM than are available on a single socket, of course.

    If you must control this for the most stringent performance requirements, you can use "named resources" to allocate specific CPU and memory resources to the domain, using commands like ldm add-core cid=3 ldm1 and ldm add-mem mblock=PA-start:size ldm1. This technique is successfully used in the SPARC Supercluster engineered system, which is rigorously tested on a fixed number of configurations. This should be avoided in general purpose environments unless you are certain of your requirements and configuration, because it requires model-specific knowledge of CPU and memory topology, and increases administrative overhead.

  6. Single thread CPU performance - Starting with the T4 processor, SPARC servers supporting domains can use a dynamic threading mode that allocates all of a core's resources to a thread for highest single thread performance. Solaris will generally detect threads that will benefit from this mode and "do the right thing" with little or no administrative effort, whether in a domain or not. An excellent writeup can be found in Critical Threads Optimization in the Observatory blog. Mentioned for completeness sake: there is also a deprecated command to control this at the domain level by using ldm set-domain threading=max-ipc mydomain, but this is generally unnecessary and should not be done.
  7. Live Migration - Live migration is CPU intensive in the control domain of the source (sending) host. Configure at least 1 core (8 vCPUs) to the control domain in all cases, but optionally add an additional core to speed migration and reduce suspend time. The core can be added just before starting migration and removed afterwards. If the machine is older than T4, add crypto accelerators to the control domains. No such step is needed on later machines.

    Perform migrations during low activity periods. Guests that heavily modify their memory take more time to migrate since memory contents have to be retransmitted, possibly several times. The overhead of tracking changed pages also increases CPU utilization.

  8. Network I/O - Configure aggregates, use multiple network links, use jumbo frames, adjust TCP windows and other systems settings the same way and for the same reasons as you would in a non-virtual environments.

    Use RxDring support to substantially reduce network latency and CPU utilization. To turn this on, issue ldm set-domain extended-mapin-space=on mydomain for each of the involved domains. The domains must run Solaris 11 or Solaris 10 update 10 and later, and the involved domains (including the control domain) will require a domain reboot for the change to take effect. This also requires 4MB of RAM per guest.

    If you are using a Solaris 10 control or service domain for virtual network I/O, then it is important to plumb the virtual switch (vsw) as the network interface and not use the native NIC or aggregate (aggr) interface. If the native NIC or aggr interface is plumbed, there can be a performance impact sinces each packet may be duplicated to provide a packet to each client of the physical hardware. Avoid this by not plumbing the NIC and only plumbing the vsw. The vsw doesn't need to be plumbed either unless the guest domains need to communicate with the service domain. This isn't an issue for Solaris 11 - another reason to use that in the service domain. (thanks to Raghuram for great tip)

    As an alternative to virtual network I/O, use Direct I/O (DIO) or Single Root I/O Virtualization (SR-IOV) to provide native-level network I/O performance. They currently have two main limitations: they cannot be used in conjunction with live migration, and cannot be dynamically added to or removed from a running domain, but provide superior performance. SR-IOV is described in an excellent blog article by Raghuram Kothakota.

  9. Disk I/O - For best performance, use a whole disk backend (a LUN or full disk). Use multiple LUNs to spread load across virtual and physical disks and reduce queueing (just as you would do in a non-virtual environment). Flat files in a file system are convenient and easy to set up as backends, but have less performance. For completely native performance, use a PCIe root complex domain and physical I/O.

    ZFS can also be used for disk backends. This provides flexibility and useful features (clones, snapshots, compression) but can impose overhead compared to a raw device. Note that local or SAN ZFS disk backends preclude live migration, because a zpool can be mounted to only one host at a time. When using ZFS backends for virtual disk, use a zvol rather than a flat file - it performs much better. Also: make sure that the ZFS recordsize for the ZFS dataset matches the application (also, just as in a non-virtual environment). This avoids read-modify-write cycles that inflate I/O counts and overhead. The default of 128K is not optimal for small random I/O.

  10. Networked disk on NFS and iSCSI - NFS and iSCSI also can perform quite well if an appropriately fast network is used. Apply the same network tuning you would use for in non-virtual applications. For NFS, specify mount options to disable atime, use hard mounts, and set large read and write sizes.

    If the NFS and iSCSI backends are provided by ZFS, such as in the ZFS Storage Appliance, provide lots of RAM for buffering, and install write-optimized solid-state disk (SSD) "logzilla" ZFS Intent Logs (ZIL) to speed up synchronous writes.

Summary

By design, logical domains don't have a lot of "tuning knobs", and many tuning practices you would do for Solaris in a non-domained environment apply equally when domains are used. However, there are configuration best practices and tuning steps you can use to improve performance. This blog note itemizes some of the most effective (and least exotic) performance best practices.

June 04, 2014

Darryl GovePretty printing using indent

June 04, 2014 16:38 GMT

If you need to pretty-print some code, then the compiler comes with indent for exactly this purpose!

May 28, 2014

Joerg MoellenkampSolaris 11.2: Time based access limitations

May 28, 2014 12:58 GMT
Let's assume you want to limit ssh login for user junior to a certain timespan, let's say weekdays between 13:10 and 17:00. With Solaris 11.2 it's really easy to limit access to certain services based on times.
Continue reading "Solaris 11.2: Time based access limitations "

Joerg MoellenkampUse ntp, not rdate

May 28, 2014 08:18 GMT
Just as i saw it in a blog on blogs.oracle.com (not link here) and my comment isn't published there: I wouldn't use rdate for syncing time between servers. rdate is largely considered as totally obsolete (installing pkg:/network/legacy-remote-utilities should tell you something just by the red part of the name of the package) and using ntp is really easy and has a much better mechanism to get time synced . I wrote a tutorial about that quite a while ago: "Configuring an NTP client in Solaris 11"

May 27, 2014

Garrett D'Amoreillumos, identification, and versions

May 27, 2014 17:30 GMT
Recently, there has been a bit of a debate on the illumos mailing lists, beginning I suppose with an initial proposal I made concerning the output from the uname(1) command on illumos, which today, when presented with "-s -r" as options, emits "SunOS 5.11".

A lot of the debate centers on various ways that software can or cannot identify the platform it is running on, and use that information to make programmatic decisions about how to operate. Most often these are decisions made at compile time, by tools such as autoconf and automake.

Clearly, it would be better for software not to rely on uname for programmatic uses, and detractors of my original proposal are correct that even in the Linux community, the value from uname cannot be relied upon for such use.  There are indeed better mechanisms to use, such as sysconf(3c), or the ZFS feature flags, to determine individual capabilities.  Indeed, the GNU autotools contains many individual tests for such things, and arguably discourages the use of uname except as a last resort.

Yet there can be no question that there are a number of packages that do make such use.  And changes to the output from uname become risky to such packages.

But perversely, not changing the output from uname also creates risk for such packages, as the various incarnations of SunOS 5.11 become ever less like one another.  Indeed, illumos != SunOS, and uname has become something of a lie over the past 4 years or so.

Clearly, the focus for programmatic platform determination -- particularly for enabling features or behaviors, should be to move away from uname.  (Actually, changing uname may actually help package maintainers in identifying this questionable behavior as questionable, although there is no doubt that such a change would be disruptive to them.)

But all this debate completely misses the other major purpose of uname's output, which is to identify the platform to humans.  Be they administrators, or developers, or distributors.  There is no question in mind that illumos' failure to self identify, and to have a regular "release" schedule (for whatever a release really means in this regard) is harmful.

The distributors, such as Joyent (SmartOS), would prefer that people only identify their particular distribution, and I believe that much of the current argument from them stems from two primary factors.  First, they see additional effort in any change, with no direct benefit to them.  Second, in some ways these distributors are disinclined to emphasize illumos itself.   Some of the messages sent (either over IRC or in the email thread) clearly portray some resentment towards the rest of the illumos ecosystem, especially some of the niche players), as well as a very heavy handed approach by one commercial concern towards the rest of the ecosystem.

Clearly, this flies in the spirit of community cooperation in which illumos was founded.  illumos should not be a slave to any single commercial concern, and nobody should be able to exercise a unilateral veto over illumos, or shut down conversations with a single "thermonuclear" post.

So, without regard to the merits, or lack thereof, of changing uname's output, I'm quite certain, with sufficient clear evidence of my own gathering, that illumos absolutely needs to have a regularly scheduled "releases" of illumos, with humanly understandable release identifiers.  The lack of both the releases, and the identifiers to go with them, hinders meaningful reviews, hurts distributors (particularly smaller ones), and makes support for the platform harder, particularly when the questions of support are considered across distributions.  All of these hinder adoption of the illumos platform; clearly an undesirable outcome.

Some would argue that we could use git tags for the identifiers.  From a programmatic standpoint, these would be easy to collect.  Although they have problems as well (particularly for distributions which neither use git, or use a private fork that doesn't preserve our git versions), there are worse problems.

Specifically, a humans aren't meant to derive meaning from something like "e3de96f25bd2ea4282eea2d1a86c1bebac8950cb".   While Neo could understand this, most of use merely mortal individuals simply can't understand such tags.  Worse, there is no implied sequencing here.  Is "e3de96f25bd2ea4282eea2d1a86c1bebac8950cb" newer than "d1007364f5b14efdd7d6ba27aa458669a6365d48" ?  You can't do a meaningful comparison without examining the actual git history.

This makes it hard to guess whether a given running release has a bug integrated or not.  It makes it hard to have conversations about the platform.  It even makes it hard for independent reviewers of the platform  to identify anything meaningful about the platform in the context of reviewing a distribution.

Furthermore, when we talk about the platform, its useful for version numbers to convey more than just serial meaning.  In particular, version numbers help set expectations for developers.  Historically Solaris (or rather SunOS, from which illumos is derived), set those expectations in the form of stability guarantees.   A given library interface might be declared as Stable, or Evolving, (or Committed or Uncommitted), or Obsolete.  This was a way to convey to developers the relative risks of an interface (in terms of interface change), and it set some ground rules for rate of change.  Indeed, Solaris (and SunOS) relied upon a form of semantic versioning, and many of the debates in Sun's architectural leadership for Solaris (PSARC) revolved around these commitments.

Yet today, the illumos distributors seem over willing to throw that bit of our heritage into the waste bin.  A trend, I fear, which ultimately leads to chaos, and an increase in the difficulty of adoption by ISVs and developers.

illumos is one component -- albeit probably the most important by far -- of a distribution.  Like all components, it is useful to be able to determine and talk about it.  This is not noticeably different than Linux, Xorg, gnome, or pretty much any of the other systems which you are likely to find as part of a complete Ubuntu or RedHat distribution.  In fact, our situation is entirely analogous other than we combine our libc and some key utilities and commands with the kernel.

Technically, in Solaris parlance, illumos is a consolidation.  In fact, this distinction has alway been clear.  And historically the way the consolidation is identified is with the contents of the utsname structure, which is what is emitted by uname.  

Furthermore, when we talk about the platform, its useful for version numbers to convey more than just serial meaning

In summary, regardless of whether we feel uname should return illumos or not, there is a critical need, although one not necessarily agreed upon by certain commercial concerns, for the illumos platform to have a release number at a minimum, and this release number must be useful to convey meaningful information to end-users, developers, and distributors alike.  It would be useful if this release number were obtainable in the traditional fashion (uname), but its more important that the numbers convey meaning in the same way across distributions (which means packaging metadata cannot be used, at least not exclusively).

Joerg MoellenkampThank you!

May 27, 2014 08:29 GMT
Before i forget someone: A very big "THANK YOU!" for all the nice wishes and congratulation for my birthday yesterday! :-)

May 25, 2014

Joerg MoellenkampNew Solaris 11.2 features: SMF stencils

May 25, 2014 22:01 GMT
As much as there is often a lot discussion about configuration items inside the SMF repository (like the hostname), it brings an important advantage: It introduces the concept of dependencies to configuration changes. What services have be restarted when i change a configuration item. Do you remember all the services that are dependent on the hostname and need a restart after changing it? SMF solves this by putting the information about dependencies into it configuration. You define it with the manifests.

However, as much configuration you may put into SMF, most applications still insists to get it's configuration inside the traditional configuration files, like the resolv.conf for the resolver or the puppet.conf for Puppet. So you need a way to take the information in the SMF repository and generate a traditional config file with it. In the past the way to do so, was some scripting inside the start method that generated the config file before the service started.

Solaris 11.2 offers a new feature in this area. It introduces a generic method to enable you to create config files from SMF properties. It's called SMF stencils.
Continue reading "New Solaris 11.2 features: SMF stencils"

Joerg MoellenkampRFC7258

May 25, 2014 11:05 GMT
Pervasive monitoring is a technical attack that should be mitigated in the design of IETF protocols, where possible.

'nuff said.

Joerg MoellenkampChanges to Openssl in Solaris 11.2

May 25, 2014 10:44 GMT
This blog entry wasn't on my radar somehow, nevertheless it reports about an important change to OpenSSL on Solaris 11.2. The most important change from my point of view is the inlining of T4/T5 crypto to Solaris 11.2 openssl:
Years and years ago, I worked on the SPARC T2/T3 crypto drivers. On the SPARC T2/T3 processors, the crypto instructions are privileged; and therefore, the drivers are needed to access those instructions. Thus, to make use of T2/T3 crypto hardware, OpenSSL had to use pkcs11 engine which adds lots of cycles going through the thick PKCS#11 session/object management layer, Solaris kernel layer, hypervisor layer to the hardware, and all the way back. However, on SPARC T4/T4+ processors, crypto instructions are no longer privileged; and therefore, you can access them directly without drivers. [...]
What does that means to you? Much improved performance! No more PKCS#11 layer, no more copy-in/copy-out of the data from the userland to the kernel space, no more scheduling, no more hypervisor, NADA! [...]

May 23, 2014

Joerg MoellenkampDarren Moffat about Kernel Zone security

May 23, 2014 17:41 GMT
Darren Moffat wrote an interesting blog entry about the security concept of Kernel zones. In "Overview of Solaris Zones Security Models". He is especially talking about a very small, but very very interesting detail:
Note that what follows is an out line of implementation details that are subject to change at any time: The kernel of a Solaris Kernel Zone is represented as a user land process in a Solaris non global zone. That non global zone is configured with less privilege than a normal non global zone would have and it is always configured as an immutable zone. So if there happened to be an exploit of the guest kernel that resulted in a VM break out you would end up in an immutable non global zone with lowered privilege.

Darryl GoveGeneric hardware counter events

May 23, 2014 16:45 GMT

A while back, Solaris introduced support for generic hardware counter events. This is a really useful feature because it enables you to specify a generic name for the kind of event you want to collect, and Solaris maps this onto the processor-specific hardware counter. The generic names come from PAPI - which is probably as close as we can get to a de-facto standard for performance counter naming. For performance counter geeks like me, this is not quite enough information, I actually want to know the names of the raw counters used. Fortunately this is provided in the generic_events man page:

$ man generic_events
Reformatting page.  Please Wait... done

CPU Performance Counters Library Functions   generic_events(3CPC)

NAME
     generic_events - generic performance counter events

DESCRIPTION
     The Solaris  cpc(3CPC)  subsystem  implements  a  number  of
     predefined, generic performance counter events. Each generic
...
   Intel Pentium Pro/II/III Processor
       Generic Event          Platform Event          Event Mask
     _____________________________________________________________
     PAPI_ca_shr          l2_ifetch                 0xf
     PAPI_ca_cln          bus_tran_rfo              0x0
     PAPI_ca_itv          bus_tran_inval            0x0
     PAPI_tlb_im          itlb_miss                 0x0
     PAPI_btac_m          btb_misses                0x0
     PAPI_hw_int          hw_int_rx                 0x0
...

Joerg MoellenkampA glimpse into Solaris 11.2 specific Puppet components

May 23, 2014 12:01 GMT
Now you have a working Puppet testbed in your Solaris 11.2 beta installation it's time to try some Solaris specific stuff. Oracle a number of additional stuff in order to control Solaris specifics like boot environments, VNICs or SMF. You can find the respective code at java.net.
Continue reading "A glimpse into Solaris 11.2 specific Puppet components"

May 22, 2014

Joerg MoellenkampEvent Announcement: Oracle Business Breakfast "Solaris 11.2" in Hamburg am 12.6.2014

May 22, 2014 08:09 GMT
Am 12.6. findet auch in Hamburg ein Oracle Business Breakfast zum Thema Solaris 11.2 statt. Ich werde dort nicht vortragen, diesmal ist mein Kollege Stefan Hinker dran. Wenn Ihr die Möglichkeit habt nach Berlin zu fahren als Hamburger, würde ich vorschlagen, das ihr euch für Berlin anmeldet (Stefan, no insult intended, doch der VP Solaris Core OS als Vortragender ist ... sagen wir es mal so ... eine seltenere Gelegenheit dem Verantwortlichen die Meinung zu sagen ;-) ). Der Vortrag findet im Oracle CVC in der Innenstadt von Berlin statt und ist ganz gut vom Hauptbahnhof erreichbar. Habt ihr keine Gelegenheit dazu, ist Hamburg eine gute Alternative, die Agenda ist identisch in für beide Events. Die Anmeldung für Hamburg ist hier verfügbar.

May 21, 2014

Joerg MoellenkampBasic Puppet installation with Solaris 11.2 beta

May 21, 2014 19:56 GMT
At the recent announcement we talked a lot about the Puppet integration. But how do you set it up? I want to show this in this blog entry.

However this example i'm using is even useful in practice. Due to the extremely low overhead of zones i'm frequently seeing really large numbers of zones on a single system. Changing /etc/hosts or changing an SMF service property on 3 systems is not that hard. Doing it on a system with 500 zones is ... let say it diplomatic ... a job you give to someone you want to punish.

Puppet can help in this case making of managing the configuration and to ease the distribution. You describe the changes you want to make in a file or set of file called manifest in the Puppet world and then roll them out to your servers, no matter if they are virtual or physical. A warning at first: Puppet is a really,really vast topic. This article is really basic and it doesn't goes more than just even toe's deep into the possibilities and capabilities of Puppet. It doesn't try to explain Puppet ... just how you get it up and running and do basic tests. There are many good books on Puppet. Please read one of them, and the concepts and the example will get much clearer immediately.

To show this, i have a relatively simple setup. A global zone and three non-global zones. In this example i will set up the global zone as a Puppet master and will configure three non-global zone with the Puppet agent, the component that gathers information from the Puppet master in order to do something on the server running the agent.

The Puppet master is called master, the agents are called agent1, agent2 and agent3. This tutorial assumes that you have already set up the zones and that they have working network connection.
Continue reading "Basic Puppet installation with Solaris 11.2 beta"