September 01, 2015

Glynn FosterApplying read-only protection with Immutable Zones

September 01, 2015 00:12 GMT

A while back, I wrote an article that focus on how you can achieve a secure and compliant application deployment from development, through test and production. This article took advantage of a number of Oracle Solaris technologies including Oracle Solaris Zones, Unified Archives, Immutable Zones, Automated Installer and the integrated compliance framework.

We've had a number of customers get really excited by Immutable Zones and being able to lock down their environments. Not only does this provide an additional layer of security, but also protects against the potential cost of human error and ensures that organisations can meet their compliance requirements routinely. Darren Moffat and Casper Dik have already written great blog entries on how to do this, but I've also recently published another How-To article on Applying read-only protection with Oracle Solaris Immutable Zones. In this article we cover immutable non-global and global zones, and show how we can make administrative changes such as applying critical security fixes using the Trusted Path. Hope you find it useful!

August 30, 2015

Robert MilkowskiRemote Management of ZFS servers with Puppet and RAD

August 30, 2015 10:39 GMT
Manuel Zach blogs about how to use Puppet and the new Solaris RAD's REST interface introduced in Solaris 11.3. Solaris RAD is really interesting if you want to manage your servers via a programmatic interface.

August 29, 2015

Peter TribbleTribblix meets MATE

August 29, 2015 11:16 GMT
One of the things I've been working on in Tribblix is to ensure that there's a good choice of desktop options. This varies from traditional window managers (all the way back to the old awm), to modern lightweight desktop environments.

The primary desktop environment (because it's the one I use myself most of the time) is Xfce, but I've had Enlightenment available as well. Recently, I've added MATE as an additional option.

OK, here's the obligatory screenshot:


While it's not quite as retro as some of the other desktop options, MATE has a similar philosophy to Tribblix - maintaining a traditional environment in a modern context. As a continuation of GNOME 2, I find it to have a familiar look and feel, but I also find it to be much less demanding both at build and run time. In addition, it's quite happy with older hardware or with VNC.

Building MATE on Tribblix was very simple. The dependencies it has are fairly straightforward - there aren't that many, and most of them you would tend to have anyway as part of a modern system.

To give a few hints, I needed to add dconf, a modern intltool, itstool, iso-codes, libcanberra, zenity, and libxklavier. Having downloaded the source tarballs, I built packages in this order (this isn't necessarily the strict dependency order, it was simply the most convenient):
The code is pretty clean, I needed a couple of fixes but overall very little needed to be changed for illumos.

The other thing I added was the murrine gtk2 theme engine. I had been getting odd warnings from applications for a while mentioning murrine, but MATE was competent enough to give me a meaningful warning.

I've been pretty impressed with MATE, it's a worthy addition to the available desktop environments, with a good philosophy and a clean implementation.

August 28, 2015

Stefan HinkerOpen a MOS Document by its DocID

August 28, 2015 09:28 GMT

I've always been wondering if there wasn't an easier way to view a MOS document than to go to the portal and "search" for the DocID which I already know.  Using a Firefox search plugin had been an idea for a while.  Now I finally found a few minutes to try that.  It works - find the plugin below.  Simply save it as "DocID.xml" in the searchplugins directory of your firefox profile, restart firefox and enter the DocID in the search field of the browser.

<SearchPlugin xmlns="http://www.mozilla.org/2006/browser/search/" xmlns:os="http://a9.com/-/spec/opensearch/1.1/">
<os:ShortName>MOS DocID</os:ShortName>
<os:Description>MOS DocID Search</os:Description>
<os:InputEncoding>UTF-8</os:InputEncoding>
<os:Image width="16" height="16">data:image/x-icon;base64,
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAAAA3NCSVQICAjb4U/gAAAAU0lEQVQokWP8z0AaYCJRPekaWFB4/3E4kJERwfyPXymGNsb
/yKqRTMJuMyMjkh9wqUaVYsIUwq+H7GAl6GmYAiZMITyqGcgNVoKWYIk4ogHtEx8A02caF75DNeYAAAAASUVORK5CYII=</os:Image>
<SearchForm>http://support.oracle.com/epmos/faces/</SearchForm>
<os:Url type="text/html" method="GET" template="http://support.oracle.com/epmos/faces/DocumentDisplay?id={searchTer
ms}">
</os:Url>
</SearchPlugin> 

(The full text of the plugin might not be displayed, but copy & paste to a file will work.)

Alan HargreavesAn Interoperability Problem (WebLogic, Java and Solaris)

August 28, 2015 00:08 GMT

Last Friday I got pulled in to a very hot customer call.

The issue was best summarised as

Since migrating our WebLogic and database services from AIX to
Solaris, at random times we are seeing the the WebLogic server
pause for a few minutes at a time. This takes down the back
office and client services part of the business for these
periods and is causing increasing frustration on the part of
staff and customers. This only occurs when a particular module
is enabled in WebLogic.

Before going on, I should note that there were other parts to this call which required option changes to WebLogic that also addressed major performance issues (such as the field iPads timing out talking to the WebLogic service) that were being seen, but it was these pauses that were the great concern to the customer.

I’d been given data from a GUDS run which initially made me concerned that we were having pstack(1M) run on the WebLogic java service. Pstack will stop the process while it walks all of the thread stacks. This could certainly have a nasty effect on accessibility to the service.

Unfortunately it was not to be that simple. The pstack collection was actually part of the data gathering process that the WebLogic folks were running. A great example of the Heisenberg Effect while looking at a problem. The effect of this data gathering masked out the possibility of seeing anything else.

I should also mention that in order to keep business running that the customer had disabled the particular module, so we were very limited in when we could get it enabled. Data gathering also required the them to send someone out to site with an iPad (which was the field interface that seemed to be an enabler of the problem). We were pretty much getting one shot at data gathering in any given 24 hour period.

The next day we gathered data with the pstack command commented out.

This was a little more interesting; however, the small amount of time that the issue was present and the fact that we were only gathering small lockstat profiles meant that it was difficult to pin anything down as we were playing hit and miss for us to be taking a profile when the issue was apparent. I did notice that we seemed to be spending more time in page-faulting than I would have expected (about 25% of available cpu at one point!), and about half of that time was being spent spinning on a cross call mutex to flush the newly mapped addresses from all other cpu caches.

With the data from the next days run I also noticed that we had the kflt_user_evict() thread also fighting for the same mutex. My thought at this time was to disable that thread, and for good measure also disable page coalescing by adding the following lines to /etc/system and rebooting.

set kflt_disable=1
set mpss_coalesce_disable=1
set pg_contig_disable=1

It still felt like we were looking at addressing symptoms but not the cause.

We had our breakthrough on Tuesday when we got the following update from the customer:

The iPad transaction which triggers this issue has a label
printing as part of the transaction. The application uses the
Solaris print spooling mechanism to spool prints to a label
printer.  The application code spawns a lp process to do this
spooling. The code is used is something like below:

Runtime.getRuntime().exec("lpr -P <destination> <file-name>");

We have observed that the CPU sys% spike behaviour seems not to
occur once we have disabled this print spooling functionality.
Is there any known issues in Oracle Java 1.7 with spawning
multiple processes from within the JVM ? Note that this
functionality as such has always been working fine on an IBM JVM
on AIX.

This suddenly made the page-faulting make sense.

On chasing up the Java folks, it looks like the default mechanism for this kind of operation is fork()/exec().

Now, fork will completely clone the address space of the parent process. This is what was causing all of the page-faults. The WebLogic Java process had a huge memory footprint and more than 600 threads.

Further discussion with the Java folks revealed that there was an option in the later Java versions that we could have them use to force Java to use posix_spawn() rather than fork/exec, which would stop the address space duplication. Customer needed to start Java with the option:

-Djdk.lang.Process.launchMechanism=posix_spawn

They implemented this along with the other application changes and it looks to have been running acceptably now for a few days.

The hardest part of this call was the fact that without any one of the support groups looking at this (WebLogic, Java and Solaris), it is a virtual certainty that we would not have gotten to root cause and found a solution.

Well done everyone.


August 27, 2015

Duncan HardieRapid fire Weblogic instances

August 27, 2015 22:25 GMT

Today we have made available the latest in our series of VM Templates for Oracle Solaris and our Oracle SPARC systems. This latest set of templates allows you to get an installed and configured Weblogic 12.1.3 instance up and running quickly and with minimal effort. You can download instances for OVM Server for SPARC and Oracle Solaris Zones here.

VM Templates bring a preconfigured, pre-installed, readly to go, deploy in minutes environment to use as is or as a base you can adapt for your production environments. This Weblogic template is simply the latest addition to the family, for example we have VM Template images for Oracle Solaris, you can see a full list here.

Oracle WebLogic Server 12c is for building and deploying enterprise Java EE applications with support for new features for lowering cost of operations, improving performance, enhancing scalability and supporting the Oracle Applications portfolio.

August 18, 2015

Duncan HardieIt's true - Docker coming to Oracle Solaris

August 18, 2015 14:24 GMT

Just a quick post, we announced a while ago that we will be bringing Docker to Oracle Solaris - you can read some background and find a link to the press release here.We'll hopefully be talking about developments at this year's Oracle Openworld so maybe I'll see you there.

August 17, 2015

Marcel HofstetterHow JomaSoft Recommends Sysadmins Virtualize Their Systems

August 17, 2015 06:53 GMT

Interview from Oracle OpenWorld 2013

https://www.youtube.com/watch?v=GuTCAhbFWUw&list=PLrEMJQQgYLXiZ8pxf1IZRPmoYsVlg4SJl&index=6

August 13, 2015

John L. HenningGoveBye

August 13, 2015 00:49 GMT
When UltraSPARC IV was shiny and new
We waded through profiles looking for clues.

We talked to the masters of code generators,
"Can you run my job faster?" we begged and we whined,
"Not without data" they would remind,
"You may RFE us, but make it succinct,
with just the right proof and no extra stink."

Your vision was clear.  All data should be
Under your mouse, and ready to see:
The forest with one click, the tree with another;
Or the roots and the branches without any bother.

"What code snips are hot?"
Your answer was SPOT.

I'll stop rhyming soon; I shall not plow through
The detail of all our adventures with you.
Let it just be said that you guided, you taught,
You diagnosed and classified and problems were caught.

Our respect for you grew, more and more;
I cannot believe that you're 404.

August 11, 2015

OpenStackChat to us at OpenStack Silicon Valley 2015 Event!

August 11, 2015 22:31 GMT

Oracle is sponsoring the upcoming OpenStack Silicon Valley 2015 event in a couple of weeks time. We're looking forward to participating in the discussions, and we will have a sponsored session with Markus Flierl, VP of Solaris engineering (not currently posted on the schedule).

We've made some pretty great progress in OpenStack over the past year across all of the software and hardware portfolios that I mentioned in my recent OpenStack Silicon Valley blog post. The IT industry is moving fast and with the recent interest in containers, agile development and microservices, we're excited to see the standardization through recent efforts including the Open Container Initiative and our announcement to include Docker into Oracle Solaris. We'd love to chat to you about what we're doing with OpenStack, our Software in Silicon strategy at Oracle and some of the trends we're seeing in our customer base, at our booth E4. Come join us!

Darryl GoveNew blog location

August 11, 2015 17:00 GMT

This will be my last blog entry here. I'm leaving Oracle after 16 amazing years. I've had a fantastic time, and I'm glad to have been able share some of that with you. If you wish to read about my continuing adventures please head along to http://www.darrylgove.com/.

All the best, Darryl.

August 10, 2015

OpenStackSwift Object Storage with ZFS Storage Appliance

August 10, 2015 23:50 GMT

Jim Kremer has written a new blog that shows you how to configure Swift to take advantage of an Oracle ZFS Storage Appliance. Jim walks step by step how to configure OpenStack Swift into a highly available cluster using an Oracle ZFS Storage Appliance as the backend storage over NFSv4.

Jim summarizes the unique benefits that using a ZFS Storage Appliance brings to OpenStack environments over a typical Swift deployment:

For more information, see Solaris Swift using ZFS Storage Appliance

Peter TribbleWhither open source?

August 10, 2015 09:02 GMT
According to the Free Software Definition, there are 4 essential freedoms:


Access to the source code and an open-source license are necessary preconditions for software freedom, but not sufficient.

And, unfortunately, we are living in an era where it is becoming ever more difficult to exercise the freedoms listed above.

Consider freedom 0. In the past, essentially all free software ran perfectly well on essentially every hardware platform and operating system. At the present time, much of what claims to be open-source software is horrendously platform-specific - sometimes by ignorance (I don't expect every developer to be able to test on all platforms), but there's a disturbing trend of deliberately excluding non-preferred platforms.

There is increasing use of new languages and runtimes, which are often very restricted in terms of platform support. If you look at some of the languages like Node.JS, Go, and Rust, you'll see that they explicitly target the common hardware architectures (x86 and ARM), deliberately and consciously excluding other platforms. Add to that the trend for self-referential bootstrapping (where you need X to build X) and you can see other platforms frozen out entirely.

So, much of freedom 0 has been emasculated. What of freedom 1?

Yes, I might be able to look at the source code. (Although, in many cases, it is opaque and undocumented.) And I might be able to crack open an editor and type in a modification. But actually being able to use that modification is a whole different ball game.

Actually building software from source often enters you into a world of pain and frustration. Fighting your way through Dependency Hell, struggling with arcane and opaque build systems, becoming frustrated with the vagaries of the autotools (remember how the configure script works - it makes a bunch of random and unsubstantiated guesses about the state of your system, the ignores half the results, and often needs explicitly overriding, making a mockery of the "auto" part), only to discover that "works on my system" is almost a religion.

Current trends like Docker make this problem worse. Rather than having to pay lip-service to portability by having to deal with the vagaries of multiple distributions, authors can now restrict the target environment even more narrowly - "works in my docker image" is the new normal. (I've had some developers come out and say this explicitly.)

The conclusion: open-source software is becoming increasingly narrow and proprietary, denying users the freedoms they deserve.

August 07, 2015

Jeff SavitOracle VM Performance and Tuning - Part 1

August 07, 2015 23:06 GMT

Oracle VM Performance

I've been interested in virtual machine performance a very long time, and while I've written Best Practices items that included performance, this post starts a series of articles exclusively focused on Oracle VM performance. Topics will include:

I intend this series to be relatively high-level in essays that relate concepts to think about, illustrated by some examples and with links to other resources for details.

The changed VM performance landscape

...then...

There was a time when working on virtual machine performance, or system performance in general, required fine tuning of many system parameters. CPU resources were scarce, with many virtual machines on servers with one or a few CPUs, so we carefully managed CPU scheduling parameters: priorities or weights for VMs, distinction of interactive vs. non-interactive VMs to prioritize access, duration of time slices. Memory was expensive and capacity was limited, so we aggressively over-subscribed memory (some VM systems still do but it's less a factor than it once was) and administered memory management: page locking and reservation, working set controls, life time of idle pages before displacement. Since we had to page (sometimes loosely referred to as "swapping", though there is a difference), we created hierarchies of page and swap devices and spread the load over multiple disks. Disks were slow and uncached, so we sometimes individually positioned files to reduce rotational and seek latency. It took a lot of skilled effort to have systems perform well.

...and now...

These items have less impact in modern virtualization systems. Many of these issues have been eliminated or rendered less important by architecture, product design, or the abundance of system resources seen with today's systems. In general, administering plentiful resources for optimal performance is very different from apportioning scarce resources. In particular, Oracle VM products eliminate as much of the effort as possible, and design in best practices and performance choices to fit today's applications and hardware to perform well "out of the box".

Today's servers have lots of CPU capacity, and we don't need to run at 95% CPU busy (though we can, of course) to make them cost effective, so we don't tweak CPU scheduling parameters to prevent starvation as we once did. Oracle VM Server for x86 lets you set CPU caps and weights as needed, and Oracle VM Server for SPARC dedicates CPU threads or cores directly to guests, so the topic simply evaporates on that platform. Having enough CPU cycles to get the job done is rarely the problem now. Instead, we now tune for scale and to handle Non Uniform Memory Access (NUMA) properties of large systems.

Neither Oracle VM platform over-subscribes memory, so we don't have to worry about managing working sets, virtual to real ratios, or paging performance. That's true in today's non-VM environments too, where it's safe to say that if you're swapping, you've already suffered performance and should just add memory. This eliminates an entire category of problematic performance management that often (in the bad old days) resulted in pathological performance issues. Friends don't let friends swap. Instead, what memory tuning remains is around NUMA latency and alignment with CPUs.

There are still performance problems and the need to manage performance remains - or why would I be writing this? Effort has moved to other parts of the technology stack - network performance is much more important than it once was, for instance. Workloads are more demanding, less predictable, and are subject to scale up/down requirements far beyond those of the earlier systems. There still are, and probably always will be, constraints on performance and scale that have to be understood and managed. That said, the goal of Oracle VM is to increasingly have systems "do the right thing" for performance with as little need for administrative effort as possible. I'll illustrate examples of that in this series of posts.

But first, a classic problem

Here's a problem that existed in the old days and persists today in different form, which makes it interesting to think about.

Let's say it was 8am on a Monday morning back when people logged into timeshared systems for their daily work. The first people to get to their desks, coffee in hand, would get really good response time reading their e-mail until the laggards showed up. Response time could degrade as users logged in if utilization of a key system component reached a performance sensitive level. However, suitably sized and tuned systems could handle this or other peaks related to business cycles. In my case it was tied to the open and close of the New York stock exchanges or timing of large derivative transactions.

Now suppose there was a system outage: when the system came back up, all the interrupted users would login at the same time to resume their work, and performance could be terrible. All the users would place demand on the system at the same time, instead of the normal mix of busy and idle (think time) users, placing pressure on CPU, memory, and I/O, and single-threading any serialized portion of the system. Viewed from the perspective of "getting work done", it could often be faster to have fewer active people logged on, so they could complete their effort and drop to idle state, than have everyone trying at once. You could even see this with plain old batch jobs: if there was excessive multiprogramming, reducing the number of tasks competing for resources could make the entire collection of work run faster. Classic congestion effect, like too many cars on the same highway. Systems of the day had clever controls to prevent excessive multiprogramming, usually driven by memory consumption to prevent thrashing on the small RAM capacities available: a subset of users whose working sets fit in RAM would be allowed to run, while other users had to wait. The fortunate users either finished their work and became idle so their memory could be re-used by others, or were evicted after a while to give somebody else a turn (that's where address space swapping comes in).

That applied to the old timesharing systems, and applies even more to today's virtual machine systems, especially since the CPU, memory, or I/O footprint of starting up a VM is substantial. This is such a well known problem that it is referred to as a "boot storm" - a Google search of "boot storm" yields over 40 million hits - this is a well known problem! Let's consider a few reasons this has such a powerful effect:

How can a boot storm be handled? A number of methods can be used:

These are well-known approaches, for a problem that has been around for a very long time in one form or another.

Summary

While the landscape has changed, and we no longer tune to the same factors we once managed, the need to manage performance has not disappeared. Further articles in this series will discuss different aspects of Oracle VM performance management. Some starting concepts:

Resources

For additional resources about Oracle VM Server

Robert MilkowskiKernel Zones - Adding Local Disks

August 07, 2015 12:12 GMT
It is possible to add a disk drive to a kernel zone by specifying its physical location intead of CTD.


add device
set storage=dev:chassis/SYS/HDD23/disk
set id=1
end
This is very nice on servers like x5-2l with pass-thru controller when all 26 local disks are visible like this.

August 04, 2015

Roch BourbonnaisConcurrent Metaslab Syncing

August 04, 2015 12:42 GMT
As hinted in my previous article, spa_sync() is the function that runs whenever a pool needs to update it's internal state. That thread is the master of ceremony for the whole TXG syncing process. As such it is the most visible of thread. At the same time, it's the thread we actually want to see idling. The spa_sync thread is setup to generate work for taskqs and then wait for the work to happen. That's why we often see spa_sync waiting in zio_wait or taskq_wait. This is what we expect that thread to be doing.

Let's dig into this process a bit more. While we do expect spa_sync to mostly be waiting, it is not the only thing that it does. Before it waits, it has to farm out work to those taskqs. Every TXG, spa_sync wakes up and starts to create work for the zio taskq threads. Those threads immediately pick up the initial tasks posted by spa_sync and just as quickly generate load for pool devices. Our goal is just to keep taskqs and more importantly device fed with work.

And so, we have this single spa_sync thread, quickly posting work to zio taskqs and threads working on checksum computation and other CPU intensive tasks. This model ensures that the disk queues are non-empty for the duration of the data update portion of a TXG.

In practice, that single spa_sync thread is able to generate the tasks to service the most demanding environment. When we hit some form of pool saturation, we typically see spa_sync waiting on a zio and that is just the expected sign that something at the I/O level below ZFS is the current limiting factor.

But, not too long ago, there was a grain of sand in this beautiful clockwork. After spa_sync was all done with ... well waiting... it had a final step to run before updating the uberblock. It would walk through all the devices and process all the space map updates, keeping track of all the allocs and frees. In many cases, this was a quick on-CPU operation done by the spa_sync thread. But when dealing with a large amount of deletion it could show up as significant. It was definitely something that spa_sync was tackling itself as opposed to farming out to workers.

A project was spawned to fix this and during the evaluation the ZFS engineer figured out that a lot of the work could be handled in the earlier stages of the zio processing, further reducing the amount of work we could have to wait on in the later stages of spa_sync.

This fix was a very important step in making sure that the critical thread running spa_sync spends most of it's time ...waiting.

August 02, 2015

Peter TribbleBlank Zones

August 02, 2015 14:20 GMT
I've been playing around with various zone configurations on Tribblix. This is going beyond the normal sparse-root, whole-root, partial-root, and various other installation types, into thinking about other ways you can actually use zones to run software.

One possibility is what I'm tentatively calling a Blank zone. That is, a zone that has nothing running. Or, more precisely, just has an init process but not the normal array of miscellaneous processes that get started up by SMF in a normal boot.

You might be tempted to use 'zoneadm ready' rather than 'zoneadm boot'. This doesn't work, as you can't get into the zone:

zlogin: login allowed only to running zones (test1 is 'ready').
So you do actually need to boot the zone.

Why not simply disable the SMF services you don't need? This is fine if you still want SMF and most of the services, but SMF itself is quite a beast, and the minimal set of service dependencies is both large and extremely complex. In practice, you end up running most things just to keep the SMF dependencies happy.

Now, SMF is started by init using the following line (I've trimmed the redirections) from /etc/inittab

smf::sysinit:/lib/svc/bin/svc.startd

OK, so all we have to do is delete this entry, and we just get init. Right? Wrong! It's not quite that simple. If you try this then you get a boot failure:

INIT: Absent svc.startd entry or bad contract template.  Not starting svc.startd.
Requesting maintenance mode

In practice, this isn't fatal - the zone is still running, but apart from wondering why it's behaving like this it would be nice to have the zone boot without errors.

Looking at the source for init, it soon becomes clear what's happening. The init process is now intimately aware of SMF, so essentially it knows that its only job is to get startd running, and startd will do all the work. However, it's clear from the code that it's only looking for the smf id in the first field. So my solution here is to replace startd with an infinite sleep.

smf::sysinit:/usr/bin/sleep Inf

(As an aside, this led to illumos bug 6019, as the manpage for sleep(1) isn't correct. Using 'sleep infinite' as the manpage suggests led to other failures.)

Then, the zone boots up, and the process tree looks like this:

# ptree -z test1
10210 zsched
  10338 /sbin/init
    10343 /usr/bin/sleep Inf

To get into the zone, you just need to use zlogin. Without anything running, there aren't the normal daemons (like sshd) available for you to connect to. It's somewhat disconcerting to type 'netstat -a' and get nothing back.

For permanent services, you could run them from inittab (in the traditional way), or have an external system that creates the zones and uses zlogin to start the application. Of course, this means that you're responsible for any required system configuration and for getting any prerequisite services running.

In particular, this sort of trick works better with shared-IP zones, in which the network is configured from the global zone. With an exclusive-IP zone, all the networking would need to be set up inside the zone, and there's nothing running to do that for you.

Another thought I had was to use a replacement init. The downside to this is that the name of the init process is baked into the brand definition, so I would have to create a duplicate of each brand to run it like this. Just tweaking the inittab inside a zone is far more flexible.

It would be nice to have more flexibility. At the present time, I either have just init, or the whole of SMF. There's a whole range of potentially useful configurations between these extremes.

The other thing is to come up with a better name. Blank zone. Null zone. Something else?

August 01, 2015

Peter TribbleThe lunacy of -Werror

August 01, 2015 15:59 GMT
First, a little history for those of you young enough not to have lived through perl. In the perl man page, there's a comment in the BUGS section that says:

The -w switch is not mandatory.

(The -w switch enables warnings about grotty code.) Unfortunately, many developers misunderstood this. They wrote their perl script, and then added the -w switch as though it was a magic bullet that fixed all the errors in your code, without even bothering to think about looking at the output it generated or - heaven forbid - actually fixing the problems. The result was that, with a CGI script, your apache error log was full of output that nobody ever read.

The correct approach, of course, is to develop with the -w switch, fix all the warnings it reports as part of development, and then turn it off. (Genuine errors will still be reported anyway, and you won't have to sift through garbage to find them, or worry about your service going down because the disk filled up.)

Move on a decade or two, and I'm starting to see a disturbing number of software packages being shipped that have -Werror in the default compilation flags. This almost always results in the build failing.

If you think about this for a moment, it should be obvious that enabling -Werror by default is a really dumb idea. There are two basic reasons:

  1. Warnings are horribly context sensitive. It's difficult enough to remove all the warnings given a single fully constrained environment. As soon as you start to vary the compiler version, the platform you're building on, or the versions of the (potentially many) prerequisites you're building against, getting accidental warnings is almost inevitable. (And you can't test against all possibilities, because some of those variations might not even exist at the point of software release.)
  2. The warnings are only meaningful to the original developer. The person who has downloaded the code and is trying to build it has no reason to be burdened by all the warnings, let alone be inconvenienced by unnecessary build failures.
To be clear, I'm not saying - at all - that the original developer shouldn't be using -Werror and fixing all the warnings (and you might want to enable it for your CI builds to be sure you catch regressions), but distributing code with it enabled is simply being rude to your users.

(Having a build target that generates a warning report that you can send back to the developer would be useful, though.)

July 31, 2015

Joerg MoellenkampComputerworld: "Oracle preps 'Sonoma' chip for low-priced Sparc servers"

July 31, 2015 08:56 GMT
Interesting read at Computerworld
Oracle is looking to expand the market for its Sparc-based servers with a new, low-cost processor dubbed Sonoma that its engineers will discuss publically for the first time later this month.
Later this month refers to the Hotchips conference. There is a presentation called "Oracle’s Sonoma Processor: Advanced low-cost SPARC processor for enterprise workloads" in the agenda.

Glynn FosterSecure Remote RESTful Administration with RAD

July 31, 2015 08:03 GMT

I've written before about the work we've been doing to provide a set of programmatic interfaces to Oracle Solaris using RAD. This allows developer and administrators to administer systems remotely over C, Java, Python and REST based interfaces. For anyone wanting to get their hands dirty, I've written a useful article: Getting Started with the Remote Administration Daemon on Oracle Solaris 11.

One of the areas I didn't tackle in this initial article was providing secure REST based administration interfaces over TLS. Thanks to the help of Gary Pennington, we now have a new article: Secure Remote RESTful Administration with RAD. In this article we'll use the automatically generated self-signed certificates, but this could be easily changed to point to certificates that have been signed by a Certificate Authority.

With the various announcements that we've been making recently about Oracle joining the Open Container Initiative and bringing Docker into Oracle Solaris, we're in a great position of being able to design a platform to handle the next wave of cloud deployment and delivery - whether that's traditional enterprise applications or micro services. We see the huge advantage of streamlining IT operations and facilitating methodologies such as DevOps, and it's time to take Oracle Solaris into that next wave.

July 30, 2015

Security BlogCVSS Version 3.0 Announced

July 30, 2015 22:04 GMT

Hello, this is Darius Wiles.

Version 3.0 of the Common Vulnerability Scoring System (CVSS) has been announced by the Forum of Incident Response and Security Teams (FIRST). Although there have been no high-level changes to the standard since the Preview 2 release which I discussed in a previous blog post, there have been a lot of improvements to the documentation.

Soon, Oracle will be using CVSS v3.0 to report CVSS Base scores in its security advisories. In order to facilitate this transition, Oracle plans to release two sets of risk matrices, both CVSS v2 and v3.0, in the first Critical Patch Update (Oracle’s security advisories) to provide CVSS version 3.0 Base scores. Subsequent Critical Patch Updates will only list CVSS version 3.0 scores.

While Oracle expects most vulnerabilities to have similar v2 and v3.0 Base Scores, certain types of vulnerabilities will experience a greater scoring difference. The CVSS v3.0 documentation includes a list of examples of public vulnerabilities scored using both v2 and v3.0, and this gives an insight into these scoring differences. Let’s now look at a couple of reasons for these differences.

The v3.0 standard provides a more precise assessment of risk because it considers more factors than the v2 standard. For example, the important impact of most cross-site scripting (XSS) vulnerabilities is that a victim's browser runs malicious code. v2 does not have a way to capture the change in impact from the vulnerable web server to the impacted browser; basically v2 just considers the impact to the former. In v3.0, the Scope metric allows us to score the impact to the browser, which in v3.0 terminology is the impacted component. v2 scores XSS as "no impact to confidentiality or availability, and partial impact to integrity", but in v3.0 we are free to score impacts to better fit each vulnerability. For example, a typical XSS vulnerability, CVE-2013-1937 is scored with a v2 Base Score of 4.3 and a v3.0 Base Score of 6.1. Most XSS vulnerabilities will experience a similar CVSS Base Score increase.

Until now, Oracle has used a proprietary Partial+ metric value for v2 impacts when a vulnerability "affects a wide range of resources, e.g., all database tables, or compromises an entire application or subsystem". We felt this extra information was useful because v2 always scores vulnerabilities relative to the "target host", but in cases where a host's main purpose is to run a single application, Oracle felt that a total compromise of that application warrants more than Partial. In v3.0, impacts are scored relative to the vulnerable component (assuming no scope change), so a total compromise of an application now leads to High impacts. Therefore, most Oracle vulnerabilities scored with Partial+ impacts under v2 are likely to be rated with High impacts and therefore more precise v3.0 Base scores. For example, CVE-2015-1098 has a v2 Base score of 6.8 and a v3.0 Base score of 7.8. This is a good indication of the differences we are likely to see. Refer to the CVSS v3.0 list of examples for more details on score this vulnerability.

Overall, Oracle expects v3.0 Base scores to be higher than v2, but bear in mind that v2 scores are always relative to the "target host", whereas v3.0 scores are relative to the vulnerable component, or the impacted component if there is a scope change. In other words, CVSS v3.0 will provide a better indication of the relative severity of vulnerabilities because it better reflects the true impact of the vulnerability being rated in software components such as database servers or middleware.


For More Information

The CVSS v3.0 documents are located on FIRST's web site at http://www.first.org/cvss/

Oracle's use of CVSS [version 2], including a fuller explanation of Partial+ is located at http://www.oracle.com/technetwork/topics/security/cvssscoringsystem-091884.html

My previous blog post on CVSS v3.0 preview is located at https://blogs.oracle.com/security/entry/cvss_version_3_0_preview

Eric Maurice's blog post on Oracle's use of CVSS v2 is located at https://blogs.oracle.com/security/entry/understanding_the_common_vulne_2

Roch BourbonnaisSystem Duty Cycle Scheduling Class

July 30, 2015 15:35 GMT
It's well known that ZFS uses a bulk update model to maintain the consistency of information stored on disk. This is referred to as a transaction group (TXG) update or internally as a spa_sync(), which is the name of the function that orchestrates this task. This task ultimately updates the uberblock between consistent ZFS states.

Today these tasks are expected to run on a 5-second schedule with some leeway. Internally, ZFS builds up the data structures such that when a new TXG is ready to be issued it can do so in the most efficient way possible. That method turned out to be a mixed blessing.

The story is that when ZFS is ready, it uses zio taskqs to execute all of the heavy lifting, CPU intensive jobs necessary to complete the TXG. This process includes the checksumming of every modified block and possibly compressing and encrypting them. It also does on-disk allocation and issues I/O to the disk drivers. This means there is a lot of CPU intensive work to do when a TXG is ready to go. The zio subsystem was crafted in such a way that when this activity does show up, the taskqs that manage the work never need to context switch out. The taskq threads can run on CPU for seconds on end. That created a new headache for the Solaris scheduler.

Things would not have been so bad if ZFS was the only service being provided. But our systems, of course, deliver a variety of services and non-ZFS clients were being short changed by the scheduler. It turns out that before this use case, most kernel threads had short spans of execution. Therefore kernel threads were never made preemptable and nothing would prevent them from continuous execution (seconds is same as infinity for a computer). With ZFS, we now had a new type of kernel thread, one that frequently consumed significant amounts of CPU time.

A team of Solaris engineers went on to design a new scheduling class specifically targeting this kind of bulk processing. Putting the zio taskqs in this class allowed those threads to become preemptable when they used too much CPU. We also changed our model such that we limited the number of CPUs dedicated to these intensive taskqs. Today, each pool may use at most 50% of nCPUS to run these tasks. This is managed by kernel parameter zio_taskq_batch_pct which was reduced from 100% to 50%.

Using these 2 features we are now much better equipped to allow the TXG to proceed at top speeds without starving application from CPU access and in the end, running applications is all that matters.

Mike GerdtsLive storage migration for kernel zones

July 30, 2015 14:30 GMT

From time to time every sysadmin realizes that something that is consuming a bunch of storage is sitting in the wrong place.  This could be because of a surprise conversion of proof of concept into proof of production or something more positive like ripping out old crusty storage for a nice new Oracle ZFS Storage Appliance. When you use kernel zones with Solaris 11.3, storage migration gets a lot easier.

As our fine manual says:

The Oracle Solaris 11.3 release introduces the Live Zone Reconfiguration feature for Oracle Solaris Kernel Zones. With this feature, you can reconfigure the network and the attached devices of a running kernel zone. Because the configuration changes are applied immediately without requiring a reboot, there is zero downtime service availability within the zone. You can use the standard zone utilities such as zonecfg and zoneadm to administer the Live Zone Reconfiguration. 

Well, we can combine this with other excellent features of Solaris to have no-outage storage migrations, even of the root zpool.

In this example, I have a kernel zone that was created with something like:

root@global:~# zonecfg -z kz1 create -t SYSsolaris-kz
root@global:~# zoneadm -z kz1 install -c <scprofile.xml>

That happened several weeks ago and now I really wish that I had installed it using an iSCSI LUN from my ZFS Storage Appliance. We can fix that with no outage.

First, I'll update the zone's configuration to add a bootable iscsi disk.

root@global:~# zonecfg -z kz1
zonecfg:kz1> add device
zonecfg:kz1:device> set storage=iscsi://zfssa/luname.naa.600144F0DBF8AF19000053879E9C0009 
zonecfg:kz1:device> set bootpri=0
zonecfg:kz1:device> end
zonecfg:kz1> exit

Next, I tell the system to add that disk to the running kernel zone.

root@global:~# zoneadm -z kz1 apply
zone 'kz1': Checking: Adding device storage=iscsi://zfssa/luname.naa.600144F0DBF8AF19000053879E9C0009
zone 'kz1': Applying the changes

Let's be sure we can see it and look at the current rpool layout.  Notice that this kernel zone is running Solaris 11.2 - I only need to have Solaris 11.3 in the global zone.

root@global:~# zlogin kz1
[Connected to zone 'kz1' pts/2]
Oracle Corporation      SunOS 5.11      11.2    May 2015
You have new mail.

root@kz1:~# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c1d0 <kz-vDisk-ZVOL-16.00GB>
          /kz-devices@ff/disk@0
       1. c1d1 <SUN-ZFS Storage 7120-1.0-120.00GB>
          /kz-devices@ff/zvblk@1
Specify disk (enter its number): ^D

root@kz1:~# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: none requested
config:
        NAME    STATE     READ WRITE CKSUM
        rpool   ONLINE       0     0     0
          c1d0  ONLINE       0     0     0
errors: No known data errors

Now, zpool replace can be used to migrate the root pool over to the new storage.

root@kz1:~# zpool replace rpool c1d0 c1d1
Make sure to wait until resilver is done before rebooting.

root@kz1:~# zpool status rpool
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function in a degraded state.
action: Wait for the resilver to complete.
        Run 'zpool status -v' to see device specific details.
  scan: resilver in progress since Thu Jul 30 05:47:50 2015
    4.39G scanned
    143M resilvered at 24.7M/s, 3.22% done, 0h2m to go
config:
        NAME           STATE     READ WRITE CKSUM
        rpool          DEGRADED     0     0     0
          replacing-0  DEGRADED     0     0     0
            c1d0       ONLINE       0     0     0
            c1d1       DEGRADED     0     0     0  (resilvering)
errors: No known data errors

After a couple minutes, that completes.

root@kz1:~# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: resilvered 4.39G in 0h2m with 0 errors on Thu Jul 30 05:49:57 2015
config:
        NAME    STATE     READ WRITE CKSUM
        rpool   ONLINE       0     0     0
          c1d1  ONLINE       0     0     0
errors: No known data errors

root@kz1:~# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  15.9G  4.39G  11.5G  27%  1.00x  ONLINE  -

You may have noticed in the format output that I'm replacing a 16 GB zvol with a 120 GB disk.  However, the size of the zpool reported above doesn't reflect that it's on a bigger disk.  Let's fix that by setting the autoexpand property. 

root@kz1:~# zpool get autoexpand rpool
NAME   PROPERTY    VALUE  SOURCE
rpool  autoexpand  off    default

root@kz1:~# zpool set autoexpand=on rpool

root@kz1:~# zpool list
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  120G  4.39G  115G   3%  1.00x  ONLINE  -

To finish this off, all we need to do is remove the old disk from the kernel zone's configuration.  This happens back in the global zone.

root@global:~# zonecfg -z kz1
zonecfg:kz1> info device
device 0:
	match not specified
	storage: iscsi://zfssa/luname.naa.600144F0DBF8AF19000053879E9C0009
	id: 1
	bootpri: 0
device 1:
	match not specified
	storage.template: dev:/dev/zvol/dsk/%{global-rootzpool}/VARSHARE/zones/%{zonename}/disk%{id}
	storage: dev:/dev/zvol/dsk/rpool/VARSHARE/zones/kz1/disk0
	id: 0
	bootpri: 0
zonecfg:kz1> remove device id=0
zonecfg:kz1> exit

Now, let's apply that configuration. To show what it does, I run format in kz1 before and after applying the configuration.

root@global:~# zlogin kz1 format </dev/null
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c1d0 <kz-vDisk-ZVOL-16.00GB>
          /kz-devices@ff/disk@0
       1. c1d1 <SUN-ZFS Storage 7120-1.0-120.00GB>
          /kz-devices@ff/zvblk@1
Specify disk (enter its number): 

root@global:~# zoneadm -z kz1 apply 
zone 'kz1': Checking: Removing device storage=dev:/dev/zvol/dsk/rpool/VARSHARE/zones/kz1/disk0
zone 'kz1': Applying the changes

root@global:~# zlogin kz1 format </dev/null
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c1d1 <SUN-ZFS Storage 7120-1.0-120.00GB>
          /kz-devices@ff/zvblk@1
Specify disk (enter its number): 

root@global:~# 

At this point the live (no outage) storage migration is complete and it is safe to destroy the old disk (rpool/VARSHARE/zones/kz1/disk0).

root@global:~# zfs destroy rpool/VARSHARE/zones/kz1/disk0

July 28, 2015

OpenStackMigrating Neutron Database from sqlite to MySQL for Oracle OpenStack for Oracle Solaris

July 28, 2015 23:58 GMT

Many OpenStack development environments use sqlite as a backend to store data. However in a production environment MySQL is widely used. Oracle also recommends to use MySQL for its OpenStack services. For many of the OpenStack services (nova, cinder, neutron...) sqlite is the default backend. Oracle OpenStack for Oracle Solaris users may want to migrate their backend database from sqlite to MySQL.

The general idea is to dump the sqlite database. Translate the dumped SQL statements so that they are compatible with MySQL. Stop neutron services. Create MySQL database. Replay the modified SQL statements in the MySQL database.

The details listed here are for the Juno release (integrated in Oracle Solaris 11.2 SRU 10.5 or newer) and Neutron is taken as an example use case.

Migrating neutron database from sqlite to MySQL

If not already installed, install MySQL

# pkg install --accept mysql-55 mysql-55/client python-mysql

Start the MySQL service
# svcadm enable -rs mysql

NOTE: If MySQL was already installed and running, then before running the next step double check that neutron database on MySQL is either not yet created or it is empty. The next step will drop the existing MySQL Neutron database if it exists on MySQL and create it. If the MySQL Neutron database is not empty then stop at this point. The following steps are limited to the case where MySQL neutron database and newly created/recreated.

Create Neutron database on MySQL

mysql -u root -p<<EOF
DROP DATABASE IF EXISTS neutron;
CREATE DATABASE neutron;
GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'localhost' \
IDENTIFIED BY 'neutron';
FLUSH PRIVILEGES;
EOF

Enter the root password when prompted

Identify that the Neutron services are online: # svcs -a | grep neutron | grep online | awk '{print $3}' \ > /tmp/neutron-svc
Disable the Neutron services: # for item in `cat /tmp/neutron-svc`; do svcadm disable $item; done
Make a backup of Neutron sqlite database:
# cp /var/lib/neutron/neutron.sqlite \
    /var/lib/neutron/neutron.sqlite.ORIG
Get the db dump of Neutron from sqlite:
# /usr/bin/sqlite3 /var/lib/neutron/neutron.sqlite .dump \
       > /tmp/neutron-sqlite.sql

The following steps are run to create a neutron-mysql.sql file which will be compatible with MySQL database engine.

Suppress foreign key checks during create table/index
# echo 'SET foreign_key_checks = 0;' > /tmp/neutron-sqlite-schema.sql

Dump sqlite schema to a file
# /usr/bin/sqlite3 /var/lib/neutron/neutron.sqlite .dump  | \  grep -v 'INSERT INTO' >> /tmp/neutron-sqlite-schema.sql

 

Remove BEGIN/COMMIT/PRAGMA lines from the file.
(Oracle Solaris sed does not support -i option and hence redireciting to a new file 
 and then renaming it to original file)
# sed '/BEGIN TRANSACTION;/d; /COMMIT;/d; /PRAGMA/d' \ /tmp/neutron-sqlite-schema.sql \ > /tmp/neutron-sqlite-schema.sql.new \ && mv /tmp/neutron-sqlite-schema.sql.new \ /tmp/neutron-sqlite-schema.sql


Replace some SQL identifiers that are enclosed in double quotes, 
to be enclosed in back quotes
e.g. "limit to `limit`
# for item in binary blob group key limit type; do sed "s/\"$item\"/\`$item\`/g" \ /tmp/neutron-sqlite-schema.sql > /tmp/neutron-sqlite-schema.sql.new \ && mv /tmp/neutron-sqlite-schema.sql.new \ /tmp/neutron-sqlite-schema.sql; done

Enable foreign key checks at the end of the file

# echo 'SET foreign_key_checks = 1;' >> /tmp/neutron-sqlite-schema.sql 
Dump the data alone (INSERT statements) into another file

# /usr/bin/sqlite3 /var/lib/neutron/neutron.sqlite .dump \
| grep 'INSERT INTO' > /tmp/neutron-sqlite-data.sql
In INSERT statements table names are in double quotes in sqlite,
 but in mysql there should not be double quotes

# sed 's/INSERT INTO \"\(.*\)\"/INSERT INTO \1/g' \
/tmp/neutron-sqlite-data.sql > /tmp/neutron-sqlite-data.sql.new \
 && mv /tmp/neutron-sqlite-data.sql.new /tmp/neutron-sqlite-data.sql


Concat schema and data files to neutron-mysql.sql

# cat /tmp/neutron-sqlite-schema.sql \
/tmp/neutron-sqlite-data.sql > /tmp/neutron-mysql.sql 
Populate Neutron database in MySQL: # mysql neutron < /tmp/neutron-mysql.sql

Specify the connection under [database] section of /etc/neutron/neutron.conf file:

The connection string format is as follows:
connection = mysql://%SERVICE_USER%:%SERVICE_PASSWORD%@hostname/neutron 
For example:
connection = mysql://neutron:neutron@localhost/neutron
Enable the Neutron services:
# for item in `cat /tmp/neutron-svc`; do svcadm enable -rs $item; done 
Cleanup:
# rm -f /var/lib/neutron/neutron.sqlite.ORIG \ /tmp/neutron-sqlite-schema.sql \ /tmp/neutron-sqlite-data.sql \   /tmp/neutron-mysql.sql 

Details about translating SQL statements to be compatible with MySQL

NOTE: /tmp/neutron-sqlite-schema.sql will have the Neutron sqlite database schema as SQL statements and /tmp/neutron-sqlite-data.sql will have the data in Neutron sqlite database which can be replayed to recreate the database. The sql statements in neutron-sqlite-schema.sql and neutron-sqlite-data.sql are to be MySQL compatible so that it can be replayed on MySQL Neutron database. A set of sed commands as listed above are used to create MySQL compatible SQL statements. The following text provides detailed information about the differences between sqlite and MySQL that are to be dealt with.

There are some differences in the way sqlite and MySQL expect the SQL statements to be which are as shown in the table below:

sqliteMySQL
Reserved words are in double quotes: 
e.g "blob", "type", "key", 
"group", "binary", "limit"
Reserved words are in back quotes: 
e.g `blob`, `type`, `key`, 
`group`, `binary`, `limit`
Table name in Insert Statement 
are in quotes 
INSERT INTO "alembic_version"
 VALUES('juno');
Table name in Insert Statement 
are without quotes 
INSERT INTO alembic_version
 VALUES('juno');

Apart from the above the following requirements are to be met before running neutron.sql on MySQL:

The lines containing PRAGMA, 'BEGIN TRANSACTION', 'COMMIT' are to be removed from the file.

 

The CREATE TABLE statements with FOREIGN KEY references are to be rearranged (or ordered) in such a way that the TABLE name that is REFERENCED has to be created earlier than the table that is REFERRING it. The Indices on tables which are referenced by FOREIGN KEY statements are created soon after those tables are created. The last two requirements are not necessary if FOREIGN KEY check is disabled. Hence foreign_key_checks is SET to 0 at the beginning of neutron-mysql.sql and enabled again by setting foreign_key_checks to 1 before the INSERT statements in neutron-mysql.sql file.

OpenStackNew Oracle University course for Oracle OpenStack!

July 28, 2015 21:10 GMT

A new Oracle University course is now available: OpenStack Administration Using Oracle Solaris (Ed 1). This is a great way to get yourself up to speed on OpenStack, especially if you're thinking about getting a proof of concept, development or test, or even production environments online!

The course is based on OpenStack Juno in Oracle Solaris 11.2 SRU 10.5. Through a series of guided hands-on labs you will learn to:

The course is 3 days long and we recommend that you have taken a previous Oracle Solaris 11 administration course. This is an excellent introduction to OpenStack that you'll not want to miss!

Mike GerdtsA trip down memory lane

July 28, 2015 15:05 GMT
In Scott Lynn's announcement of Oracle's membership in the Open Container Initiative, he gives a great summary of how Solaris virtualization got to the point it's at.  Quite an interesting read!

July 24, 2015

Peter Tribbleboot2docker on Tribblix

July 24, 2015 22:24 GMT
Containers are the new hype, and Docker is the Poster Child. OK, I've been running containerized workloads on Solaris with zones for over a decade, so some of the ideas behind all this are good; I'm not so sure about the implementation.

The fact that there's a lot of buzz is unmistakeable, though. So being familiar with the technology can't be a bad idea.

I'm running Tribblix, so running Docker natively is just a little tricky. (Although if you actually wanted to do that, then Triton from Joyent is how to do it.)

But there's boot2docker, which allows you to run Docker on a machine - by spinning up a copy of VirtualBox for you and getting that to actually do the work. The next thought is obvious - if you can make that work on MacOS X or Windows, why not on any other OS that also supports VirtualBox?

So, off we go. First port of call is to get VirtualBox installed on Tribblix. It's an SVR4 package, so should be easy enough. Ah, but, it has special-case handling for various Solaris releases that cause it to derail quite badly on illumos.

Turns out that Jim Klimov has a patchset to fix this. It doesn't handle Tribblix (yet), but you can take the same idea - and the same instructions - to fix it here. Unpack the SUNWvbox package from datastream to filesystem format, edit the file SUNWvbox/root/opt/VirtualBox/vboxconfig.sh, replacing the lines

             # S11 without 'pkg'?? Something's wrong... bail.
             errorprint "Solaris $HOST_OS_MAJORVERSION detected without executable $BIN_PKG !? I are confused."
             exit 1

with

         # S11 without 'pkg'?? Likely an illumos variant
         HOST_OS_MINORVERSION="152"

and follow Jim's instructions for updating the pkgmap, then just pkgadd from the filesystem image.

Next, the boot2docker cli. I'm assuming you have go installed already - on Tribblix, "zap install go" will do the trick. Then, in a convenient new directory,

env GOPATH=`pwd` go get github.com/boot2docker/boot2docker-cli

That won't quite work as is. There are a couple of patches. The first is to the file src/github.com/boot2docker/boot2docker-cli/virtualbox/hostonlynet.go. Look for the CreateHostonlyNet() function, and replace

    out, err := vbmOut("hostonlyif", "create")
    if err != nil {
        return nil, err
    }


with

    out, err := vbmOut("hostonlyif", "create")
    if err != nil {
               // default to vboxnet0
        return &HostonlyNet{Name: "vboxnet0"}, nil
    }


The point here is that , on a Solaris platform, you always get a hostonly network - that's what vboxnet0 is - so you don't need to create one, and in fact the create option doesn't even exist so it errors out.

The second little patch is that the arguments to SSH don't quite match the SunSSH that comes with illumos, so we need to remove one of the arguments. In the file src/github.com/boot2docker/boot2docker-cli/util.go, look for DefaultSSHArgs and delete the line containing IdentitiesOnly=yes (which is the option SunSSH doesn't recognize).

Then you need to rebuild the project.

env GOPATH=`pwd` go clean github.com/boot2docker/boot2docker-cli
env GOPATH=`pwd` go build github.com/boot2docker/boot2docker-cli

Then you should be able to play around. First, download the base VM image it'll run:

./boot2docker-cli download

Configure VirtualBox

./boot2docker-cli init

Start the VM

./boot2docker-cli up

Log into it

./boot2docker-cli ssh

Once in the VM you can run docker commands (I'm doing it this way at the moment, rather than running a docker client on the host). For example

docker run hello-world

Or,

docker run -d -P --name web nginx
 
Shut the VM down

./boot2docker-cli down

While this is interesting, and reasonably functional, certainly to the level of being useful for testing, a sign of the churn in the current container world is that the boot2docker cli is deprecated in favour of Docker Machine, but building that looks to be rather more involved.

Roch BourbonnaisScalable Reader/Writer Locks

July 24, 2015 14:42 GMT
ZFS is designed as a highly scalable storage pool kernel module.

Behind that simple idea are a lot of subsystems, internal to ZFS, which are cleverly designed to deliver high performance for the most demanding environments. But as computer systems grow in size and as demand for performance follows that growth, we are bound to hit scalability limits (at some point) that we had not anticipated at first.

ZFS easily scales in capacity by aggregating 100s of hard disks into a single administration domain. From that single pool, 100s or even 1000s of filesystems can be trivially created for a variety of purposes. But then people got crazy (rightly so) and we started to see performance tests running on a single filesystem. That scenario raised an interesting scalability limit for us...something had to be done.

Filesystems are kernel objects that get mounted once at some point (often at boot). Then, they are used over and over again, millions even billions of times. To simplify each read/write system call uses the filesystem object for a few milliseconds. And then, days or weeks later, a system administrator wants this filesystem unmounted and that's that. Filesystem modules, ZFS or other, need to manage this dance in which the kernel object representing a mount point is in-use for the duration of a system call and so must prevent that object from disappearing. Only when there are no more system calls using a mountpoint, can a request to unmount be processed. This is implemented simply using a basic reader/writer lock, rwlock(3C) : A read or write system call acquires a read lock on the filesystem object and holds it for the duration of the call, while a umount(2) acquires a write lock on the object.

For many years, individual filesystems from a ZFS pool were protected by a standard Solaris rwlock. And while this could handle 100s of thousands or read/write calls per second through a single filesystem eventually people wanted more.

Rather than depart from the basic kernel rwlock, the Solaris team decided to tackle the scalability of the rwlock code itself. By taking advantage of visibility into a system's architecture, Solaris is able to use multiple counters in a way that scales with the system's size. A small system can use a simple counter to track readers while a large system can use multiple counters each stored on separate cache lines for better scaling. As a bonus they were able to deliver this feature without changing the rwlock function signature. For ZFS code, just simple rwlock initialization change was needed to open up the benefit of this scalable rwlock.

We also found that, in addition to protecting the filesystem object itself, another structure called a ZAP object used to manage directories was also hitting the rwlock scalability limit and that was changed too.

Since the new locks have been put into action, they have delivered scalable performance into single filesystems that is absolutely superb. While the French explorer Jean-Louis Etienne claims that "On se repousse pas ses limites, on les decouvre:" From the comfort of my air-conditioned office, I conclude that we are pushing the limits out of harm's way.

July 23, 2015

OpenStackOpenStack Summit Tokyo - Voting Begins!

July 23, 2015 23:56 GMT

It's voting time! The next OpenStack Summit will be held in Tokyo, October 27-30.

The Oracle OpenStack team have submitted a few papers for the summit that can now be voted for:

If you'd like to see these talks, please Vote Now!

July 22, 2015

Jeff SavitAvailability Best Practices - Example configuring a T5-8

July 22, 2015 21:36 GMT
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)
This article continues the series on availability best practices. In this post we will show each step used to configure a T5-8 for availability with redundant network and disk I/O, using multiple service domains.

Overview of T5

The SPARC T5 servers are a powerful addition to the SPARC line. Details on the product can be seen at SPARC T5-8 Server, SPARC T5-8 Server Documentation, The SPARC T5 Servers have landed, and other locations.

For this discussion, the important things to know are:

The following graphic shows T5-8 server resources. This picture labels each chip as a CPU, and shows CPU0 through CPU7 on their respective Processor Modules (PM) and the associated buses. On-board devices are connected to buses on CPU0 and CPU7.

Initial configuration

This demo is done on a lab system with a limited I/O configuration, but enough to show availability practices. Real T5-8 systems would typically have much richer I/O. The system is delivered with a single control domain owning all CPU, I/O and memory resources. Let's view the resources bound to the control domain (the only domain at this time). Wow, that's a lot of CPUs and memory. Some output and whitespace snipped out for brevity.

primary# ldm list -l
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
primary          active     -n-c--  UART    1024  1047296M 0.0%  0.0%  2d 5h 11m

----snip----

CORE
    CID    CPUSET
    0      (0, 1, 2, 3, 4, 5, 6, 7)
    1      (8, 9, 10, 11, 12, 13, 14, 15)
    2      (16, 17, 18, 19, 20, 21, 22, 23)
    3      (24, 25, 26, 27, 28, 29, 30, 31)
----snip----
    124    (992, 993, 994, 995, 996, 997, 998, 999)
    125    (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007)
    126    (1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015)
    127    (1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023)
VCPU
    VID    PID    CID    UTIL NORM STRAND
    0      0      0      4.7% 0.2%   100%
    1      1      0      1.3% 0.1%   100%
    2      2      0      0.2% 0.0%   100%
    3      3      0      0.1% 0.0%   100%
----snip----
    1020   1020   127    0.0% 0.0%   100%
    1021   1021   127    0.0% 0.0%   100%
    1022   1022   127    0.0% 0.0%   100%
    1023   1023   127    0.0% 0.0%   100%
----snip----
IO
    DEVICE                           PSEUDONYM        OPTIONS
    pci@300                          pci_0           
    pci@340                          pci_1           
    pci@380                          pci_2           
    pci@3c0                          pci_3           
    pci@400                          pci_4           
    pci@440                          pci_5           
    pci@480                          pci_6           
    pci@4c0                          pci_7           
    pci@500                          pci_8           
    pci@540                          pci_9           
    pci@580                          pci_10          
    pci@5c0                          pci_11          
    pci@600                          pci_12          
    pci@640                          pci_13          
    pci@680                          pci_14          
    pci@6c0                          pci_15    
----snip----
Let's also look at the bus device names and pseudonyms:
primary# ldm list -l -o physio primary
NAME             
primary          

IO
    DEVICE                           PSEUDONYM        OPTIONS
    pci@300                          pci_0           
    pci@340                          pci_1           
    pci@380                          pci_2           
    pci@3c0                          pci_3           
    pci@400                          pci_4           
    pci@440                          pci_5           
    pci@480                          pci_6           
    pci@4c0                          pci_7           
    pci@500                          pci_8           
    pci@540                          pci_9           
    pci@580                          pci_10          
    pci@5c0                          pci_11          
    pci@600                          pci_12          
    pci@640                          pci_13          
    pci@680                          pci_14          
    pci@6c0                          pci_15
----snip----          

Basic domain configuration

The following commands are basic configuration steps to define virtual disk, console and network services and resize the control domain. They are shown for completeness but are not specifically about configuring for availability.

primary# ldm add-vds primary-vds0 primary
primary# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
primary# ldm add-vswitch net-dev=net0 primary-vsw0 primary
primary# ldm set-core 2 primary
primary# svcadm enable vntsd
primary# ldm start-reconf primary
primary# ldm set-mem 16g primary
primary# shutdown -y -g0 -i6

This is standard control domain configuration. After reboot, we have a resized control domain, and save the configuration to the service processor.

primary# ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
primary          active     -n-cv-  UART    16    16G      3.3%  2.5%  4m
primary# ldm add-spconfig initial

Determine which buses to reassign

This step follows the same procedure as in the previous article to determine which buses must be kept on the control domain and which can be assigned to an alternate service domain. The official documentation is at Assigning PCIe Buses in the Oracle VM Server for SPARC 3.0 Administration Guide.

First, identify the bus used for the root pool disk (in a production environment this would be mirrored) by getting the device name and then using the mpathadm command.

primary# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: none requested
config:
        NAME                       STATE     READ WRITE CKSUM
        rpool                      ONLINE       0     0     0
          c0t5000CCA01605A11Cd0s0  ONLINE       0     0     0
errors: No known data errors
primary# mpathadm show lu /dev/rdsk/c0t5000CCA01605A11Cd0s0
Logical Unit:  /dev/rdsk/c0t5000CCA01605A11Cd0s2
----snip----        
        Paths:  
                Initiator Port Name:  w508002000145d1b1
----snip----

primary# mpathadm show initiator-port w508002000145d1b1
Initiator Port:  w508002000145d1b1
        Transport Type:  unknown
        OS Device File:  /devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/iport@1

That shows that the boot disk is on bus pci@300 (pci_0).

Next, determine which bus is used for network. Interface net0 (based on ixgbe0) is our primary interface and hosts a virtual switch, so we need to keep its bus.

primary# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net1              Ethernet             unknown    0      unknown   ixgbe1
net2              Ethernet             unknown    0      unknown   ixgbe2
net0              Ethernet             up         100    full      ixgbe0
net3              Ethernet             unknown    0      unknown   ixgbe3
net4              Ethernet             up         10     full      usbecm2
primary# ls -l /dev/ix*
lrwxrwxrwx   1 root     root     31 Jun 21 12:04 /dev/ixgbe -> ../devices/pseudo/clone@0:ixgbe
lrwxrwxrwx   1 root     root     65 Jun 21 12:04 /dev/ixgbe0 -> ../devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0:ixgbe0
lrwxrwxrwx   1 root     root     67 Jun 21 12:04 /dev/ixgbe1 -> ../devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0,1:ixgbe1
lrwxrwxrwx   1 root     root     65 Jun 21 12:04 /dev/ixgbe2 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0:ixgbe2
lrwxrwxrwx   1 root     root     67 Jun 21 12:04 /dev/ixgbe3 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1:ixgbe3

Both disk and network are on bus pci@300 (pci_0), and there are network devices on pci@6c0 (pci_15) that we can give to an alternate service domain.

Lets determine which buses are needed to give that service domain access to disk. Previously we saw that the control domain's root pool was on c0t5000CCA01605A11Cd0s0 on pci@300. The control domain currently has access to all buses and devices, so we can use the format command to see what other disks are available. There is a second disk, and it's on bus pci@6c0:

primary# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
       0. c0t5000CCA01605A11Cd0 <HITACHI-H109060SESUN600G-A244 cyl 64986 alt 2 hd 27 sec 66>
          /scsi_vhci/disk@g5000cca01605a11c
          /dev/chassis/SPARC_T5-8.1239BDC0F9//SYS/SASBP0/HDD0/disk
       1. c0t5000CCA016066100d0 <HITACHI-H109060SESUN600G-A244 cyl 64986 alt 2 hd 27 sec 668>
          /scsi_vhci/disk@g5000cca016066100
          /dev/chassis/SPARC_T5-8.1239BDC0F9//SYS/SASBP1/HDD4/disk
Specify disk (enter its number): ^C
primary# mpathadm show lu /dev/dsk/c0t5000CCA016066100d0s0
Logical Unit:  /dev/rdsk/c0t5000CCA016066100d0s2
----snip----
        Paths:  
                Initiator Port Name:  w508002000145d1b0
                Target Port Name:  w5000cca016066101
----snip----
primary# mpathadm show initiator-port w508002000145d1b0
Initiator Port:  w508002000145d1b0
        Transport Type:  unknown
        OS Device File:  /devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/iport@1

This provides the information needed to reassign buses.

Define alternate service domain and reassign buses

We now define an alternate service domain, remove the above buses from the control domain and assign them to the alternate. Removing the buses cannot be done dynamically (add to or remove from a running domain). If I had planned ahead and obtained bus information earlier, I could have done this when I resized the domain's memory and avoided the second reboot.

primary# ldm add-dom alternate
primary# ldm set-core 2 alternate
primary# ldm set-mem 16g alternate
primary# ldm start-reconf primary
primary# ldm rm-io pci_15 primary
primary# init 6

After rebooting the control domain, I give the unassigned bus pci_15 to the alternate domain. At this point I could install Solaris in the alternate domain using a network install server, but for convenience I use a virtual CD image in a .iso file on the control domain. Normally you do not use virtual I/O devices in the alternate service domain because that introduces a dependency on the control domain, but this is temporary and will be removed after Solaris is installed.

primary# ldm add-io pci_15 alternate
primary# ldm add-vdsdev /export/home/iso/sol-11-sparc.iso s11iso@primary-vds0
primary# ldm add-vdisk s11isodisk s11iso@primary-vds0 alternate
primary# ldm bind alternate
primary# ldm start alternate

At this point, I installed Solaris in the domain. When the install was complete, I removed the Solaris install CD image, and saved the configuration to the service processor:

primary# ldm rm-vdisk s11isodisk alternate
primary# ldm add-spconfig 20130621-split
Note that the network devices on pci@6c0 are enumerated starting at ixgbe0, even though they were ixgbe2 and ixgbe3 when on the control domain that had all 4 installed interfaces.
alternate# ls -l /dev/ixgb*
lrwxrwxrwx   1 root     root     31 Jun 21 10:34 /dev/ixgbe -> ../devices/pseudo/clone@0:ixgbe
lrwxrwxrwx   1 root     root     65 Jun 21 10:34 /dev/ixgbe0 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0:ixgbe0
lrwxrwxrwx   1 root     root     67 Jun 21 10:34 /dev/ixgbe1 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1:ixgbe1

Define redundant services

We've split up the bus configuration and defined an I/O domain that can boot and run independently on its own PCIe bus. All that remains is to define redundant disk and network services to pair with the ones defined above in the control domain:

primary# ldm add-vds alternate-vds0 alternate
primary# ldm add-vsw net-dev=net0 alternate-vsw0 alternate

Note that we could increase resiliency, and potentially performance as well, by using a Solaris 11 network aggregate as the net-dev for each virtual switch. That would provide additional insulation: if a single network device fails the aggregate can continue operation without requiring IPMP failover in the guest.

In this exercise we use a ZFS storage appliance as an NFS server to host guest disk images, so we mount it on both the control and alternate domain, and then create a directory and boot disk for a guest domain. The following two commands are executed in both the primary and alternate domains:

# mkdir /ldoms				 
# mount zfssa:/export/mylab /ldoms  
Those are the only configuration commands run in the alternate domain. All other commands in this exercise are only run from the control domain.

Define a guest domain

A guest domain will be defined with two network devices so it can use IP Multipathing (IPMP) and two virtual disks for a mirrored root pool, each with a path from both the control and alternate domains. This pattern can be repeated as needed for multiple guest domains, as shown in the following graphic with two guests.

primary# ldm add-dom ldg1
primary# ldm set-core 16 ldg1
primary# ldm set-mem 64g ldg1
primary# ldm add-vnet linkprop=phys-state ldg1net0 primary-vsw0 ldg1 
primary# ldm add-vnet linkprop=phys-state ldg1net1 alternate-vsw0 ldg1
primary# ldm add-vdisk s11isodisk s11iso@primary-vds0 ldg1
primary# mkdir /ldoms/ldg1
primary# mkfile -n 20g /ldoms/ldg1/disk0.img
primary# ldm add-vdsdev mpgroup=ldg1group /ldoms/ldg1/disk0.img ldg1disk0@primary-vds0
primary# ldm add-vdsdev mpgroup=ldg1group /ldoms/ldg1/disk0.img ldg1disk0@alternate-vds0
primary# ldm add-vdisk ldg1disk0 ldg1disk0@primary-vds0 ldg1
primary# mkfile -n 20g /ldoms/ldg1/disk1.img
primary# ldm add-vdsdev mpgroup=ldg1group1 /ldoms/ldg1/disk1.img ldg1disk1@primary-vds0
primary# ldm add-vdsdev mpgroup=ldg1group1 /ldoms/ldg1/disk1.img ldg1disk1@alternate-vds0
primary# ldm add-vdisk ldg1disk1 ldg1disk1@alternate-vds0 ldg1
primary# ldm bind ldg1
primary# ldm start ldg1

Note the use of linkprop=phys-state on the virtual network definitions: this indicates that changes in physical link state should be passed to the virtual device so it can perform a failover.

Also note mpgroup on the virtual disk definitions. The ldm add-vdsdev commands define a virtual disk exported by a service domain, and the mpgroup pair indicates they are the same disk (the administrator must ensure they are different paths to the same disk) accessible by multiple paths. A different mpgroup pair is used for each multi-path disk. For each actual disk there are two "add-vdsdev" commands, and one ldm add-vdisk command that adds the multi-path disk to the guest. Each disk can be accessed from either the control domain or the alternate domain, transparent to the guest. This is documented in the Oracle VM Server for SPARC 3.0 Administration Guide at Configuring Virtual Disk Multipathing.

At this point, Solaris is installed in the guest domain without any special procedures. It will have a mirrored ZFS root pool, and each disk is available from both service domains. It also has two network devices, one from each service domain. This provides resiliency for device failure, and in case either the control domain or alternate domain is rebooted.

Configuring and testing redundancy

Multipath disk I/O is transparent to the guest domain. This was tested by serially rebooting the control domain or the alternate domain, and observing that disk I/O operation just proceeded without noticeable effect.

Network redundancy required configuring IP Multipathing (IPMP) in the guest domain. The guest has two network devices, net0 provided by the control domain, and net1 provided by the alternate domain. The process is documented at Configuring IPMP in a Logical Domains Environment.

The following commands are executed in the guest domain to make a redundant network connection:

ldg1# ipadm create-ipmp ipmp0
ldg1# ipadm add-ipmp -i net0 -i net1 ipmp0
ldg1# ipadm create-addr -T static -a 10.134.116.224/24 ipmp0/v4addr1
ldg1# ipadm create-addr -T static -a 10.134.116.225/24 ipmp0/v4addr2
ldg1# ipadm show-if
IFNAME     CLASS    STATE    ACTIVE OVER
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net1       ip       ok       yes    --
ipmp0      ipmp     ok       yes    net0 net1

This was tested by bouncing the alternate service domain and control domain (one at a time) and noting that network sessions remained intact. The guest domain console displayed messages when one link failed and was restored:

Jul  9 10:35:51 ldg1 in.mpathd[107]: The link has gone down on net1
Jul  9 10:35:51 ldg1 in.mpathd[107]: IP interface failure detected on net1 of group ipmp0
Jul  9 10:37:37 ldg1 in.mpathd[107]: The link has come up on net1

While one of the service domains was down, dladm and ipadm showed link status:

ldg1# ipadm show-if
IFNAME     CLASS    STATE    ACTIVE OVER
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net1       ip       failed   no     --
ipmp0      ipmp     ok       yes    net0 net1
ldg1# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         0      unknown   vnet0
net1              Ethernet             down       0      unknown   vnet1
ldg1# dladm show-link
LINK                CLASS     MTU    STATE    OVER
net0                phys      1500   up       --
net1                phys      1500   down     --
When the service domain finished rebooting, the "down" status returned to "up". There was no outage at any time.

Summary

This article showed how to configure a T5-8 with an alternate service domain, and define services for redundant I/O access. This was tested by rebooting each service domain one at a time, and observing that guest operation considered without interruption. This is a very powerful Oracle VM Server for SPARC capability for configuring highly available virtualized compute environments.

July 21, 2015

Bryan CantrillThe foundation of cloud-native computing

July 21, 2015 16:56 GMT

The older I get, the more engineering values matter to me — and the more I seek out shared values in those with whom I endeavor to build things. For us at Joyent, those engineering values reflect that we operate the software we make: we believe that foundational systems must be designed to be robust and high-performing — and when they fail in this regard, it is incumbent upon the system itself to provide the tooling to diagnose the errant behavior. These values are not new (indeed, they are some of the oldest in computing), but there are times when they can feel endangered. It is our belief that the rise of cloud computing has — if anything — made the traditional values of systems software robustness more important. Recently, I’ve had the opportunity to get to know some of the Google engineers involved in the Kubernetes effort, and I have found that they broadly share Joyent’s engineering values — that they too seek to build a robust software substrate, as informed by their (substantial) experience operating systems at scale. Given our shared values, I was particularly pleased to learn of Google’s desire to create a new kind of foundation with their formation of the Cloud-native Computing Foundation. Today, I am excited to announce that Joyent is a charter member of the Cloud-native Computing Foundation, as it represents the values we sought to embody in the Triton stack — and I am honored to have been personally asked to serve on the foundation’s technical steering committee. We believe that we haven’t just joined a(nother) foundation, we have joined with those who share the mission that we have always had for ourselves: to help effect the next revolution in computing.

That I could possibly be so enthusiastic for a foundation merits further explanation, as I have historically been very forthright with my skepticism about foundations with respect to open source: three years ago, in a presentation on Corporate Open Source Anti-patterns (video), I described the insistence of giving newly-opened source code to a foundation as an anti-pattern, noting that giving up ownership also eschews leadership. I further cautioned that many underestimate the complexity and constraints of a 501(c)(3) — while overestimating the need for an explicitly non-profit organization’s involvement in a company’s open source efforts. While these statements about foundations were unequivocal, I also ended that presentation by saying that my observations shouldn’t be perceived as hard rules — and implied that the thinking may change over time as we continue to learn from our own experiences.

Three years after that presentation, I still broadly stand by my claims — but (as my enthusiasm for the Cloud Native Computing Foundation indicates) foundations are one area where my thinking has definitely shifted. In particular, in those rare instances when an open source technology reaches a level of ubiquity such as to sediment into collective bedrock, I believe that it actually does belong in a foundation. How do you know if your open source project is in this category? If multiple companies are betting their future on your open source project, congratulate yourself for laying down the bedrock upon which others are building — and then get it into a foundation to assure its future. This can be hard to internalize (after all, you have almost certainly put more resources into it than anyone else; why should you be expected to simply give that away?!), but the reality is that the commercial pressures that are now being exerted on your (incredibly popular!) technology will rip it apart if you don’t preserve its fate. This can be doubly frustrating when you feel you are acting in the community’s best interests, but as soon as that community includes rival commercial interests, only a foundation can provide the necessary (but not sufficient!) neutrality to assure the community that the technology’s future transcends the fate of any one company. Certainly, we learned all this the hard way with node.js — but the problem is in no way unique to node.js or to Joyent. Indeed, with open source now essentially a constraint on new infrastructure software, we can expect this transition (from corporate-owned open source to foundation-owned open source) will happen with increasing frequency. (Should you find yourself at OSCON this week, this trend and its ramifications is the subject of my talk on Thursday.)

In this regard, the Docker world has been particularly interesting of late: the domain is entirely open source, with many companies (including Joyent!) betting their futures not just on Docker, but on the many other technologies in the ecosystem. With so much bedrock suddenly forming, foundations were practically preordained — so it was no surprise to see the announcement of the Open Container Project at DockerCon just a few weeks ago. We at Joyent applaud these developments (and we are a charter member of the OCP), but I confess that the sprouting of foundations has left me feeling somewhat underwhelmed: are we really to have a foundation for every GitHub repo that reaches a certain level of popularity? To be clear, I don’t object to the foundations in the abstract so much as the cacophony of their putative missions: having the mission of a foundation being merely to promote a particular technology feels like it’s aiming a bit low in Maslow’s hierarchy of needs. Now, one can certainly collect open source software into a foundation like the Apache Foundation — but as we move to a world where an increasing amount of software is open source, what becomes of their mission? Foundations that are amalgamations of otherwise unrelated software seem to me to run the risk of becoming open source orphanages: providing shelter and a modicum of structure, perhaps, but lacking a sense of collective purpose.

The promise of the Cloud-native Computing Foundation is that it offers a potential third model: while the foundation will serve as the new home for Kubernetes, it’s not limited to Kubernetes — nor is it an open source dumping ground. Rather, this foundation is dedicated to a particular ethos: the creation of the new kinds of application and (especially) service stacks that represent modern, server-side computing. That is, it is a foundation with a true mission: to advance key open source technologies that constitute modern, elastic computing. As such, it seeks to transcend any single technology — it has a raison d’être that runs deeper than mere self-preservation. I would like to think that this third parth can serve as a model in the new, all-open world: foundations as entities that don’t let their corporate neutrality prevent them from being opinionated as to their mission, their constituent technologies or — importantly — their engineering values!

July 20, 2015

OpenStackOpenStack and Hadoop

July 20, 2015 23:46 GMT

It's always interesting to see how technologies get tied together in the industry. Orgad Kimchi from the Oracle Solaris ISV engineering group has blogged about the combination of OpenStack and Hadoop. Hadoop is an open source project run by the Apache Foundation that provided distributed storage and compute for large data sets - in essence, the very heart of big data. In this technical How To, Orgad shows how to set up a multi-node Hadoop cluster using OpenStack by creating a pre-configured Unified Archives that can be uploaded to the Glance Image Repository for deployment across VMs created with Nova.

Check out: How to Build a Hadoop 2.6 Cluster Using Oracle OpenStack

OpenStackFlat networks and fixed IPs with OpenStack Juno

July 20, 2015 03:07 GMT

Girish blogged previously on the work that we've been doing to support new features with the Solaris integrations into the Neutron OpenStack networking project. One of these features provides a flat network topology, allowing administrators to be able to plumb VMs created through an OpenStack environment directly into an existing network infrastructure. This essentially gives administrators a choice between a more secure, dynamic network using either VLAN or VXLAN and a pool of floating IP addresses, or an untagged static or 'flat' network with a set of allocated fixed IP addresses.

Scott Dickson has blogged about flat networks, along with the steps required to set up a flat network with OpenStack, using our driver integration into Neutron based on Elastic Virtual Switch. Check it out!

July 17, 2015

Joerg MoellenkampLess known Solaris features: Dump device estimates in Solaris 11.2

July 17, 2015 14:19 GMT
One reoccuring question from customers is „How large should i size the dump device?“. Since Solaris 11.2 there is a really comfortable way to get a number.
Continue reading "Less known Solaris features: Dump device estimates in Solaris 11.2"

Robert MilkowskiSolaris 11.3 Beta

July 17, 2015 09:48 GMT
Solaris 11.3 Beta is available now. There are many interesting new features and lots of improvements.
Some of them have already been available if you had access to Solaris Support repository, but if not now you can play with ZFS persistent l2arc, which can also hold compressed blocks, or ZFS/lz4 compression, or perhaps you fancy a new (to Solaris) OpenBSD Packet Filter, or... see What's New for more details on all the new features.

Also see a collection of blog posts which have more technical details about the new features.
New batch of blogs about the new update.