December 15, 2014

Joerg MoellenkampRegistration for Solaris Tech Day on January 13th 2015 in Cologne online

December 15, 2014 13:47 GMT
The registration for the "Oracle Solaris TechDay: Sharing Experiences, Engineering Insights and Outlook"-Event is now online. So you can now register for the event. I think it's really interesting in order to learn about the new stuff in Solaris and where the operating environment is heading to.

PS: The headline initially stated "February 13th". This is incorrect. It's January 13th.

December 12, 2014

Joerg MoellenkampAgenda for Solaris Tech Day in Cologne on January 13th, 2015

December 12, 2014 20:04 GMT
I still don't have the registration page for the Solaris Tech Day in Cologne, but my colleague Franz Haberhauer already put the agenda online for the event in the "Solarium Blog", that takes place in the Maritim Hotel Köln (Heumarkt 20, 50667 Köln). The agenda is as following:

Time Theme
09:00 Registration and Coffee
09:45 Welcome & Introduction
Franz Haberhauer, Chief Technologist
Markus Flierl, VP Software Development
09:55 OpenStack
Eric Saxe, Director Software Development
Joost Pronk van Hoogeven, Senior Principal Product Strategy Manager
11:10 Coffee
11:30 Software Defined Networking
Jörg Möllenkamp, Senior Account Architect
12:15 Reduce Risk , Deliver Secure Services, and Monitor Compliance with Solaris Security Technologies
Darren Moffat, Senior Principal Software Engineer
13:00 Lunch
13:50 Solaris 11.2 Server Virtualization
Duncan Hardie, Principal Product Manager
Bart Smalders, Senior Principal Software Engineer

14:35 Solaris Data Management – Local and in the Cloud
Cindy Swearingen, Product Manager
Thomas Nau, University of Ulm
15:20 Coffee
15:40 Solaris 11.2 Provisioning and SMF – Completing the Vision with Unified Archives and First Boot Services
Bart Smalders, Senior Principal Software Engineer
Liane Praza, Senior Principal Software Engineer
16:25 Oracle Solaris Update and Strategy
Markus Flierl, VP Software Development
17:10 Q&A panel - All presenters and Solaris engineers
17:30 End of Public Event
Presenters and Engineers Available for Personal Discussions

This should be really an interesting event. So please block January 13th. I will post the link to the registration as soon as the link is operational.

December 11, 2014

Darryl GoveChecking whether hardware supports crypto instructions

December 11, 2014 18:12 GMT

A quick example of how to tell if the machine that you're running on supports crypto instructions.

The 2011 SPARC Architecture manual tells you to read the cfr register before using the instruction. The cfr register contains a bit for every implemented crypto instruction. However, the cfr register is not implemented on all processors. So you would need to check whether this register is implemented before reading it....

So there has to be a better way. Fortunately, Solaris implements a getisax() call which provides this information without the user needing to muck around with the low level details. The following code shows how this call can be used to check whether the AES instruction is implemented or not:

#include <sys/auxv.h>
#include <stdio.h>

void main()
  unsigned int array[10];
  unsigned int count = getisax(array,10);
  if (count>0)
    printf(" AES: ");
    if (array[0] & AV_SPARC_AES) { printf("Yes\n"); } else { printf("No\n"); }
    printf("Error: getisax() call returned no results\n");

December 03, 2014

Joerg MoellenkampEvent accouncement - Solaris Tech Day in Cologne on January 13th, 2015

December 03, 2014 14:51 GMT
On January 13th, 2015 (yeah, it's really that late in this year that we are talking about schedules in the next year ... time flies like an arrow, fruit flies like a banana) there will be an Solaris Tech Day in Cologne. A number of colleagues from Solaris Engineering and Solaris Product Management are in Germany and thus the opportunity should be used. Just reserve January 13th at the moment, more information will follow. There is a blog entry with a a few additional information in german language in the Solarium Blog.

Joerg MoellenkampRoch Bourbonnais about Performance Improvements to ZFS.

December 03, 2014 09:14 GMT
Roch Bourbonnais started a series of blog articles about changes to ZFS in oder to improve performance with his article "ZFS Performance boosts since 2010". He published a first article out of this series already, it is about reARC , a major rearchitecture of the subystem that manages ZFS in-memory cache along with its interface to the DMU.

December 01, 2014

Joerg MoellenkampIPS with CVE numbers

December 01, 2014 09:25 GMT
A few days ago, Darren Moffat wrote an interesting article about the inclusion of CVE numbers in the IPS packages. You can read the article here. I just want to give a short example by citing Darren. For more information, just go to his blog post.

If we simply want to know if the fix for a given CVE-ID is installed the using 'pkg search -l' with the CVE-ID is sufficent eg:

# pkg search -l CVE-2014-7187
info.cve set CVE-2014-7187 pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1

Joerg MoellenkampEvent accouncement - Oracle Business Breakfast - "Service Managment Facility"

December 01, 2014 08:53 GMT
As this event is in Germany and in german language, i will proceed in the respective language:

Am 16. Dezember 2014 findet in Düsseldorf noch einmal ein Business Breakfast statt. Das Thema ist neben den Neuheiten von Oracle eine Einführung in die Service Management Facility. Ersteres wird durch meinen Kollegen Michael Färber vorgetragen, letzteres werde ich vortragen. Anmelden könnt ihr euch unter diesem Link.
Die Service Management Facility (SMF) von Solaris, obschon seit Version 10 enthalten, ist für die meisten Kunden immer noch ein Feld, das recht selten betreten wird und oft mit dem Schreiben eines init.d-Scripts umgangen wird. Dadurch verliert man jedoch Funktionalität. Dieses Frühstück will noch mal die Grundlagen der SMF aufrischen, Neuheiten erläutern, die in SMF dazu gekommen sind, Tipps und Tricks zur Arbeit mit SMF geben und einige eher selten damit in Verbindung gebrachte Features erläutern. So wird auch die Frage geklärt, was es mit dem /system/contract-mountpoint auf sich hat und wie man das dahinterstehende Feature auch ausserhalb des SMF gebrauchen kann

Insbesondere werde ich auf das neue Solaris 11.2 Feature der SMF-Stencils eingehen, das vielen noch unbekannt ist.

November 28, 2014

Joerg MoellenkampNext try ... Event Announcement: Business Breakfast "Erste Praxiserfahrungen mit Solaris 11.2" in Hamburg am 18. Dezember 2014

November 28, 2014 13:37 GMT
Leider musste ja das Event am 6. November abgesagt werden, weil ich krank wurde. Jetzt daher der Nachholtermin: Am Donnerstag, den 18. Dezember 2014 findet in der Oracle Geschäftsstelle in Hamburg wieder unser Business Breakfast statt. Diesmal steht die Veranstaltung unter dem Motto: "Erste Praxiserfahrungen mit Solaris 11.2". Die Veranstaltung beginnt um 9:30 Uhr und endet gegen 13:30 Uhr.

Ich werde in diesem Vortrag über folgende Bereiche berichten:Anmeldungen laufen diesmal etwas anders. Bitte eine eMail an diese Mailaddresse schicken. Das ist ein Weiterleiter an den organisierenden Kollegen, damit dessen Emailaddresse nicht für Spammer spiderbar hier im Artikel steht.

November 19, 2014

Darryl GoveWriting inline templates

November 19, 2014 17:48 GMT

Writing some inline templates today... I've written about doing this kind of stuff in the past here and, in more detail, here.

I happen to need to pass a bundle of parameters on to the routine. The best way of checking how the parameters will be passed is to get the compiler to provide some initial template. Here's an example routine:

int parameters (int p0, int * p1, int * p2, int* p3, int * p4, int * p5, int * p6, int p7)
  return p0 + *p1 + *p2 + *p3 + *p4 + ((*p5)<<2) + ((*p6)<<3) + p7*p7;

In the routine I've tried to handle some of the parameters differently. I know that the first parameters get passed in registers, and then the later ones get passed on the stack. By handling them differently I can work out which loads from the stack correspond to which variables. The disassembly looks like:

-bash-4.1$ cc -g -O parameters.c -c
-bash-4.1$ dis -F parameters parameters.o
disassembly for parameters.o

    parameters:             ca 02 60 00  ld        [%o1], %g5
    parameters+0x4:         c4 02 e0 00  ld        [%o3], %g2
    parameters+0x8:         c2 02 a0 00  ld        [%o2], %g1
    parameters+0xc:         c6 03 a0 60  ld        [%sp + 0x60], %g3  // load of p7
    parameters+0x10:        88 02 00 05  add       %o0, %g5, %g4
    parameters+0x14:        d0 03 60 00  ld        [%o5], %o0
    parameters+0x18:        ca 03 20 00  ld        [%o4], %g5
    parameters+0x1c:        92 00 80 01  add       %g2, %g1, %o1
    parameters+0x20:        87 38 e0 00  sra       %g3, 0x0, %g3
    parameters+0x24:        82 01 00 09  add       %g4, %o1, %g1
    parameters+0x28:        d2 03 a0 5c  ld        [%sp + 0x5c], %o1 // load of p6
    parameters+0x2c:        88 48 c0 03  mulx      %g3, %g3, %g4     // %g4 = %g3*%g3
    parameters+0x30:        97 2a 20 02  sll       %o0, 0x2, %o3
    parameters+0x34:        94 00 40 05  add       %g1, %g5, %o2
    parameters+0x38:        da 02 60 00  ld        [%o1], %o5       
    parameters+0x3c:        84 02 c0 0a  add       %o3, %o2, %g2
    parameters+0x40:        99 2b 60 03  sll       %o5, 0x3, %o4     // %o4 = %o5<<3
    parameters+0x44:        90 00 80 0c  add       %g2, %o4, %o0
    parameters+0x48:        81 c3 e0 08  retl
    parameters+0x4c:        90 02 00 04  add       %o0, %g4, %o0

November 13, 2014

Garrett D'AmoreA better illumos...

November 13, 2014 17:33 GMT
If you follow illumos very closely, you may already know some of this.

A New Fork

Several months ago, I forked illumos-gate (the primary source code repository for the kernel and system components of illumos) into illumos-core.

I had started upstreaming my work from illumos-core into illumos-gate.  I've since ceased that effort, largely because I simply have no time for the various arguments that my work often generates.  I think this is largely because my vision for illumos is somewhat different from that of other folks, and sadly illumos proper lacks anything resembling a guiding vision now, which means that only entirely non-contentious changes can get integrated into illumos.

However, I still want to proceed apace with illumos-core, because I believe that work has real value, and I firmly believe that my vision for illumos is the one that will lead to greater adoption by users, and by distributors as well, since much of what I'm trying to achieve in illumos-gate is aimed at reducing barriers to adoption and to developers both of illumos itself and of systems that want to build on top of or integrate illumos.  (An example of reducing barriers to adoption -- I recently implemented a BSD compatible flock() within libc, which is sometimes used by applications developed for BSD or Linux.)

Relationship to Upstream

I do also invite other parties to cherry-pick from illumos-core into illumos-gate.  I suspect that a large number of the enhancements I've made, such as the support for the fexecve() function specified by POSIX 2008, are likely to be more widely useful.  Within illumos-core, I want to retain a high standard of quality, and facilitate the effort of upstreaming for those who want to make the effort to do so.

I do want to reiterate that unlike other projects that have forked from illumos, it is not my intent to divorce myself from the community -- rather I see this illumos-core as an experimental branch aimed at exploring new directions that I ultimately hope will be embraced by the wider illumos community some day; by doing this in a separate repository/branch/fork, illumos-core can drive towards these goals without getting mired in questions that would prevent progress on these goals within illumos-gate proper.

The focus here is on delivery, rather than on discussion.  (In fact, one of my taglines on social media has for many years been "Code first, questions later."  The illumos-core effort represents a return to that core value.)

Call for Participation

I'm also interested in having co-collaborators on this project.  The goals are large, and while I hope to achieve them someday even if I have to do it all myself, I'm certain that the project will move quite a lot faster with help.  Also, because of our lack of bureaucracy, I hope that illumos-core can be an easier path to integration than illumos-gate.  I just use a simple github pull-request for integration at present.

There is an opportunity for folks at all different technical levels to participate.  We need work that involves systems programming, but also there is work around documentation, research, shell scripting, test development and release engineering to be performed.  I'm happy to mentor folks who want to help out, based on their skill level.

And, of course, for folks who want to focus primarily on improving illumos-gate upstream, there is effort that could be spent to figure out what to cherry-pick and to do the various illumos-gate process wrangling steps to get those bits integrated.

Darryl GoveSoftware in Silicon Cloud

November 13, 2014 16:00 GMT

I missed this press release about Software in Silicon Cloud. It's the announcement for a service where you can try out a SPARC M7 processor. There's an accompanying website which has the sign up plus some more information about the service.

What's particularly exciting is that it talks a bit more about Application Data Integrity (ADI). Larry Ellison called this "the most important piece of engineering we’ve done in a long, long time.".

Incorrect handling of pointers is a large contributor to bugs in software. ADI tackles this by making the hardware check that the pointer being used is valid for the region of memory it is pointing to. If it's not valid the hardware flags it as an error. Since it's done by hardware, there's minimal performance impact - it's at hardware speed, so developers can check their application in realtime.

There's a nice demo of how ADI protects against exploits like HeartBleed.

November 12, 2014

Darryl GoveOracle Solaris Studio playlist

November 12, 2014 16:00 GMT

There's an extensive list of Solaris Studio videos on youtube. In particular there's a bunch of tutorials covering the features of the IDE. The IDE doesn't often get the attention it deserves. It's based off NetBeans, and is full of useful code refactoring tools, navigation tools, etc. To find out more, take a look at some of the videos.

Darryl GoveNew Performance Analyzer Overview screen

November 12, 2014 00:20 GMT

I love using the Performance Analyzer, but the question I often get when I show it to people, is "Where do I start?". So one of the improvements in Solaris Studio 12.4 is an Overview screen to help people get started with the tool. Here's what it looks like:

The reason this is important, is that many applications spend time in various place - like waiting on disk, or in user locks - and it's not always obvious where is going to be the most effective place to look for performance gains.

The Overview screen is meant to be the "one-stop" place where people can find out what their application is doing. When we put it back into the product I expected it to be the screen that I glanced at then never went back to. I was most surprised when this turned out not to be the case.

During performance analysis, I'm often exploring different ideas as to where it might be possible to get performance improvements. The Overview screen allows me to select the metrics that I'm interested in, then take a look at the resulting profiles. So I might start with system time, and just enable the system time metrics. Once I'm done with that, I might move on to user time, and select those metrics. So what was surprising about the Overview screen was how often I returned to it to change the metrics I was using.

So what does the screen contain? The overview shows all the available metrics. The bars indicate which metrics contribute the most time. So it's easy to pick (and explore) the metrics that contribute the most time.

If the profile contains performance counter metrics, then those also appear. If the counters include instructions and cycles, then the synthetic CPI/IPC metrics are also available. The Overview screen is really useful for hardware counter metrics.

I use performance counters in a couple of ways: to confirm a hypothesis about performance or to estimate time spent on a type of event. For example, if I think a load is taking a lot of time due to TLB misses, then profiling on the TLB miss performance counter will tell me whether that load has a lot of misses or not. Alternatively, if I've got TLB miss counter data, then I can scale this by the cost per TLB miss, and get an estimate of the total runtime lost to TLB misses.

Where the Overview screen comes into this is that I will often want to minimise the number of columns of data that are shown (to fit everything onto my monitor), but sometimes I want to quickly enable a counter to see whether that event happens at the bit of code where I'm looking. Hence I end up flipping to the Overview screen and then returning to the code.

So what I thought would be a nice feature, actually became pretty central to my work-flow.

I should have a more detailed paper about the Overview screen up on OTN soon.

November 11, 2014

Darryl GovePerformance made easy

November 11, 2014 22:47 GMT

The big news of the day is that Oracle Solaris Studio 12.4 is available for download. I'd like to thank all those people who tried out the beta releases and gave us feedback.

There's a number of things that are new in this release. The most obvious one is C++11 support, I've written a bit about the lambda expression support, tuples, and unordered containers.

My favourite tool, the Performance Analyzer, has also had a bit of a facelift. I'll talk about the Overview screen in a separate post (and in an article), but there's some other fantastic features. The syntax highlighting, and hyperlinking, has made navigating profiles much easier. There's been a large number of improvements in filtering - a feature that's been in the product a long time, but these changes elevate it to being much more accessible (an article on filtering is long overdue!). There's also the default hardware counters - which makes it a no-brainer to get hardware counter data, which is really helpful in understanding exactly what an application is doing.

Over the development cycle I've made much use of the other tools. The Thread Analyzer for identifying data races has had some improvements. The Code Analyzer tools have made some great gains in rapidly identifying potential coding errors. And so on....

Anyway, please download the new version, try it out, try out the tools, and let us know what you think of it.

November 06, 2014

Steve TunstallNew Logzilla Drives for your ZFSSA

November 06, 2014 16:58 GMT

Yes, the new, larger Logzilla SSD drives for your ZFSSA systems are now out. They are 200GB usable, up from the 73GB usable drives. 

Yes, you will sometimes see them referred to in some marketing literature as 400GB. This is because there is extra room in enterprise SSD chips to allow for cell burnout and keep their 5 years lifetime. Make no mistake, they will give you 200GB of actually capacity in the ZFSSA systems.

Yes, they are compatible with the current 73GB version. You can mix and match. The one thing to look out for is in a 'mirrored' log profile. If you mix a new one with an old one in a mirrored log profile, then the new one will size down to 73GB to match it. In a striped profile, it doesn't matter, nor will it matter if you have 2 or more of each.

One last thing-- They are almost twice as fast as the older 73GB version. If you mix them, you will get faster, but not as fast as if you had all 200GB versions. Diminishing returns. Talk to your local SC on whether your Lozgilla workload is so great that either adding some new ones or even changing out your old ones would help your performance. Not every workload needs Logzillas, but there are built-in analytics that can tell us if yours is a good fit.


November 05, 2014

Joerg MoellenkampCancelation: Business Breakfast "Erste Praxiserfahrungen mit Solaris 11.2" in Hamburg am 6. November 2014

November 05, 2014 21:51 GMT
The event tomorrow is canceled because of the illness of the presentator (me, i got a bad cold in my vacation). I will keep you updated about a new schedule.

November 04, 2014

Darryl GoveSPARC Software in Silicon

November 04, 2014 17:48 GMT

Short video by Juan Loaiza about the Software in Silicon work in the upcoming SPARC processor.

Bryan CantrillSmartDataCenter and Manta are now open source

November 04, 2014 00:16 GMT

Today we are announcing that we are open sourcing the two systems at the heart of our business: SmartDataCenter and the Manta object storage platform. SmartDataCenter is the container-based orchestration software that runs the Joyent public cloud; we have used it for the better half of a decade to run on-the-metal OS containers — securely and at scale. Manta is our multi-tenant ZFS-based object storage platform that provides first-class compute by allowing OS containers to be spun up directly upon objects — effecting arbitrary computation at scale without data movement. The unifying technological foundation beneath both SmartDataCenter and Manta is OS-based virtualization, a technology that Joyent pioneered in the cloud way back in 2006. We have long known the transformative power of OS containers, so it has been both exciting and validating for us to see the rise of Docker and the broadening of appreciation for OS-based virtualization. SmartDataCenter and Manta show that containers aren’t merely a fad or developer plaything but rather a fundamental technological advance that represents the foundation for the next generation of computing — and we believe that open sourcing them advances the adoption of container-based architectures more broadly.

Without any further ado — and to assure that we don’t fall into the most prominent of my own corporate open source anti-patterns — here is the source for SmartDataCenter and the source for Manta. These are sophisticated systems with many moving parts, and you’ll see that these two repositories are in fact meta-repositories that explain the design of each of the systems and then point to the (many) components that comprise them (all now open source, natch). We believe that some of these subcomponents will likely find use entirely outside of SDC and Manta. For example, Manatee is a ZooKeeper-based system that manages Postgres replication and automates failover; Moray is a key-value service that lives on top of Postgres. Taken together, Manatee and Moray implement a highly-available key-value service that we use as the foundation for many other components in SDC and Manta — and one that we think others will find useful as well.

In terms of source code mechanics, you’ll see that many of the components are implemented in either node.js or by extending C-based systems. This is not by fiat but rather by the choices of individual engineers; over the past four years, as we learned about the nuances of node.js error handling and as we invested heavily in tooling for running node.js in production, node.js became the right tool for many of our jobs — and we used it for many of the services that constitute SDC and Manta.

And because any conversation about open source has to address licensing at some point or another, let’s get that out of the way: we opted for the Mozilla Public License 2.0. While relatively new, there is a lot to like about this license: its file-based copyleft allows it to be proprietary-friendly while also forcing certain kinds of derived work to be contributed back; its explicit patent license discourages litigation, offering some measure of troll protection; its explicit warranting of original work obviates the need for a contributor license agreement (we’re not so into CLAs); and (best of all, in my opinion), it has been explicitly designed to co-exist with other open source licenses in larger derived works. Mozilla did terrific work on MPL 2.0, and we hope to see it adopted by other companies that share our thinking around open source!

In terms of the business ramifications, at Joyent we have long been believers in open source as a business model; as the leaders of the node.js and SmartOS projects, we have seen the power of open source to start new conversations, open up new markets and (importantly) yield new customers. Ten years ago, I wrote that open source is “a loss leader — minus the loss, of course”; after a decade of experience with open source business models, I would add that open source also serves as sales outreach without cold calls, as a channel without loss of margin, and as a marketing campaign without advertisements. But while we have directly experienced the business advantages of open source, we at Joyent have also lived something of a dual life: node.js and SmartOS have been open source, but the distributed systems that we have built using these core technologies have remained largely behind our walls. So that these systems are now open source does not change the fundamentals of our business model: if you would like to consume SmartDataCenter or Manta as a service, you can spin up an instance on the public cloud or use our Manta storage service. Similarly, if you want a support contract and/or professional services to run either SmartDataCenter or Manta on-premises, we’ll sell them to you. Based on our past experiences with open source, we do know that there will be one important change: these technologies will find their way into the hands of those that we have no other way of reaching — and that some fraction of these will become customers. Also based on past experience, we know that some (presumably much smaller) fraction of these new technologists will — by merits of their interest in and contributions to these projects — one day join us as engineers at Joyent. Bluntly, open source is our farm system, and broadening our hiring channel during a blazingly hot market for software talent is playing no small role in our decision here. In short, this is not an act of altruism: it is a business decision — if a multifaceted one that we believe has benefits beyond the balance sheet.

Welcome to open source SDC and Manta — and long-live the container revolution!

October 23, 2014

Joerg MoellenkampEvent Announcement: Business Breakfast "Erste Praxiserfahrungen mit Solaris 11.2" in Hamburg am 6. November 2014

October 23, 2014 08:22 GMT
I'm doing a business breakfast at beginning of November. As this is an event in german language, i will proceed in german language in this announcement.

Am Donnerstag, den 6. November 2014 findet in der Oracle Geschäftsstelle in Hamburg wieder unser Business Breakfast statt. Diesmal steht die Veranstaltung unter dem Motto: "Erste Praxiserfahrungen mit Solaris 11.2".

Ich werde in diesem Vortrag über folgende Bereiche berichten:Anmeldungen laufen diesmal etwas anders. Bitte eine eMail an diese Mailaddresse schicken. Das ist ein Weiterleiter an den organisierenden Kollegen, damit dessen Emailaddresse nicht für Spammer spiderbar hier im Artikel steht.

October 18, 2014

Garrett D'AmoreYour language sucks...

October 18, 2014 06:20 GMT
As a result of work I've been doing for illumos, I've recently gotten re-engaged with internationalization, and the support for this in libc and localedef (I am the original author for our localedef.)

I've decided that human languages suck.  Some suck worse than others though, so I thought I'd write up a guide.  You can take this as "your language sucks if...", or perhaps a better view might be "your program sucks if you make assumptions this breaks..."

(Full disclosure, I'm spoiled.  I am a native speaker of English.  English is pretty awesome for data-processing, at least at the written level.  I'm not going to concern myself with questions about deeper issues like grammar, natural language recognition, speech synthesis, or recognition, automatic translation, etc.  Instead this is focused strictly on the most basic display and simple operations like collation (sorting), case conversion, and character classification.)

1. Too many code points. 

Some languages (from Eastern Asia) have way way too many code points.  There are so many that these languages can't actually fit into 16-bits all by themselves.  Yes, I'm saying that there are languages with over 65,000 characters in them!  This explosion means that generating data for languages results in intermediate lookup tables that are megabytes in size.  For Unicode, this impacts all languages.  The intermediate sources for the Unicode supported in illumos blow up to over 2GB when support for the additional code planes is included.

2. Your language requires me to write custom code for symbol names. 

Hangul Jamo, I'm looking at you.  Of all the languages in Unicode, only this one is so bizarre that it requires multiple lookup tables to determine the names of the characters, because the characters are made up of smaller bits of phonetic portions (vowels and consonants.)  It even has its own section in the basic conformance document for Unicode (section 3.12).  I don't speak Korean, but I had to learn about Jamo.

3. Your language's character set is continuing to evolve. 

Yes, that's Asia again (mostly China I think).   The rate at which new Asian characters are added rivals that of updates to the timezone database.  The approach your language uses is wrong!

4. Characters in your language are of multiple different cell widths. 

Again, this is mostly, but not exclusively, Asian languages.  Asian languages require 2 cells to display many of their characters.  But, to make matters far far worse, some times the number f code points used to represent a character is more than one, which means that the width of a character when displayed may be 0, 1, or 2 cells.   Worse, some languages have both half- and full-width forms for many common symbols.  Argh.

5. The width of the character depends on the context. 

Some widths depend on the encoding because of historical practice (Asia again!), but then you have composite characters as well.  For example, a Jamo vowel sound could in theory be displayed on its own.  But if it follows a leading consonant, then it changes the consonant character and they become a new character (at least to the human viewer).

6. Your language has unstable case conversions.

There are some evil ones here, and thankfully they are rare.  But some languages have case conversions which are not reversible!  Case itself is kind of silly, but this is just insane!  Armenian has a letter with this property, I believe.

7. Your language's collation order is context-dependent. 

(French, I'm looking at you!)  Some languages have sorting orders that depend not just on the character itself, but on the characters that precede or follow it.  Some of the rules are really hard.  The collation code required to deal with this generally is really really scary looking.

8. Your language has equivalent alternates (ligatures). 

German, your ß character, which stands in for "ss", is a poster child here.  This is a single code point, but for sorting it is equivalent to "ss".  This is just historical decoration, because it's "fancy".  Stop making my programming life hard.

9. Your language can't decide on a script. 

Some languages can be written in more than one script.  For example, Mongolian can be written using Mongolian script or Cyrillic.  But the winner (loser?) here is Serbian, which in some places uses both Latin and Cyrillic characters interchangeably! Pick a script already! I think the people who live like this are just schizophrenic.  (Given all the political nonsense surrounding language in these places, that's no real surprise.)

10. Your language has Titlecase. 

POSIX doesn't do Titlecase.  This happens because your language also uses ligatures instead of just allocating a separate cell and code point for each character.  Most people talk about titlecase used in a phrase or string of words.  But yes, titlecase can apply to a SINGLE CHARACTER.  For example, Dž is just such a character.

11. Your language doesn't use the same display / ordering we expect.

So some languages use right to left, which is backwards, but whatever.   Others, crazy ones (but maybe crazy smart, if you think about it) use back and forth bidirectional.  And still others use vertical ordering.  But the worst of them are those languages (Asia again, dammit!) where the orientation of text can change.  Worse, some cases even rotate individual characters, depending upon context (e.g. titles are rotated 90 degrees and placed on the right edge).  How did you ever figure out how to use a computer with this crazy stuff?

12. Your encoding collides control codes.

We use the first 32 or so character codes to mean special things for terminal control, etc.  If we can't use these, your language is going to suck over certain kinds of communication lines.

13. Your encoding uses conflicting values at ASCII code points.

ASCII is universal.  Why did you fight it?  But that's probably just me being mostly Anglo-centric / bigoted.

14. Your language encoding uses shift characters. 

(Code page, etc.)  Some East Asian languages used this hack in the old days.  Stateful encodings are JUST HORRIBLY BROKEN.   A given sequence of characters should not depend on some state value that was sent a long time earlier.

15. Your language encoding uses zero values in the middle of valid characters. 

Thankfully this doesn't happen with modern encodings in common use anymore.  (Or maybe I just have decided that I won't support any encoding system this busted.  Such an encoding is so broken that I just flat out refuse to work with it.)

Non-Broken Languages

So, there are some good examples of languages that are famously not broken.

a. English.  Written English has simple sorting rules, and a very simple character set.  Dipthongs are never ligatures.  This is so useful for data processing that I think it has had a great deal to do with why English is the common language for computer scientists around the world.  US-ASCII -- and English character set, is the "base" character set for Unicode, and pretty much all other encodings use ASCII encodings in the lower 7 bits.

b. Russian.  (And likely others that use Cyrillic, but not all of them!)  Russian has a very simple alphabet, strictly phonetic.  The number of characters is small, there are no composite characters, and no special sorting rules.  Hmm... I seem to recall that Russia (Soviet era) had a pretty robust computing industry.  And these days Russians mostly own the Internet, right?  Coincidence?  Or maybe they just don't have to waste a lot of time fighting with the language just to get stuff done?

I think there are probably others.  (At a glance, Geoergian looks pretty straight-forward.   I suspect that there are languages using both Cyrillic and Latin character sets that are sane.  Ethiopic actually looks pretty simple and sane too.  (Again, just from a text processing standpoint.)

But sadly, the vast majority of natural languages have written forms & rules that completely and utterly suck for text processing.

October 17, 2014

Jeff SavitOracle VM Server for SPARC Best Practices White Paper

October 17, 2014 23:02 GMT
I'm very pleased to announce a new white paper has been published: Oracle VM Server for SPARC Best Practices.

This paper shows how to configure to meet demanding performance and availability requirements. Topics include:

The paper includes specific recommendations, describes the reasons behind them, and illustrates them with examples taken from actual systems.

October 13, 2014

Garrett D'AmoreMy Problem with Feminism

October 13, 2014 23:03 GMT
I'm going to say some things here that may be controversial.  Certainly that headline is.  But please, bear with me, and read this before you judge too harshly.

As another writer said, 2014 has been a terrible year for women in tech.  (Whether in the industry, or in gaming.)  Arguably, this is not a new thing, but rather events are reaching a head.  Women (some at any rate) are being more vocal, and awareness of women's issues is up.  On the face of it, this should be a good thing.

And yet, we have incredible conflict between women and men.  And this is at the heart of my problem with "Feminism".

The F-Word

Don't get me wrong.  I strongly believe that women should be treated fairly and with respect; in the professional place they should receive the same level of professional respect -- and compensation! -- as their male counterparts can expect.  I believe this passionately -- as a nerd, I prefer to judge people on the merits of their work, rather than on their race, creed, gender, or sexual preference.  A similar principle applies to gaming -- after all, how do you really know the gender of the player on the other side of the MMO?  Does it even matter?  When did gaming become a venue for channeling hate instead of fun?

The problem with "feminism" is that instead of repairing inequality and trying to bring men and women closer together, so much of it seems to be divisive.  The very word itself basically suggests a gender based conflict, and I think this, as well as much of the recent approach, is counterproductive.

Instead of calling attention to inequalities and improper behaviors (lets face it, nobody wants to deal with sexual harassment, discrimination, or some of the very much worse behavior that a few terribly bad actors are guilty of), we've become focused on gender bias and "fixing" gender bias as a goal in and of itself, rather than instead focusing on fair and equal treatment for all.

Every day I'm inundated with tweets and Facebook postings extolling the terrible plight of women at the expense of men.  Many of these posts seem intended to make me either angry at men, or ashamed of being one.  This basically drives a wedge between people, even unconsciously, to the point that it has become impossible to avoid being a soldier on one side or the other of this war.  And don't get me wrong, it has indeed degenerated to a total war.

I don't think this is what most feminists or their advocates really want.  (Though, I think it is what some of them want.  The side of feminism has its bad actors who thrive on conflict just as much as the other side has.  Extremism is gender and color and religion blind, as we've ample evidence of.)

I think one thing that advocates for women in tech can do, is to pick a different term, and a different way of stating their goals, and perhaps a different approach.  I think we've reached the critical mass necessary for awareness, so the constant tweets about how terrible it is to be a woman are no longer helpful.

I'm not sure what "term" should replace feminism -- in the workplace I'd suggest "professionalism".  After all everyone wants to be treated professionally, not just women.  (Btw, I'd say that in the gaming community, the value should be "sportsmanship".  Sadly some will see that word is gender biased, but I don't ascribe to the notion that we have to completely change our language in order to be more politically correct.  You know what I mean.)

Likewise, instead of dog piling on the one person (as I'm sure will happen in response to this post) on someone who doesn't immediately appear to support the feminist agenda, perhaps a little more tolerance, and education should be used in the approach.  Focus should, IMO, be on public praise for the parties who are working to make conditions better.

Educate instead of punish.  Make allies instead of enemies.

Salary Gap

The salary gap issue that was raised recently by Microsoft is another case in point.

I don't agree with Satya Nadella's comments saying that women should not ask for raises, but I think many women are nearly as likely to get a raise upon requesting one as a man of similar accomplishments.  (Yes, it would be better if this statement could have been said without "nearly".)   Far too few women feel comfortable asking for a merit based raise in the first place -- that is something that should change. But using race or gender as a bias to demand pay increases is a recipe for further division.  Indeed, men may begin to wonder if women are being compensated unfairly because they are women, but in the reverse direction. 

Likewise, bringing up discrimination in a salary discussion puts the other party on the defensive.  It presumes to imply prior wrong-doing.  This may be the case, but it may well not be.  After all, I've known many men that were under compensated simply because they sold themselves short, or were not comfortable asking for more money.   Why look for a fight when there isn't one?  (I suspect this is what Satya was really trying to get at.)

None of this helps the cause of "professionalism", and probably not the cause of "feminism".

Average tech salary figures are easily obtainable.  If a worker, man or woman, feels under compensated -- for any reason -- then they should take it to his employer and ask for a correction.  But to presume that the reason is gender, starts the conversation from a point of conflict.

Far far better is to demand far pay based on work performance and merit, relative to industry norms as appropriate.   If an employer won't compensate fairly, just leave.  There is no shortage of tech jobs in the industry.  If you're a woman, maybe look for jobs at companies that employ (and successfully retain) women.  Ask the people who work at a prospective employer about conditions, etc.  That's true for minorities too!  Ultimately, an employer who discriminates will find itself at a severe competitive advantage, as both the discriminated-against parties, and their allies refuse to do business with them.

An employer is not obligated to pay you "more" because of your gender.  But they must also not pay you less because of gender.  And yet every company will generally try to pay as little as they think they can get away with.  So don't let them -- but keep discrimination out of the conversation unless there is really compelling proof of wrong doing.  (And if there is such evidence, I'd recommend looking elsewhere, and possibly explore stronger legal measures.)

And yes, I strongly strongly believe that most men feel as I do.  They support the notion that everyone should be treated equally and professionally, and would like to stamp out sexism in the workplace, but many of us are starting to show symptoms of battle fatigue, and even more of us just don't want to be involved in a conflict at all.   Frankly, I think a lot of us are annoyed at feminist attempts to draw us into the conflict, even though we do support many of the stated goals of equal pay, fair treatment, etc. etc.

Closing Thoughts

As for me, I support the plight of women who find themselves discriminated against based on their gender, and I would like to see more women in my industry.  And I've put my money where my mouth is. 

But at the same time, you won't find me supporting "feminism".  I want to heal the rift, and work with awesome people -- and I happen to believe at least half of the awesome people in the world are of a different gender than I am.  Why would I want to alienate them?
I happen to believe that many well meaning people of many causes damage their cause by basically forcing people to deal with their "diversity" first, instead of of being able to deal with people as people on their own merit.  Its so much harder to appreciate a person on her own merits, when at least half of what she is saying is that she's unfairly treated because of gender, race, sexual preference, etc.  This true for everyone.  Show me how you're excellent, and I promise to appreciate you for your awesomeness, and to treat you fairly and with the same respect I would for anyone of my own gender/race/sexual preference.

You are awesome because of your accomplishments/innovations/contributions, not because of your gender or race or sexual preference.

But, if you won't let me look past your race/gender/etc. identity, then please don't be offended if I don't see anything else.  If you want to be treated like a "person", then let me see the person instead of just some classification in an equal opportunity survey.

October 11, 2014

Jeff SavitAvailability Best Practices - Example configuring a T5-8

October 11, 2014 00:05 GMT
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)
This article continues the series on availability best practices. In this post we will show each step used to configure a T5-8 for availability with redundant network and disk I/O, using multiple service domains.

Overview of T5

The SPARC T5 servers are a powerful addition to the SPARC line. Details on the product can be seen at SPARC T5-8 Server, SPARC T5-8 Server Documentation, The SPARC T5 Servers have landed, and other locations.

For this discussion, the important things to know are:

The following graphic shows T5-8 server resources. This picture labels each chip as a CPU, and shows CPU0 through CPU7 on their respective Processor Modules (PM) and the associated buses. On-board devices are connected to buses on CPU0 and CPU7.

Initial configuration

This demo is done on a lab system with a limited I/O configuration, but enough to show availability practices. Real T5-8 systems would typically have much richer I/O. The system is delivered with a single control domain owning all CPU, I/O and memory resources. Let's view the resources bound to the control domain (the only domain at this time). Wow, that's a lot of CPUs and memory. Some output and whitespace snipped out for brevity.

primary# ldm list -l
primary          active     -n-c--  UART    1024  1047296M 0.0%  0.0%  2d 5h 11m


    0      (0, 1, 2, 3, 4, 5, 6, 7)
    1      (8, 9, 10, 11, 12, 13, 14, 15)
    2      (16, 17, 18, 19, 20, 21, 22, 23)
    3      (24, 25, 26, 27, 28, 29, 30, 31)
    124    (992, 993, 994, 995, 996, 997, 998, 999)
    125    (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007)
    126    (1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015)
    127    (1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023)
    0      0      0      4.7% 0.2%   100%
    1      1      0      1.3% 0.1%   100%
    2      2      0      0.2% 0.0%   100%
    3      3      0      0.1% 0.0%   100%
    1020   1020   127    0.0% 0.0%   100%
    1021   1021   127    0.0% 0.0%   100%
    1022   1022   127    0.0% 0.0%   100%
    1023   1023   127    0.0% 0.0%   100%
    DEVICE                           PSEUDONYM        OPTIONS
    pci@300                          pci_0           
    pci@340                          pci_1           
    pci@380                          pci_2           
    pci@3c0                          pci_3           
    pci@400                          pci_4           
    pci@440                          pci_5           
    pci@480                          pci_6           
    pci@4c0                          pci_7           
    pci@500                          pci_8           
    pci@540                          pci_9           
    pci@580                          pci_10          
    pci@5c0                          pci_11          
    pci@600                          pci_12          
    pci@640                          pci_13          
    pci@680                          pci_14          
    pci@6c0                          pci_15    
Let's also look at the bus device names and pseudonyms:
primary# ldm list -l -o physio primary

    DEVICE                           PSEUDONYM        OPTIONS
    pci@300                          pci_0           
    pci@340                          pci_1           
    pci@380                          pci_2           
    pci@3c0                          pci_3           
    pci@400                          pci_4           
    pci@440                          pci_5           
    pci@480                          pci_6           
    pci@4c0                          pci_7           
    pci@500                          pci_8           
    pci@540                          pci_9           
    pci@580                          pci_10          
    pci@5c0                          pci_11          
    pci@600                          pci_12          
    pci@640                          pci_13          
    pci@680                          pci_14          
    pci@6c0                          pci_15

Basic domain configuration

The following commands are basic configuration steps to define virtual disk, console and network services and resize the control domain. They are shown for completeness but are not specifically about configuring for availability.

primary# ldm add-vds primary-vds0 primary
primary# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
primary# ldm add-vswitch net-dev=net0 primary-vsw0 primary
primary# ldm set-core 2 primary
primary# svcadm enable vntsd
primary# ldm start-reconf primary
primary# ldm set-mem 16g primary
primary# shutdown -y -g0 -i6

This is standard control domain configuration. After reboot, we have a resized control domain, and save the configuration to the service processor.

primary# ldm list
primary          active     -n-cv-  UART    16    16G      3.3%  2.5%  4m
primary# ldm add-spconfig initial

Determine which buses to reassign

This step follows the same procedure as in the previous article to determine which buses must be kept on the control domain and which can be assigned to an alternate service domain. The official documentation is at Assigning PCIe Buses in the Oracle VM Server for SPARC 3.0 Administration Guide.

First, identify the bus used for the root pool disk (in a production environment this would be mirrored) by getting the device name and then using the mpathadm command.

primary# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: none requested
        NAME                       STATE     READ WRITE CKSUM
        rpool                      ONLINE       0     0     0
          c0t5000CCA01605A11Cd0s0  ONLINE       0     0     0
errors: No known data errors
primary# mpathadm show lu /dev/rdsk/c0t5000CCA01605A11Cd0s0
Logical Unit:  /dev/rdsk/c0t5000CCA01605A11Cd0s2
                Initiator Port Name:  w508002000145d1b1

primary# mpathadm show initiator-port w508002000145d1b1
Initiator Port:  w508002000145d1b1
        Transport Type:  unknown
        OS Device File:  /devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/iport@1

That shows that the boot disk is on bus pci@300 (pci_0).

Next, determine which bus is used for network. Interface net0 (based on ixgbe0) is our primary interface and hosts a virtual switch, so we need to keep its bus.

primary# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net1              Ethernet             unknown    0      unknown   ixgbe1
net2              Ethernet             unknown    0      unknown   ixgbe2
net0              Ethernet             up         100    full      ixgbe0
net3              Ethernet             unknown    0      unknown   ixgbe3
net4              Ethernet             up         10     full      usbecm2
primary# ls -l /dev/ix*
lrwxrwxrwx   1 root     root     31 Jun 21 12:04 /dev/ixgbe -> ../devices/pseudo/clone@0:ixgbe
lrwxrwxrwx   1 root     root     65 Jun 21 12:04 /dev/ixgbe0 -> ../devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0:ixgbe0
lrwxrwxrwx   1 root     root     67 Jun 21 12:04 /dev/ixgbe1 -> ../devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0,1:ixgbe1
lrwxrwxrwx   1 root     root     65 Jun 21 12:04 /dev/ixgbe2 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0:ixgbe2
lrwxrwxrwx   1 root     root     67 Jun 21 12:04 /dev/ixgbe3 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1:ixgbe3

Both disk and network are on bus pci@300 (pci_0), and there are network devices on pci@6c0 (pci_15) that we can give to an alternate service domain.

Lets determine which buses are needed to give that service domain access to disk. Previously we saw that the control domain's root pool was on c0t5000CCA01605A11Cd0s0 on pci@300. The control domain currently has access to all buses and devices, so we can use the format command to see what other disks are available. There is a second disk, and it's on bus pci@6c0:

primary# format
Searching for disks...done
       0. c0t5000CCA01605A11Cd0 <HITACHI-H109060SESUN600G-A244 cyl 64986 alt 2 hd 27 sec 66>
       1. c0t5000CCA016066100d0 <HITACHI-H109060SESUN600G-A244 cyl 64986 alt 2 hd 27 sec 668>
Specify disk (enter its number): ^C
primary# mpathadm show lu /dev/dsk/c0t5000CCA016066100d0s0
Logical Unit:  /dev/rdsk/c0t5000CCA016066100d0s2
                Initiator Port Name:  w508002000145d1b0
                Target Port Name:  w5000cca016066101
primary# mpathadm show initiator-port w508002000145d1b0
Initiator Port:  w508002000145d1b0
        Transport Type:  unknown
        OS Device File:  /devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/iport@1

This provides the information needed to reassign buses.

Define alternate service domain and reassign buses

We now define an alternate service domain, remove the above buses from the control domain and assign them to the alternate. Removing the buses cannot be done dynamically (add to or remove from a running domain). If I had planned ahead and obtained bus information earlier, I could have done this when I resized the domain's memory and avoided the second reboot.

primary# ldm add-dom alternate
primary# ldm set-core 2 alternate
primary# ldm set-mem 16g alternate
primary# ldm start-reconf primary
primary# ldm rm-io pci_15 primary
primary# init 6

After rebooting the control domain, I give the unassigned bus pci_15 to the alternate domain. At this point I could install Solaris in the alternate domain using a network install server, but for convenience I use a virtual CD image in a .iso file on the control domain. Normally you do not use virtual I/O devices in the alternate service domain because that introduces a dependency on the control domain, but this is temporary and will be removed after Solaris is installed.

primary# ldm add-io pci_15 alternate
primary# ldm add-vdsdev /export/home/iso/sol-11-sparc.iso s11iso@primary-vds0
primary# ldm add-vdisk s11isodisk s11iso@primary-vds0 alternate
primary# ldm bind alternate
primary# ldm start alternate

At this point, I installed Solaris in the domain. When the install was complete, I removed the Solaris install CD image, and saved the configuration to the service processor:

primary# ldm rm-vdisk s11isodisk alternate
primary# ldm add-spconfig 20130621-split
Note that the network devices on pci@6c0 are enumerated starting at ixgbe0, even though they were ixgbe2 and ixgbe3 when on the control domain that had all 4 installed interfaces.
alternate# ls -l /dev/ixgb*
lrwxrwxrwx   1 root     root     31 Jun 21 10:34 /dev/ixgbe -> ../devices/pseudo/clone@0:ixgbe
lrwxrwxrwx   1 root     root     65 Jun 21 10:34 /dev/ixgbe0 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0:ixgbe0
lrwxrwxrwx   1 root     root     67 Jun 21 10:34 /dev/ixgbe1 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1:ixgbe1

Define redundant services

We've split up the bus configuration and defined an I/O domain that can boot and run independently on its own PCIe bus. All that remains is to define redundant disk and network services to pair with the ones defined above in the control domain:

primary# ldm add-vds alternate-vds0 alternate
primary# ldm add-vsw net-dev=net0 alternate-vsw0 alternate

Note that we could increase resiliency, and potentially performance as well, by using a Solaris 11 network aggregate as the net-dev for each virtual switch. That would provide additional insulation: if a single network device fails the aggregate can continue operation without requiring IPMP failover in the guest.

In this exercise we use a ZFS storage appliance as an NFS server to host guest disk images, so we mount it on both the control and alternate domain, and then create a directory and boot disk for a guest domain. The following two commands are executed in both the primary and alternate domains:

# mkdir /ldoms				 
# mount zfssa:/export/mylab /ldoms  
Those are the only configuration commands run in the alternate domain. All other commands in this exercise are only run from the control domain.

Define a guest domain

A guest domain will be defined with two network devices so it can use IP Multipathing (IPMP) and two virtual disks for a mirrored root pool, each with a path from both the control and alternate domains. This pattern can be repeated as needed for multiple guest domains, as shown in the following graphic with two guests.

primary# ldm add-dom ldg1
primary# ldm set-core 16 ldg1
primary# ldm set-mem 64g ldg1
primary# ldm add-vnet linkprop=phys-state ldg1net0 primary-vsw0 ldg1 
primary# ldm add-vnet linkprop=phys-state ldg1net1 alternate-vsw0 ldg1
primary# ldm add-vdisk s11isodisk s11iso@primary-vds0 ldg1
primary# mkdir /ldoms/ldg1
primary# mkfile -n 20g /ldoms/ldg1/disk0.img
primary# ldm add-vdsdev mpgroup=ldg1group /ldoms/ldg1/disk0.img ldg1disk0@primary-vds0
primary# ldm add-vdsdev mpgroup=ldg1group /ldoms/ldg1/disk0.img ldg1disk0@alternate-vds0
primary# ldm add-vdisk ldg1disk0 ldg1disk0@primary-vds0 ldg1
primary# mkfile -n 20g /ldoms/ldg1/disk1.img
primary# ldm add-vdsdev mpgroup=ldg1group1 /ldoms/ldg1/disk1.img ldg1disk1@primary-vds0
primary# ldm add-vdsdev mpgroup=ldg1group1 /ldoms/ldg1/disk1.img ldg1disk1@alternate-vds0
primary# ldm add-vdisk ldg1disk1 ldg1disk1@alternate-vds0 ldg1
primary# ldm bind ldg1
primary# ldm start ldg1

Note the use of linkprop=phys-state on the virtual network definitions: this indicates that changes in physical link state should be passed to the virtual device so it can perform a failover.

Also note mpgroup on the virtual disk definitions. The ldm add-vdsdev commands define a virtual disk exported by a service domain, and the mpgroup pair indicates they are the same disk (the administrator must ensure they are different paths to the same disk) accessible by multiple paths. A different mpgroup pair is used for each multi-path disk. For each actual disk there are two "add-vdsdev" commands, and one ldm add-vdisk command that adds the multi-path disk to the guest. Each disk can be accessed from either the control domain or the alternate domain, transparent to the guest. This is documented in the Oracle VM Server for SPARC 3.0 Administration Guide at Configuring Virtual Disk Multipathing.

At this point, Solaris is installed in the guest domain without any special procedures. It will have a mirrored ZFS root pool, and each disk is available from both service domains. It also has two network devices, one from each service domain. This provides resiliency for device failure, and in case either the control domain or alternate domain is rebooted.

Configuring and testing redundancy

Multipath disk I/O is transparent to the guest domain. This was tested by serially rebooting the control domain or the alternate domain, and observing that disk I/O operation just proceeded without noticeable effect.

Network redundancy required configuring IP Multipathing (IPMP) in the guest domain. The guest has two network devices, net0 provided by the control domain, and net1 provided by the alternate domain. The process is documented at Configuring IPMP in a Logical Domains Environment.

The following commands are executed in the guest domain to make a redundant network connection:

ldg1# ipadm create-ipmp ipmp0
ldg1# ipadm add-ipmp -i net0 -i net1 ipmp0
ldg1# ipadm create-addr -T static -a ipmp0/v4addr1
ldg1# ipadm create-addr -T static -a ipmp0/v4addr2
ldg1# ipadm show-if
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net1       ip       ok       yes    --
ipmp0      ipmp     ok       yes    net0 net1

This was tested by bouncing the alternate service domain and control domain (one at a time) and noting that network sessions remained intact. The guest domain console displayed messages when one link failed and was restored:

Jul  9 10:35:51 ldg1 in.mpathd[107]: The link has gone down on net1
Jul  9 10:35:51 ldg1 in.mpathd[107]: IP interface failure detected on net1 of group ipmp0
Jul  9 10:37:37 ldg1 in.mpathd[107]: The link has come up on net1

While one of the service domains was down, dladm and ipadm showed link status:

ldg1# ipadm show-if
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net1       ip       failed   no     --
ipmp0      ipmp     ok       yes    net0 net1
ldg1# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         0      unknown   vnet0
net1              Ethernet             down       0      unknown   vnet1
ldg1# dladm show-link
LINK                CLASS     MTU    STATE    OVER
net0                phys      1500   up       --
net1                phys      1500   down     --
When the service domain finished rebooting, the "down" status returned to "up". There was no outage at any time.


This article showed how to configure a T5-8 with an alternate service domain, and define services for redundant I/O access. This was tested by rebooting each service domain one at a time, and observing that guest operation considered without interruption. This is a very powerful Oracle VM Serer for SPARC capability for configuring highly available virtualized compute environments.

October 10, 2014

Darryl GoveOpenWorld and JavaOne slides available for download

October 10, 2014 23:46 GMT

Thanks everyone who attended my talks last week. My slides for OpenWorld and JavaOne are available for download:

October 09, 2014

Joerg MoellenkampEvent announcement - Solaris Lounge: Why Oracle DB 12c runs best on Oracle Systems

October 09, 2014 15:48 GMT
Next week an interesting event takes place in Vienna on October 16th, 2014: "Solaris Lounge: Why Oracle DB 12c runs best on Oracle Systems". I will have two presentations there. The first one is "Why the Oracle Database runs best on SPARC and Solaris" and "LiveDemo: Solaris 11.2 features: Kernel Zones, Unified Archives, SDN, puppet"

Just to cite from the invitation:
This event follows up on the success of the TechDay Vienna event series, this time with emphasis on Oracle Platform advantages for the Oracle Database. We will focus on the practical implementations of the integration between the Database and the Systems layers, discussing the technical background, providing detailed examples as well as live demonstration of the mentioned technologies.

Learn through what methods the right systems and engineering methods can supercharge your environment, find out what unique Oracle Database 12c technologies are available while running Oracle on Oracle, consider virtualization management tools for your IaaS platform and hear customer case studies!
You can view the agenda and the link to register here.

October 08, 2014

Joerg MoellenkampReally interesting week

October 08, 2014 14:20 GMT

October 02, 2014

Garrett D'AmoreSupporting Women in Open Source

October 02, 2014 21:24 GMT
Please have a look at Sage Weil's blog post on supporting the Ada Initiative, which supports women in open source development.

Sage is sponsoring an $8192 matching grant, to support women in open source development of open storage technology.

You may have heard my talk recently, where I expressed that there have been no female contributions to illumos (that includes ZFS by the way!)  This is kind of a tragedy; intelligence and creativity of at least half the population are simply not represented here, and we are worser for it.

If you want to try to do something about it, heres a small thing.  There's a week remaining to do so, so I encourage folks to step up.  ($3392 has already been granted.)

I'm making a donation myself, if you think supporting more women in open source is a worthwhile cause, please join me!

September 28, 2014

Darryl GoveSPARC Processor Documentation

September 28, 2014 21:57 GMT

I'm pretty excited, we've now got documentation up for the SPARC processors. Take a look at the SPARC T4 supplement, the SPARC T4 performance instrumentation supplement, the SPARC M5 supplement, or the familiar SPARC 2011 Architecture.

September 27, 2014

Joerg Moellenkamp2014/7169 aka ShellShock

September 27, 2014 07:44 GMT
I got quite a number of questions regarding ShellShock (also known as CVE 2014/7169 and CVE-2014-6271) from readers in the last days and what they could do about it. To answer this i would like to point to the official blog entry "Security Alert CVE-2014-7169 Released", which in turn points to the advisory. To highlight the urgency of this alert i would just cite a single sentence of the advisory:
Due to the severity, public disclosure, and reports of active exploitation of CVE-2014-7169, Oracle strongly recommends that customers apply the fixes provided by this Security Alert as soon as they are released by Oracle.
For any further question please contact Oracle Support.

September 24, 2014

Jeff SavitOracle VM Server for SPARC Released

September 24, 2014 01:12 GMT
A new maintenance release to Oracle VM Server for SPARC has been released, providing several enhancements described in the What's New page. This update adds support for private VLANs and relieves virtual I/O scalability constraints. This was already announced in the Virtualization Blog, but the I/O scalability improvement deserves further discussion.

Previous blog entries have described scalability improvements that improve virtual disk and network I/O performance. This new update adds scalability in a different context, by increasing the number of virtual I/O devices a domain can have.

Every virtual I/O device requires a Logical Domain Channel (LDC) endpoint. Previous product versions had a limit of 768 LDCs (or 512 on UltraSPARC T2 systems) per domain (not per system) that constrained growth. This set a maximum number of virtual I/O devices in a domain, which impeded migration of large configurations that might have hundreds of disk devices or network connections. While this could be addressed in a number of ways, such as using physical I/O or consolidating many small LUNs onto fewer large LUNs, it was an impediment to adopting Oracle VM Server for SPARC. It especially affected how service domains could be used, since each service domain has LDC endpoints for each of the virtual devices it provides to guests.

With this new update, and with associated system firmware levels, LDC endpoints are arranged into a large pool which can be shared among domains. As described in Using Logical Domain Channels, each domain can have 1,984 LDC endpoints on SPARC T4, SPARC T5, M5, and M6 systems, out of a pool of 98,304 LDC endpoints in total. The required system firmware to support the LDC endpoint pool is 8.5.1.b for SPARC T4 and 9.2.1.b for SPARC T5, SPARC M5, and SPARC M6.

This more than doubles the number of I/O devices available to a guest domain, and can be implemented by installing the current firmware and moving to the Oracle VM Server for SPARC update.

September 23, 2014

Darryl GoveComparing constant duration profiles

September 23, 2014 18:58 GMT

I was putting together my slides for Open World, and in one of them I'm showing profile data from a server-style workload. ie one that keeps running until stopped. In this case the profile can be an arbitrary duration, and it's the work done in that time which is the important metric, not the total amount of time taken.

Profiling for a constant duration is a slightly unusual situation. We normally profile a workload that takes N seconds, do some tuning, and it now takes (N-S) seconds, and we can say that we improved performance by S/N percent. This is represented by the left pair of boxes in the following diagram:

In the diagram you can see that the routine B got optimised and therefore the entire runtime, for completing the same amount of work, reduced by an amount corresponding to the performance improvement for B.

Let's run through the same scenario, but instead of profiling for a constant amount of work, we profile for a constant duration. In the diagram this is represented by the outermost pair of boxes.

Both profiles run for the same total amount of time, but the right hand profile has less time spent in routine B() than the left profile, because the time in B() has reduced more time is spent in A(). This is natural, I've made some part of the code more efficient, I'm observing for the same amount of time, so I must spend more time in the part of the code that I've not optimised.

So what's the performance gain? In this case we're more likely to look at the gain in throughput. It's a safe assumption that the amount of time in A() corresponds to the amount of work done - ie that if we did T units of work, then the average cost per unit work A()/T is the same across the pair of experiments. So if we did T units of work in the first experiment, then in the second experiment we'd do T * A'()/A(). ie the throughput increases by S = A'()/A() where S is the scaling factor. What is interesting about this is that A() represents any measure of time spent in code which was not optimised. So A() could be a single routine or it could be all the routines that are untouched by the optimisation.

September 17, 2014

Jeff SavitIf You're Going to San Francisco... Oracle OpenWorld 2014

September 17, 2014 22:29 GMT

Oracle Virtualization at Oracle OpenWorld

There is a rich set of virtualization sessions at Oracle OpenWorld, with presentations by experts, and with customer experience and insight. That starts with the General Session with Wim Coekaerts, Senior VP of Linux and Virtualization Engineering, on his virtualization strategy and roadmap.

I recommend the sessions on Oracle Virtual Compute Appliance (VCA). I've been working with this product for the past year, and will be presenting at one of the following sessions:

First, there's VCA's product roadmap and cloud implementations - 10:15 am Wednesday, Oct. 1st. Then stay in the same room for Customer Insights, followed by Best Practices for Deploying Oracle Software on VCA. (I'll be presenting at this session along with a customer to discuss their experiences). Especially if you are working with partners, see the session Data Center Optimization with VCA by Centroid (VCA partner) and ITC Holdings (the customer) on Thursday, Oct. 2nd at 10:45 am.

All VCA sessions are in the Intercontinental - Grand Ballroom B.

It won't just be about the Oracle Virtual Compute Appliance, of course. There will be plenty of sessions highlighting developments with Oracle VM on x86 and SPARC. I'll also be doing a session Using Oracle VM VirtualBox as Your Development Platform . So, please, if you're coming to San Francisco for Oracle OpenWorld, be sure to attend these virtualization sessions. Wearing flowers in your hair is completely optional.

September 08, 2014

Garrett D'AmoreModernizing "less"

September 08, 2014 01:31 GMT
I have just spent an all-nighter doing something I didn't expect to do.

I've "modernized" less(1).  (That link is to the changeset.)

First off, let me explain the motivation.  We need a pager for illumos that can meet the requirements for POSIX IEEE 2003.1-2008 more(1).  We have a suitable pager (barely), in closed source form only, delivered into /usr/xpg4/bin/more.  We have an open source /usr/bin/more, but it is incredibly limited, hearkening back to the days of printed hard copy I think.  (It even has Microsoft copyrights in it!)

So closed source is kind of a no go for me.

less(1) looks attractive.  It's widely used, and has been used to fill in for more(1) to achieve POSIX compliance on other systems (such as MacOS X.)

So I started by unpacking it into our tree, and trying to get it to work with an illumos build system.

That's when I discovered the crazy contortions autoconf was doing that basically wound up leaving it with just legacy BSD termcap.   Ewww.   I wanted it to use X/Open Curses.

When I started trying to do that, I found that there were severe crasher bugs in less, involving the way it uses scratch buffer space.  I started trying to debug just that problem, but pretty soon the effort mushroomed.

Legacy less supports all kinds of crufty and ancient systems.   Systems like MS-DOS (actually many different versions with different compiler options!) and Ultrix and OS/2 and OS9, and OSK, etc.  In fact, it apparently had to support systems where the C preprocessor didn't understand #elif, so the #ifdef maze was truly nightmarish.  The code is K&R style C even.

I decided it was high time to modernize this application for POSIX systems.  So I went ahead and did a sweeping update.  In the process I ditched thousands of lines of code (the screen handling bits in screen.c are less than half as big as they were).

So, now it:

There is more work to do in the future if someone wants to.  Here are the ideas for the future:

If someone wants to pick up any of this work, let me know.  I'm happy to advise.  Oh, and this isn't in illumos proper yet.  It's unclear when, if ever, it will get into illumos -- I expect a lot of grief from people who think I shouldn't have forked this project, and I'm not interested in having  a battle with them.  The upstream has to be a crazy maze because of the platforms it has to support.  We can do better, and I think this was a worthwhile case.  (In any event, I now know quite a lot more about less internals than I did before.  Not that this is a good thing.)

September 07, 2014

Steve TunstallVMWare with the ZFSSA

September 07, 2014 16:17 GMT

So we have been saying how well the ZFSSA works in a VM environment for years. We tested and wrote a white paper on VMWare running on the ZFSSA back at Sun Microsystems well before being bought by Oracle. People still assume that now that we are Oracle, we must only work with Oracle's version of vitural machine but not true VMWare... I do hope our presence at VMWorld and this blog can help put those fears to rest. The ZFSSA KILLS the VMWare workload and we fully test and support it.

Check this out... 

Oracle Claims ZFS ZS3 Storage boots 16,000 VMs in under 7 mins., outperforms NetApp’s FAS6000

September 05, 2014

Darryl GoveFun with signal handlers

September 05, 2014 15:00 GMT

I recently had a couple of projects where I needed to write some signal handling code. I figured it would be helpful to write up a short article on my experiences.

The article contains two examples. The first is using a timer to write a simple profiler for an application - so you can find out what code is currently being executed. The second is potentially more esoteric - handling illegal instructions. This is probably worth explaining a bit.

When a SPARC processor hits an instruction that it does not understand, it traps. You typically see this if an application has gone off into the weeds and started executing the data segment or something. However, you can use this feature for doing something whenever the processor encounters an illegal instruction. If it's a valid instruction that isn't available on the processor, you could write emulation code. Or you could use it as a kind of break point that you insert into the code. Or you could use it to make up your own instruction set. That bit's left as an exercise for you. The article provides the template of how to do it.