January 23, 2015

Steve TunstallZFS Improvements

January 23, 2015 14:52 GMT

This is a wonderful article by Roch (rhymes with Spock) that I thought you all may find interesting. Roch's top ten list of how ZFS has improved performance.


January 13, 2015

Darryl GoveMissing semi-colon

January 13, 2015 20:52 GMT

Thought I'd highlight this error message:

class foo

{ }
$ CC -c c.cpp
"c.cpp", line 6: Error: A constructor may not have a return type specification.
1 Error(s) detected.

The problem is that the class definition is not terminated with a semi-colon. It should be:

class foo
};  // Semi-colon

{ }

January 07, 2015

Darryl GoveBehaviour of std::list::splice in the 2003 and 2011 C++ standards

January 07, 2015 17:17 GMT

There's an interesting corner case in the behaviour of std::list::splice. In the C++98/C++03 standards it is defined such that iterators referring to the spliced element(s) are invalidated. This behaviour changes in the C++11 standard, where iterators remain valid.

The text of the 2003 standard (section, p2, p7, p12) describes the splice operation as "destructively" moving elements from one list to another. If one list is spliced into another, then all iterators and references to that list become invalid. If an element is spliced into a list, then any iterators and references to that element become invalid, similarly if a range of elements is spliced then iterators and references to those elements become invalid.

This is changed in the 2011 standard (section, p2, p4, p7, p12) where the operation is still described as being destructive, but all the iterators and references to the spliced element(s) remain valid.

The following code demonstrates the problem:

#include <list>
#include <iostream>

int main()
  std::list<int> list;
  std::list<int>::iterator i;

  list.insert(i,4); // i points to end
  // list contains 5 10 3 4
  i--; // i points to 4
  i--; // i points to 3
  i--; // i points to 10

  std::cout << " List contains: ";
  for (std::list<int>::iterator l=list.begin(); l!=list.end(); l++)
    std::cout << " >" << *l << "< ";
  std::cout << "\n element at i = " << *i << "\n";

  std::list<int>::iterator element;
  element = list.begin();
  element++; // points to 10
  element++; // points to 3
  std::cout << " element at element = " << *element << "\n";

  list.splice(i,list,element); // Swap 10 and 3

  std::cout << " List contains :";
  for (std::list<int>::iterator l=list.begin(); l!=list.end(); l++)
    std::cout << " >" << *l << "< ";

  std::cout << "\n element at element = " << *element << '\n';
  element++; // C++03, access to invalid iterator
  std::cout << " element at element = " << *element << '\n';

When compiled to the 2011 standard the code is expected to work and produce output like:

 List contains:  >5<  >10<  >3<  >4<
 element at i = 10
 element at element = 3
 List contains : >5<  >3<  >10<  >4<
 element at element = 3
 element at element = 10

However, the behaviour when compiled to the 2003 standard is indeterminate. It might work - if the iterator happens to remain valid, but it could also fail:

 List contains:  >5<  >10<  >3<  >4<
 element at i = 10
 element at element = 3
 List contains : >5<  >3<  >10<  >4<
 element at element = 3
 element at element = Segmentation Fault (core dumped)

January 06, 2015

Bryan CantrillPredicteria 2015

January 06, 2015 22:14 GMT

Fifteen years ago, I initiated a time-honored tradition among my colleagues in kernel development at Sun: shortly after the first of every year, we would get together at our favorite local restaurant to form predictions for the coming year. We made one-year, three-year and six-year predictions for both our technologies and more broadly for the industry. We did this for nine years running — from 2000 to 2008 inclusive — and came to know the annual ritual as a play on the restaurant name: Predicteria.

I have always been in interested in our past notions of the future (hoverboards and self-lacing shoes FTW!), and looking back now at nearly a decade of our predictions has led me to an inescapable (and perhaps obvious) conclusion: predictions tell you more about the present than the future. That is, predictions reflect the zeitgeist of the day — both in substance and in tone: in good years, people predict halcyon days; in bad ones, the apocalypse. And when a particular company or technology happened to be in the news or otherwise on the collective mind, predictions tended to be centered around it: it was often the case that several people would predict that a certain company would be acquired or that a certain technology would flourish — or perish. (Let the record reflect that the demise of Itanium was accurately predicted many times over.)

Which is not to say that we never made gutsy predictions; in 2006, a colleague made a one-year prediction that “GOOG embarrassed by revelation of unauthorized US government spying at Gmail.” The timing may have been off, but the concern was disturbingly prescient. Sometimes the predictions were right, but for the wrong reasons: in 2003, one of my three-year predictions was that “Apple develops new ‘must-have’ gadget called the iPhone, a digital camera/MP3 player/cell phone.” This turned out to be stunningly accurate, even down to the timing (and it was by far my most accurate big prediction over the years), but if you can’t tell by the snide tone, I thought that such a thing would be Glass-like in its ludicrousness; I had not an inkling as to its transformative power. (And indeed, when the iPhone did in fact emerge a few years later, several at Predicteria predicted that it would be a disappointing flop.)

But accurate predictions were the exception, not the rule; our predictions were usually wrong — often wildly so. Evergreen wildly wrong predictions included: the rise of carbon nanotube-based memory, the relevance of quantum computing, and the death of tape, disk or volatile DRAM (each predicted several times over). We were also wrong by our omissions: as a group, we entirely failed to predict cloud computing — or even the rise of hardware-based virtualization.

I give all of this as a backdrop to some predictions for the coming year. If my experience taught me anything, it’s that these predictions may very well be right on trajectory, but wrong on timing — and that they may well capture current thinking more than they meaningfully predict the future. They also may be (rightfully) criticized for, as they say, talking our book — but we have made our bets based on where we think things are going, not vice versa. And finally, I apologize that these are somewhat milquetoast predictions; I’m afraid that practical concerns muffle the gutsy predictions that name names and boldly predict their fates!

Without further ado, looking forward to 2015:

Right or wrong, these predictions point to an exciting 2015. And if nothing else you can rely on my for a candid self-assessment of my predictions — you’ll just need to wait fifteen years or so!

Darryl GoveNew articles about Solaris Studio

January 06, 2015 21:26 GMT

We've started posting new articles directly into the communities section of the Oracle website. If you're not familiar with this location, it's also where you can post questions on languages or tools.

With the change it should be easier to find articles relevant to developers, and it should be easy to comment on them. So hopefully this works out well. There's currently three articles listed on the content page. I've already posted about the article on the Performance Analyzer Overview screen, so I'll quickly highlight the other two:

Darryl GoveThe Performance Analyzer Overview screen

January 06, 2015 21:12 GMT

A while back I promised a more complete article about the Performance Analyzer Overview screen introduced in Studio 12.4. Well, here it is!

Just to recap, the Overview screen displays a summary of the contents of the experiment. This enables you to pick the appropriate metrics to display, so quickly allows you to find where the time is being spent, and then to use the rest of the tool to drill down into what is taking the time.

January 03, 2015

Bryan Cantrill2014 in review: Docker rising

January 03, 2015 00:03 GMT

When looking back on 2014 from an infrastructure perspective, it’s hard not to have one word on the lips: Docker. (Or, as we are wont to do in Silicon Valley when a technology is particularly hot, have the same word on the lips three times over à la Gabbo: “Docker, Docker, DOCKER!”) While Docker has existed since 2013, 2014 was indisputably the year in which it transcended from an interesting project to a transformative technology — a shift which had profound ramifications for us at Joyent.

The enthusiasm for Docker has been invigorating: it validates Joyent’s core hypothesis that OS-based virtualization is the infrastructure substrate of the future. That said, going into 2014, there was also a clear impedance mismatch: while Docker was refreshingly open to being cross-platform, the reality is that it was being deployed exclusively on Linux — and that the budding encyclopedia of Docker images was exclusively Linux-based. Our operating system, SmartOS, is an illumos derivative that it many ways is similar to Linux (they’re both essentially Unix, after all), but it’s also different enough to be an impediment. So the arrival of Docker in 2013 left us headed into 2014 with a kind of dilemma: how can we enable Docker on our proven SmartOS-based substrate for OS containers while still allowing existing Linux-based images to function?

Into this quandary came a happy accident: David Mackay, an illumos community member, revived lx branded zones, work that had been explored some number of years ago to execute complete Linux binary environments in an illumos zone. This work was so old that, frankly, we didn’t feel it was likely to be salvageable — but we were pleasantly surprised when it seemed to still function for some modern binaries. (If it needs to be said, this is yet another example of why we so fervently believe in open source: it allows for others to explore ideas that may seem too radical for commercial entities with mortgages to pay and mouths to feed.)

Energized by the community, Joyent engineer Jerry Jelinek went to work in the spring, bolstering the emulation layer and getting it to work with progressively more and more modern Linux systems. By late summer, 32-bit was working remarkably well on Ubuntu 14.04 (an odyssey that I detailed in my illumos day Surge presentation) and we were ready to make an attempt at the summit: 64-bit Linux emulation. Like much bringup work, the 64-bit work was excruciating because it was very hard to forecast: you can be one bug away from a functioning system or a hundred — and the only way to really know is to grind through them all. Fortunately, we are nothing if not persistent, and by late fall we had 64-bit working on most stuff — and thanks to early adopter community members like Jorge Schrauwen, we were able to quickly find increasingly obscure software to validate it against. (Notes to self: (1) “Cabal hell” is a thing and (2) I bet HHVM is unaware of the implicit dependency they have on Linux address space layout.)

With the LX branded zone work looking very promising, Joyent engineer Josh Wilsdon led a team studying Docker to determine the best way to implement it on SmartOS for our orchestration software, SmartDataCenter. In doing this, we learned about a great Docker strength: its remote API. This API allows us to do exactly what robust APIs have allowed us to do for time immemorial: replace one implementation with a different one without breaking upstack software. Implementing a Docker API endpoint would also allow for a datacenter-wide Docker view that would solve many other problems for us as well; in late autumn, we set out building sdc-docker, a Docker engine for SDC that we have been developing in the open. As with the LX branded zone work, we are far enough along to validate the approach: we know that we can make this work.

In parallel to these two bodies of work, a third group of Joyent engineers led by Robert Mustacchi was tackling a long-standing problem: extending the infrastructure present in SmartOS for robust (and secure!) network virtualization for OS containers to the formation of virtual layer two networks that can span an entire datacenter (that is, finally breaking the shackles of .1q VLANs). We have wanted to do this for quite some time, but the rise of Docker has given this work a new urgency: of the Linux problems with respect to OS-based containers, network virtualization is clearly among the most acute — and we have heard over and over again that it has become an impediment to Docker in production. Robert and team have made great progress and by the end of 2014 had the first signs of life from the SDC integration point for this work.

The SmartDataCenter-based aspects of our Docker and network virtualization work embody an important point of distinction: while OpenStack has been accused of being “a software particle-board designed by committee”, SDC has been deliberately engineered based on our experience actually running a public cloud at scale. That said, OpenStack has had one (and arguably, only one) historic advantage: it is open source. While the components of SDC (namely, SmartOS and node.js) have been open, SDC itself was not. The rise of Docker — and the clear need for an open, container-based stack instead of some committee-designed VMware retread — allowed us to summon the organizational will to take an essential leap: on November 6th, we open sourced SDC and Manta.

Speaking of Manta: with respect to containers, Joyent has been living in the future (which, in case it sounds awesome, is actually very difficult; being ahead of the vanguard is a decidedly mixed blessing). If the broader world is finally understanding the merits of OS-based virtualization with respect to standing compute, it still hasn’t figured out that it has profound ramifications for scale-out storage. However, with the rise of Docker in 2014, we have more confidence than ever that this understanding will come in due time — and by open sourcing Manta we hope to accelerate it. (And certainly, you can imagine that we’ll help connect the dots by allowing Manta jobs to be phrased as Docker containers in 2015.)

Add it all up — the enthusiasm for Docker, the great progress of the LX-branded zone work, the Docker engine for SDC, the first-class network virtualization that we’re building into the system — and then give it the kicker of an entirely open source SmartDataCenter and Manta, and you can see that it’s been a hell of a 2014 for us. Indeed, it’s been a hell of a 2014 for the entire Docker community, and we believe that Matt Asay got it exactly right when he wrote that “Docker, hot as it was in 2014, will be even hotter in 2015.”

So here’s to a hot 2014 — and even hotter 2015!

December 27, 2014

Adam LeventhalDTrace OEL Dynamic Language Support

December 27, 2014 20:27 GMT

We built DTrace to solve problems; at the start, the problems we understood best were our own. In the Solaris Kernel Group we started by instrumenting the kernel and system calls, the user/kernel boundary. Early use required detailed knowledge of kernel internals. As DTrace use grew—within the team, in Sun and then beyond—we extended DTrace to turn every function and every instruction in user programs into probes. We added stable points of instrumentation both in the kernel and in user-land so that no deep knowledge of program or kernel internals would be required.

Oracle has been evolving their port of DTrace to OEL, prioritizing the stable points of instrumentation most relevant for the widest group of users. While DTrace started with providers that unlocked tens of thousands of points of instrumentation, the Oracle port enables a small number of comprehensible probes. Since I last tried out their port they’ve fixed some bugs, and added support for stable I/O and process probes, as well as user-land static probes.

[root@screven ~]# uname -a
Linux screven 3.8.13-16.el6uek.x86_64 #1 SMP Fri Sep 20 11:54:42 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@screven ~]# cat test.d
provider test {
        probe foo(int);
[root@screven ~]# cat main.c
#include "test.h"

main(int argc, char **argv)
        return (0);
[root@screven ~]# dtrace -h -s test.d
[root@screven ~]# gcc -c main.c
[root@screven ~]# dtrace -G -s test.d main.o
[root@screven ~]# gcc -o main main.o test.o
[root@screven ~]# dtrace -c ./main -n 'test$target:::foo{ trace(arg0); }'
dtrace: description 'test$target:::foo' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    643                         main:foo               100

USDT, as it’s called, was a relatively late addition in the initial development of DTrace. We added it initially to support probes in user-land locking primitives (the plockstat(1M) command uses it just as the lockstat(1M) command was converted to use kernel SDT probes). We were right in thinking that USDT would be useful for providing probes in infrastructure software such as Apache and MySQL; we didn’t anticipate how incredibly valuable it would be for supporting dynamic languages such as javascript (including Node), python, java, and bash.

USDT built on both the learning and code from years of DTrace development. By effectively starting there, OEL benefits from a decade of integrations and investigations. DTrace users on all platforms will benefit from the growth of our community. I look forward to seeing the new investigations on OEL and new integrations in all types of applications.

December 24, 2014

Adam LeventhalDelphix Week of Giving

December 24, 2014 22:09 GMT

In the frenzied, insular world of a Silicon Valley startup it can be easy to lose perspective on the broader community in which we live and work. Among the great hackathon projects to come from our bi-annual engineering event was the idea of “Angel Sharks”, a group of volunteers at Delphix who provide opportunities for volunteering and community giving. Earlier this year, this group organized volunteer events around the launch of new Delphix releases.

We just completed our first “Week of Giving”. While many at Delphix already donate their time and money, the Angel Sharks organized giving and corporate matching. Our theme for 2014 was hunger; we focused on the SF-Marin Food Bank as our featured organization.

Over 50% of Delphix employees participated worldwide; a high bar that I’d like to see us exceed next year. Some activities of note were volunteering at food banks in the SF Bay Area, Atlanta and Boulder, toy donations to Toys for Tots, the Salvation Army Giving Tree, and the Starlight Foundation, and a silent auction that both brought the Delphix community closer together and raised over $3,000 for the SF-Marin Food Bank. More than $21,000 was raised in total with 30% of employees making matching requests in just three weeks! The Week of Giving brought a great energy and community spirit to the company; I’m excited to have giving as part of our DNA as a young company.

The SF-Marin Food Bank feeds 225,000 people annually with 47m lbs of food, and 96% of donations go directly to their programs. Donations are down for the year while need has increased by 1m lbs. You can donate here. I volunteered twice this year with my Delphix colleagues, and once with my wife and son (8 years old); I highly recommend it for both corporate and family outings.

Happy holidays from the Delphix family!


December 22, 2014

Marcelo LealPackt $5 eBook Bonanza!

December 22, 2014 14:16 GMT

Hi there! No time, no see… but I’m back for a good reason! Packt publishing released its SUPER PROMO: $5 E-Book Bonanza! This means that you can buy many titles about many topics for just $5! It´s a festive campaign that will go from 18th December to 6th January. Don’t...
Read more »

December 15, 2014

Joerg MoellenkampRegistration for Solaris Tech Day on January 13th 2015 in Cologne online

December 15, 2014 13:47 GMT
The registration for the "Oracle Solaris TechDay: Sharing Experiences, Engineering Insights and Outlook"-Event is now online. So you can now register for the event. I think it's really interesting in order to learn about the new stuff in Solaris and where the operating environment is heading to.

PS: The headline initially stated "February 13th". This is incorrect. It's January 13th.

December 12, 2014

Joerg MoellenkampAgenda for Solaris Tech Day in Cologne on January 13th, 2015

December 12, 2014 20:04 GMT
I still don't have the registration page for the Solaris Tech Day in Cologne, but my colleague Franz Haberhauer already put the agenda online for the event in the "Solarium Blog", that takes place in the Maritim Hotel Köln (Heumarkt 20, 50667 Köln). The agenda is as following:

Time Theme
09:00 Registration and Coffee
09:45 Welcome & Introduction
Franz Haberhauer, Chief Technologist
Markus Flierl, VP Software Development
09:55 OpenStack
Eric Saxe, Director Software Development
Joost Pronk van Hoogeven, Senior Principal Product Strategy Manager
11:10 Coffee
11:30 Software Defined Networking
Jörg Möllenkamp, Senior Account Architect
12:15 Reduce Risk , Deliver Secure Services, and Monitor Compliance with Solaris Security Technologies
Darren Moffat, Senior Principal Software Engineer
13:00 Lunch
13:50 Solaris 11.2 Server Virtualization
Duncan Hardie, Principal Product Manager
Bart Smalders, Senior Principal Software Engineer

14:35 Solaris Data Management – Local and in the Cloud
Cindy Swearingen, Product Manager
Thomas Nau, University of Ulm
15:20 Coffee
15:40 Solaris 11.2 Provisioning and SMF – Completing the Vision with Unified Archives and First Boot Services
Bart Smalders, Senior Principal Software Engineer
Liane Praza, Senior Principal Software Engineer
16:25 Oracle Solaris Update and Strategy
Markus Flierl, VP Software Development
17:10 Q&A panel - All presenters and Solaris engineers
17:30 End of Public Event
Presenters and Engineers Available for Personal Discussions

This should be really an interesting event. So please block January 13th. I will post the link to the registration as soon as the link is operational.

December 11, 2014

Darryl GoveChecking whether hardware supports crypto instructions

December 11, 2014 18:12 GMT

A quick example of how to tell if the machine that you're running on supports crypto instructions.

The 2011 SPARC Architecture manual tells you to read the cfr register before using the instruction. The cfr register contains a bit for every implemented crypto instruction. However, the cfr register is not implemented on all processors. So you would need to check whether this register is implemented before reading it....

So there has to be a better way. Fortunately, Solaris implements a getisax() call which provides this information without the user needing to muck around with the low level details. The following code shows how this call can be used to check whether the AES instruction is implemented or not:

#include <sys/auxv.h>
#include <stdio.h>

void main()
  unsigned int array[10];
  unsigned int count = getisax(array,10);
  if (count>0)
    printf(" AES: ");
    if (array[0] & AV_SPARC_AES) { printf("Yes\n"); } else { printf("No\n"); }
    printf("Error: getisax() call returned no results\n");

December 03, 2014

Joerg MoellenkampEvent accouncement - Solaris Tech Day in Cologne on January 13th, 2015

December 03, 2014 14:51 GMT
On January 13th, 2015 (yeah, it's really that late in this year that we are talking about schedules in the next year ... time flies like an arrow, fruit flies like a banana) there will be an Solaris Tech Day in Cologne. A number of colleagues from Solaris Engineering and Solaris Product Management are in Germany and thus the opportunity should be used. Just reserve January 13th at the moment, more information will follow. There is a blog entry with a a few additional information in german language in the Solarium Blog.

Joerg MoellenkampRoch Bourbonnais about Performance Improvements to ZFS.

December 03, 2014 09:14 GMT
Roch Bourbonnais started a series of blog articles about changes to ZFS in oder to improve performance with his article "ZFS Performance boosts since 2010". He published a first article out of this series already, it is about reARC , a major rearchitecture of the subystem that manages ZFS in-memory cache along with its interface to the DMU.

December 01, 2014

Joerg MoellenkampIPS with CVE numbers

December 01, 2014 09:25 GMT
A few days ago, Darren Moffat wrote an interesting article about the inclusion of CVE numbers in the IPS packages. You can read the article here. I just want to give a short example by citing Darren. For more information, just go to his blog post.

If we simply want to know if the fix for a given CVE-ID is installed the using 'pkg search -l' with the CVE-ID is sufficent eg:

# pkg search -l CVE-2014-7187
info.cve set CVE-2014-7187 pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1

Joerg MoellenkampEvent accouncement - Oracle Business Breakfast - "Service Managment Facility"

December 01, 2014 08:53 GMT
As this event is in Germany and in german language, i will proceed in the respective language:

Am 16. Dezember 2014 findet in Düsseldorf noch einmal ein Business Breakfast statt. Das Thema ist neben den Neuheiten von Oracle eine Einführung in die Service Management Facility. Ersteres wird durch meinen Kollegen Michael Färber vorgetragen, letzteres werde ich vortragen. Anmelden könnt ihr euch unter diesem Link.
Die Service Management Facility (SMF) von Solaris, obschon seit Version 10 enthalten, ist für die meisten Kunden immer noch ein Feld, das recht selten betreten wird und oft mit dem Schreiben eines init.d-Scripts umgangen wird. Dadurch verliert man jedoch Funktionalität. Dieses Frühstück will noch mal die Grundlagen der SMF aufrischen, Neuheiten erläutern, die in SMF dazu gekommen sind, Tipps und Tricks zur Arbeit mit SMF geben und einige eher selten damit in Verbindung gebrachte Features erläutern. So wird auch die Frage geklärt, was es mit dem /system/contract-mountpoint auf sich hat und wie man das dahinterstehende Feature auch ausserhalb des SMF gebrauchen kann

Insbesondere werde ich auf das neue Solaris 11.2 Feature der SMF-Stencils eingehen, das vielen noch unbekannt ist.

November 28, 2014

Joerg MoellenkampNext try ... Event Announcement: Business Breakfast "Erste Praxiserfahrungen mit Solaris 11.2" in Hamburg am 18. Dezember 2014

November 28, 2014 13:37 GMT
Leider musste ja das Event am 6. November abgesagt werden, weil ich krank wurde. Jetzt daher der Nachholtermin: Am Donnerstag, den 18. Dezember 2014 findet in der Oracle Geschäftsstelle in Hamburg wieder unser Business Breakfast statt. Diesmal steht die Veranstaltung unter dem Motto: "Erste Praxiserfahrungen mit Solaris 11.2". Die Veranstaltung beginnt um 9:30 Uhr und endet gegen 13:30 Uhr.

Ich werde in diesem Vortrag über folgende Bereiche berichten:Anmeldungen laufen diesmal etwas anders. Bitte eine eMail an diese Mailaddresse schicken. Das ist ein Weiterleiter an den organisierenden Kollegen, damit dessen Emailaddresse nicht für Spammer spiderbar hier im Artikel steht.

November 19, 2014

Darryl GoveWriting inline templates

November 19, 2014 17:48 GMT

Writing some inline templates today... I've written about doing this kind of stuff in the past here and, in more detail, here.

I happen to need to pass a bundle of parameters on to the routine. The best way of checking how the parameters will be passed is to get the compiler to provide some initial template. Here's an example routine:

int parameters (int p0, int * p1, int * p2, int* p3, int * p4, int * p5, int * p6, int p7)
  return p0 + *p1 + *p2 + *p3 + *p4 + ((*p5)<<2) + ((*p6)<<3) + p7*p7;

In the routine I've tried to handle some of the parameters differently. I know that the first parameters get passed in registers, and then the later ones get passed on the stack. By handling them differently I can work out which loads from the stack correspond to which variables. The disassembly looks like:

-bash-4.1$ cc -g -O parameters.c -c
-bash-4.1$ dis -F parameters parameters.o
disassembly for parameters.o

    parameters:             ca 02 60 00  ld        [%o1], %g5
    parameters+0x4:         c4 02 e0 00  ld        [%o3], %g2
    parameters+0x8:         c2 02 a0 00  ld        [%o2], %g1
    parameters+0xc:         c6 03 a0 60  ld        [%sp + 0x60], %g3  // load of p7
    parameters+0x10:        88 02 00 05  add       %o0, %g5, %g4
    parameters+0x14:        d0 03 60 00  ld        [%o5], %o0
    parameters+0x18:        ca 03 20 00  ld        [%o4], %g5
    parameters+0x1c:        92 00 80 01  add       %g2, %g1, %o1
    parameters+0x20:        87 38 e0 00  sra       %g3, 0x0, %g3
    parameters+0x24:        82 01 00 09  add       %g4, %o1, %g1
    parameters+0x28:        d2 03 a0 5c  ld        [%sp + 0x5c], %o1 // load of p6
    parameters+0x2c:        88 48 c0 03  mulx      %g3, %g3, %g4     // %g4 = %g3*%g3
    parameters+0x30:        97 2a 20 02  sll       %o0, 0x2, %o3
    parameters+0x34:        94 00 40 05  add       %g1, %g5, %o2
    parameters+0x38:        da 02 60 00  ld        [%o1], %o5       
    parameters+0x3c:        84 02 c0 0a  add       %o3, %o2, %g2
    parameters+0x40:        99 2b 60 03  sll       %o5, 0x3, %o4     // %o4 = %o5<<3
    parameters+0x44:        90 00 80 0c  add       %g2, %o4, %o0
    parameters+0x48:        81 c3 e0 08  retl
    parameters+0x4c:        90 02 00 04  add       %o0, %g4, %o0

November 13, 2014

Garrett D'AmoreA better illumos...

November 13, 2014 17:33 GMT
If you follow illumos very closely, you may already know some of this.

A New Fork

Several months ago, I forked illumos-gate (the primary source code repository for the kernel and system components of illumos) into illumos-core.

I had started upstreaming my work from illumos-core into illumos-gate.  I've since ceased that effort, largely because I simply have no time for the various arguments that my work often generates.  I think this is largely because my vision for illumos is somewhat different from that of other folks, and sadly illumos proper lacks anything resembling a guiding vision now, which means that only entirely non-contentious changes can get integrated into illumos.

However, I still want to proceed apace with illumos-core, because I believe that work has real value, and I firmly believe that my vision for illumos is the one that will lead to greater adoption by users, and by distributors as well, since much of what I'm trying to achieve in illumos-gate is aimed at reducing barriers to adoption and to developers both of illumos itself and of systems that want to build on top of or integrate illumos.  (An example of reducing barriers to adoption -- I recently implemented a BSD compatible flock() within libc, which is sometimes used by applications developed for BSD or Linux.)

Relationship to Upstream

I do also invite other parties to cherry-pick from illumos-core into illumos-gate.  I suspect that a large number of the enhancements I've made, such as the support for the fexecve() function specified by POSIX 2008, are likely to be more widely useful.  Within illumos-core, I want to retain a high standard of quality, and facilitate the effort of upstreaming for those who want to make the effort to do so.

I do want to reiterate that unlike other projects that have forked from illumos, it is not my intent to divorce myself from the community -- rather I see this illumos-core as an experimental branch aimed at exploring new directions that I ultimately hope will be embraced by the wider illumos community some day; by doing this in a separate repository/branch/fork, illumos-core can drive towards these goals without getting mired in questions that would prevent progress on these goals within illumos-gate proper.

The focus here is on delivery, rather than on discussion.  (In fact, one of my taglines on social media has for many years been "Code first, questions later."  The illumos-core effort represents a return to that core value.)

Call for Participation

I'm also interested in having co-collaborators on this project.  The goals are large, and while I hope to achieve them someday even if I have to do it all myself, I'm certain that the project will move quite a lot faster with help.  Also, because of our lack of bureaucracy, I hope that illumos-core can be an easier path to integration than illumos-gate.  I just use a simple github pull-request for integration at present.

There is an opportunity for folks at all different technical levels to participate.  We need work that involves systems programming, but also there is work around documentation, research, shell scripting, test development and release engineering to be performed.  I'm happy to mentor folks who want to help out, based on their skill level.

And, of course, for folks who want to focus primarily on improving illumos-gate upstream, there is effort that could be spent to figure out what to cherry-pick and to do the various illumos-gate process wrangling steps to get those bits integrated.

Darryl GoveSoftware in Silicon Cloud

November 13, 2014 16:00 GMT

I missed this press release about Software in Silicon Cloud. It's the announcement for a service where you can try out a SPARC M7 processor. There's an accompanying website which has the sign up plus some more information about the service.

What's particularly exciting is that it talks a bit more about Application Data Integrity (ADI). Larry Ellison called this "the most important piece of engineering we’ve done in a long, long time.".

Incorrect handling of pointers is a large contributor to bugs in software. ADI tackles this by making the hardware check that the pointer being used is valid for the region of memory it is pointing to. If it's not valid the hardware flags it as an error. Since it's done by hardware, there's minimal performance impact - it's at hardware speed, so developers can check their application in realtime.

There's a nice demo of how ADI protects against exploits like HeartBleed.

November 12, 2014

Darryl GoveOracle Solaris Studio playlist

November 12, 2014 16:00 GMT

There's an extensive list of Solaris Studio videos on youtube. In particular there's a bunch of tutorials covering the features of the IDE. The IDE doesn't often get the attention it deserves. It's based off NetBeans, and is full of useful code refactoring tools, navigation tools, etc. To find out more, take a look at some of the videos.

Darryl GoveNew Performance Analyzer Overview screen

November 12, 2014 00:20 GMT

I love using the Performance Analyzer, but the question I often get when I show it to people, is "Where do I start?". So one of the improvements in Solaris Studio 12.4 is an Overview screen to help people get started with the tool. Here's what it looks like:

The reason this is important, is that many applications spend time in various place - like waiting on disk, or in user locks - and it's not always obvious where is going to be the most effective place to look for performance gains.

The Overview screen is meant to be the "one-stop" place where people can find out what their application is doing. When we put it back into the product I expected it to be the screen that I glanced at then never went back to. I was most surprised when this turned out not to be the case.

During performance analysis, I'm often exploring different ideas as to where it might be possible to get performance improvements. The Overview screen allows me to select the metrics that I'm interested in, then take a look at the resulting profiles. So I might start with system time, and just enable the system time metrics. Once I'm done with that, I might move on to user time, and select those metrics. So what was surprising about the Overview screen was how often I returned to it to change the metrics I was using.

So what does the screen contain? The overview shows all the available metrics. The bars indicate which metrics contribute the most time. So it's easy to pick (and explore) the metrics that contribute the most time.

If the profile contains performance counter metrics, then those also appear. If the counters include instructions and cycles, then the synthetic CPI/IPC metrics are also available. The Overview screen is really useful for hardware counter metrics.

I use performance counters in a couple of ways: to confirm a hypothesis about performance or to estimate time spent on a type of event. For example, if I think a load is taking a lot of time due to TLB misses, then profiling on the TLB miss performance counter will tell me whether that load has a lot of misses or not. Alternatively, if I've got TLB miss counter data, then I can scale this by the cost per TLB miss, and get an estimate of the total runtime lost to TLB misses.

Where the Overview screen comes into this is that I will often want to minimise the number of columns of data that are shown (to fit everything onto my monitor), but sometimes I want to quickly enable a counter to see whether that event happens at the bit of code where I'm looking. Hence I end up flipping to the Overview screen and then returning to the code.

So what I thought would be a nice feature, actually became pretty central to my work-flow.

I should have a more detailed paper about the Overview screen up on OTN soon.

November 11, 2014

Darryl GovePerformance made easy

November 11, 2014 22:47 GMT

The big news of the day is that Oracle Solaris Studio 12.4 is available for download. I'd like to thank all those people who tried out the beta releases and gave us feedback.

There's a number of things that are new in this release. The most obvious one is C++11 support, I've written a bit about the lambda expression support, tuples, and unordered containers.

My favourite tool, the Performance Analyzer, has also had a bit of a facelift. I'll talk about the Overview screen in a separate post (and in an article), but there's some other fantastic features. The syntax highlighting, and hyperlinking, has made navigating profiles much easier. There's been a large number of improvements in filtering - a feature that's been in the product a long time, but these changes elevate it to being much more accessible (an article on filtering is long overdue!). There's also the default hardware counters - which makes it a no-brainer to get hardware counter data, which is really helpful in understanding exactly what an application is doing.

Over the development cycle I've made much use of the other tools. The Thread Analyzer for identifying data races has had some improvements. The Code Analyzer tools have made some great gains in rapidly identifying potential coding errors. And so on....

Anyway, please download the new version, try it out, try out the tools, and let us know what you think of it.

November 06, 2014

Steve TunstallNew Logzilla Drives for your ZFSSA

November 06, 2014 16:58 GMT

Yes, the new, larger Logzilla SSD drives for your ZFSSA systems are now out. They are 200GB usable, up from the 73GB usable drives. 

Yes, you will sometimes see them referred to in some marketing literature as 400GB. This is because there is extra room in enterprise SSD chips to allow for cell burnout and keep their 5 years lifetime. Make no mistake, they will give you 200GB of actually capacity in the ZFSSA systems.

Yes, they are compatible with the current 73GB version. You can mix and match. The one thing to look out for is in a 'mirrored' log profile. If you mix a new one with an old one in a mirrored log profile, then the new one will size down to 73GB to match it. In a striped profile, it doesn't matter, nor will it matter if you have 2 or more of each.

One last thing-- They are almost twice as fast as the older 73GB version. If you mix them, you will get faster, but not as fast as if you had all 200GB versions. Diminishing returns. Talk to your local SC on whether your Lozgilla workload is so great that either adding some new ones or even changing out your old ones would help your performance. Not every workload needs Logzillas, but there are built-in analytics that can tell us if yours is a good fit.


November 05, 2014

Joerg MoellenkampCancelation: Business Breakfast "Erste Praxiserfahrungen mit Solaris 11.2" in Hamburg am 6. November 2014

November 05, 2014 21:51 GMT
The event tomorrow is canceled because of the illness of the presentator (me, i got a bad cold in my vacation). I will keep you updated about a new schedule.

November 04, 2014

Darryl GoveSPARC Software in Silicon

November 04, 2014 17:48 GMT

Short video by Juan Loaiza about the Software in Silicon work in the upcoming SPARC processor.

Bryan CantrillSmartDataCenter and Manta are now open source

November 04, 2014 00:16 GMT

Today we are announcing that we are open sourcing the two systems at the heart of our business: SmartDataCenter and the Manta object storage platform. SmartDataCenter is the container-based orchestration software that runs the Joyent public cloud; we have used it for the better half of a decade to run on-the-metal OS containers — securely and at scale. Manta is our multi-tenant ZFS-based object storage platform that provides first-class compute by allowing OS containers to be spun up directly upon objects — effecting arbitrary computation at scale without data movement. The unifying technological foundation beneath both SmartDataCenter and Manta is OS-based virtualization, a technology that Joyent pioneered in the cloud way back in 2006. We have long known the transformative power of OS containers, so it has been both exciting and validating for us to see the rise of Docker and the broadening of appreciation for OS-based virtualization. SmartDataCenter and Manta show that containers aren’t merely a fad or developer plaything but rather a fundamental technological advance that represents the foundation for the next generation of computing — and we believe that open sourcing them advances the adoption of container-based architectures more broadly.

Without any further ado — and to assure that we don’t fall into the most prominent of my own corporate open source anti-patterns — here is the source for SmartDataCenter and the source for Manta. These are sophisticated systems with many moving parts, and you’ll see that these two repositories are in fact meta-repositories that explain the design of each of the systems and then point to the (many) components that comprise them (all now open source, natch). We believe that some of these subcomponents will likely find use entirely outside of SDC and Manta. For example, Manatee is a ZooKeeper-based system that manages Postgres replication and automates failover; Moray is a key-value service that lives on top of Postgres. Taken together, Manatee and Moray implement a highly-available key-value service that we use as the foundation for many other components in SDC and Manta — and one that we think others will find useful as well.

In terms of source code mechanics, you’ll see that many of the components are implemented in either node.js or by extending C-based systems. This is not by fiat but rather by the choices of individual engineers; over the past four years, as we learned about the nuances of node.js error handling and as we invested heavily in tooling for running node.js in production, node.js became the right tool for many of our jobs — and we used it for many of the services that constitute SDC and Manta.

And because any conversation about open source has to address licensing at some point or another, let’s get that out of the way: we opted for the Mozilla Public License 2.0. While relatively new, there is a lot to like about this license: its file-based copyleft allows it to be proprietary-friendly while also forcing certain kinds of derived work to be contributed back; its explicit patent license discourages litigation, offering some measure of troll protection; its explicit warranting of original work obviates the need for a contributor license agreement (we’re not so into CLAs); and (best of all, in my opinion), it has been explicitly designed to co-exist with other open source licenses in larger derived works. Mozilla did terrific work on MPL 2.0, and we hope to see it adopted by other companies that share our thinking around open source!

In terms of the business ramifications, at Joyent we have long been believers in open source as a business model; as the leaders of the node.js and SmartOS projects, we have seen the power of open source to start new conversations, open up new markets and (importantly) yield new customers. Ten years ago, I wrote that open source is “a loss leader — minus the loss, of course”; after a decade of experience with open source business models, I would add that open source also serves as sales outreach without cold calls, as a channel without loss of margin, and as a marketing campaign without advertisements. But while we have directly experienced the business advantages of open source, we at Joyent have also lived something of a dual life: node.js and SmartOS have been open source, but the distributed systems that we have built using these core technologies have remained largely behind our walls. So that these systems are now open source does not change the fundamentals of our business model: if you would like to consume SmartDataCenter or Manta as a service, you can spin up an instance on the public cloud or use our Manta storage service. Similarly, if you want a support contract and/or professional services to run either SmartDataCenter or Manta on-premises, we’ll sell them to you. Based on our past experiences with open source, we do know that there will be one important change: these technologies will find their way into the hands of those that we have no other way of reaching — and that some fraction of these will become customers. Also based on past experience, we know that some (presumably much smaller) fraction of these new technologists will — by merits of their interest in and contributions to these projects — one day join us as engineers at Joyent. Bluntly, open source is our farm system, and broadening our hiring channel during a blazingly hot market for software talent is playing no small role in our decision here. In short, this is not an act of altruism: it is a business decision — if a multifaceted one that we believe has benefits beyond the balance sheet.

Welcome to open source SDC and Manta — and long-live the container revolution!

October 23, 2014

Joerg MoellenkampEvent Announcement: Business Breakfast "Erste Praxiserfahrungen mit Solaris 11.2" in Hamburg am 6. November 2014

October 23, 2014 08:22 GMT
I'm doing a business breakfast at beginning of November. As this is an event in german language, i will proceed in german language in this announcement.

Am Donnerstag, den 6. November 2014 findet in der Oracle Geschäftsstelle in Hamburg wieder unser Business Breakfast statt. Diesmal steht die Veranstaltung unter dem Motto: "Erste Praxiserfahrungen mit Solaris 11.2".

Ich werde in diesem Vortrag über folgende Bereiche berichten:Anmeldungen laufen diesmal etwas anders. Bitte eine eMail an diese Mailaddresse schicken. Das ist ein Weiterleiter an den organisierenden Kollegen, damit dessen Emailaddresse nicht für Spammer spiderbar hier im Artikel steht.

October 18, 2014

Garrett D'AmoreYour language sucks...

October 18, 2014 06:20 GMT
As a result of work I've been doing for illumos, I've recently gotten re-engaged with internationalization, and the support for this in libc and localedef (I am the original author for our localedef.)

I've decided that human languages suck.  Some suck worse than others though, so I thought I'd write up a guide.  You can take this as "your language sucks if...", or perhaps a better view might be "your program sucks if you make assumptions this breaks..."

(Full disclosure, I'm spoiled.  I am a native speaker of English.  English is pretty awesome for data-processing, at least at the written level.  I'm not going to concern myself with questions about deeper issues like grammar, natural language recognition, speech synthesis, or recognition, automatic translation, etc.  Instead this is focused strictly on the most basic display and simple operations like collation (sorting), case conversion, and character classification.)

1. Too many code points. 

Some languages (from Eastern Asia) have way way too many code points.  There are so many that these languages can't actually fit into 16-bits all by themselves.  Yes, I'm saying that there are languages with over 65,000 characters in them!  This explosion means that generating data for languages results in intermediate lookup tables that are megabytes in size.  For Unicode, this impacts all languages.  The intermediate sources for the Unicode supported in illumos blow up to over 2GB when support for the additional code planes is included.

2. Your language requires me to write custom code for symbol names. 

Hangul Jamo, I'm looking at you.  Of all the languages in Unicode, only this one is so bizarre that it requires multiple lookup tables to determine the names of the characters, because the characters are made up of smaller bits of phonetic portions (vowels and consonants.)  It even has its own section in the basic conformance document for Unicode (section 3.12).  I don't speak Korean, but I had to learn about Jamo.

3. Your language's character set is continuing to evolve. 

Yes, that's Asia again (mostly China I think).   The rate at which new Asian characters are added rivals that of updates to the timezone database.  The approach your language uses is wrong!

4. Characters in your language are of multiple different cell widths. 

Again, this is mostly, but not exclusively, Asian languages.  Asian languages require 2 cells to display many of their characters.  But, to make matters far far worse, some times the number f code points used to represent a character is more than one, which means that the width of a character when displayed may be 0, 1, or 2 cells.   Worse, some languages have both half- and full-width forms for many common symbols.  Argh.

5. The width of the character depends on the context. 

Some widths depend on the encoding because of historical practice (Asia again!), but then you have composite characters as well.  For example, a Jamo vowel sound could in theory be displayed on its own.  But if it follows a leading consonant, then it changes the consonant character and they become a new character (at least to the human viewer).

6. Your language has unstable case conversions.

There are some evil ones here, and thankfully they are rare.  But some languages have case conversions which are not reversible!  Case itself is kind of silly, but this is just insane!  Armenian has a letter with this property, I believe.

7. Your language's collation order is context-dependent. 

(French, I'm looking at you!)  Some languages have sorting orders that depend not just on the character itself, but on the characters that precede or follow it.  Some of the rules are really hard.  The collation code required to deal with this generally is really really scary looking.

8. Your language has equivalent alternates (ligatures). 

German, your ß character, which stands in for "ss", is a poster child here.  This is a single code point, but for sorting it is equivalent to "ss".  This is just historical decoration, because it's "fancy".  Stop making my programming life hard.

9. Your language can't decide on a script. 

Some languages can be written in more than one script.  For example, Mongolian can be written using Mongolian script or Cyrillic.  But the winner (loser?) here is Serbian, which in some places uses both Latin and Cyrillic characters interchangeably! Pick a script already! I think the people who live like this are just schizophrenic.  (Given all the political nonsense surrounding language in these places, that's no real surprise.)

10. Your language has Titlecase. 

POSIX doesn't do Titlecase.  This happens because your language also uses ligatures instead of just allocating a separate cell and code point for each character.  Most people talk about titlecase used in a phrase or string of words.  But yes, titlecase can apply to a SINGLE CHARACTER.  For example, Dž is just such a character.

11. Your language doesn't use the same display / ordering we expect.

So some languages use right to left, which is backwards, but whatever.   Others, crazy ones (but maybe crazy smart, if you think about it) use back and forth bidirectional.  And still others use vertical ordering.  But the worst of them are those languages (Asia again, dammit!) where the orientation of text can change.  Worse, some cases even rotate individual characters, depending upon context (e.g. titles are rotated 90 degrees and placed on the right edge).  How did you ever figure out how to use a computer with this crazy stuff?

12. Your encoding collides control codes.

We use the first 32 or so character codes to mean special things for terminal control, etc.  If we can't use these, your language is going to suck over certain kinds of communication lines.

13. Your encoding uses conflicting values at ASCII code points.

ASCII is universal.  Why did you fight it?  But that's probably just me being mostly Anglo-centric / bigoted.

14. Your language encoding uses shift characters. 

(Code page, etc.)  Some East Asian languages used this hack in the old days.  Stateful encodings are JUST HORRIBLY BROKEN.   A given sequence of characters should not depend on some state value that was sent a long time earlier.

15. Your language encoding uses zero values in the middle of valid characters. 

Thankfully this doesn't happen with modern encodings in common use anymore.  (Or maybe I just have decided that I won't support any encoding system this busted.  Such an encoding is so broken that I just flat out refuse to work with it.)

Non-Broken Languages

So, there are some good examples of languages that are famously not broken.

a. English.  Written English has simple sorting rules, and a very simple character set.  Dipthongs are never ligatures.  This is so useful for data processing that I think it has had a great deal to do with why English is the common language for computer scientists around the world.  US-ASCII -- and English character set, is the "base" character set for Unicode, and pretty much all other encodings use ASCII encodings in the lower 7 bits.

b. Russian.  (And likely others that use Cyrillic, but not all of them!)  Russian has a very simple alphabet, strictly phonetic.  The number of characters is small, there are no composite characters, and no special sorting rules.  Hmm... I seem to recall that Russia (Soviet era) had a pretty robust computing industry.  And these days Russians mostly own the Internet, right?  Coincidence?  Or maybe they just don't have to waste a lot of time fighting with the language just to get stuff done?

I think there are probably others.  (At a glance, Geoergian looks pretty straight-forward.   I suspect that there are languages using both Cyrillic and Latin character sets that are sane.  Ethiopic actually looks pretty simple and sane too.  (Again, just from a text processing standpoint.)

But sadly, the vast majority of natural languages have written forms & rules that completely and utterly suck for text processing.

October 17, 2014

Jeff SavitOracle VM Server for SPARC Best Practices White Paper

October 17, 2014 23:02 GMT
I'm very pleased to announce a new white paper has been published: Oracle VM Server for SPARC Best Practices.

This paper shows how to configure to meet demanding performance and availability requirements. Topics include:

The paper includes specific recommendations, describes the reasons behind them, and illustrates them with examples taken from actual systems.

October 13, 2014

Garrett D'AmoreMy Problem with Feminism

October 13, 2014 23:03 GMT
I'm going to say some things here that may be controversial.  Certainly that headline is.  But please, bear with me, and read this before you judge too harshly.

As another writer said, 2014 has been a terrible year for women in tech.  (Whether in the industry, or in gaming.)  Arguably, this is not a new thing, but rather events are reaching a head.  Women (some at any rate) are being more vocal, and awareness of women's issues is up.  On the face of it, this should be a good thing.

And yet, we have incredible conflict between women and men.  And this is at the heart of my problem with "Feminism".

The F-Word

Don't get me wrong.  I strongly believe that women should be treated fairly and with respect; in the professional place they should receive the same level of professional respect -- and compensation! -- as their male counterparts can expect.  I believe this passionately -- as a nerd, I prefer to judge people on the merits of their work, rather than on their race, creed, gender, or sexual preference.  A similar principle applies to gaming -- after all, how do you really know the gender of the player on the other side of the MMO?  Does it even matter?  When did gaming become a venue for channeling hate instead of fun?

The problem with "feminism" is that instead of repairing inequality and trying to bring men and women closer together, so much of it seems to be divisive.  The very word itself basically suggests a gender based conflict, and I think this, as well as much of the recent approach, is counterproductive.

Instead of calling attention to inequalities and improper behaviors (lets face it, nobody wants to deal with sexual harassment, discrimination, or some of the very much worse behavior that a few terribly bad actors are guilty of), we've become focused on gender bias and "fixing" gender bias as a goal in and of itself, rather than instead focusing on fair and equal treatment for all.

Every day I'm inundated with tweets and Facebook postings extolling the terrible plight of women at the expense of men.  Many of these posts seem intended to make me either angry at men, or ashamed of being one.  This basically drives a wedge between people, even unconsciously, to the point that it has become impossible to avoid being a soldier on one side or the other of this war.  And don't get me wrong, it has indeed degenerated to a total war.

I don't think this is what most feminists or their advocates really want.  (Though, I think it is what some of them want.  The side of feminism has its bad actors who thrive on conflict just as much as the other side has.  Extremism is gender and color and religion blind, as we've ample evidence of.)

I think one thing that advocates for women in tech can do, is to pick a different term, and a different way of stating their goals, and perhaps a different approach.  I think we've reached the critical mass necessary for awareness, so the constant tweets about how terrible it is to be a woman are no longer helpful.

I'm not sure what "term" should replace feminism -- in the workplace I'd suggest "professionalism".  After all everyone wants to be treated professionally, not just women.  (Btw, I'd say that in the gaming community, the value should be "sportsmanship".  Sadly some will see that word is gender biased, but I don't ascribe to the notion that we have to completely change our language in order to be more politically correct.  You know what I mean.)

Likewise, instead of dog piling on the one person (as I'm sure will happen in response to this post) on someone who doesn't immediately appear to support the feminist agenda, perhaps a little more tolerance, and education should be used in the approach.  Focus should, IMO, be on public praise for the parties who are working to make conditions better.

Educate instead of punish.  Make allies instead of enemies.

Salary Gap

The salary gap issue that was raised recently by Microsoft is another case in point.

I don't agree with Satya Nadella's comments saying that women should not ask for raises, but I think many women are nearly as likely to get a raise upon requesting one as a man of similar accomplishments.  (Yes, it would be better if this statement could have been said without "nearly".)   Far too few women feel comfortable asking for a merit based raise in the first place -- that is something that should change. But using race or gender as a bias to demand pay increases is a recipe for further division.  Indeed, men may begin to wonder if women are being compensated unfairly because they are women, but in the reverse direction. 

Likewise, bringing up discrimination in a salary discussion puts the other party on the defensive.  It presumes to imply prior wrong-doing.  This may be the case, but it may well not be.  After all, I've known many men that were under compensated simply because they sold themselves short, or were not comfortable asking for more money.   Why look for a fight when there isn't one?  (I suspect this is what Satya was really trying to get at.)

None of this helps the cause of "professionalism", and probably not the cause of "feminism".

Average tech salary figures are easily obtainable.  If a worker, man or woman, feels under compensated -- for any reason -- then they should take it to his employer and ask for a correction.  But to presume that the reason is gender, starts the conversation from a point of conflict.

Far far better is to demand far pay based on work performance and merit, relative to industry norms as appropriate.   If an employer won't compensate fairly, just leave.  There is no shortage of tech jobs in the industry.  If you're a woman, maybe look for jobs at companies that employ (and successfully retain) women.  Ask the people who work at a prospective employer about conditions, etc.  That's true for minorities too!  Ultimately, an employer who discriminates will find itself at a severe competitive advantage, as both the discriminated-against parties, and their allies refuse to do business with them.

An employer is not obligated to pay you "more" because of your gender.  But they must also not pay you less because of gender.  And yet every company will generally try to pay as little as they think they can get away with.  So don't let them -- but keep discrimination out of the conversation unless there is really compelling proof of wrong doing.  (And if there is such evidence, I'd recommend looking elsewhere, and possibly explore stronger legal measures.)

And yes, I strongly strongly believe that most men feel as I do.  They support the notion that everyone should be treated equally and professionally, and would like to stamp out sexism in the workplace, but many of us are starting to show symptoms of battle fatigue, and even more of us just don't want to be involved in a conflict at all.   Frankly, I think a lot of us are annoyed at feminist attempts to draw us into the conflict, even though we do support many of the stated goals of equal pay, fair treatment, etc. etc.

Closing Thoughts

As for me, I support the plight of women who find themselves discriminated against based on their gender, and I would like to see more women in my industry.  And I've put my money where my mouth is. 

But at the same time, you won't find me supporting "feminism".  I want to heal the rift, and work with awesome people -- and I happen to believe at least half of the awesome people in the world are of a different gender than I am.  Why would I want to alienate them?
I happen to believe that many well meaning people of many causes damage their cause by basically forcing people to deal with their "diversity" first, instead of of being able to deal with people as people on their own merit.  Its so much harder to appreciate a person on her own merits, when at least half of what she is saying is that she's unfairly treated because of gender, race, sexual preference, etc.  This true for everyone.  Show me how you're excellent, and I promise to appreciate you for your awesomeness, and to treat you fairly and with the same respect I would for anyone of my own gender/race/sexual preference.

You are awesome because of your accomplishments/innovations/contributions, not because of your gender or race or sexual preference.

But, if you won't let me look past your race/gender/etc. identity, then please don't be offended if I don't see anything else.  If you want to be treated like a "person", then let me see the person instead of just some classification in an equal opportunity survey.

October 11, 2014

Jeff SavitAvailability Best Practices - Example configuring a T5-8

October 11, 2014 00:05 GMT
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)
This article continues the series on availability best practices. In this post we will show each step used to configure a T5-8 for availability with redundant network and disk I/O, using multiple service domains.

Overview of T5

The SPARC T5 servers are a powerful addition to the SPARC line. Details on the product can be seen at SPARC T5-8 Server, SPARC T5-8 Server Documentation, The SPARC T5 Servers have landed, and other locations.

For this discussion, the important things to know are:

The following graphic shows T5-8 server resources. This picture labels each chip as a CPU, and shows CPU0 through CPU7 on their respective Processor Modules (PM) and the associated buses. On-board devices are connected to buses on CPU0 and CPU7.

Initial configuration

This demo is done on a lab system with a limited I/O configuration, but enough to show availability practices. Real T5-8 systems would typically have much richer I/O. The system is delivered with a single control domain owning all CPU, I/O and memory resources. Let's view the resources bound to the control domain (the only domain at this time). Wow, that's a lot of CPUs and memory. Some output and whitespace snipped out for brevity.

primary# ldm list -l
primary          active     -n-c--  UART    1024  1047296M 0.0%  0.0%  2d 5h 11m


    0      (0, 1, 2, 3, 4, 5, 6, 7)
    1      (8, 9, 10, 11, 12, 13, 14, 15)
    2      (16, 17, 18, 19, 20, 21, 22, 23)
    3      (24, 25, 26, 27, 28, 29, 30, 31)
    124    (992, 993, 994, 995, 996, 997, 998, 999)
    125    (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007)
    126    (1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015)
    127    (1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023)
    0      0      0      4.7% 0.2%   100%
    1      1      0      1.3% 0.1%   100%
    2      2      0      0.2% 0.0%   100%
    3      3      0      0.1% 0.0%   100%
    1020   1020   127    0.0% 0.0%   100%
    1021   1021   127    0.0% 0.0%   100%
    1022   1022   127    0.0% 0.0%   100%
    1023   1023   127    0.0% 0.0%   100%
    DEVICE                           PSEUDONYM        OPTIONS
    pci@300                          pci_0           
    pci@340                          pci_1           
    pci@380                          pci_2           
    pci@3c0                          pci_3           
    pci@400                          pci_4           
    pci@440                          pci_5           
    pci@480                          pci_6           
    pci@4c0                          pci_7           
    pci@500                          pci_8           
    pci@540                          pci_9           
    pci@580                          pci_10          
    pci@5c0                          pci_11          
    pci@600                          pci_12          
    pci@640                          pci_13          
    pci@680                          pci_14          
    pci@6c0                          pci_15    
Let's also look at the bus device names and pseudonyms:
primary# ldm list -l -o physio primary

    DEVICE                           PSEUDONYM        OPTIONS
    pci@300                          pci_0           
    pci@340                          pci_1           
    pci@380                          pci_2           
    pci@3c0                          pci_3           
    pci@400                          pci_4           
    pci@440                          pci_5           
    pci@480                          pci_6           
    pci@4c0                          pci_7           
    pci@500                          pci_8           
    pci@540                          pci_9           
    pci@580                          pci_10          
    pci@5c0                          pci_11          
    pci@600                          pci_12          
    pci@640                          pci_13          
    pci@680                          pci_14          
    pci@6c0                          pci_15

Basic domain configuration

The following commands are basic configuration steps to define virtual disk, console and network services and resize the control domain. They are shown for completeness but are not specifically about configuring for availability.

primary# ldm add-vds primary-vds0 primary
primary# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
primary# ldm add-vswitch net-dev=net0 primary-vsw0 primary
primary# ldm set-core 2 primary
primary# svcadm enable vntsd
primary# ldm start-reconf primary
primary# ldm set-mem 16g primary
primary# shutdown -y -g0 -i6

This is standard control domain configuration. After reboot, we have a resized control domain, and save the configuration to the service processor.

primary# ldm list
primary          active     -n-cv-  UART    16    16G      3.3%  2.5%  4m
primary# ldm add-spconfig initial

Determine which buses to reassign

This step follows the same procedure as in the previous article to determine which buses must be kept on the control domain and which can be assigned to an alternate service domain. The official documentation is at Assigning PCIe Buses in the Oracle VM Server for SPARC 3.0 Administration Guide.

First, identify the bus used for the root pool disk (in a production environment this would be mirrored) by getting the device name and then using the mpathadm command.

primary# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: none requested
        NAME                       STATE     READ WRITE CKSUM
        rpool                      ONLINE       0     0     0
          c0t5000CCA01605A11Cd0s0  ONLINE       0     0     0
errors: No known data errors
primary# mpathadm show lu /dev/rdsk/c0t5000CCA01605A11Cd0s0
Logical Unit:  /dev/rdsk/c0t5000CCA01605A11Cd0s2
                Initiator Port Name:  w508002000145d1b1

primary# mpathadm show initiator-port w508002000145d1b1
Initiator Port:  w508002000145d1b1
        Transport Type:  unknown
        OS Device File:  /devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/iport@1

That shows that the boot disk is on bus pci@300 (pci_0).

Next, determine which bus is used for network. Interface net0 (based on ixgbe0) is our primary interface and hosts a virtual switch, so we need to keep its bus.

primary# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net1              Ethernet             unknown    0      unknown   ixgbe1
net2              Ethernet             unknown    0      unknown   ixgbe2
net0              Ethernet             up         100    full      ixgbe0
net3              Ethernet             unknown    0      unknown   ixgbe3
net4              Ethernet             up         10     full      usbecm2
primary# ls -l /dev/ix*
lrwxrwxrwx   1 root     root     31 Jun 21 12:04 /dev/ixgbe -> ../devices/pseudo/clone@0:ixgbe
lrwxrwxrwx   1 root     root     65 Jun 21 12:04 /dev/ixgbe0 -> ../devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0:ixgbe0
lrwxrwxrwx   1 root     root     67 Jun 21 12:04 /dev/ixgbe1 -> ../devices/pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0,1:ixgbe1
lrwxrwxrwx   1 root     root     65 Jun 21 12:04 /dev/ixgbe2 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0:ixgbe2
lrwxrwxrwx   1 root     root     67 Jun 21 12:04 /dev/ixgbe3 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1:ixgbe3

Both disk and network are on bus pci@300 (pci_0), and there are network devices on pci@6c0 (pci_15) that we can give to an alternate service domain.

Lets determine which buses are needed to give that service domain access to disk. Previously we saw that the control domain's root pool was on c0t5000CCA01605A11Cd0s0 on pci@300. The control domain currently has access to all buses and devices, so we can use the format command to see what other disks are available. There is a second disk, and it's on bus pci@6c0:

primary# format
Searching for disks...done
       0. c0t5000CCA01605A11Cd0 <HITACHI-H109060SESUN600G-A244 cyl 64986 alt 2 hd 27 sec 66>
       1. c0t5000CCA016066100d0 <HITACHI-H109060SESUN600G-A244 cyl 64986 alt 2 hd 27 sec 668>
Specify disk (enter its number): ^C
primary# mpathadm show lu /dev/dsk/c0t5000CCA016066100d0s0
Logical Unit:  /dev/rdsk/c0t5000CCA016066100d0s2
                Initiator Port Name:  w508002000145d1b0
                Target Port Name:  w5000cca016066101
primary# mpathadm show initiator-port w508002000145d1b0
Initiator Port:  w508002000145d1b0
        Transport Type:  unknown
        OS Device File:  /devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/iport@1

This provides the information needed to reassign buses.

Define alternate service domain and reassign buses

We now define an alternate service domain, remove the above buses from the control domain and assign them to the alternate. Removing the buses cannot be done dynamically (add to or remove from a running domain). If I had planned ahead and obtained bus information earlier, I could have done this when I resized the domain's memory and avoided the second reboot.

primary# ldm add-dom alternate
primary# ldm set-core 2 alternate
primary# ldm set-mem 16g alternate
primary# ldm start-reconf primary
primary# ldm rm-io pci_15 primary
primary# init 6

After rebooting the control domain, I give the unassigned bus pci_15 to the alternate domain. At this point I could install Solaris in the alternate domain using a network install server, but for convenience I use a virtual CD image in a .iso file on the control domain. Normally you do not use virtual I/O devices in the alternate service domain because that introduces a dependency on the control domain, but this is temporary and will be removed after Solaris is installed.

primary# ldm add-io pci_15 alternate
primary# ldm add-vdsdev /export/home/iso/sol-11-sparc.iso s11iso@primary-vds0
primary# ldm add-vdisk s11isodisk s11iso@primary-vds0 alternate
primary# ldm bind alternate
primary# ldm start alternate

At this point, I installed Solaris in the domain. When the install was complete, I removed the Solaris install CD image, and saved the configuration to the service processor:

primary# ldm rm-vdisk s11isodisk alternate
primary# ldm add-spconfig 20130621-split
Note that the network devices on pci@6c0 are enumerated starting at ixgbe0, even though they were ixgbe2 and ixgbe3 when on the control domain that had all 4 installed interfaces.
alternate# ls -l /dev/ixgb*
lrwxrwxrwx   1 root     root     31 Jun 21 10:34 /dev/ixgbe -> ../devices/pseudo/clone@0:ixgbe
lrwxrwxrwx   1 root     root     65 Jun 21 10:34 /dev/ixgbe0 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0:ixgbe0
lrwxrwxrwx   1 root     root     67 Jun 21 10:34 /dev/ixgbe1 -> ../devices/pci@6c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1:ixgbe1

Define redundant services

We've split up the bus configuration and defined an I/O domain that can boot and run independently on its own PCIe bus. All that remains is to define redundant disk and network services to pair with the ones defined above in the control domain:

primary# ldm add-vds alternate-vds0 alternate
primary# ldm add-vsw net-dev=net0 alternate-vsw0 alternate

Note that we could increase resiliency, and potentially performance as well, by using a Solaris 11 network aggregate as the net-dev for each virtual switch. That would provide additional insulation: if a single network device fails the aggregate can continue operation without requiring IPMP failover in the guest.

In this exercise we use a ZFS storage appliance as an NFS server to host guest disk images, so we mount it on both the control and alternate domain, and then create a directory and boot disk for a guest domain. The following two commands are executed in both the primary and alternate domains:

# mkdir /ldoms				 
# mount zfssa:/export/mylab /ldoms  
Those are the only configuration commands run in the alternate domain. All other commands in this exercise are only run from the control domain.

Define a guest domain

A guest domain will be defined with two network devices so it can use IP Multipathing (IPMP) and two virtual disks for a mirrored root pool, each with a path from both the control and alternate domains. This pattern can be repeated as needed for multiple guest domains, as shown in the following graphic with two guests.

primary# ldm add-dom ldg1
primary# ldm set-core 16 ldg1
primary# ldm set-mem 64g ldg1
primary# ldm add-vnet linkprop=phys-state ldg1net0 primary-vsw0 ldg1 
primary# ldm add-vnet linkprop=phys-state ldg1net1 alternate-vsw0 ldg1
primary# ldm add-vdisk s11isodisk s11iso@primary-vds0 ldg1
primary# mkdir /ldoms/ldg1
primary# mkfile -n 20g /ldoms/ldg1/disk0.img
primary# ldm add-vdsdev mpgroup=ldg1group /ldoms/ldg1/disk0.img ldg1disk0@primary-vds0
primary# ldm add-vdsdev mpgroup=ldg1group /ldoms/ldg1/disk0.img ldg1disk0@alternate-vds0
primary# ldm add-vdisk ldg1disk0 ldg1disk0@primary-vds0 ldg1
primary# mkfile -n 20g /ldoms/ldg1/disk1.img
primary# ldm add-vdsdev mpgroup=ldg1group1 /ldoms/ldg1/disk1.img ldg1disk1@primary-vds0
primary# ldm add-vdsdev mpgroup=ldg1group1 /ldoms/ldg1/disk1.img ldg1disk1@alternate-vds0
primary# ldm add-vdisk ldg1disk1 ldg1disk1@alternate-vds0 ldg1
primary# ldm bind ldg1
primary# ldm start ldg1

Note the use of linkprop=phys-state on the virtual network definitions: this indicates that changes in physical link state should be passed to the virtual device so it can perform a failover.

Also note mpgroup on the virtual disk definitions. The ldm add-vdsdev commands define a virtual disk exported by a service domain, and the mpgroup pair indicates they are the same disk (the administrator must ensure they are different paths to the same disk) accessible by multiple paths. A different mpgroup pair is used for each multi-path disk. For each actual disk there are two "add-vdsdev" commands, and one ldm add-vdisk command that adds the multi-path disk to the guest. Each disk can be accessed from either the control domain or the alternate domain, transparent to the guest. This is documented in the Oracle VM Server for SPARC 3.0 Administration Guide at Configuring Virtual Disk Multipathing.

At this point, Solaris is installed in the guest domain without any special procedures. It will have a mirrored ZFS root pool, and each disk is available from both service domains. It also has two network devices, one from each service domain. This provides resiliency for device failure, and in case either the control domain or alternate domain is rebooted.

Configuring and testing redundancy

Multipath disk I/O is transparent to the guest domain. This was tested by serially rebooting the control domain or the alternate domain, and observing that disk I/O operation just proceeded without noticeable effect.

Network redundancy required configuring IP Multipathing (IPMP) in the guest domain. The guest has two network devices, net0 provided by the control domain, and net1 provided by the alternate domain. The process is documented at Configuring IPMP in a Logical Domains Environment.

The following commands are executed in the guest domain to make a redundant network connection:

ldg1# ipadm create-ipmp ipmp0
ldg1# ipadm add-ipmp -i net0 -i net1 ipmp0
ldg1# ipadm create-addr -T static -a ipmp0/v4addr1
ldg1# ipadm create-addr -T static -a ipmp0/v4addr2
ldg1# ipadm show-if
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net1       ip       ok       yes    --
ipmp0      ipmp     ok       yes    net0 net1

This was tested by bouncing the alternate service domain and control domain (one at a time) and noting that network sessions remained intact. The guest domain console displayed messages when one link failed and was restored:

Jul  9 10:35:51 ldg1 in.mpathd[107]: The link has gone down on net1
Jul  9 10:35:51 ldg1 in.mpathd[107]: IP interface failure detected on net1 of group ipmp0
Jul  9 10:37:37 ldg1 in.mpathd[107]: The link has come up on net1

While one of the service domains was down, dladm and ipadm showed link status:

ldg1# ipadm show-if
lo0        loopback ok       yes    --
net0       ip       ok       yes    --
net1       ip       failed   no     --
ipmp0      ipmp     ok       yes    net0 net1
ldg1# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         0      unknown   vnet0
net1              Ethernet             down       0      unknown   vnet1
ldg1# dladm show-link
LINK                CLASS     MTU    STATE    OVER
net0                phys      1500   up       --
net1                phys      1500   down     --
When the service domain finished rebooting, the "down" status returned to "up". There was no outage at any time.


This article showed how to configure a T5-8 with an alternate service domain, and define services for redundant I/O access. This was tested by rebooting each service domain one at a time, and observing that guest operation considered without interruption. This is a very powerful Oracle VM Serer for SPARC capability for configuring highly available virtualized compute environments.

October 10, 2014

Darryl GoveOpenWorld and JavaOne slides available for download

October 10, 2014 23:46 GMT

Thanks everyone who attended my talks last week. My slides for OpenWorld and JavaOne are available for download:

October 09, 2014

Joerg MoellenkampEvent announcement - Solaris Lounge: Why Oracle DB 12c runs best on Oracle Systems

October 09, 2014 15:48 GMT
Next week an interesting event takes place in Vienna on October 16th, 2014: "Solaris Lounge: Why Oracle DB 12c runs best on Oracle Systems". I will have two presentations there. The first one is "Why the Oracle Database runs best on SPARC and Solaris" and "LiveDemo: Solaris 11.2 features: Kernel Zones, Unified Archives, SDN, puppet"

Just to cite from the invitation:
This event follows up on the success of the TechDay Vienna event series, this time with emphasis on Oracle Platform advantages for the Oracle Database. We will focus on the practical implementations of the integration between the Database and the Systems layers, discussing the technical background, providing detailed examples as well as live demonstration of the mentioned technologies.

Learn through what methods the right systems and engineering methods can supercharge your environment, find out what unique Oracle Database 12c technologies are available while running Oracle on Oracle, consider virtualization management tools for your IaaS platform and hear customer case studies!
You can view the agenda and the link to register here.