January 25, 2012

Blog O' MattyFree video tutorials for C, Java, PHP, HTML5, Python, MySQL and more …

January 25, 2012 12:36 GMT
I just came across the new boston video tutorial series. I’ve watched 20 of the PHP videos and am hooked. The production quality is great, and the content is really, really good! Once I finish the 200 PHP videos I plan to watch their MySQL and HTML5 videos. Can’t recommend these videos enough, and the [...]

January 24, 2012

Blog O' MattyThe importance of keeping your storage array firmware up to date

January 24, 2012 13:02 GMT
A couple of weeks back I attempted to migrate a pair of clustered Solaris 10 servers to a new disk storage array. After rebooting into single user mode to pick up the new devices, I went to add the new quorum disk with clquorum. This resulted in both nodes panicking with the following panic string: [...]

January 23, 2012

Constantin GonzalezI am a Mobile Sensor Network, Collecting Big Data

January 23, 2012 15:40 GMT
Running stats over running path

Don’t worry, this is not a desperate attempt at SEO for my blog (although I do appreciate your likes, Tweets, RSS subscriptions and other ways you help me reach a wider audience), nor is this my entry into the latest contest of IT BS Bingo.

It just occurred to me yesterday that Big Data is everywhere. Even during your weekend jogging run.

Collecting Fitness Data, Step by Step, Heartbeat by Heartbeat, on Your Phone

For Christmas, I bought myself a Wahoo Fitness Key* and its matching ANT+ heart rate monitor (HRM)*. The key plugs into your iPhone and provides connectivity to the ANT+ wireless sensor protocol. The HRM is another dongle that straps around your chest and electrically registers every heart beat, then transmits the data to the Wahoo key. If you have an iPhone 4S, you can do without the key and just buy a Bluetooth HRM like the Wahoo BlueHR, because iPhone 4 supports Bluetooth 4.0 which includes a low power version of the protocol that supports sensor collection devices such as HRMs that run off of a coin cell.

So iPhone + Wahoo + HRM = Wireless Sensor Network. And if your idea of a network involves more than two participants, Wahoo also sells an ANT+ pedometer* to measure your stepping frequency along with heart beat data as well.

(Android users: I'm sure you'll find a similar solution for yourselves as well. I just happen to prefer quality over popularity.)

Running 2.0

Thanks to modern gadgetry, apps like iSmoothRun on my phone can now tell me how I'm doing while I’m running, including time, distance (thanks to GPS, which is another sensor), pace, cadence (using the phone’s accelerometer or a wireless pedometer*) and heart rate. I can also set up a target running profile (like “No more than 70% of max. heart rate so I can stay in the aerobic zone, please.”) and my phone will duck the music and tell me to slow down whenever I go beyond target heart rate.

Pretty cool.

Social Network Running

But we live in the age of web 2.0 so there's obviously more to do if you want to maintain your running geek-cred: The iPhone also collects all data (position, heart-beats, and steps) over time and at the end of the run, it will not only present me with my running statistics, possibly spiced up with current weather data etc., it will also offer to upload the data to one of the emerging fitness social networks, such as RunKeeper.com.

Sites like Runkeeper take the data and create web maps with my running path, complete with nice graphs that I can dive into for analyzing my own running behavior including altitude, pace, heart rate, cadence etc. They also collect other data such as weight and body fat percentage (yes, using a Withings Scale* for example, you can track weight/bodyfat data too, even data from a sleep tracking system* can be collected!) and show you your running (or fat loss) progress over time.

And thanks to social network goodness, you can run with friends over the network and compare statistics even if you’re not physically running at the same time. Or the same place.

And this is where Big Data comes into play, but what is it and how does it work?

The Advent of Big Data

The first time I heard about big data was during an internal workshop about the Sun Cloud in 2009 (you know, the old Sun habit of being way before our time). While we contemplated the implications of cloud computing for enterprises, someone mentioned that this would be nothing compared to the implications of Big Data. Back then, Big Data was reserved to web giants like Google and Yahoo! and the occasional large research institute such as CERN.

Big Data is the art of handling (surprise!) large amounts of data. “Large” can be anywhere starting at a dozen of Terabytes or a couple of Petabytes or any large number that no-one in their right mind would place into a single database on a single server.

Big Data has been made popular by innovations from web companies like Google, Yahoo, Facebook or Twitter, who pioneered new ways of handling huge amounts of data.

Today, Big Data is about to cross the chasm* from the domain of a few innovators and early adopters to the early majority, as businesses start to realize its value.

The Four V’s of Big Data

Big Data is typically associated with four V’s:

RunKeeper and Big Data

Let’s come back to our running example: RunKeeper is a Big Data company because it collects GPS, heart rate, cadence and other data from its millions of users. Assuming that only half of their 6 million users actually use the service for real, and that they run once a week and assuming a data size of 50 KByte per run (including GPS positions), we get 7.8 TByte of data per year. This is not a lot by Big Data standards, and it is quite structured, but when you combine this data with Tweets, Facebook status updates, other exercise data and nutrition/sleep data (RunKeeper does all of the above), then data volume easily increases to more than 10 TB per year, which is quite a lot to wade through.

And if you start counting records, the complexity is overwhelming: Each GPS sample is about 100 Bytes, which means that RunKeeper’s 10TB per year translates into roughly 100 Billion records to correlate, analyze and create meaning from.

What meaning?

The Meaning of Big Data

And that is the goal of Big Data: To create meaning out of billions of records that seem so innocent, if looked at individually. In the RunKeeper example, they create graphs of your running history and help you analyze and optimize your fitness either for free or as a paid, “pro” service. And thanks to their Health Graph API, an eco-system of other applications and companies emerges who slice and dice RunKeeper’s data in other creative ways, trying to create valuable (and monetizable) meaning out of it. Example: World-Rank.in collects data from RunKeeper and Twitter, then ranks runners into its own top 30 lists.

Other companies use Big Data to identify patterns in their customer’s behavior, find threats or opportunities to act upon, or simply alert hospitals that a new flu epidemic is about to hit them.

How Big Data Works

Most Big Data use cases work around the same pattern:

Oracle and Big Data

Don’t worry, this commercial break will be brief, but interesting:

Oracle’s big strengths of course are in handling commercial data warehouses and analyzing business information data, as well as building Engineered Systems that remove the pain of setting up an IT shop while optimizing the usefulness you get from your systems.

Big Data’s strength lies in its innovation to handle and organize unstructured, large data sets, through the Hadoop filesystem, the MapReduce framework, the R statistical language and other emerging technologies. But analyzing data after these steps is still in it infancy.

By combining the worlds of Big Data, Data Warehousing and Business Intelligence, running on Engineered Systems, Oracle can offer unique value to businesses who want to leverage Big Data for their benefit, without going through the trial/error/research of running their own Big Data development operations.

Learn more from Oracle’s Big Data White Paper, it’s really good, and check out Oracle's Big Data home page.

Building your own Sensor Driven Big Data Collection Network

As you can see, Big Data is fun and healthy. Here are some gadgets* to get your own Sensor Network based Big Data collection infrastructure set up that feeds into RunKeeper and other Big Data collecting social networks for your analytical pleasure:

Big Data and You

What are your favorite Big Data examples? Do you see Big Data being used in your company? Have you played with collecting, organizing and analyzing Big Data yourself? Leave a comment and share!

Finally, here's a video that shows the beauty of collecting, organizing, analyzing and visualizing of Big Data:

And if you want to see my own small chunks of running data, feel free to join my Street Team on RunKeeper.

Disclaimer: Neither me nor Oracle are affiliated with RunKeeper (Not that I know of). I just think it’s a cool service.

*Disclosure: Some product links in this article are affiliate links. If you buy through them, I’ll get a small kickback to help with hosting costs for this blog at no extra charge to you.

Joerg MoellenkampOracle Solaris 11 Tech Days 2012

January 23, 2012 14:18 GMT
Im Februar läuft eine Reihe von Events in ganz Deutschland zum Thema Solaris 11. Die Events versprechen technisch sehr interessant werden, da die Sprecher jeweils sehr tief in der Materie sind. Über Detlef Drewanz - der bei allen Events dabei ist - muss ich seit dem Containerleitfaden genauso wie über Uli Gräf (der an einigen, aber nicht allen Orten spricht) wohl nichts mehr sagen. Christian Christian Ritzka und Elke Freymann sind ausgewiesene Experten zum Thema OpsCenter. Und ja .. ich halte auch einen Vortrag über Datamanagement in Solaris 11. Und da ich schon zweimal die Frage gesehen habe: Die Veranstaltung ist kostenfrei :-)
 
08:30 - 09:00 Registrierung
09:00 - 09:15 Begrüßung
09:15 - 10:00

Was ist neu in Oracle Solaris 11
Viele Features, die im Rahmen der Solaris 10 Entwicklung in Solaris 11 Express eingeflossen sind, finden sich auch in Solaris 11 wieder. Diese Präsentation gibt einen Überblick über die neusten Features.

10:00 - 11:00

Oracle Solaris 11 Installation
Die wohl herausragendste Eigenschaft von Oracle Solaris 11 ist das neue Package System IPS und der Autoinstaller, die die Installation und das Management von Oracle Solaris 11 vereinfachen. Lernen Sie die neuen Techniken kennen und lassen Sie sich zeigen, wie einfach das Patchen unter Oracle Solaris 11 ist.

11:00 - 11:30

Pause

11:30 - 12:30 Oracle Virtualisierung
In Oracle Solaris 11 sind umfangreiche Virtualisierungstechniken integriert. Lernen Sie alles über die neue Netzwerk Virtualisierung in Oracle Solaris 11 und wie sie komplette multi-tier HW Infrastrukturen in einer einzelnen Maschine zusammen mit dem Oracle Virtual Machine framework und Solaris Zonen realisiert werden kann.
12:30 - 13:30 Mittagessen
13:30 - 14:15 Management von IT Infrastrukturen
Virtualisierung heist nicht nur "Hypervisor". In diesem Vortrag zeigen wir, wie sich virtualisierte Oracle Solaris 11 Umgebungen zentral verwalten lassen.
14:15 - 14:45 Das Solaris Schulungsprogramm
Oracle University stellt zusammen mit unseren Schulungspartnern ein umfassendes Programm zur Vertiefung von Solaris Wissen zur Verfügung. In diesem Vortrag werden die Ausbildungpfade, Kurse und Zertifizierungen für Solaris 11 beleuchtet und verfügbare Lernformen vorgestellt.

14:45 - 15:15

Pause

15:15 - 15:45

Oracle Solaris 11 Datamanagement
Oracle Solaris 11 hat umfassende Datamanagement Funktionen integriert. Lernen Sie die neusten ZFS features wie Data Encryption und Deduplikation kennen und wie Sie diese Funktionen über die CIFS-Integration im Kernel auch anderen Plattformen zur Verfügung stellen können.

15:45 - 16:15

Panel, Q&A

16:15 - 16:45

Erfrischungen, Zeit zur Diskussion mit den Experten


Die genaue Agenda mit den Sprechern in den einzelnen Orten und eine Möglichkeit zur Anmeldung findet ihr auf den Eventseiten:
Um zahlreiches Erscheinen wird gebeten! :-)

Blog O' MattyHow to figure out if a processes has been chroot()’ed

January 23, 2012 13:07 GMT
A number of applications (e.g., custom chroot jails, openssh, vsftp, apache) support the ability to chroot themselves. To find out if a process called chroot() at startup, you can check the /proc/<pid>/root entry for the process. For non-chrooted processes this entry will point to /: $ ps auxwww | grep [s]endmail root 3643 0.0 0.1 [...]

January 22, 2012

Blog O' MattyLearn Python video series from Google

January 22, 2012 14:03 GMT
I’ve been trying to expand my Python knowledge and recently came across Nick Parlante’s 6-part learn Python series on Youtube. I’ve watched several of the videos, and I am impressed with Nick’s teaching ability. Here are links to the 6-part series: Day 1 part 1: Introduction and Strings Day 1 part 2: Lists, Sorting and [...]

January 21, 2012

Blog O' MattyA couple of gotchas with the OpenSSH chroot() implementation

January 21, 2012 14:09 GMT
I previously discussed the OpenSSH Match directive, and how it can be used to chroot SSH and SFTP users. Over the past couple of months I’ve encountered some gotchas with the chroot implementation in OpenSSH. Since I had to figure these items out myself, I figured I would share my findings here so folks wouldn’t [...]

January 20, 2012

Adam LeventhalZFS+10: illumos meetup

January 20, 2012 21:39 GMT

ZFS recently celebrated its informal 10th anniversary; to mark the occasion, Delphix hosted a ZFS-themed meetup for the illumos community (sponsored generously by Joyent). Many thanks to Deirdre Straughan, the new illumos community manager, for helping to organize and for filming the event. Three of my colleagues at Delphix presented work they’ve been doing in the ZFS ecosystem.

Matt Ahrens, who (with Jeff Bonwick) invented ZFS back in 2001, started the program with a discussion of a new stable interface for ZFS. Initially libzfs had been designed as a set of helper functions in support of the zfs(1M) and zpool(1M) commands; since then, it has outgrown those humble ambitions and a new, simple, stable interface is needed for programmatic consumers of ZFS. In Matt’s talk and blog post, he lays out a series of guiding principles for the new libzfs_core library; he’s already started to implement these ideas for new ZFS features in development at Delphix.

John Kennedy has been working on a relatively neglected part of illumos: automated testing. At the meetup John spoke about the work he’s been doing to revitalize the ZFS test suite, and to build a unit testing framework for illumos at large. I found the questions and enthusiasm from the people in the room particularly encouraging — everyone knows that we need to be doing more testing, but until John stepped up, no one was leading the charge. The ZFS test suite is available on github. Take a look at John’s blog post to see how to execute the ZFS test suite, and how you can contribute to illumos by helping him diagnose and fix the 60+ outstanding failures.

Chris Siden has been at Delphix just since he graduated from Brown University this past spring, but he’s already made a tremendous impact on ZFS. Chris presented both the work he’s done to finish the work started by Basil Crow (also of Brown, and soon full-time at Delphix) on ZFS feature flags (originally presented to the ZFS community by Matt back in May). Previously, ZFS features followed a single, linear versioning; with Chris and Basil’s work it’s not a land-grab for the next version, rather each feature can be enabled discretely. Chris also implemented the world’s first flagged ZFS feature, Async Destroy (also known to ZFS feature flags as com.delphix:async_destroy) which allows datasets to be destroyed asynchronously in the background — a huge boon when destroying gigantic ZFS datasets. Chris also presented some work he’s been doing on backward compatibility testing for ZFS; check out his blog post on both subjects.

The illumos meetup was a great success. Thank you to everyone who attended in person or on the web. To get involved with the ZFS community, join the illumos ZFS mailing list, and for information on the next illumos meetup, join the group.

Jeff SavitSolaris 10 branded zone VM Templates for Solaris 11 on OTN

January 20, 2012 21:34 GMT

Early this year I wrote the article Ours Goes To 11 which describes the ability to import Solaris 10 systems into a "Solaris 10 branded zone" under Oracle Solaris 11. I did this using Solaris 11 Express, and the capability remains in Solaris 11 with only slight changes. This important tool lets you painlessly inhaling a Solaris Container from Solaris 10 or entire Solaris 10 systems ("the global zone") into virtualized environments on a Solaris 11 OS.

Just recently, Oracle provided Oracle VM Templates for Oracle Solaris 10 Zones to let you create Solaris 10 branded zones for Solaris 11 even if you don't currently have access to install media or a running Solaris 10 system. To use this, just download the Oracle VM Template for Oracle Solaris Zone 10 from OTN at http://www.oracle.com/technetwork/server-storage/solaris11/downloads/virtual-machines-1355605.html. This page contains images of Oracle Solaris 10 8/11 (the recent update to Solaris 10) in SPARC and x86 formats suitable for creating branded zones. The same page also has a VirtualBox image you can download for a complete Solaris 10 install in a guest virtual machine you can run on any host OS that supports VirtualBox. Both sets of downloads provide a quick - and extremely easy - way to set up a virtual Solaris 10 environment. In the case of the Oracle VM Templates, they illustrate several advanced features of Solaris 11.

To start, just go to the above link, download the template for the hardware platform (SPARC or x86) you want, and download the README file also linked from that page.

Install prerequisites

The README file tells you to install the prerequisite Solaris 11 package that implements the Solaris 10 brand. Then you can install instances of zones with that brand.

# pkg install pkg:/system/zones/brand/brand-solaris10
           Packages to install:   1
       Create boot environment:  No
Create backup boot environment: Yes

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1       44/44      0.4/0.4

PHASE                                        ACTIONS
Install Phase                                  74/74 

PHASE                                          ITEMS
Package State Update Phase                       1/1 
Image State Update Phase                         2/2 
That took only a few minutes, and didn't require a reboot.

Install the Solaris 10 zone

Now it's time to run the downloaded template file. First make it executable via the chmod command, of course. I found that (unlike stated in the README) there was no need to rename the downloaded file to remove the .bin. When you run it you provide several parameters to describe the zone configuration:

Kicking it off, you will see a copyright message, and then messages showing progress building the zone, which only takes a few minutes.
# ./solaris-10u10-x86.bin -p /zones -a 192.168.1.100 -i rge0 -z s10u10

...
...

Checking disk-space for extraction
  Ok

Extracting in /export/home/CDimages/s10zone/bootimage.ihaqvh ...
100% [===============================>]

Checking data integrity
  Ok

Checking platform compatibility
      The host  and  the image  do not have  the same Solaris release:
        host  Solaris release:   5.11
        image Solaris release:   5.10

      Will create a Solaris 10 branded zone.

  Warning: could not find a defaultrouter
  Zone won't have any defaultrouter configured


IMAGE:      ./solaris-10u10-x86.bin
ZONE:       s10u10
ZONEPATH:   /zones/s10u10
INTERFACE:  rge0
VNIC:       vnicZBI13379
MAC ADDR:   2:8:20:5c:1a:cc
IP ADDR:    192.168.1.100
NETMASK:    255.255.255.0
DEFROUTER:  NONE
TIMEZONE:   US/Arizona

Checking disk-space for installation
  Ok

Installing in /zones/s10u10 ...
100% [===============================>]

Using a static exclusive-IP

Attaching s10u10

Booting s10u10

  Waiting for boot to complete
  booting...
  booting...
  booting...

Zone s10u10 booted

The zone's root password  has been set using the
root password of the local host.
You  can  change  the  zone's  root password  to
further harden  the security of the zone:  being
root,  log  into the zone  from  the  local host
with  the command 'zlogin s10u10'.
Once logged in, change the root password with the
command 'passwd'.

The nifty part in my opinion (besides being so easy), is that the zone was created as an exclusive-IP zone on a virtual NIC. This network configuration lets you enforce traffic isolation from other zones, enforce network Quality of Service, and even let the zone set its own characteristics like IP address and packet size.

Independence of the zone's network characteristics from the global zone is one of the enhancements in Solaris 10 that make it easier to consolidate zones while preserving their autonomy, yet provide control in a consolidated environment.

Let's see what the virtual network environment looks like by issuing commands from the Solaris 11 global zone. First I'll use Old School ifconfig, and then I'll use the new ipadm and dladm commands.

# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
	inet 127.0.0.1 netmask ff000000 
rge0: flags=1004943<UP,BROADCAST,RUNNING,PROMISC,MULTICAST,DHCP,IPv4> mtu 1500 index 2
	inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
	ether 0:14:d1:18:ac:bc 
vboxnet0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
	inet 192.168.56.1 netmask ffffff00 broadcast 192.168.56.255
	ether 8:0:27:f8:62:1c 
# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
yge0              Ethernet             unknown    0      unknown   yge0
yge1              Ethernet             unknown    0      unknown   yge1
rge0              Ethernet             up         1000   full      rge0
vboxnet0          Ethernet             up         1000   full      vboxnet0
# dladm show-link
LINK                CLASS     MTU    STATE    OVER
yge0                phys      1500   unknown  --
yge1                phys      1500   unknown  --
rge0                phys      1500   up       --
vboxnet0            phys      1500   up       --
vnicZBI13379        vnic      1500   up       rge0
s10u10/vnicZBI13379 vnic      1500   up       rge0
s10u10/net0         vnic      1500   up       rge0
# dladm show-vnic
LINK                OVER         SPEED  MACADDRESS        MACADDRTYPE       VID
vnicZBI13379        rge0         1000   2:8:20:5c:1a:cc   random            0
s10u10/vnicZBI13379 rge0         1000   2:8:20:5c:1a:cc   random            0
s10u10/net0         rge0         1000   2:8:20:9d:d0:79   random            0
# ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
rge0/_a           dhcp     ok           192.168.1.3/24
vboxnet0/_a       static   ok           192.168.56.1/24
lo0/v6            static   ok           ::1/128

Log into the zone

The install step already booted the zone, so lets log into it. Notice how you have to be appropriately privileged to log into a zone. This is my home system so I'm being a bit cavalier, but in a production environment you can give granular control of who can login to which zones. Voila! a Solaris 10 environment under a Solaris 11 kernel. Notice the output from the uname -a and ifconfig commands, and output from a ping to a nearby host.

$ zlogin s10u10
zlogin: You lack sufficient privilege to run this command (all privs required)
savit@home:~$ sudo zlogin s10u10
Password: 
[Connected to zone 's10u10' pts/5]
Oracle Corporation	SunOS 5.10	Generic Patch	January 2005
# uname -a
SunOS s10u10 5.10 Generic_Virtual i86pc i386 i86pc
# ifconfig -a4
lo0: flags=2001000849 mtu 8232 index 1
	inet 127.0.0.1 netmask ff000000 
vnicZBI13379: flags=1000843 mtu 1500 index 2
	inet 192.168.1.100 netmask ffffff00 broadcast 192.168.1.255
	ether 2:8:20:5c:1a:cc 
# bash
bash-3.2# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
	inet 127.0.0.1 netmask ff000000 
vnicZBI13379: flags=1000843 mtu 1500 index 2
	inet 192.168.1.100 netmask ffffff00 broadcast 192.168.1.255
	ether 2:8:20:5c:1a:cc 
bash-3.2# ping 192.168.1.2
192.168.1.2 is alive

For fun, I configured Apache (setting its configuration file in /etc/apache2) and brought it up. Easy - took just a few minutes.

bash-3.2# svcs  apache2
STATE          STIME    FMRI
disabled       12:38:46 svc:/network/http:apache2
bash-3.2# svcadm enable apache2

Summary

In just a few minutes, I built a functioning virtual Solaris 10 environment under by Solaris 11 system. It was... easy! While I can still do it the manual way (creating and using a system archive), this is a low-effort way to create a Solaris 10 zone on Solaris 11.

Jeff SavitNew wiki article: Exploring Oracle Solaris 11 Express

January 20, 2012 21:33 GMT

Recommended: Exploring Oracle Solaris 11 Express

There's a very useful new wiki article at http://wikis.sun.com/display/solaris/Exploring+the+World%27s+First+Fully+Virtualized+Operating+System titled Exploring the World's First Fully Virtualized Operating System.

This covers material similar to what I discussed in http://blogs.oracle.com/jsavit/entry/flow_control_in_solaris_11 "Flow control in Solaris 11 Express Network virtualization", but goes further. Instead of just adding a flow to an existing physical network interface as I did, the wiki illustrates creating virtual network interfaces with the dladm create-vnic and ipadm commands. In its second example, the wiki shows how to create a zone using the virtual nic.

No need to trade off shared vs. exclusive

That brings up an important new capability of Oracle Solaris 11. In Solaris 10, a zone (aka Solaris Container) could have a shared network interface or an exclusive IP. The shared model works well for most use-cases, typically many virtual environments on the same host and same network, with individual IP addresses and efficient off-box and inter-zone networking. But, that didn't allow zones to do things like assign their own IP address, or individually set network characteristics like turning on jumbo frames.

Exclusive IP was invented for cases where some zones had to have control over their own network interfaces (even issuing ndd commands if they want, and when some zones had to exist on separate networks from other zones, especially for hosts residing on a DMZ or the Internet along with a company's internal network. However, exclusive IP required, well - exclusive access to a physical network device, restricting how many exclusive IP zones could be hosted on a server. Now, you can create an arbitrary number of virtual interfaces.

Recommended reading - several tasty recipes in one serving

In addition to the above features, the blog illustrates several other tasty items: the exclusive-IP zone is created using ZFS compression to save disk space, and sudo is used for commands that (traditionally, or by habit) would have implied becoming root. Switching to an all-powerful root userid is so, last-century. Userids are created within the zone (names that will be familiar to viewers of a recent pair of movies about a high-tech superhero). Software is added to the zone (Solaris 11 zones start with a minimized install), Apache web server is set up, and then the whole thing is cloned to make a new zone. Great stuff, and a good illustration of ways that Oracle Solaris 11 Express provides new, flexible, and more secure administration. For a further illustration, see Jeff Victor's blog at http://blogs.oracle.com/JeffV/entry/virtual_network_part_3

Don't Fear The Reaper

My opportunity for a little joke: Sun blogs were on blogs.sun.com, sometimes referred to by us bloggers as "b.s.c". Now that we're on blogs.oracle.com (this is my first post in the new name), I expect to see references to "b.o.c". Which makes me think of Blue Öyster Cult. Naturally!


The views expressed on this [blog; Web site] are my own and do not necessarily reflect the views of Oracle.

(that goes for musical taste, too...)

Blog O' MattyHow to encrypt an SSH private key

January 20, 2012 20:55 GMT
If you are using SSH key-based authentication you should be encrypting your private key. This ensures that if someone breaks into your server and steals your keys, they won’t be able to utilize them to access other systems. If your private key isn’t encrypted you can use the ssh-keygen utilities “-p” option to do so: [...]

Jeff SavitDedicated CPUs in zones - a small RM exercise

January 20, 2012 20:54 GMT

A small RM exercise

Today's blog is about an exercise with resource management using Logical Domains and Solaris Containers. Nothing earth-shattering, or even novel, but an illustration on how these technologies interact, and how resource management looks when dedicated CPUs are used with Containers.

The problem statement

I needed to demonstrate the interaction of Solaris Containers and dedicated CPUs for a customer. They wanted zones to be set up with dedicated CPUs so they could see what visibility zones had to CPU resources.

Lab environment

Fortunately I have access to a small T1000 server and logical domains, and set up a domain with multiple CPUs. (In these examples, "primary" in screen-scraped text indicates that the terminal session is in the Control Domain, and "global" indicates that the terminal session is in the guest domain's global zone.)

primary # ldm set-mem 2g ldom1
primary # ldm set-vcpu 8 ldom1
primary # ldm bind ldom1
primary # ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      4     3G       0.5%  2d 11h 12m
ldom1            bound      ------  5000    8     2G             
ldom2            inactive   ------          4     1G             
ldom3            inactive   ------          4     1G             
primary # ldm start ldom1
LDom ldom1 started

Boot up the lab system

After firing up the domain, I connected to its console (as shown above, its virtual console is connected to port 5000), logged into it and displayed some virtual configuration data. Note that I have this domain set up to require manual boot. That's useful in a lab or training scenario, but normally you would let the domain boot up Solaris on the ldm start. In that case I could have just waited a few seconds for boot to complete (booting a logical domain is very fast since physical devices don't have to be probed) and simply used the ssh command to connect directly to the domain. Here, I get the OpenBoot "ok" prompt and then boot Solaris.

As expected, the domain sees the 8 virtual CPUs defined to it (look for the psrinfo output below). It also has 2 (virtual) NICS bound to different virtual switches connected to different physical networks. The network configuration isn't germane to today's exercise, but it's worth mentioning because it illustrates how you can pass separate physical network connections to nested virtual environments.

primary $ telnet localhost 5000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '\^]'.

Connecting to console "ldom1" in group "ldom1" ....
Press ~? for control options ..

Sun Fire(TM) T1000, No Keyboard
Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.30.3, 2048 MB memory available, Serial #83492552.
Ethernet address 0:14:4f:f9:fe:c8, Host ID: 84f9fec8.

{0} ok boot
Boot device: /virtual-devices@100/channel-devices@200/disk@0:a  File and args: 
SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: t1ldom1
Reading ZFS config: done.
Mounting ZFS filesystems: (8/8)

t1ldom1 console login: root
Password: 
Last login: Tue Sep 15 17:31:05 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
global # psrinfo
0       on-line   since 10/01/2009 19:23:09
1       on-line   since 10/01/2009 19:23:11
2       on-line   since 10/01/2009 19:23:11
3       on-line   since 10/01/2009 19:23:11
4       on-line   since 10/01/2009 19:23:11
5       on-line   since 10/01/2009 19:23:11
6       on-line   since 10/01/2009 19:23:11
7       on-line   since 10/01/2009 19:23:11
global # ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
vnet0: flags=201000843 mtu 1500 index 2
        inet 192.168.2.101 netmask ffffff00 broadcast 192.168.2.255
        ether 0:14:4f:fb:f8:a4 
vnet1: flags=201000843 mtu 1500 index 3
        inet 129.153.20.144 netmask ffffff00 broadcast 129.153.20.255
        ether 0:14:4f:fa:3b:c9 

This domain also has a zone named u4z1 which I had migrated (via zoneadm detach and zoneadm attach) from an older update level of Solaris 10. The zone has shared IP access to each of the logical domain's virtual network devices, hence access to the different physical networks the machine is connected to.

global # zoneadm list -civ
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   - u4z1             installed  /zones/u4z1                    native   shared
global # zonecfg -z u4z1 info
zonename: u4z1
zonepath: /zones/u4z1
brand: native
autoboot: false
bootargs: 
pool: 
limitpriv: 
scheduling-class: 
ip-type: shared
inherit-pkg-dir:
	dir: /lib
inherit-pkg-dir:
	dir: /platform
inherit-pkg-dir:
	dir: /sbin
inherit-pkg-dir:
	dir: /usr
net:
	address: 192.168.2.222
	physical: vnet0
	defrouter not specified
net:
	address: 129.153.20.232
	physical: vnet1
	defrouter not specified

View from within the zone before resource management applied

At this point, I'll boot up the zone u4z1 and demonstrate that it has access to all the CPUs defined for this logical domain, and coincidentally, access to the network devices.

global # zlogin -C u4z1
[Connected to zone 'u4z1' console]

[NOTICE: Zone booting up]


SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: u4z1
Reading ZFS config: done.

u4z1 console login: root
Password: 
Last login: Thu Aug 27 16:47:17 on console
Oct  1 19:27:05 u4z1 login: ROOT LOGIN /dev/console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# ifconfig -a
lo0:1: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
vnet0:1: flags=201000843 mtu 1500 index 2
        inet 192.168.2.222 netmask ffffff00 broadcast 192.168.2.255
vnet1:1: flags=201000843 mtu 1500 index 3
        inet 129.153.20.232 netmask ffffff00 broadcast 129.153.20.255
# psrinfo
0       on-line   since 10/01/2009 19:23:09
1       on-line   since 10/01/2009 19:23:11
2       on-line   since 10/01/2009 19:23:11
3       on-line   since 10/01/2009 19:23:11
4       on-line   since 10/01/2009 19:23:11
5       on-line   since 10/01/2009 19:23:11
6       on-line   since 10/01/2009 19:23:11
7       on-line   since 10/01/2009 19:23:11

If you're keeping score: the physical machine has 24 CPUs (6 cores of 4 virtual CPUs each), and this domain has 8 of those CPUs, and the zone within it can see all of them.

Apply dedicated CPUs to the zone

Now, I go back to the global zone (still in the logical domain, remember) and add a dedicated-cpu stanza to the definition of the u4z1 zone. This sets up the zone so it has between 1 and 4 CPUs for its exclusive use.

global # zonecfg -z u4z1
zonecfg:u4z1> add dedicated-cpu
zonecfg:u4z1:dedicated-cpu> set ncpus=1-4
zonecfg:u4z1:dedicated-cpu> set importance=2
zonecfg:u4z1:dedicated-cpu> end
zonecfg:u4z1> verify
zonecfg:u4z1> commit
zonecfg:u4z1> exit
global # zonecfg -z u4z1 info
zonename: u4z1
zonepath: /zones/u4z1
brand: native
autoboot: false
bootargs: 
pool: 
limitpriv: 
scheduling-class: 
ip-type: shared
inherit-pkg-dir:
	dir: /lib
inherit-pkg-dir:
	dir: /platform
inherit-pkg-dir:
	dir: /sbin
inherit-pkg-dir:
	dir: /usr
net:
	address: 192.168.2.222
	physical: vnet0
	defrouter not specified
net:
	address: 129.153.20.232
	physical: vnet1
	defrouter not specified
dedicated-cpu:
	ncpus: 1-4
	importance: 2

Okay, I've changed the definition, let's recycle the zone. Oh, I forgot to enable the service that automatically shifts the number of CPUs owned by the zone between its lower and upper bounds. This is a really helpful feature: when the zone is CPU-busy, Solaris provides it CPUs up to the specified maximum number. When the zone is idle, it removes CPUs until it reaches the lower limit, which makes the CPUs available to other zones. Without the svc:/system/pools/dynamic service turned on, the zone gets the upper bound of dedicated CPUs. I can turn the dynamic pool service some other time, as it's not needed for this demo.

global # zoneadm -z u4z1 halt
global # zoneadm -z u4z1 boot
zoneadm: zone 'u4z1': WARNING: A range of dedicated-cpus has been specified
zoneadm: zone 'u4z1': but the dynamic pool service is not enabled.
zoneadm: zone 'u4z1': The system will not dynamically adjust the
zoneadm: zone 'u4z1': processor allocation within the specified range
zoneadm: zone 'u4z1': until svc:/system/pools/dynamic is enabled.
zoneadm: zone 'u4z1': See poold(1M).
global # svcs -xv svc:/system/pools/dynamic
svc:/system/pools/dynamic:default (dynamic resource pools)
 State: disabled since Thu Oct 01 19:23:30 2009
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: man -M /usr/share/ man -s 1M poold
Impact: This service is not running. 

The pool is under the covers

Under the covers, Solaris is building a "resource pool" that exists for the duration of the zone being booted up. You can do the same thing with the pooladm and poolcfg commands, but the dedicated-cpu syntax does it for you with much less effort on your part. This usability enhancement was delivered to Solaris 10 some two years ago!

Here's a view from the global zone of the resource pool environment created for you. There's a resource pool created by appending the name of the zone to the string SUNWtmp_, bound to a like-named processor set ("pset") with between 1 and 4 CPUs. Four of the eight CPUs owned by the domain are associated with this processor set, and the remaining CPUs are owned by a default resource pool and processor set.

global # poolcfg -c 'info' -d           

system default
	string	system.comment 
	int	system.version 1
	boolean	system.bind-default true
	string	system.poold.objectives wt-load

	pool SUNWtmp_u4z1
		int	pool.sys_id 1
		boolean	pool.active true
		boolean	pool.default false
		int	pool.importance 2
		string	pool.comment 
		boolean	pool.temporary true
		pset	SUNWtmp_u4z1

	pool pool_default
		int	pool.sys_id 0
		boolean	pool.active true
		boolean	pool.default true
		int	pool.importance 1
		string	pool.comment 
		pset	pset_default

	pset SUNWtmp_u4z1
		int	pset.sys_id 1
		boolean	pset.default false
		uint	pset.min 1
		uint	pset.max 4
		string	pset.units population
		uint	pset.load 361
		uint	pset.size 4
		string	pset.comment 
		boolean	pset.temporary true

		cpu
			int	cpu.sys_id 1
			string	cpu.comment 
			string	cpu.status on-line

		cpu
			int	cpu.sys_id 0
			string	cpu.comment 
			string	cpu.status on-line

		cpu
			int	cpu.sys_id 3
			string	cpu.comment 
			string	cpu.status on-line

		cpu
			int	cpu.sys_id 2
			string	cpu.comment 
			string	cpu.status on-line

	pset pset_default
		int	pset.sys_id -1
		boolean	pset.default true
		uint	pset.min 1
		uint	pset.max 65536
		string	pset.units population
		uint	pset.load 2
		uint	pset.size 4
		string	pset.comment 

		cpu
			int	cpu.sys_id 5
			string	cpu.comment 
			string	cpu.status on-line

		cpu
			int	cpu.sys_id 4
			string	cpu.comment 
			string	cpu.status on-line

		cpu
			int	cpu.sys_id 7
			string	cpu.comment 
			string	cpu.status on-line

		cpu
			int	cpu.sys_id 6
			string	cpu.comment 
			string	cpu.status on-line

Now when I boot the zone up it has access to only 4 CPUs of the 8 defined for this logical domain. You can use this to control the resources allocated to a zone, or to control the number of CPUs it has for software products that are licensed on a per-CPU charge.

[NOTICE: Zone halted]
[NOTICE: Zone booting up]

SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: u4z1
Reading ZFS config: done.

u4z1 console login: root
Password: 
Oct  1 19:34:20 u4z1 login: ROOT LOGIN /dev/console
Last login: Thu Oct  1 19:27:04 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# psrinfo
0       on-line   since 10/01/2009 19:23:09
1       on-line   since 10/01/2009 19:23:11
2       on-line   since 10/01/2009 19:23:11
3       on-line   since 10/01/2009 19:23:11

Unlike many things with computers, CPU allocation doesn't have to be on a power-of-two basis:

global # zonecfg -z u4z1
zonecfg:u4z1> remove dedicated-cpu
zonecfg:u4z1> add dedicated-cpu
zonecfg:u4z1:dedicated-cpu> set ncpus=2-3
zonecfg:u4z1:dedicated-cpu> end
zonecfg:u4z1> verify
zonecfg:u4z1> commit
zonecfg:u4z1> exit

I restart the zone, and it again has the limited number of CPUs for its dedicated use.

[NOTICE: Zone halted]
[NOTICE: Zone booting up]

SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: u4z1
Reading ZFS config: done.

u4z1 console login: root
Password: 
Last login: Thu Oct  1 19:34:20 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# psrinfo
0       on-line   since 10/01/2009 19:23:09
1       on-line   since 10/01/2009 19:23:11
2       on-line   since 10/01/2009 19:23:11

Can you combine dedicated CPUs and the Fair Share Scheduler?

See what happens if I try to use the Fair Share Scheduler (FSS) to assign CPU resources to this zone.

global # zonecfg -z u4z1
zonecfg:u4z1> set cpu-shares=5
zonecfg:u4z1> verify
rctl zone.cpu-shares and dedicated-cpu are incompatible.
u4z1: Incompatible settings
zonecfg:u4z1> remove dedicated-cpu
zonecfg:u4z1> set cpu-shares=5
zonecfg:u4z1> verify
zonecfg:u4z1> commit
zonecfg:u4z1> exit

It's not permitted: either you dedicate CPUs to a zone or you assign CPUs based on relative shares.

However, within a zone, the zone's root can use FSS to suballocate its CPU resources to projects using the project command. That's useful when a single zone hosts multiple applications.

CPU visibility in unmanaged zones

In this example. I booted up a second zone u4z2 (cloned from u4z1), and it did not have CPUs dedicated to it. When u4z1 had 3 dedicated CPUs, u4z2 had visibility to the remaining 5, as you would expect.

u7z2 console login: root
Password: 
Oct  1 20:10:58 u7z2 login: ROOT LOGIN /dev/console
Last login: Thu Oct  1 19:43:39 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# psrinfo
3       on-line   since 10/01/2009 19:23:11
4       on-line   since 10/01/2009 19:23:11
5       on-line   since 10/01/2009 19:23:11
6       on-line   since 10/01/2009 19:23:11
7       on-line   since 10/01/2009 19:23:11

I removed the dedicated-cpus from zone u4z1 and rebooted it, and zone u4z2 immediately saw the full set of 8 CPUs:

# psrinfo
0       on-line   since 10/01/2009 19:23:09
1       on-line   since 10/01/2009 19:23:11
2       on-line   since 10/01/2009 19:23:11
3       on-line   since 10/01/2009 19:23:11
4       on-line   since 10/01/2009 19:23:11
5       on-line   since 10/01/2009 19:23:11
6       on-line   since 10/01/2009 19:23:11
7       on-line   since 10/01/2009 19:23:11

What has happened is that the SUNWtmp_u4z1 resource pool has been removed, and all of its the CPUs returned to the default pool, so they are available to all the zones bound to it.

Summary

In this exercise we used dedicated CPUs to allocate CPU resources to zones. This can be used to provide predictable service to an application in a zone by giving it exclusive access to CPUs. It may also be easier to explain to IT clients than other resource management methods, since users can easily see that some number of CPUs "belong" to them, and their performance isn't dependent on the resource requirements of other applications running on the same server. Dedicated CPUs also can save considerable amounts of money for software licenses, for products that are licensed by the number of CPUs they run on. Dedicated CPUs are especially attractive on Sun's Chip Multithreading (CMT) servers, since they provide many CPUs at an extremely low price point with low space and environmental requirements.

The alternative to dedicating CPUs is to use the Fair Share Scheduler, which provides CPU power to a zone proportional to the number of shares the zone has, divided by the sum of shares given to all zones. Everything else being equal, if one zone has 10 shares and another zone has 20 shares, then the zone with 20 shares will get about twice the CPU power of the zone with 10. This only takes effect if there is no excess CPU capacity, and if both zones are able to consume all the CPU cycles made available to them.

The choice between using FSS or dedicated CPUs is based on both technology and policy: dedicated CPUs can be deterministic, easily explained, and save license fees for 3rd party software products, but can waste CPU power if a zone doesn't use the CPUs assigned to it. FSS is more flexible and provides more granular CPU resource allocation, but it doesn't provide guaranteed access. Solaris supports both styles of CPU resource management, in order to handle different customers priorities and business requirements.

Blog O' MattyBook review: Pulling Strings With Puppet

January 20, 2012 19:01 GMT
The devops movement (if you haven’t seen Ben Rockwood’s presentation on devops you should go watch it now) has been gaining steam over the past few years, and the movement has led to a lot of organizations adopting automation solutions like CFEngine, Chef or Puppet. I’ve had great success with puppet so far, and my [...]

Steve TunstallNew Storage Magazine awards for NAS... Check this out...

January 20, 2012 17:55 GMT

Well, it's hard to be quiet about this. Storage Magazine just came out with the January 2012 issue, showing Oracle Storage doing quite well (#1) with the Oracle ZFSSA 7420 and 7320 family. Check out pages 37-43 of this month's Storage Magazine.

Storage Magazine: http://docs.media.bitpipe.com/io_10x/io_103104/item_494970/StoragemagOnlineJan2012final2.pdf (pages 37-43)

award

January 19, 2012

Blog O' MattyUsing the automated bug-reporting tool (abrt) to generate core dumps when a Linux process fails

January 19, 2012 16:20 GMT
Software fails, and it often occurs at the wrong time. When failures occur I want to understand why, and will usually start putting together the events that lead up to the issue. Some application issues can be root caused by reviewing logs, but catastrophic crashes will often require the admin to sit down with gdb [...]

Ben RockwoodSending Email with Attachments from the Command Line

January 19, 2012 07:46 GMT

I have lots of awesome CLI based reporting tools. One was so awesome that other people in the company wanted to get it on a regular basis but they preferred to see it as CSV so it could be manipulated in Numbers or Excel. Modifying my report to output CSV was easy, I just added a conditional that replace my pretty column formated printf() with an ugly comma separated printf(). Sending CSV in email is easy, just pump it into ”sendmail -t”.

I quickly realized that using sendmail “as usual” sucked, because the CSV was in the body of the message, not an attachment. The solution was to send a Multi-Part MIME message. Doing so is easier than you think.

Lets look at a template example, piece by piece:

From: $FROM
To: $TO
Date: $DATE
Subject: $SUBJECT
Mime-Version: 1.0
Content-Type: Multipart/Mixed; boundary="ATTACHMENT-BOUNDRY"
Return-Receipt-To: $FROM

Some body stuff here, this is your message

Notice above that From, To, Date, is all pretty standard stuff. What is special is that we specify the MIME Version (1.0) and then set the content-type to “multipart/mixed”. Following that is a boundary string. A boundary string is an arbitrary string that represents the different parts of your message. In our case, it will separate the body from the attachments, but it can also be used for providing both HTML and Plain Text versions of a message in a single mail.

--ATTACHMENT-BOUNDRY
Content-Disposition: attachment;
filename="$FILENAME1"
Content-type: text/plain;
charset=US-ASCII;
name="$FILENAME1"
Content-Transfer-Encoding: quoted-printable

$ATTECHMENT_DATA1

The next section of of our message is noted by the boundary string prefixed by two dashes (--). Note that they are before but not after the boundary string! Next is the metadata about this portion of the message, namely the Content-type, encoding, and disposition.

It is important to note that Mail.app (OS X) is more strict about attachments than Thunderbird or Gmail. If you do not include a content-disposition it will register the section as just another part of the body. Mail.app requires that you be very careful about syntax, whereas Thunderbird and Gmail have a "I know what you meant" attitude.

--ATTACHMENT-BOUNDRY
Content-Disposition: attachment;
        filename="$FILENAME2"
Content-type: text/plain;
        charset=US-ASCII;
        name="$FILENAME2"
Content-Transfer-Encoding: quoted-printable

$ATTECHMENT_DATA2

--ATTACHMENT-BOUNDRY--

Here we have a second attachment. We could add as many as we wish, but notice that it ends with our boundary string again but now its surrounded by dashes front and back. This signifies the end our parts.

Thats really about it, pump all this into "sendmail -t" (ie: cat mymail.txt | sendmail -t, or equivalent) and away your mail goes.

One word about attachment type. Above the content type of the attachments was "quoted-printable". That or 8bit are fine for normal text such as CSV, but if you wish to send binary data you will want to base64 encode it (see BASE64(1) for syntax) and set the content-type as "base64".

January 17, 2012

Darryl GoveSeparation of debug and executable

January 17, 2012 17:16 GMT

To reduce the size of shipped binaries it can be useful to separate the debug information into a separate file. This procedure is covered in the dbx manual. We can use objdump to extract the debug information and then to link the executable with the extracted data.

Here's a short example executable:

#include <stdio.h>
#include <math.h>

int main()
{
  double d=1.0;
  d = sin(d);
  printf("sin(1.0) = %f\n",d);
}

Compiled with debug:

$ cc -g hello.c -lm
$ ./a.out
sin(1.0) = 0.841471

We can debug this executable with dbx. Note that, in this case, we compiled without optimisation in order to get the best debug information. Doing this does potentially sacrifice some performance. We can follow the same procedure with optimised code.

$ dbx ./a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop in main
(2) stop in main
(dbx) run
Running: a.out
(process id 53296)
stopped in main at line 6 in file "hello.c"
    6     double d=1.0;
(dbx) step
stopped in main at line 7 in file "hello.c"
    7     d = sin(d);
(dbx) print d
d = 1.0
(dbx) cont
Reading libc_psr.so.1
sin(1.0) = 0.841471

First of all we are going to use objcopy to extract the debug information from ./a.out and place it into ./a.out.debug:

$ /usr/sfw/bin/gobjcopy --only-keep-debug ./a.out ./a.out.debug

Now we can strip a.out of debug information:

$ strip ./a.out

To prove that this has removed the debug information we can try running under dbx:

$ dbx  ./a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop in main
dbx: warning: 'main' has no debugger info -- will trigger on first instruction
(2) stop in main
(dbx) quit

Now we want to use objcopy to make a link between the executable and its debug information:

$ /usr/sfw/bin/gobjcopy --add-gnu-debuglink=./a.out.debug ./a.out

Now when we debug the executable we are back to full debug:

$ dbx ./a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop  in main
(2) stop in main
(dbx) run
Running: a.out
(process id 58837)
stopped in main at line 6 in file "hello.c"
    6     double d=1.0;
(dbx) next
stopped in main at line 7 in file "hello.c"
    7     d = sin(d);
(dbx) print d
d = 1.0
(dbx) cont
Reading libc_psr.so.1
sin(1.0) = 0.841471

execution completed, exit code is 0
(dbx) quit

January 16, 2012

Constantin GonzalezEngineered Systems and Enterprise Architecture (or: How to Sell Dog Food Online)

January 16, 2012 16:21 GMT
A dog. And the TOGAF ADM cycle.

One of the first things that customers and sales teams realize when dealing with Engineered Systems is: They fundamentally change the IT architecture of a business.

Change is good, it means progress. But change is sometimes seen as a bad thing: Change comes with fear.

The truth is that Engineered Systems really empower IT architects to add value to their business, application and data architectures, without worrying about the technology architecture.

To understand this, we need to dig a bit deeper into Enterprise Architecture, specifically the TOGAF flavor of it.

One of my first tasks as a member of my new team was to get TOGAF certified. TOGAF is “The Open Group Architecture Framework” which is a fancy name for a giant document full of definitions, models, and methods for creating IT architecture out of a given business capability need.

Let’s look at a simple example: Suppose you’re the owner of a big dog food business empire. Your core competency is to know all about creating the best dog food, knowing your customer’s dog feeding preferences and habits, and having the best suppliers of dog food ingredients lined up from partners to help you excel at selling dog food.

And now you want to take your business to the next level and start selling dog food online. Here’s the typical progression of steps, assuming your company has successfully introduced TOGAF as your method of doing Enterprise Architecture, according to the TOGAF Architecture Development Method (ADM):

TOGAF ADM for Dummies

  1. Architecture Vision: Your boss told you to come up with an architecture for adding an online shop to your dog food business. The first step then is to create an architecture vision: How does it look like to sell dog food online? Who are the stakeholders for such a capability and what are their concerns?
  2. Business Architecture: Your enterprise architects sit down through numerous meetings with your business people, identifying a business architecture for selling dog food online. This is a bunch of diagrams, catalogs and matrices describing exactly how your online shop capability ties into the rest of your dog food business: What’s your online dog food business model? What are the use cases you want to support through your new capability? What existing processes does the new capability tie into? What new processes need to be defined to add the online shop capability?
    The output of this stage uses business building blocks: Business processes and capabilities, modeled at a very high level, and technology-agnostic, specific to your enterprise.
  3. Information Systems Architecture: Your enterprise architects translate the business architecture created in step 2 into an information system representation, which is composed of two architectures, a data architecture and an application architecture:
    1. The Data Architecture describes what kind of data you need to take care of for your online shop capability. Customer data, product data, payment data, shipping data, dog pictures, food pictures, dog food ingredient lists, all of it. It also describes where the data comes from, where it flows to, who needs what data when and so on. We see two types of data here: Structured data (what you'd place into a database) and unstructured data (i.e.: flat files).
    2. The Application Architecture describes what applications you need to handle your data: Web portals, CRM applications, supply chain management applications, shipping and tracking applications, order management applications, payment processing applications, dog food personalization and mixing order processing applications etc.
      More diagrams, more catalogs and more matrices are created to describe this stage of building your dog food online shopping empire. Notice that there are no particular technologies and no products considered at this stage either.

    The building blocks at this stage: Applications, database schemas, flat file specs. All particular to your enterprise.

  4. Technology Architecture: The next step is to translate the information systems architecture into a technology architecture. Now we’re getting somewhere: This is where our enterprise architecture heroes get to describe what systems are needed where, how to map applications and data to those systems, what networks to use, which servers are connected how at what locations, etc.
    And here comes the cut: The building blocks are now generic! A database server is a database server, no matter what it stores, be it dog food recipes or car configurations. An app server is an app server, no matter what app runs on top of it, same goes for a file server, a firewall or a router. All components of Technology Architecture are generic, there's hardly anything specific about selling dog food online here.
  5. Opportunities and Solutions: This is the earliest stage at which particular technologies (such as the Oracle database) come into play and where specific instances of a technology ("Our main CRM system running on Oracle on SPARC SuperCluster.") are mentioned.
  6. Migration Planning: Here's where the hand-over from the enterprise architects to the project managers occurs: Implementation and migration plans are forged, projects are created and prioritized, resources are allocated, hands become dirty.
  7. Implementation Governance: This stage ensures synchronization between Enterprise Architecture and its implementation, as carried out by you project managers, through running reviews and monitoring progress.
  8. Architecture Change Management: Finally the project is done, the dog food online shop goes live and hopefully the first orders come in. Parties are thrown and people get drunk. The next day, lessons are learned and captured, and fed into the next ADM cycle, starting again with an architecture vision. Perhaps the beta users identified a few bugs or RFEs, or the product managers have new ideas for what else the online dog food store could sell, but that would require a new capability to be added to the architecture: The cycle starts again.

The Limits of Creating Business Specific Value

What does this have to do with Oracle's Engineered Systems? The answer is that it's all about who creates unique value at what level of the architecture and who doesn't:

And yet, businesses spend enormous amounts of resources for coming up with their own custom made database cluster design, their own custom made app server farm design, their own custom made file server infrastructure design, their own custom made web infrastructure design and so on.

This is where the rise of Engineered Systems takes place.

The Engineered System: TOGAF Stage D in a Box

Oracle's Engineered Systems (you know, Exadata, Exalogic, SPARC SuperCluster, ZFS Storage Appliance and so on) are nothing more than the TOGAF stage D of the ADM in a few, easy to plan, buy, install, manage boxes, without the usual headaches that used to occur when planning Technology Architecture.

Want a new database for your dog food recipes? Just ask the DBA to allocate one for you out of an Exadata. Want to host the entire chain of app servers and portal that forms the application layer of your dog food shopping empire? Push a button on your PaaS portal to instantiate a virtual assembly on one of your Exalogic boxes. Want something more powerful? Perhaps with more transactional crypto-oomph? Ask you friendly PaaS portal for a slice off of your shiny new SPARC SuperCluster. Wanna analyze heaps of personalized dog food recipes and correlate them with mailman assault reports, dog flu epidemic data and weather forecasts? That's what the big data appliance and Exalytics are for.

The point here is: No business should spend time, money and resources for creating yet another personalized flavor of a RAC cluster, an app server tier or a web portal infrastructure. This is what IT companies like Oracle can do better than anyone else: Design Technology Architecture.

Instead, use your energy and resources to create real business value. What's your online dog food architecture?

P.S.: Please forgive my dog food analogy. Here's the Guy who planted it into my mind:

TOGAF ADM picture taken from Wikipedia, used under NASA public domain policy.
Dog picture by digital_image_fan on Flickr, used under Creative Commons license.

Blog O' MattyBind’s strict zone checking feature is part of CentOS 6

January 16, 2012 13:51 GMT
I recently moved a bind installation from CentOS 5 to CentOS 6. As part of the move I built out a new server with CentOS 6, staged the bind chroot packages and then proceeded to copy all of the zone files from the CentOS 5 server to the CentOS 6 server. Once all the pieces [...]

January 15, 2012

Blog O' MattyLocating Linux LVM (Logical Volume Manager) free space

January 15, 2012 13:24 GMT
The Linux Logical Volume Manager (LVM) provides a relatively easy way to combine block devices into a pool of storage that you can allocate storage out of. In LVM terminology, there are three main concepts: Physical Volumes – A sequence of sectors on a physical device. Volume Groups – A group of physical volumes. Logical [...]

January 14, 2012

Blog O' MattyUsing exec-shield to protect your Linux servers from stack, heap and integer overflows

January 14, 2012 15:08 GMT
I’ve been a long time follower of the OpenBSD project, and their amazing work on detecting and protecting the kernel and applications from stack and heap overflows. Several of the concepts that were developed by the OpenBSD team were made available in Linux, and came by way of the exec-shield project. Of the many useful [...]

Blog O' MattyFcron, a feature rich cron and anacron replacement

January 14, 2012 14:34 GMT
I’ve been looking at some opensource scheduling packages, and while doing my research I came across the fcron package. Fcron is a replacement for vixie cron and anacron, and provides a number of super useful features: - Run jobs based on the system load average. - Serialize jobs. - Set the nice value of the [...]

January 13, 2012

Darryl GoveC++ and inline templates

January 13, 2012 16:00 GMT

A while back I wrote an article on using inline templates. It's a bit of a niche article as I would generally advise people to write in C/C++, and tune the compiler flags and source code until the compiler generates the code that they want to see.

However, one thing that I didn't mention in the article, it's implied but not stated, is that inline templates are defined as C functions. When used from C++ they need to be declared as extern "C", otherwise you get linker errors. Here's an example template:

.inline nothing
  nop
.end

And here's some code that calls it:

void nothing();

int main()
{
  nothing();
}

The code works when compiled as C, but not as C++:

$ cc i.c i.il
$ ./a.out
$ CC i.c i.il
Undefined                       first referenced
 symbol                             in file
void nothing()                   i.o
ld: fatal: Symbol referencing errors. No output written to a.out

To fix this, and make the code compilable with both C and C++ we use the __cplusplus feature test macro and conditionally include extern "C". Here's the modified source:

#ifdef __cplusplus
  extern "C"
  {
#endif
    void nothing();
#ifdef __cplusplus
  }
#endif

int main()
{
  nothing();
}

January 12, 2012

Darryl GovePlease mind the gap

January 12, 2012 16:00 GMT

I find the timeline view in the Performance Analyzer incredibly useful, but I've often been puzzled by what causes the gaps - like those in the example below:

Timeline view

One of my colleagues pointed out that it is possible to figure out what is causing the gaps. The call stack is indicated by the event after the gap. This makes sense. The Performance Analyzer works by sending a profiling signal to the thread multiple times a second. If the thread is not scheduled on the CPU then it doesn't get a signal. The first thing that the thread does when it is put back onto the CPU is to respond to those signals that it missed. Here's some example code so that you can try it out.

#include <stdio.h>

void write_file()
{
  char block[8192];
  FILE * file = fopen("./text.txt", "w");
  for (int i=0;i<1024; i++)
  {
    fwrite(block, sizeof(block), 1, file);
  }
  fclose(file);
}

void read_file()
{
  char block[8192];
  FILE * file = fopen("./text.txt", "rw");
  for (int i=0;i<1024; i++)
  {
    fread(block,sizeof(block),1,file);
    fseek(file,-sizeof(block),SEEK_CUR);
    fwrite(block, sizeof(block), 1, file);
  }
  fclose(file);
}

int main()
{
  for (int i=0; i<100; i++)
  {
    write_file();
    read_file();
  }
}

This is the code that generated the timeline shown above, so you know that the profile will have some gaps in it. If we select the event after the gap we determine that the gaps are caused by the application either opening or closing the file.

_close

But that is not all that is going on, if we look at the information shown in the Timeline details panel for the Duration of the event we can see that it spent 210ms in the "Other Wait" micro state. So we've now got a pretty clear idea of where the time is coming from.

Steve TunstallNew ZFSSA simulator download

January 12, 2012 10:34 GMT

I've just been informed that the simulator download has been updated to the latest version of 2011.1.1.

So instead of trying to upgrade your older simulator, it is possible to download and install the new one at the latest code. Mine upgraded just fine, but some people report errors during upgrading, which occurs when using a computer or laptop without enough memory or a variety of other problems. You can get the simulator here:

http://www.oracle.com/webapps/dialogue/ns/dlgwelcome.jsp?p_ext=Y&p_dlg_id=10521841&src=7299332&Act=45

January 11, 2012

Darryl GoveA static function, an inline function, and a static variable walked into a bar....

January 11, 2012 16:00 GMT

... well, not really. Hacking around with some library code, so I thought I'd write up a quick refresher on scoping. Steve Clamage and I cover scoping in more detail in the series on libraries and linking. For the code I was working on today, the problem was much more limited.

I had a single file containing all the source code. I wanted to export only the minimal number of symbols that were needed to act as an interface for the library. You can imagine it being something like:

#include <stdio.h>

int count=0;

inline void printcount()
{
  printf("Count = %i\n",count);
  asm("nop");
}

void next()
{
  count++;
  printcount();
}

If I compile this, and then use nm to inspect the resulting library, I can see a global symbol for count. The function printcount() is defined with local scope. However, the only interface I want to export is next().

bash-3.00$ cc -g -G -O -o libt.so t.c
bash-3.00$ nm libt.so|grep GLOB
...
[45]    |     66468|       4|OBJT |GLOB |0    |11     |count
[43]    |       724|      40|FUNC |GLOB |0    |5      |next
[42]    |         0|       0|FUNC |GLOB |0    |UNDEF  |printf
bash-3.00$ nm libt.so |grep count
[44]    |     66460|       4|OBJT |GLOB |0    |11     |count
[32]    |       672|      52|FUNC |LOCL |0    |5      |printcount

So I can define count as a static variable, and that reduces its scope to the file in which it is defined. However, this does not actually make it disappear, it is still there, but with name mangling:

bash-3.00$ nm libt.so|grep count
[40]    |     66476|       4|OBJT |GLOB |0    |11     |$XAS4IkBuA_CPGtc.count
[33]    |       688|      52|FUNC |LOCL |0    |5      |printcount

The reason for this is that I'm building with debug (-g). With debug, I get a local version of the routine printcount(), and I get a globalised version of the variable count. If I remove -g, I get the following output from nm:

bash-3.00$ nm libt.so|grep count
[29]    |     66316|       4|OBJT |LOCL |0    |11     |count
[36]    |         0|       0|FUNC |GLOB |0    |UNDEF  |printcount

The variable count has local scope, which is what we expected - it is no longer exported from the file, so we have avoided possible name conflicts there. However, printcount() is now no longer defined. That might be ok so long as we don't actually call the routine:

bash-3.00$ dis libt.so|grep printcount
printcount()
         2e4:  7f ff ff ef  call        printcount      ! 0x2a0

Oops. We've hit the rule about needing to provide an extern version of any inline functions. Once again, I suggest parsing Douglas Walls' discussion of the topic for the gory details. Anyhow, the upshot is that this library wouldn't work. The fix is trivial, declare printcount() to be static inline, and the compiler will generate the local version of the function:

bash-3.00$ cc -G -O -o libt.so t.c
bash-3.00$ nm libt.so |grep count
[29]    |     66448|       4|OBJT |LOCL |0    |11     |count
[30]    |       664|      52|FUNC |LOCL |0    |5      |printcount

With these fixes the library no longer exports any functions but the ones I left with external linkage. This substantially reduces the risk of "undefined behaviour".

Steve TunstallEven more ZFSSA announcements

January 11, 2012 07:53 GMT

The new announcements for the ZFSSA just keep on coming.

Oracle has released today the 3TB drives for the 7420 and 7320 disk trays. So you now can choose 2TB and 3TB 7,200 RPM drives and 300GB and 600GB 15,000 RPM drives in your 7420 and 7320 systems.

Now, the 2TB drive have a last order date of May 31, 2012, so after that it will be 3TB only for the slower-speed drives.

Also, has anyone checked out the new local replication feature that just came out in the 2011.1.1 software release? I'm going to play with it this week and I'll do a write up on it soon.

Steve 

January 10, 2012

Darryl GoveWhat's inlined by -xlibmil

January 10, 2012 16:00 GMT

The compiler flag -xlibmil provides inline templates for some critical maths functions, but it comes with the optimisation that it does not set errno for these functions. The functions it inlines can vary from release to release, so it's useful to be able to see which functions are inlined, and determine whether you care that they don't set errno. You can see the list of functions using the command:

grep inline /compilerpath/prod/lib/libm.il
        .inline sqrtf,1
        .inline sqrt,2
        .inline ceil,2
        .inline ceilf,1
        .inline floor,2
        .inline floorf,1
        .inline rint,2
        .inline rintf,1
...

From a cursory glance at the list I got when I did this just now, I can only see sqrt as a function that sets errno. So if you use sqrt and you care about whether it set errno, then don't use -xlibmil.

Joerg MoellenkampTracing ZFS

January 10, 2012 10:34 GMT
Brendan Gregg wrote a really interesting article about tracing ZFS: Activity of the ZFS ARC. Really worth a read.

January 09, 2012

Darryl GoveUnderstanding binary size

January 09, 2012 21:20 GMT

One of my colleagues, Miriam Blatt, has written a great article about understanding the size of binary objects. This is worth a read because it describes both what goes into the objects and what tools you can use to discover this information.

January 08, 2012

Joerg MoellenkampExtermination

January 08, 2012 13:12 GMT
Buffer Extermination? WTF? Normally i'm seeing wait events like "buffer busy", "log sync" or "db file sequential read" when doing my research in Oracle installation in Top5 events when i'm called because of a situation where the performance is not quite at the level the customer wants. I was sitting in front of the console of an system still using 10g as it's database.

I want to add, that the performance problem had its root somewhere else and was quickly found, however this log wait sparked my interest. A much simpler reason. It was the curiosity afterwards, why there were peaks in the wait event statistics in regular intervals with this wait event i never saw before.

"Buffer exterminate"? WTF ... again. Sounds dangerous. Never saw that before in that list, and than my brain rotated … what the heck is "Buffer Exterminate", i have an idea, something is ringing in my head, but somehow my long-term memory management unit of my brain was unable to stage this information in to current working set. Okay … ask Dr. Google.

Metalink [ID 259137.1] is of great help here. The "buffer exterminate" wait occurs (and can only occur), when the buffer cache is shrunk dynamically and a session wants to access data that is in the granule of the buffer that is chosen by Oracle for removal from the buffer cache. The session wanting the block has to wait until the buffer to be removed has been freed to read it from disk then. You can't simply read the block from disk without waiting, as the block in the granule may represent a new state of the block than the one on the disk an simply reading the one from the disk would yield just outdated data. So you can just wait until the granule has been released.

Before you ask, what a granule is: Oracle doesn't allocate memory in the SGA bytewise, but in so called granules. A granule is 4 MB of memory, when your SGA is up to 1 GB when the instance starts. It's 16 MB when your SGA is larger than 1GB at startup.

In Oracle DB 10g, there is a feature called "Automatic Shared Memory Management". The idea is, that Oracle itself monitors the load and configures the layout of the SGA. I think of automatic means as a very good feature. It's like with manual and automatic gearboxes. Surely, a good driver can accelerate faster with a manual gearbox than with an automatic gearbox, however an automatic gearbox is faster and better than 99% of all drivers. That said, given the existence of behaviour patterns explained by the Dunning-Kruger-effect (h/t to Chris Colomb for hinting me to this interesting phenomenon), 99% of drivers think are part of the 1%. This is especially epidemic in Germany. But back to the issue. It's the same with tuning of systems ;-)

You activate the ASMM by setting the parameter SGA_TARGET to a value unequal to zero. Now the system sizes the buffer cache (DB_CACHE_SIZE), shared pool (SHARED_POOL_SIZE), large pool (LARGE_POOL_SIZE) and Java pool (JAVA_POOL_SIZE) automatical within the limit set by SGA_TARGET. If one of the other parameters controling one of the mentioned memory areas is set to a value other than 0, the value is assume as the minimum amount of memory.

Of course: When you have fixed SGA_TARGET and you want to grow one part, another has to shrink. It's obvious that you can't do shrinking simply by throwing the block out of the memory. There may be dirty blocks in that granule(changed blocks that weren't written to disk so far by the database writer to the database file, just to the redo logs).

This works really good and this relieves the admin from investing time to find good values for some of the most important SGA parameters.

However if your database tries to move memory back and forth from one kind of shared memory to another tens of times per hour this is surely not without impact on your database performance. I had such a situation in this case. The system started to move around memory in minute intervals just to move it back a minute or two later. As most automatic systems they will work perfectly within their specification, but you may hit a situation where tries to get most out of a situation with restricted resources, where the SGA is confronted with the situation that all components want more memory and as soon you remove memory from one parts, the other part cries and wants its memory back. That's similar to the argument with your significant other about what's the half of the blanket. Better have two blankets ;-) Or to get back to the topic: Have enough SGA ...

How do you find out, how many resizing operations took place? You can look that up by a select statement as described in this blog:
select START_TIME, component, oper_type, oper_mode,status, initial_size/1024/1024 "INITIAL", target_size/1024/1024 "TARGET", FINAL_SIZE/1024/1024 "FINAL", END_TIME from v$sga_resize_ops order by start_time, component;

With this statement you will see the recent history of resizings.

In this case a slight increase (4 gigs) of the target size of the SGA moved the system away from growing and shrinking the buffers back and forth. And not a single "buffer extermination" was seen afterwards and no peaks in the wait time statistics and the number of resizing ops was down to one per hour. And that was more than okay.

Other solutions would be the deactivation of ASMM (by setting SGA_TARGET to zero) and configuring everything manually(doing it the old way) or setting some reasonable minima for the values controlled by ASMM. Important to know: In the amount specified SGA_TARGET is not only the amount of memory for the four parts mentioned before, it's for the complete SGA. So the amount of memory used for other parts of the SGA than those managed by ASMM has to deducted from the SGA_TARGET size. And this reduced amount of SGA is available for the SGA areas managed by ASMM.

January 06, 2012

Jeff SavitT4 arrives!

January 06, 2012 19:44 GMT
T4 arrives!

I was eagerly waiting for the announcement made last week on the new SPARC T4 processor and servers. The T4 provides landmark performance (see Bestperf blog), with world records beating systems based on IBM Power7, IBM mainframe, and Intel Westmere. The T4 adds world-class single CPU thread performance to the throughput computing performance T-series systems are known for. It has 2.85 or 3.0Ghz clock rate, branch prediction, longer pipelines, Out-Of-Order execution, for up to 5x better per-CPU performance than its predecessor. Forget bogus old cliches like "SPARC is slow" or "T-series is slow"!

Product evolution

The first generation T1 chip provided up to 8 cores, each with 4 CPU threads (hence the name CMT for Chip MultiThreading). Each core ran at the same time as the others (a chip could retire 8 instructions per clock cycle), providing round robin service to its CPU threads. On a cache miss or after a quantum of clock cycles, the core would switch to the next CPU thread. This technique is extremely effective because most workloads spend a lot of time - often estimated at 2/3 to 3/4 of the time - suffering from cache misses. During a cache miss an instruction "stalls" until RAM responds with a cache line of data. T-series uses otherwise-wasted stall time in one thread to run a different CPU thread's instructions. You can contrive instruction kernels whose working sets always fit in cache, but that's not Real World.

The T1 effectively provided a 32-way multiprocessor. No individual processor was particularily fast because transistors were spent on creating more (simple) threads rather than fast clocks and deep pipelines. In aggregate, the many CPUs provided excellent throughput. Subsequent designs had 8 cores with 8 CPU threads per core (T2 and T2+) for 64 threads/chip or 16 cores with 8 threads per core (T3) for 128 threads/chip. These dramatically increased compute density but had only modest improvements for single-thread applications - except for floating point and crypto, which were dramatically sped up.

Now, the T4 has 8 cores with 8 threads, but with much faster per-thread performance.

Application suitability

T-series products always provided great throughput performance and price/performance, but you had to select applications that matched the machines' characteristics. Ideally that meant multi-threaded applications with good parallelism. Fortunately, a lot of workloads fit that thread-rich profile: web servers, messaging servers, Java application servers, and some database and middleware applications. Another approach is consolidation of multiple (even non-threaded) workloads, using T-series' builtin virtualization. Applications requiring single-CPU performance were better suited for M-series, which is designed for vertically scaled purposes but doesn't have hardware crypto and a built-in hypervisor. A trade-off.

The T4 removes the constraint on single-CPU performance, and T-series can be used for parallel applications that use many CPUs, consolidation workloads, and apps requiring hot single CPU performance.

Measurement pitfall #1

A common situation is that somebody would say "My application isn't going fast enough, but vmstat says that the CPU is almost completely idle. What's happening?" Closer inspection would reveal that CPU utilization was indeed very low - 1% to 3% - but mpstat would show that one or two of the CPU threads were working as hard as it could. Consider a 128-thread T3-1 with only 1 active thread: vmstat will show average CPU utilization of 1/128, which is about 0.8%, even when 1 thread is 100% busy. The answer: run more threads! The box is almost completely idle, and adding more compute load won't slow down the existing application.

Measurement pitfall #2

Another pitfall happens when people measure performance of a single transaction on an empty system. Sometimes developers even compare response time on their laptops to the production servers. This gives a distorted view of performance unless your production systems are idle at peak load!

Consider this hypothetical (and rather simplified) example. Let's assume that CPU service time for a transaction on a 1.65GHz T3 chip is twice the time of a product with a deep pipeline and 2 CPUs running at 3GHz, and that response time is solely due to CPU service time. If response time on the T3 is 0.6 seconds, response time for a single transaction on the faster clock machine is 0.3 seconds. If the service level agreement requires 1 second response time, then both products are acceptable even though the faster clock produced faster response time.

What happens if we add concurrent transactions, as would happen in a real workload? Under our simple assumptions, the 2-CPU machine will still have 0.3 second response with 2 concurrent transactions (each gets 100% of one of two CPUs). But at 40 concurrent transactions, each transaction has the equivalent of only 5% of a CPU (2 CPUs divided by 40), and CPU service time grows to 6 seconds. On the T3 server, each of the 40 concurrent transactions will have 100% of a CPU, and response time will still be 0.6 seconds, even up to 128 current transactions - at 100 transactions the 2 CPU system has 50x slower response (15 seconds) while the T3 would still be subsecond. That's the scalability of throughput computing: under load, the T-series system performs much better. (Yes, I know I'm over simplifying, but at a crude level that's how it works). Don't measure single transactions on idle systems!

What has changed

The big difference with the T4 is that it provides both the throughput of the earlier T-series chips (with networking, crypto, and virtualization enhancements I'll discuss at a later time) and the single-CPU performance that wasn't previously available on T-series. No more need to carefully select multi-threaded workloads - the T4 chip is a powerhouse for a very broad range of applications.

Which server should I pick?

A natural question for SPARC and Solaris customers would be "should I use a T4, a T3, or an M-series product?" Now that T-series has a broader range of applicability, there's more choice in platform selection: a T4 can be used in cases where M-series would have been the only answer. There's more overlap.

In general, the M-series will still have the advantage for vertically scaling workloads that need massive CPU, memory, and I/O capacity, that need the higher redundancy and reliability features, and depend on the ability to add capacity to a running system by inserting CPU boards when needed. The T3 product will still find use in pure throughput computing applications because it has the higher core density and lower software license core factor (0.25 instead of 0.5).

So, there's still room for the different models - but the best news is that it remains completely compatible SPARC and Solaris, so systems and applications can be deployed (and redeployed) without concerns about compatibility.

Summary

The T4 processor and the servers based on it mark a new level of performance for SPARC processors. With record performance it changes the game (and turns over stale assumptions) about SPARC performance. It also illustrates the commitment Oracle has to SPARC and Solaris, and our increased ability to execute on delivering faster system products. By adding single CPU performance to T-series, it extends the ability to leverage Oracle VM Server for SPARC (LDoms) for a broader range of applications. Big news indeed - and Oracle Open World is just starting up, so watch Oracle.com and blogs.oracle.com closely the next few days.

Jeff SavitSecure administration of Oracle VM Server for SPARC (Logical Domains)

January 06, 2012 19:43 GMT
A few days ago I was showing new features of Oracle VM Server for SPARC (informally shortened to "OVMSS", or still called LDoms) to a customer, and during the demo he asked "Why are you using root userid for this? Is that necessary?"

I guess the most literal answers would have been "Uh, because I never bothered about it before," and "No, root is not necessary." You can manage OVMSS without root by using Solaris' Role Based Access Control to assign just the needed authorizations to a non-root userid. In real deployments (unlike my little demo lab) that's really the best way to go.

(Irrelevant aside) I'm forcing myself to use the Chicago Manual of Style convention in which punctuation goes inside quoted text. I dislike it, myself. No, it's not an Oracle standard, AFAIK, but publishers seem to insist on it.

Why this is important

A bit of history and editorialization...

The all-powerful root 'super-user' is an artifact of Unix from its earliest days. In the original Unix security model, a userid was either root ("uid 0"), which can run any command, read, write, or remove any file, kill any process, shutdown the system, or a regular user (uid!=0) subject to authorization checks and restricted to its own playpen. It can't do any of those other fun things.

While convenient to concentrate all power in a system administrator userid, it was also risky. It's too easy to do a destructive "oops" while logged in as root, and has horrid security implications. Anyone who obtained the root password or otherwise managed to fool the system into thinking he or she was root could do anything. The root password had to be shared among administrators so they could login to do their administrative tasks - making it easy to compromise the password, and impossible to audit. If a root user accidentally ran a malicious binary (say, by not setting PATH carefully to include only trusted directories) that command would run with root privileges and could in turn do evil things - including setting trap doors that might swing open later.

I always felt that Unix 'root' was a mistake, and that separation of functions should have been considered from the outset. In all fairness, Unix grew up in trusting environments where this wasn't a consideration. For what it's worth, several operating systems have a similar history ("Whee! I can kill login sessions, shut the system down, and store arbitrary values into any location of RAM!" - from another OS), and evolved granular control over administrative privileges over time.

Solaris has, of course, provided rigorous security features for many years - which I leverage in this article.

OVMSS Security

Security is even more important in a virtual machine environment, since compromising a virtual machine monitor compromises the guests running underneath it. Fortunately, Oracle VM Server for SPARC was designed with security in mind. Some security features are provided by the underlying architecture, and others leverage Solaris security capabilities.

Separation of function

OVMSS architecture provides separation of function, using a firmware-based based hypervisor that runs on a processor invisible to guest domains. Other functions (administration, virtual devices) are delegated to a control domain that serves as an administrative control point, and service domains that provide virtual I/O to guest domains that run applications. (To fill out the picture and use the proper definitions: an I/O domain has access to a PCI bus and its devices; a service domain is therefore usually an I/O domain, and since a control domain needs a bus and I/O devices to boot, it is usually used as a service and I/O domain as well.) This is an architectural advantage over designs where all administrative power and access to all system resources resides in a single monolithic hypervisor.

Domain isolation

All domains run on their own CPU threads and RAM, providing a high degree of physical isolation. Each has a separate Solaris instance, so they are separate in terms of security scope.

Since the control domain is the administrative control point for the server, it is further protected by making it inaccessible to network access from guest domains. Guest domains cannot even ping the control domain via the virtual switch! This is by design, and prevents a compromised guest from mounting a network-based attack on the control domain. This default behavior can be changed by plumbing the virtual switch (as documented in the OVMSS administrative guide). After that, the guest domain and control domain can access one another via TCP/IP as usual. Still, the default behavior is to start with strict isolation.

Securing the control domain

How then, do we secure the control domain? The first thing is to apply whatever site-specific Solaris standards are applicable. Next advice: don't permit login by arbitrary users who may otherwise have legitimate access to your servers, since the only purpose for the control domain is to administer the virtualization environment or get access to guest domain consoles. If somebody has no business being on the control domain, they shouldn't even be able to get on.

No clear text password is allowed for authorized users - that's so last-century! Instead, we always login via ssh so passwords and session contents fly across the wire encrypted. Which reminds me: some popular virtual machine products do not encrypt memory contents during virtual machine migration - which exposes their contents (which may include passwords, Social Security ids, credit card numbers) to snooping. Be wary!

Deploying RBAC

Now to the meat of things: using RBAC to authorize selected non-root users to issue commands to the logical domain manager.

Authorization comes at two levels: read access, to view the configuration, and read/write access which lets you read or alter the domain environment. The corresponding Solaris authorizations are solaris.ldoms.read and solaris.ldoms.write. These authorizations are defined on the Solaris instance, stored in /etc/security/auth_attr when the LDoms manager software is installed. You can see that there are related authorizations, such as the one to manage the domain service, and authorizations for guest domain consoles ("vntsd" stands for "virtual network terminal server daemon" - quite a mouthful to pronounce.) Note that in the examples below (captured from terminal sessions I've just done), a prompt sequence with "#" indicates I'm logged in as root, and anything else indicates I'm logged in as a "regular" user.

# cat /etc/security/auth_attr |grep LDoms
solaris.ldoms.:::LDoms Administration::
solaris.ldoms.grant:::Delegate LDoms Configuration::
solaris.ldoms.read:::View LDoms Configuration::
solaris.ldoms.write:::Manage LDoms Configuration::
solaris.smf.manage.ldoms:::Manage Start/Stop LDoms::
solaris.vntsd.:::LDoms vntsd Administration::
solaris.vntsd.consoles:::Access All LDoms Guest Consoles::
solaris.vntsd.grant:::Delegate LDoms vntsd Administration::

Further, these authorizations are collected into profiles stored in /etc/security/prof_attr

# cat /etc/security/prof_attr |grep ^LDoms
LDoms Management:::Manage LDoms domains:auths=solaris.ldoms.*
LDoms Review:::Review LDoms configuration:auths=solaris.ldoms.read

We haven't been consistent with upper and lower case, have we? Well, each file is consistent with its own stylebook.

Now, I'll create two plain old userids using the normal commands:

# useradd -d /export/home/ldmuser1 -s /bin/bash ldmuser1
# zfs create rpool/export/home/ldmuser1
# chown -R ldmuser1 /export/home/ldmuser1
# passwd ldmuser1
New Password: 
Re-enter new Password: 
passwd: password successfully changed for ldmuser1

I do the same for user ldmuser2. So far, so boring - this is SA 101. I'll log into one of them and show that by default it cannot execute ldm commands.

-bash-3.00$ export PATH=/usr/sbin:$PATH
-bash-3.00$ ldm list
Authorization failed

Now, back as root, I'll add the read authorization

# usermod -A solaris.ldoms.read ldmuser1
UX: usermod: ldmuser1 is currently logged in, some changes may not take effect until next login.
Despite the above warning, it works right away:
-bash-3.00$ ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    2G       2.6%  13d 5h 52m
rover            active     -n----  5000    8     1G       0.2%  13d 18h 5m
I can read, but can I also modify? Let's try to change the domain I used in my previous few articles.
-bash-3.00$ ldm set-vcpu 16 rover
Authorization failed
No problem - working as desired, and we can change that easily.
# usermod -A solaris.ldoms.write ldmuser1
UX: usermod: ldmuser1 is currently logged in, some changes may not take effect until next login.
And sure enough, on user ldmuser1:
-bash-3.00$ ldm set-vcpu 16 rover
-bash-3.00$ ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    2G       0.6%  13d 5h 54m
rover            active     -n----  5000    16    1G       0.0%  13d 18h 7m
-bash-3.00$ ldm set-vcpu 8 rover

That was easy. If I want to retract the ability I can do that easily too.

# usermod -A "" ldmuser1
UX: usermod: ldmuser1 is currently logged in, some changes may not take effect until next login.
and again on user ldmuser1
-bash-3.00$ ldm list
Authorization failed

There are several ways of adding magic powers to a userid. In the preceding example I added the specific authorizations, but I can also add a profile to the user, and the profile inherits the authorizations defined for it in /etc/security/prof_attr. Note the change to the user entry in /etc/user_attr

# usermod -P "LDoms Management" ldmuser1
UX: usermod: ldmuser1 is currently logged in, some changes may not take effect until next login.
# cat /etc/user_attr|grep ldmuser
ldmuser1::::type=normal;profiles=LDoms Management
Sure enough, we're back in business:
-bash-3.00$ ldm set-vcpu 16 rover
-bash-3.00$ ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    2G       1.0%  13d 6h 27m
rover            active     -n----  5000    16    1G       0.1%  13d 18h 40m
-bash-3.00$ ldm set-vcpu 8 rover

Would you like a Role in your RBAC?

Finally, we can do this with roles. Roles are a special type of user account that you don't directly log into. Instead, they are associated with a profile (see above), and users are designated as being able to assume that role.

The benefit is that the user assumes the role only at the specific times when they need to perform the relevant task, instead of running with "extra power" at all times. This enhances both safety (protection against "oops!") and security, since the user has to explicitly assume the role and authenticate with a password.

In this case, I define a role called LDomDemo, assign it the 'LDoms Management' profile, and then set ldmuser2 to be able to switch into that role. Since LDomDemo is a role, not a regular user, you can't log into it - but it gets a password anyway to guard switching into it via su.

# roleadd LDomDemo
# rolemod -P 'LDoms Management' LDomDemo
# usermod -R LDomDemo ldmuser2
# passwd LDomDemo 
New Password: 
Re-enter new Password: 
passwd: password successfully changed for LDomDemo

Now I log into ldmuser2 to try it out. Note that it initially has no additional profiles, and fails to run an ldm command, until I assume the LDomDemo role via su.

-bash-3.00$ profiles
Basic Solaris User
All
-bash-3.00$ ldm list
Authorization failed
-bash-3.00$ roles
LDomDemo
-bash-3.00$ su LDomDemo
Password: 
$ ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    2G       0.4%  13d 7h 44m
rover            active     -n----  5000    8     1G       0.1%  13d 19h 56m
$ exit
-bash-3.00$ 
Note the difference in shell prompt: while logged in as myself I'm running bash, but run the protected shell ("$") when assuming the role.

Again, the advantage of this method is that your userid doesn't have additional powers until you assume the relevant role, which helps protect you from mistakes that accidentally use super-powers when you don't mean to. Using a role also provides additional password protection to complement role assignment.

Let's migrate a domain

Now that this has been done, let's log into ldmuser1 and migrate the domain rover to our neighbor machine (which I've also set up with the equivalent userids)

-bash-3.00$ ldm migrate rover ldmuser1@192.168.100.24
Target Password: 
Cannot enable FILE_DAC_READ privilege
Huh? I try every combination of command syntax (use the -p option, omit the target userid, whatever) and it makes no difference. Okay, I can take a hint, and I add the required privilege. This is strong medicine, because it lets a user account read /etc/shadow.
# cat user_attr|grep ldmuser1
ldmuser1::::type=normal;defaultpriv=basic,file_dac_read;profiles=LDoms Management
Let's try again:
-bash-3.00$ ldm migrate rover ldmuser1@192.168.100.24
Target Password: 
Cannot enable FILE_DAC_SEARCH privilege
That's progress, I suppose. Let's add the remaining privilege that it is explicitly telling me to add - no investigation is needed!
# cat user_attr|grep ldmuser1
ldmuser1::::type=normal;defaultpriv=basic,file_dac_read,file_dac_search;profiles=LDoms Management
I now go back to my terminal window for ldmuser1 and try again, and it works fine. Note that the nifty ppriv command tells me what privileges my shell enjoys.
-bash-3.00$ ppriv $$
27728:  -bash
flags = 
        E: basic,file_dac_read,file_dac_search
        I: basic,file_dac_read,file_dac_search
        P: basic,file_dac_read,file_dac_search
        L: all
-bash-3.00$ ldm migrate rover ldmuser1@192.168.100.24
Target Password: 

It works, but it's really not the right way, as I'll explain next.

Let's migrate a domain using a role, and why roles are better

I didn't understand why the Admin Guides for OVMSS 2.1 and 2.0 do not mention the requirement for file_dac_read and file_dac_search, while the older document for Logical Domains 1.3 (the version before being renamed) does, and tells you how to add them. It was easy to figure out, since the command tells you exactly what it is missing, but puzzling.

Menno Lageman explained this to me (thanks, Menno!). Correct practice is to use a role, so the guide doesn't illustrate directly adding the powerful file_dac_read and file_dac_search privileges to a user account. Domain migration uses these privileges to read root-owned private key and certificate files used to setup the SSL connection between the source and target hosts, a security privilege that should be carefully controlled. Directly adding file_dac_read and file_dac_search to a userid as I did above means that it has those powers all the time, when running any binary! Instead, leveraging a role means that the privileges are only set when running the ldm binary which itself has an execution attribute associated with the "LDoms Management" RBAC profile. This adds a layer of protection: gaining the privilege requires running the binary, and running the binary requires assuming the password-protected RBAC role and running under a profile-aware shell or pfexec.

While using a non-root userid to manage domains is better than using all-powerful root, a userid that can't exercise special powers until you assume the password-protected relevant role and run the correct binary is even better.

So, here's how it looks using ldmuser2, which I previously set up to use the role LDomDemo. I've logged into ldmuser2, and show that it can't issue an ldm command until I assume the LDomDemo role (which requires an additional password. Good). After that I can issue the migrate command without any extra magic incantations.

-bash-3.00$ id
uid=103814(ldmuser2) gid=1(other)
-bash-3.00$ profiles
Basic Solaris User
All
-bash-3.00$ ldm list
Authorization failed
-bash-3.00$ roles
LDomDemo
-bash-3.00$ su LDomDemo
Password: 
$ ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    2G       1.2%  15d 6h 59m
atl-sewr-pool-155 active     -n----  5001    8     2G       0.1%  18d 18m
rover            active     -n----  5000    8     1G       0.1%  15d 19h 11m
$ ldm migrate rover 192.168.100.24
Target Password: 
$ ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  SP      16    2G       0.3%  15d 7h 1m
atl-sewr-pool-155 active     -n----  5001    8     2G       0.2%  18d 20m

So, the moral of the story is: use the RBAC roles - that's what the "R" stands for!

Summary, and where to learn more

This article describes how to use RBAC to secure an Oracle VM Server for SPARC system by eliminating use of the root userid and restricting power to specific users and roles when they need them. That, along with restricting what userids can log into a control domain in the first place, should be considered for any domain environment. Other tasks you may wish to consider include using RBAC to control access to guest domains consoles and to enable security auditing.

Reference information for these tasks can be found in Chapter 3 of the Oracle VM Server Administration Guide. The OVM for SPARC document library can be found at http://www.oracle.com/technetwork/documentation/vm-sparc-194287.html.

Jeff SavitEngineered and General Purpose Systems

January 06, 2012 19:42 GMT
Engineered and General Purpose Systems

One thing I learned on joining Oracle is that the company likes to make a big splash at Oracle OpenWorld (though we did announce big items like the new T4 platform beforehand), and this year's event fit the pattern. (Oh yeah, before I get distracted: Solaris 11 is coming! Be there or be square!) This OOW highlighted the increased shift towards "engineered systems", a dramatic change in how systems will be designed and delivered. I've been working in this area for some time now, in particular with Exalogic, and want to share my impressions.

Current state

Today, many servers are high-touch systems designed and configured by the customer. This places the burden of integration on the customer, who has to invest substantial staff expertise on non-revenue generating efforts, and has to take on the risk of misconfiguration. Because it's so expensive in terms of risk, staff time and expertise, there's a strong incentive to hand craft just a few "standard model" configurations at a time. These models must be used until the next refresh and are often not optimal for any of the workloads they are expected to run.

Plus, since so many configuration and part selection details (this NIC, that amount of RAM using these DIMMs, those switches, at these OS and app software levels) exist only at that customer's site, customers risk discovering corner conditions because they are the only people in the world with that combination.

Engineered systems

In contrast, engineered systems are designed to be optimal for a particular workload class, validated and proven by the vendor (that's us at Oracle, if you're still following) to be reliable, simple to purchase, configure and manage, and have dramatically superior performance for their target purpose. At the same time, these systems are built on industry-standard components rather than rare or exotic chips, in order to take advantage of price/performance advances.

This started with databases, unsurprising for Oracle, with Exadata the optimal platform for running Oracle RAC, and subsequently Exalogic for Java middleware and other applications. The idea has scaled to other workload types: Exadata has proven to be the premiere platform for both OLTP and DSS, and Exalogic provides dramatic performance improvements for applications, not exclusively in the Java app-server space it was first aimed at, but also for Peoplesoft, Siebel, JD Edwards, E-Business Suite and Tuxedo.

New members of the family show that the architectural concept scales further: Exalytics for on-line data analytics, and SPARC SuperCluster. Oracle's engineered systems are on both Solaris and Oracle Linux, on both x86 and SPARC. This is a concept that has legs.

Performance

The most visible selling point for these systems is performance. Unlike general-purpose platforms, engineered systems are, well, engineered for a purpose. Rather than designed to be adequate for everything, they are built to provide outstanding performance for a selected category of work.

For example, Exadata is designed for databases, so it has a tremendous amount of disk I/O capacity, using SSD devices for optimal latency and IOPS, backed by rotating media for capacity. I/O is done by storage nodes to offload I/O work from compute nodes, connected via a 40Gb Infiniband network for lowest latency and highest bandwidth. Unique optimizations yield further performance gains: Storage cells take on part of the burden of selecting rows ("Exadata Smart Scan") rather than blindly transmit all data to compute nodes just so unneeded rows can be discarded at the destination. Another Exadata optimization. hybrid columnar compression, uses column value compression to reduce disk space requirements. Consider a database with a LASTNAME column: you might have a lot of "Johnson" values in column order. Compressing common values saves disk space and reduces disk I/O time.

In contrast, Exalogic is designed for the application "middle tier" (between presentation and data persistence), and therefore has different requirements. For example, Java performance is very much affected by RAM speed and quantity, so compute node processors are configured with the maximum RAM that can be deployed - consistent with the memory needs of a JVM - without sacrificing RAM latency. Performance of modern applications is typically constrained by network latency - consider how Java application servers transmit changed state between nodes, so Exalogic is configured with the same Infiniband network as Exadata and has optimized database access. Further - and an advantage of Oracle owning the software and hardware stack - Weblogic and other application products have specific optimizations for Exalogic that reduce kernel pathlength for network access.

These are just a few examples of the "special sauce" that let different parts of the Oracle hardware and software stack combine for better performance and management. This is a blog entry, not a book (not yet, at least) so I have to restrain myself a little.

Arguably the biggest benefit is a less exotic one: these systems are built for balanced performance. So many times I've seen systems (on many platform types) with unbalanced configurations: They might have excess CPU but were hopelessly I/O bound - and the CPUs spent all their time waiting. Or they had plenty of I/O and CPU, but not enough RAM. Understanding workload characteristics so you can build systems that can scale as work grows - it's not so easy. With engineered systems we've been able to create systems that don't run into system bottlenecks due to unbalanced capacity.

The published results show performance that is in many cases several times better than comparable kit (similar chip and clock speeds - we're not gaming the system with 2011 gear compared to antiques). This works.

Faster networking eliminates limits of horizontal scale

The biggest constraint on performance in many networked applications is (duh...) network latency. Exa products essentially solve this problem by using Infiniband connections for low latency, high bandwidth interconnects. The Infiniband fabric provides the kind of bandwidth and latency you would previously see on the backplane within a server. Exadata and Exalogic systems can be configured with up to 8 full racks of servers, each with many compute nodes, on a single Infiniband network. Software optimizations bypass the kernel TCP/IP stack to put data directly on the wire and prevent CPU becoming the bottleneck.

This removes the primary traditional constraint on horizontal scale - delay caused by the "chattiness" between computers hosting a networked application. When applications on a network can talk to one another with latencies that approximate RAM DMA times (indeed, access can be categorized as "remote DMA") then you can for the first time link together many systems with linear scale.

Eliminating complexity

Performance is an infinite source of computing fun, but it's not always the most important issue. Real world pain points are often about complexity and management, rather than speeds and feeds. The first part is getting rid of the 6 month science project that starts when a pile of components shows up on the loading dock, replacing it with a system that can literally be up and running in a day. The entire platform is integrated and tested at the factory. Components and assembly at the customer site is the same as at the support center and product engineering. This cuts part and configuration-based problems, and ensures that problems discovered on-site can be reproduced at the factory.

On an ongoing basis, the benefit is a system where you can manage and monitor everything from apps down to storage from a single browser window - with multiple nodes seamlessly managed as one system at different levels of abstraction. That's provided by Oracle Enterprise Manager, which lets you manage networked systems as a coordinated whole - "the network is the computer". Catchy, huh? But this time, all the way up to the application level where business value resides. This is also the foundation for a complete cloud lifecycle which would have virtual system slicing, self service, assembly deployment, automatic scale up, scale down, metering and chargeback. Heady capabilities.

A metaphor

Some people have referred to Exa* products as a "new version of the mainframe". I get the "it's been tested and purchased together" aspect, but that's been possible in open systems where the option to buy preconfigured reference architecture implementations has always been available (if not always used). The scalable systems aspect also is understandable, but open systems platforms have outscaled mainframes in most aspects for many years. But, okay - engineered systems have properties that can be compared to mainframes.

The analogy falls apart elsewhere: mainframes are general purpose systems that quite easily can have unbalanced performance (this is not intended to be partisan - I'm not attacking it, just pointing out that it can be just as easily configured unbalanced as any other platform. Much of my career was on mainframe systems fighting problems due to unbalanced performance). The other difference is that Oracle engineered systems are built from standard platform components: they run Oracle Linux or Solaris on x86 or (in SPARC SuperCluster) SPARC processors. They run standard application APIs and components, like Java application servers based on Weblogic Server. So, there's no lock-in to proprietary hardware or APIs or operating systems that (for whatever merits they might have) look like no other systems and have high barriers to exit.

Do General Purpose Systems Disappear?

I don't think GP or "non-engineered" systems go away. Systems are often purchased to support a variety of workloads which may not be fully known in advance, and not everybody will buy into the concept of engineered systems. There will also need to be component systems to build from - so "best of breed" systems will be around for a long time. Still, it's going to be an easier choice to run engineered systems proven to work reliably and at scale for known and important workloads.

Closing

Engineered systems are a big change in how systems have been built for years, and a wave of the future. They make it possible to offer dramatically superior performance while reducing customer risk and complexity. I expect this will be a growing trend.

Jeff SavitWhy Solaris 11 is being released *before* 11/11/11

January 06, 2012 19:41 GMT
It would have been very elegant to release Solaris 11 on 11/11/11 (after all, how many chances for that kind of symmetry do you get?), but there are reasons it's not happening: see 11 Reasons Why Oracle Solaris 11 11/11 Isn't Being Released on 11/11/11 .

Aw, darn. Still, it would be nice if we could do Solaris 12 on 12/12/12. After that, we run out of months!