Saturday, September 23, 2006

Google testing OpenSolaris

There are rumors running all over the place that Google is Testing Sun's OpenSolaris. In fact even digg featured the article and the comments are quite interesting. Now leaving out the Linux fanatics who seem to be blindfolded in their love for their OS the reasonable amongst them can argue about the benefits about it. Now that would be a really fruitful discussion, which unfortunately never happens. Most threads of this nature are dominated by the "I am good and all else is bad" comments.

But taking a look at this from an unbiased perspective (I am really trying to be unbiased here) Google can seriously benefit from a lot of what OpenSolaris has to offer.

OpenSolaris is a BSD variant
All Unix and Linux faithful can't deny the fact that the BSD variants are for some reason the more sturdier class of *nixes. Be if FreeBSD, NetBSD, OpenBSD, Solaris of Apple's Darwin all of these have been very successful on high end systems. And high end today does not mean high cost but its more in terms of their scale. The up times shown by Solaris servers are legendary in their own right. And ofcourse now OpenSolaris is Trusted by default.

OpenSolaris has the bells and whistles
Dtrace, Zones, Zfs, FMA, x64 support and a lot more. Sun has gone all out while developing Solaris and has gone ahead and released all of this as OpenSolaris. Yes they are currently using the same code base for both, Solaris and OpenSolaris. So someone like Google can benefit by using zfs and scaling their mhamoth data centers even further while running dtrace to figure out the real time patterns in usage and code segments. So catch those algorithms while they are lazing around or eating more resources and improve them.

OpenSolaris provides for binary compatibility
Now I am sure Linux guys dint know what that means but it has been an assurance given by Sun on all Solaris systems for nearly 2 decades or more now. And hey its not just OpenSolaris that will let you maintain binary compatibility but because of the way the code is structured there should be no reason why this guarantee shouldn't be maintained between OpenSolaris variants as well.

OpenSolaris is Open
Ignore GPL devotees. If you don't like CDDL tough luck. But even with CDDL in place Google can go ahead and create a custom version of OpenSolaris for themselves and deploy it all over and heck even redistribute it if they like. Golaris sounds kiddish but well you never know:)

Suggestion to Google: Use Niagara ie USIV+ processors
The Niagara family of processors are targetted at customers such as Google that need a higher throughput on their front ends. With 8 cores and 4 threads per core in the Niagara1 family these processors can go ahead and eat up web queries on the Google website while the back end servers churn up the results. And NO the cost of the Niagara based systems is not prohibitive at all. In fact you could compare them with many desktop processors while comparing the cost/GHz and well the beat all of them in the cost/watt category. Am sure Google will be thinking about energy costs in the huge facility that they are supposedly coming up with in collaboration with the US government.

Well Netcraft shows gmail running some servers on Solaris 8 already. So I guess in a couple of months we should know if these rumors have any meat in them at all. And yeah being realistic even if Google adopts OpenSolaris it will be for very specific reasons. Their deployment of Linux is humongously distributed and very customized, from what I have heard. So we wont see them going on OpenSolaris 100% in the near future. And of course they too will have to douse off the in house Linux flame throwers too right. That may be the toughest challenge!

Well moving to OpenSolaris will simply mean that Google is definitely not resting on its laurels and is in constant search of whatever will let it improve further.


Technorati Tags:

Tuesday, September 19, 2006

Nevada on VMWare

It's really exciting to see OpenSolaris build 46 ie Nevada running smoothly on my laptop inside a VMWare workstation environment. I have dedicated 8GB and 512Mb RAM to its running instance and hopefully it will suffice. The main idea here is to keep up to speed with the new features being constantly integrated into Nevada while still doing my routine job. It was so much more fun while doing all this on the 7th floor of Divyasree chambers (Sun Microsystems IEC Bangalore)

Anyways the setup procedure in VMWare turned out to be pretty straightforward. The only hiccup was when selecting the disk for the installation. You actually need to uncheck and recheck the box again and only then things go on smoothly.

I have used VMWare a couple of times before and it seems to work alright most of the times. However there is a new kid on the block called Parallels and there were these rave reviews about it some time ago. Hope I can check that out sometime soon too.

For VMWare there is this wonderful set of slides on how to go about doing this right here. So there is no reason not to be running OpenSolaris even if you can't dedicate a complete box for it. Get onto the train now!

Technorati Tags:

Tuesday, September 05, 2006

Postgresql Mod - Backup parts of a table

Well I got a chance to get into the postgres code although for a minor code addition. It turned out to be pretty cool just trying to understanding a small part of the source code.

Requirement.
1] Dump out only a part of a database table. eg oldest entries first

Approach
1] Now there are multiple ways to do it. For eg. you can go ahead and make a duplicate temp table with the rows that you want to dump out and then use the pg_dump tool to get you a copy as a binary file. Now that turns out to be a good idea if the amount of data to be moved around is relatively small as compared to the database, since the duplication will require a proportionate amount of space itself right.
2] The other way is to go ahead and modify the pg_dump tool itself to provide you with a partial dump.
Ofcourse keeping in with tradition we take the tougher approach ie the second one.


Once you get into the pg_dump source you realise that all it does is a dumb COPY command which does the real man's job of accessing the data for a given table/database and putting it out a given file. Onto the COPY source code in $PGSOURCE/src/backend/commands/copy.c

So what the COPY command does internally is to get to the database/table requested and redirect the binary file onto the specified file. Now if you want to dump out part of a table then you need some sort of query to be able to partition the part of the table that you desire. However doing this turns out to be very inefficienct since that would require you to have accesses into the table to get each row and then do comparisons on it and selectively dump to file. Not a good thing at all considering the large number of entries that we have.

Now since our requirement specifies that we need a part of the table to be dumped out, more specifically the older data we can work on the table if it has some timelined indices. So searching high and low in the docs I came across the funda of OIDs. Now OIDs or Object Identifiers are 4byte (on a 32bit m/c) integers that are unique per tuple on every table within the database. Moreover OIDs are accessed for dumps, it is a parameter to pg_dump, lucky us. Moreover since OIDs are appended to the tuples by the database itself on insertion we can rely on an effective technique being used in there. Something better than having triggers on each insert I hope!

So I go ahead and add two more parameters to the COPY command
1. from_oid
2. upto_oid
which gives us the range of OID's that we are interested in. We should be able to get these two values by checking the min and max values of the OIDs for a given table and simply do a percentage addition on the min value to get the upto_oid. You also need to make sure the parser recognises they new options so head off to $PGSOURCE/src/backend/parser/gram.y and good luck.

so now my customised postgres works as such
pg_dump from_oid 1634 upto_oid 1734
(internally)
COPY -t tablename TO filename WITH OIDS FROM_OID 1634 UPTO_OID 1734;


And if you really want to do some meaningful stuff. Take a look at this dude's webpage. Thanks Neil the tips were really helpful.

Technorati Tags: