MySQL now supports NoSQL with Cluster 7.2 Release

February 16, 2012

In a brilliant move by Oracle, MySQL’s parent company, MySQL Cluster 7.2 was released yesterday with NoSQL support. This is a major development as many developers that previously had to rely on multiple vendors for applications utilizing  SQL & NoSQL will now have an option to stay with a single vendor. Technically, there has been a NoSQL option for MySQL for a few years through the HandlerSocket plugin but there was no official NoSQL option directly from MySQL until today.

Summary of Key Enhancements

The MySQL Cluster 7.2 Development Milestone Release and latest labs.mysql.com builds deliver enhancements based on input from the community and customers, including support for the memcached NoSQL API, faster JOIN performance and simplified administration:

  • NoSQL with Memcached support enables users to extend memcached by deploying a scalable, persistent, highly available data-store supporting high volumes of reads and writes with real-time performance, all accessed via the trusted, proven and popular Memcached API
  • Adaptive Query Localization delivers over 20x higher performance when executing complex queries.
  • Shared User Privilege Tables radically simplifies the provisioning and administration of MySQL Cluster by consolidating previous distributed user privilege tables into the data nodes – accessible from all MySQL Servers

In addition to memcached access, the engineering team also previewed JSON as an additional NoSQL interface to MySQL Cluster, allowing applications to directly query and modify the database, and return results directly to a browser, eliminating transformations to SQL. Expect to hear more about this shortly.

NoSQL Support:  Implementation of memcached Access to MySQL Cluster

The Memcached API adds a NoSQL access method to MySQL Cluster, which already includes C++ (NDB API), Java, JPA, LDAP and HTTP/REST APIs, all of which can be used concurrently with SQL to serve a broad range of web, telecoms and embedded use-cases handling the simplest to the most complex queries.

The Memcached API enables web services to directly access the MySQL Cluster database without transformations to SQL, ensuring low latency and high throughput for read/write operations.
 

Figure 1: Implementation of memcached Access to MySQL Cluster

Links:

MySQL Cluster 7.2 Labs & Development Milestone Release – NoSQL with Memcached and 20x Higher JOIN Performance
http://dev.mysql.com/tech-resources/articles/mysql-cluster-labs-dev-milestone-release.html

70x Faster Joins with AQL now GA with MySQL Cluster 7.2
http://www.clusterdb.com/mysql-cluster/70x-faster-joins-with-aql-in-mysql-cluster-7-2/


IDT Partners:

IDT Partners is a New York City based Web Application Development and Technology Solutions firm providing solutions that leverage MySQL and the latest NoSQL technology to build highly scalable web applications and custom products.

AmazonDB Cost Examples ($32/mo, $400 / 4 Billion Writes)

February 15, 2012

Amazon DynomoDB seems to be very cost effective. Give it a shot if still haven’t tried it.

If you are in New York City and would like to learn more about DynamoDB, RSVP for the DynamoDB Presentation at The New York PHP Meetup @ http://www.meetup.com/nycphp


Here are some numbers & cost examples:

A typical Web Application Example with a significant load @ $32/month

Customer use case with 100,000 writes/second over 4 hours; design to production in 3 days; 4 Billion Writes @ $400

Current Customer List (Just 1 Month after the release).

Amazon DynamoDB @ The New York PHP Meetup (March 6th, 2012)

February 14, 2012

Join us on Tuesday, March 6th, 2012 to learn about DynamoDB, Amazon’s NoSQL service, directly from Amazon. Siva Raghupathy, an Amazon Web Services (AWS) Enterprise Solutions Architect, will present DynamoDB to The New York PHP Meetup group.

RSVP @ http://www.meetup.com/nycphp


About Amazon DynamoDB

Amazon DynamoDB is a highly scalable, SSD-based, zero administration database service in the Amazon Web Services (AWS) Cloud. Amazon has launched this service in January.

Amazon DynamoDB service is designed to address the core problems of database management, performance, scalability, and reliability. Developers can create a database table that can store and retrieve any amount of data, and serve any level of request traffic. DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent, fast performance. All data items are stored on Solid State Drives (SSDs) and are automatically replicated across multiple Availability Zones in a Region to provide built-in high availability and data durability. According to Amazon, a “perpetual” free tier of the DynamoDB service with storage and a set number of reads and writes per month will be offered.


NoSQL Industry Outlook by GigaOM

Should NoSQL startups be afraid of DynamoDB? With AWS and DynamoDB, other NoSQL companies find themselves fighting for the websites and other web-based customers that are now their bread and butter. Sid Anand, who helped transition Netflix from Oracle to AWS’s SimpleDB to Cassandra and who now is on the LinkedIn infrastructure team, wrote on his blog earlier this week that “[i]f [your NoSQL database] is not hosted (e.g. by AWS), be prepared to hire a fleet of ops folks to support it yourself. If you don’t have the manpower, I recommend AWS’[s] DynamoDB.”


Amazon DynamoDB 
Presentation Overview:

  1. Motivation and Design Goals
  2. Data Model & API
  3. Characteristics
  4. Query and Update Patterns
  5. Best Practices
  6. PHP Demo!
  7. Q&A


Speaker Profile

Siva Raghupathy is an AWS Enterprise Solutions Architect. He guides developers and architects to build successful solutions using AWS.

As a Principal Technical Program Manager for Amazon SimpleDB Siva Raghupathy gathered emerging NoSQL requirements and wrote the first version of DynamoDB product specification. Later he was a development manager for Amazon Relational Database Services (RDS) and drove several enhancements. Prior to joining Amazon, he spent several years at Microsoft roughly half the time building SQL Server and the other half guiding ISVs and developers to build successful enterprise software solutions.

Sponsors

IDT Partners is an NYC based Web Application Development & Technology Solutions firm specializing in enterprise-level web application development, custom product development, and technology solutions that help customers accelerate growth, capitalize on new market opportunities, and optimize operational efficiency.

ICS is the event venue sponsor. ICS provides streamlined staffing services that help clients focus on their core business competencies. Their mission is to deliver the best talent without compromising a company’s most precious asset — time.

Big Data for Investment Research Management (White Paper)

January 13, 2012

Big Data for Investment Research Management - IDT Partners


IDT Partners is pleased to publish
Big Data for Investment Research Management, a white paper focused on a big-data-based Research Management System built by IDT Partners for a financial client. The IDT Partner’s solution helped the client  achieve the following:

  • 8,000% Larger Dataset
  • 500% Faster Data Processing
  • 100% ROI within 12 months
  • 20% Increase in Operational Efficiency
  • Improved User Interface (UI) & Reporting
  • Automated Trend Spotting & Monitoring

Download the white paper to learn more about this solution:

        

For additional information or to learn more about IDT Partner’s capabilities contact info@idtpartners.com

IDT Partners is Hiring

November 21, 2011

IDT Partners is looking for talented people to join the team.

Our employees have done work for world’s leading companies and have developed products, platforms, and solutions used by millions of people globally.

IDT Partners values entrepreneurship and collaboration in a fast-paced environment. We are a small result-driven organization with focus on empowering employees to do what they do best. We offer flexible work hours, a choice of either working remotely or from our New York City office. IDT provides an environment that fosters individual development, facilitates collaboration and teamwork.

Current Job Openings:

MongoDB 2.0 Upgrade Guide & Profile Viewer Recommendation

October 4, 2011

MongoDB 2.0 just turned stable last month and I’ve spent about 10 minutes earlier today to upgrade a small Fedora cluster (one of the development environments). See the step by step guide for pre-1.8 version upgrade on Fedora below and check out all the new features at http://blog.mongodb.org/post/10126837729/mongodb-2-0-released.

Thanks to Mongo’s improved profiler, you now can also make more sense of the logs. I highly recommend using a profile viewer called Professor (written in Python and Flask by one of the 10gen developers).

Upgrade to MongoDB 2.0 in 9 steps

# 1. start mongo if it's not already running
/etc/init.d/mongod start

# 2. make a backup of the existing collections
mongodump

# 3. stop mongo
/etc/init.d/mongod stop

# 4. remove the previous version
rpm -e mongodb mongodb-server

# 5. add one of the following yum repos
vi /etc/yum.repos.d/10gen.repo

	# 64-bit
	[10gen]
	name=10gen Repository
	baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
	gpgcheck=0
# OR
	# 32-bit
	[10gen]
	name=10gen Repository
	baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
	gpgcheck=0

# 6. update all packages and install the new version of MongoDB
yum update
yum install mongo-10gen
yum install mongo-10gen-server

# 7. start the mongo service
/etc/init.d/mongod start

# 8. restore from the dump
mongorestore

# 9. confirm version 2.0 & restored backups via the mongo cli
mongo
show dbs

TCP-seg-offload-related network latency issue resolution between VM guest & host

September 6, 2011

A neat little Linux trick to resolve tcp-seg-offload-related network latency issues (especially for smb and rsync) between VM guest & host:

# set the tcp segmentation offload to off
sudo ethtool -K eth0 tso off
# verify that it’s off
sudo ethtool -k eth0

Further thoughts on Google+

July 11, 2011


As a follow up to a previous post on Google+, here are my observations and thoughts on Google’s Facebook and Twitter problem, the impact of Google+ on Google’s Advertising Business and its Search Result Relevancy Algorithms, possible Usability Improvements to the Google+ Circles & Contact Management, the recent Google+ Glitches and SPAM issues, and existing Google Product Integration into Google+.

Google+ should Help Google Address its Facebook & Twitter Problem

Prior to Google+, Google had faced serious competition for ad dollars and search with two rivals that had created a walled garden with massive amounts of social / user-generated content. Both Facebook and Twitter were growing like a weed and Google was unable to index the user-generated content within these walled gardens in order to improve its search & ad businesses. Now, thanks to Google’s social efforts with Google+, the company will be able to both significantly improve its search advertising platform by leveraging the Google+ user-generated content for improved targeting and to provide more relevant and real-time search results by integrating a social layer into search and its search ranking algorithms.

At its core, Google+ is comparable to Facebook with a few important improvements. One such improvement in particular — the ability to add people to circles without their permission — will help Google address the Twitter problem. The ability to add anyone to your circles allows users to “follow” anyone just like on Twitter. This makes Twitter less useful given that you can now have nearly identical functionality within a Facebook-like social network.

Google+ renders Facebook and Twitter less useful by allowing people to do similar functions plus a few more fun things such as Hangouts and Huddle — all in a single place. Provided that Google succeeds with making Google+ a hit (having a 200M+ Gmail user base should help with that), starts innovating at a rapid pace, and counteract innovation from the competition in  a timely fashion, both Facebook and Twitter should become less of a problem for the company long term.

Possible Improvements to the Circles & Contact Management

Google+ uses Gmail to help users populate their social graph. This is great for over 200 million Gmail users given that you don’t have to start from scratch. However, the social graph creation and maintenance process could be much easier to manage if Google+ supported nested circles and an ability to de-dupe contacts on the fly without having to manually go through all your contacts in Gmail. Otherwise, given a few thousand or even a few hundred contacts, it’s tough to first organize everyone into a single level of circles and then use these limited single-level groups to disseminate information to your social graph.

Moreover, it is quite frustrating to manage contacts with multiple records/email addresses given that there is no ability to see the actual email addresses for each record within the “suggestions” box for the circle contacts. Meaning that when you add duplicate contacts, you may send multiple content emails or invites to the same person.

Comments on Recent Glitches & SPAM

In addition to the recent privacy issues, Google+ had an issue with SPAM yesterday. Google+ had sent multiple copies of the same invites due to a technical glitch. Google’s SVP of Social had posted a note about it this weekend:

“Please accept our apologies for the spam we caused this afternoon. For about 80 minutes we ran out of disk space on the service that keeps track of notifications. Hence our system continued to try sending notifications. Over, and over again. Yikes.  — Vic Gundotra

The above technical glitch is to be expected since the product is still in beta and is not available to the general public. However, it’s also a bit odd given that Google obviously has quite a bit of expertise with systems monitoring and capacity planning. I’m guessing that Google+ significantly exceeded Google’s best-case scenario capacity plans due to a huge demand for the new product and /or that Google+ team wasn’t able to put proper capacity plans and systems monitoring in place due to a premature product release. The latter could possibly be attributed to the the fact that under its new CEO, Larry Page, Google is now trying to bring new products to market much faster by releasing half-finished products into closed beta much earlier than Google would under Eric Schmidt.

Existing Google Product Integration into Google+

It’s exciting to see the various existing standalone Google products getting rolled up into Google+. Picasa and Blogger are getting phased out as standalone products and are getting integrated as Photos and Blogs within Google+.  These product integrations should allow Google+ to better compete with Facebook Photos and WordPress / Tumblr.

Google is fighting a battle on quite a few fronts right now and it looks like Google+ is poised to help Google better position itself in the social networking space. Google+ rollout is clearly a significant milestone for Google and I wish Vic Gundotra and the Google Social team success!

Start Your Own Circle of Trust

Google Finally Rolls Out Google+

June 28, 2011

Google’s answer to Facebook is finally getting rolled out today. Google+ looks promising — the UI is very well thought out and just feels right.

Though Google is downplaying this new product roll out and their marketing refers to Google+ simply as “a few new thoughts on sharing”, this is a major milestone for Google and it is a part of their bigger strategy to combat Facebook. Specifically, Facebook had passed Goolge in “time spent on site” for the first time ever in August 2010.


In less than a year since then, Facebook had gained momentum and has left Google in the dust.

 

This is one of the reasons why Google+ needs to be a success for Google (unlike other half baked products like Google Buzz). From what I’ve seen thus far, I think Google+ has a fighting chance against Facebook. It will be interesting to see how this plays out for the “time spent on site” comScore metric for the two sites in the next 12 months.

New Product Development Process — Building & Launching Products the Right Way

June 10, 2011

I’ve nearly always been involved in the product development process and helped build and launch over 30 B2B and B2C products across half a dozen industries. First, starting out as a developer, and eventually wearing the product development / management hat while serving as a director and then as a CTO.

Through the years, I’ve come across a number of organizations that have tried to bring new products to market without following any standard product development practices & processes. Unsurprisingly, more than 65% of all new products are commercial failures.

For example, some organizations are often driven by a belief that if a handful of customers are asking for a new product or if a competitor has a new product that’s well received by their customers then, the  new comparable product/service/ feature should just be built after doing some minimal due diligence & internal checks. This often leads to products that could be well received by the clients but have unsustainable margins, may result in cannibalization of existing services & offerings, and poor overall ROI.

Another example might be organizations that get carried away with the “big idea”, where one or more senior executives have a grand product vision and do not follow any standard product development processes. For example, an organization may fail to solicit customer feedback during the concept development and testing phase based on a very strong belief in their big idea and some positive internal feedback. This often results in customers finding severe issues with the product at launch and may cause irreparable damage to the company and/or product’s reputation.

The following is an 8-step New Product Development Process Guideline that has worked well for me in the past and has been used by numerous organizations to successfully launch new products.

New Product Development Process

  1. Ideation
    Idea generation.
  2. Screening
    Idea screening to eliminate unsound concepts.
  3. Concept Development & Testing
    Develop marketing & engineering details, collect customer feedback.
  4. Business Analysis
    Customer feedback on pricing, sales & profitability estimation.
  5. Market & Beta Testing
    Prototype, beta & customer acceptance testing.
  6. Technical Implementation
    Plan & build.
  7. Commercialization
    Launch & market.
  8. Product Pricing
    Assess portfolio impact, internal/external value & cost analysis, financial forecasting.

The above steps are a guideline. Each step should be expanded to encompass sub-steps as required for specific products by your product organization.
Additional Notes:

Product Development / Management @ Google

See Google Product Development / Management Process presentation by  Marissa Mayer. Though this presentation was given in 2003, it is still relevant today and has a number of good ideas for any product organization. Quote: “Formula:  Smart people + creative environment + outlet for ideas = innovation”.

Book Recommendation

For more information on the subject, I highly recommend a great book by Karl Ulrich titled Product Design and Development.