GigaSpaces owns an amazing set of technologies. They have created a general-purpose, infinitely-scalable, application platform. This is an incredibly difficult problem to solve, but they have managed to solve it. In other words, they provide Google-like technology for the rest of us. Here is an exclusive interview with Geva Perry, Chief Marketing Officer, at GigaSpaces.
With so many technologies out there what does GigaSpaces offer?
The GigaSpaces eXtreme Application Platform (XAP) is a general-purpose application platform — what some people call infrastructure software or middleware – that enables linear scalability (horizontally scaling) across many machines and parallelizing your applications even when it is stateful (transactional or data-intensive). It can handle a variety of application scenarios, including those with the need for high-performance (low-latency and/or high-throughput). These applications can range from what Gartner calls Extreme Transaction Processing (the next-generation of OLTP, following mainframes, TPMs, J2EE App Servers – now XTP) to Real-Time Analytics to demanding Web Apps to service-oriented and event-driven environments.
What is unique about GigaSpaces XAP is that it is a single platform that handles the business logic, the data and the messaging/event processing. Unlike a typical n-tier architecture, where you require three different tiers (and usually three different products) to build your application, with XAP you can create a unique architecture in which the tiers are collapsed and multiple services that make up an application are co-located. In essence, you create a mini-application that can complete the transaction or analytical process end-to-end all within the same process and using a shared memory address space. We call this mini-application a Processing Unit. This approach offers many advantages:
§ Low-Latency: Because the entire transaction is completed within the same process, no network hops across the tiers are required. You don’t need to write transaction state in the database (and deal with OR Mapping, DB contention and so on) – everything is done in memory, locally.
§ Linear Scalability: Because each of the Processing Units is completely self-sufficient, scaling is unencumbered. When my data or transaction volumes grow, I simply add more Processing Units (either within the same big box, or across additional commodity boxes) – without changing a single line of code. There is no dependency among these Processing Units – each completes the transaction from start-to-finish, so scalability is linear.
§ Eliminate Complexity: If you think about the N-Tier Architecture most people are accustomed to, there is a lot of complexity there in using multiple tiers and multiple products. Besides the integration, let’s take an issue such as maintaining high-availability. Each tier/product uses its own model for it. Some may use hardware, some use software clustering models. You now have to maintain a separate HA model for each tier. The same goes for scalability. You end up with a very complex many-to-many-to-many picture that looks like the spaghetti diagram below (note that the “V”, “C”, and “E” represent three services that comprise my application Verify, Check, and Execute – just an example).
In our approach, which we call Space-Based Architecture (more on that later), all of this complexity is eliminated. The Processing Unit – including multiple services, the data and the messages/events – are scaled and maintain reliability using a single model. So the end result is something that is very fast and very clean that looks like this:
Finally, the processing unit is managed by something we call an SLA-Driven Container. The container manages the high-availability and scalability policies (in short, clustering) you set dynamically. It is an extremely powerful and elegant model.
What we are seeing is that more and more people are moving to similar approach as ours. The use of in-memory partitioned data stores, as opposed to databases for in-flight transaction state management; the collapsing of the tiers, and so on. Google, Amazon, eBay, MySpace and others have all built these systems for themselves because they realized the old approach doesn’t scale. Most people cannot afford to build such a complex infrastructure by themselves, that’s what GigaSpaces is for. Google-like scalability for the rest of us.
Also, the big guys built their systems to fit their own very unique needs and infrastructures. GigaSpaces was made to be used by everyone and works in virtually any environment.
Finally, although the architecture we use is unique, the programming model is not. We have taken great care and put in a lot of effort to abstract the underlying sophistication of the run-time environment from the programming model. Developers can use GigaSpaces XAP with familiar APIs, such as JDBC, JMS, Map/Jcache and most importantly a POJO model using the our Spring support – a framework we call OpenSpaces.
There is a “secret sauce” if you will that allows us to provide all of these capabilities, and that is the tuple space paradigm, and specifically the Sun specification for it in Java called JavaSpaces (now under the Apache River project). The space-model is unique in that it essentially handles data, events and business logic coordination (both parallelization as well as workflow) – as one.
To reiterate, GigaSpaces is the next generation, application platform for a scale out environment.
What are some areas in which GigaSpaces is heavily used?
When I joined GigaSpaces in the beginning of 2004 and we started selling in earnest, we went to the guys with the most pain. At the time that was Wall Street. It was undergoing many changes, such as the move to automated trading where you basically have thousands of machines buying and selling securities from and to other machines. In this environment, latency becomes the number one issue and they needed to shave off every millisecond – if not microsecond – from the process. They also needed scalability because data and trade volumes are growing exponentially. So we’ve had a lot of success there and our customers include 6 of the top ten U.S. investment banks, as well as large exchanges such as NYSE and CME, and large European banks such as Monte Paschi, Societe Generale, Commerzbank and others.
But the same challenges faced by the securities industry were quickly spreading to telcos (particularly wireless carriers), federal government agencies, retail, manufacturing, online gaming, travel & logistics, pharma – and now are widely prevalent in web applications, particularly Web 2.0, SaaS, utility computing and so on. So we have a wide range of customers today including Virgin Mobile, the Gallup Organization, Hutchison 3G, British American Tobacco, online ad distributors and a long list of ISVs – just to name a few.
What is the difference between your technology, then say a database, filesystem, or messaging service?
GigaSpaces is particularly effective at real-time transactions and data analysis in a streaming manner. Because GigaSpaces XAP uses an in-memory solution, I wouldn’t recommend it for storing very large volumes of long-lived data, or very large files. Databases and file systems respectively are best suited for that. Although databases are not going away any time soon as archiving and data warehousing technologies, we certainly do think, and many agree with us, that databases are misused as the system-of-record for in-flight transactional and analytical data – or state. They are just too slow, don’t scale and are generally to expensive to manage and maintain in today’s environment.
One thing that is unique about GigaSpaces is that we play nice with persistent systems. As I show in the diagram above, you can still have a persistent store (RDBMS or file system) in your environment for archiving and audit trails. What’s more, GigaSpaces will take care of mapping your objects and placing them in the database – as a background process that does not slow down the transactional or analytical process you are running. We call this feature Persistence as a Service (PaaS) – it’s something the developer doesn’t need to worry about. You can read more about it here.
We also realize that many of our customers have legacy systems that are using a database and cannot be changed to work through the GigaSpaces environment. That’s fine. We have a feature called the Mirror Service – which continuously synchronizes data from the “space” into the database (or file system) and vice versa.
How is this different than MapReduce? Is MapReduce a competing technology?
What MapReduce does is a small sub-set of what GigaSpaces XAP can do. To put it simply, MapReduce takes a large task, breaks it down into sub-tasks, processes the sub-tasks in parallel across several machines and the aggregates the results. This is something that GigaSpaces XAP can definitely provide. However, we also deal with issues such as keeping the data in-memory and partitioned (I believe that Google has a similar approach for a disk-based file system they call BigTable); message and event handling; dynamic SLA management; transactional semantics; and on and on. And we do all of this with standard APIs (which I mentioned above).
We don’t consider MapReduce a competing technology. It addresses a different problem which is processing large files in what I would consider more of a batch mode than a real-time, streaming mode, which is what GigaSpaces is focused on.
Does GigaSpaces work with Amazon’s EC2 and S3 services?
Yes, it does. As a matter of fact we have made a GigaSpaces Amazon Machine Image (AMI) publicly available on the Amazon web site. You can find it here, and a white paper that describes the solution here. We believe that running an application based on GigaSpaces on the Amazon EC2 service makes a lot of sense. It is really the natural home for something like GigaSpaces, and vice versa – an application built with the parallel, dynamic approach of GigaSpaces can truly take advantage of the economic flexibility of a service such as Amazon EC2.
What is the scalability of the GigaSpaces platform?
As I already mentioned, GigaSpaces XAP provides for linear scalability. Simply put, if I can process 10,000 transactions on one machine (for example), I can process 20,000 transactions with two machines and so on. There is no theoretical limit to the scalability because there are no diminishing returns. We have basically overcome Amdahl’s Law by using a Shared-Nothing approach.
The largest system we’ve ever tested had 2048 CPUs with over 2 TB of data in-memory (RAM) at any given time. This was real-time streaming data. Of course, we could have gone larger, but we ran out of hardware to test.
What programming languages does GigaSpaces support? I heard it was mainly for Java and .NET.
GigaSpaces has very powerful support for Java (including the Spring Framework and POJOs), .Net (specifically C#) and C++. One of the nice things about it is that all of these languages are handled in that same single run-time environment. I basically get complete interoperability for “free” – I can have on process (or service in SOA terms) write a Java object and a .Net service read it as C# — completely transparent to the developer.
Also, GigaSpaces is completely platform agnostic. It can run in an entirely heterogeneous environment with a mix of virtually any hardware and operating system combination.
How much does it cost?
We are in the process of re-structuring our pricing model at the moment so I would rather not get into it right now. However, I will say this. We are very interested in getting start-ups and smaller companies on board. The whole point about GigaSpaces is scalability. This means that when you build your application on GigaSpaces to handle your 1,000 users today, you won’t have to change anything in your application when you have a million users later.
With that in mind, we are making available various special programs. For one, we have a Community Edition, which is freely available (download here) and can be used for everything including production. It is limited to one server-side node, but is a great way to start at NO COST, and then grow to the full-blown product when necessary.
We are also launching in October a Start-Up Program. Companies (or individuals) with less than $5 million in revenues can use our full blown product for everything, including production at no cost. When it’s running it will be available on this URL: http://gigaspaces.com/startup.
What is next?
A big direction for us going forward is working with the presentation tier, things like Adobe Flash/Flex, AJAX, Microsoft Silverlight and various web servers. We are working very closely with Microsoft and created a joint solution around MS Excel – a product that is widely used but was never meant to deal with the scalable systems it does today. So GigaSpaces acts as the back-end and Excel is the front-end. You can read our joint white paper about it here.
Is there anything you would like to add?
Just that I would encourage all of your readers to download the product and play around with it. It is available here. We have a great Getting Started guide with very easy to follow tutorials, screencast walkthroughs, and some really good white papers for those not quite ready to get their hands dirty.