Terracotta provides a world-class JVM clustering solution. This provides Java developers with a way to quickly scale their existing applications from a single JVM to a many JVMs with minimal or no code changes. Here is an exclusive interview with Ari Zilka, Founder and Chief Technology Officer, at Terracotta.
With so many technologies out there what does Terracotta offer?
Terracotta gives developers and operations a very simple way to get HA and scale, reduce the load on databases, cluster without a lot of custom code, and reduce development costs.
From a technical perspective, Terracotta works with existing applications to make critical parts of memory / heap durable and shared. It is infrastructure meant to make the Java runtime more scalable and highly available. Developers are coming to rely on this infrastructure approach because it dramatically reduces the development time required to scale applications. Instead of O/R-mapping Java memory to the database on every change to memory, developers use Java objects that they treat as only in memory. Then Terracotta makes memory durable. This is done using what we call Network Attached Memory (NAM).
Unlike the database working as a storage center for an application cluster, Terracotta’s NAM approach is purpose-built to deal with the many updates that Java programs tend to produce at a fine-grained level. Similar to how Network Attached Storage (NAS) works, where the application writes to what seems like a local file but which is actually a network resource, with NAM the same analogy applies to those portions of the application’s Java heap that have been marked to be shared or coordinated using Terracotta. So pieces of the application’s memory can be selectively shared between application servers, and locks and methods can be extended across the whole cluster as well. One user pointed out that Terracotta is like a form of virtualization that rather than slicing a node up to run many different operating systems, instead makes many nodes, which you are already running for availability and scalability, appear like one to the application – and the application can be written using standard Java with no Terracotta-specific API to get the capabilities you need.
In a nutshell, Terracotta provides a run-time infrastructure that clusters at the level of the JVM, this can be used to distribute workloads, caches, and clustering applications with minimal change. In other words, distributed caches, grids, and middleware can built using Terracotta technology.
What are some areas in which Terracotta is heavily used?
In terms of use cases, Terracotta is used most frequently for distributed caches, accelerating Hibernate apps, POJO clustering, high performance HTTP session replication, and clustering Open Source frameworks and products like Spring, Wicket, RIFE, and EHcache.
Right now what we see a lot of people doing is using Terracotta to reduce the load on their databases by storing temporary application state information, like where a customer is in a web based order workflow, in Terracotta, and using the database for data related what we call “completed workflows,” like saving a customer order when it is complete. This “database deflection” capability has been really popular of late, with a number of customers using it to break out of some nasty database scalability bottlenecks. In many cases they’ve saved a bundle of cash too, by avoiding expensive purchases of database license upgrades and bigger boxes to run an expanded database.
Industries that are emerging as heavy users include telecommunications, gaming, financial services and e-commerce.
Has going open-source affected your business model?
Terracotta is infrastructure technology. Our approach to building and running Java applications is tried and tested at very large scale on the Web, but we needed to enable trial so that the approach could be vetted by the larger Java community. We knew that going open source would trigger the large-scale trial and feedback loop that would then lead to tremendous adoption of the product, and this strategic assessment has turned out to be true, well beyond our expectations.
We have many thousands of visitors every week now, our forum posts have doubled just in the last couple of months, and we have many deployed customers. The exact customer count is hard to say given that some deploy using community support and without purchasing an Enterprise Subscription, but we estimate the figure to be around 100 – and that number is moving upward fast. Of this number, the “take rate” on the Enterprise Subscription has been very strong as well.
We’ve also fostered the growth of a very vibrant community, so much so that community members outside the company are now helping other people deploy Terracotta through our forums and mail lists. We have a number of partnerships with other fantastic open source communities as well, like Liferay, Jetty, Tomcat, Geronimo, and GlassFish. Excited developers who have used our infrastructure approach to HA and scale are coming forth every week, asking to sign up as contributors as well.
Having this community, and having a great product easily available for full-featured production use, has allowed us to dramatically reduce our cost of sales as well. Customers now approach us when they are ready to go into production and would like to consider our Enterprise Subscription. We don’t waste everyone’s time knocking on doors asking if there is a need for our technology. Our users find our solutions on their own, evaluate in a safe and fast community-supported environment (of course we are there to help everyone) and they can approach us when they are integrated and satisfied. So yes, it has changed the business model – dramatically, and for the better.
Does your technology compete with Sun Microsystem’s MVM initiative?
As far as I know, MVM is a research project and not a shipping product. It’s an interesting way to efficiently run multiple applications within a JVM, but it doesn’t have the Network-attached memory capability that allows state to be shared between JVMs.
What is the difference between your technology and grid-computing?
Terracotta is in many ways more of the fundamental type of IT infrastructure, like the RDBMS or NAS, that that you can use with various algorithmic approaches – so in essence Terracotta can be part of a grid computing solution, but is not restricted to that use case, nor does using Terracotta exclude use of various grid computing tools. Some of the grid frameworks teams are in fact choosing to focus on the API and to partner with Terracotta to provide the data sharing and work coordination infrastructure they need.
What is the scalability of the Terracotta technology?
Terracotta is highly scalable because the Network Attached Memory approach allows us to observe and propagate fine grain changes to Java objects. For instance, if a service changes 4 bytes in a user profile, that’s all that needs to move around the network to other nodes. There is no dependence on serialization, where you’re pretty much saving whole copies of objects even if only a tiny fraction of the object changes and then sending around a lot more information than just the changes you’re really concerned about. Also, only the nodes that are currently referencing the object actually get the changes. So essentially, only the fine grained changes traverse the network, and they only go where they’re needed. If another node references the object in question, Terracotta will pass it the object in a just-in-time manner. Note, I said it passes it the object, and not a copy of the object.
With Terracotta, object identity is preserved across the cluster, so there is no need to reconcile a number of copies that might be running around. By maintaining object identity, Terracotta allows application logic to hold on to references (no more copy-on-read / deserialization semantics). This allows object reads to be served from the local JVM’s memory, with no network call and no latency – you get automatic locality essentially.
From a numbers perspective, our users have reported large gains in scalability due to this approach. Those with clustered applications are able to remove previous architectures’ network bottlenecks and transition from handling hundreds of application requests per second to handling thousands on the same hardware. But the most dramatic scalability impact is for applications that previously relied on a database as a central storage point for stateless applications. In most cases, database utilization is reduced from the 80% range to the 10% range while CPU utilization in the Java servers drops from 50% to more like 10% as well. This means an existing application cluster can be driven much harder before buying more hardware for the database and app servers.
We have a very complete discussion of the scalability advantages Terracotta offers here.
To avoid single-point of failure have you considered using peer-to-peer (P2P) technology?
We don’t believe that peer-to-peer is for server-side application development. P2P is for broadcasting to many nodes simultaneously. If you get clustered application design correct, you will end up with nearly zero broadcasting, and hence nearly zero bottlenecking. Locality is the key to proper clustered behavior, not broadcasting. As I see it, locality and P2P are in opposition since locality dictates that data should stay close to the processing context in a particular JVM whereas broadcasting implies that you have failed to achieve locality and instead every piece of data is on every node. With Terracotta, data is copied to other Terracotta servers in our cluster for availability, as opposed to scalability.
We have 2 or more copies of the data in our server cluster but the numbers of copies are tightly controlled. As a result, our approach doesn’t present a single point of failure since each of the Terracotta servers can themselves be highly available, either using shared disk or TCP/IP.
There is another point to be made about P2P systems that scale. Such architectures tend to ask developers to partition data so that locality can be preserved, just as Terracotta delivers out of the box. Developers achieve locality with P2P sytems through some notion where all writes to objects are sent to a particular node in the cluster. But the approach of having Terracotta servers avoids application servers having to retain a registry of where each object is homed and having to update objects in various places based on this registry. Also, having a central, highly available, Terracotta server cluster means that we can keep track of all the application servers’ heaps, tracking which objects are resident on what nodes, which means order-n network conversations need not occur, even though Terracotta can share data between an arbitrary number of JVMs. This approach decouples operation execution time from the number of nodes in the whole application server cluster, which forms a firm foundation for scalability as demand on an application grows.
How much does it cost?
The product is available for free unlimited use and it you redistribute as part of a derivative product all you have to do is acknowledge us. We have seen our subscription pricing work out to anywhere from $4,000 per node to $10,000 per node, depending on the level of support and the application functionality involved.
What is next?
We’re planning to add some exciting new capabilities in the coming months, like a clustered lock analyzer, some really cool cluster visualization capabilities (application teams will soon be able to visualize the relative amounts of data moving around the cluster as a result of various load balancing strategies. And teams will be able to tune GC in a cluster, not just on the node), and new operational tools for monitoring and management of large scale applications running on top of Terracotta. Stay tuned.
Is there anything you would like to add?
It’s been a pleasure to speak with you Jesse and thanks for your interest in Terracotta. Good luck with the FishTrain concept.
One reply on “Interview with Terracotta”
It it worthwhile to point out that not all P2P solutions require a “registry” where the location of objects is kept.
Coherence is pure P2P (no registry) and every node knows where every object is on the grid, without a lookup. This removes a potential bottleneck.
Also, cluster-wide locking is also a potential bottleneck and with Coherence at least the notion of “lock-free computing” is presented. If your application uses global locks (which then become cluster-wide locks), this application will scale very poorly with Coherence.
However if a change of programming paradigm is made and use of EntryProcessor’s (Coherence recommended approach) then this restriction is removed.
I think that global distributed locks, in any implementation no matter how well-executed, in the end will prove to be performance bottlenecks.
For many use cases however (which are read-mostly), concurrency and high-performance lock-free computing is simply not an issue.