Random Notes: août 2014

Servers are getting more and more powerful with a lot of RAM (up to hundred to thousands of giga bytes). However, it is still not possible to use most of the available capacity directly in java applications due to inherent limitations of the GC (Garbage Collector) on JVM that may pause the application for a long time (even up to many minutes) to move objects between different generations.

Follows is the description/comparison between some solutions, also called data grids like, that can be used to face this problem like the Infinispan project of JBoss (ex. JBoss Cache), DirectMemory (an Apache proposal), EhCache (of terracotta), etc.

Caches

1. Infinispan (JBoss Data Grid Platform)

Don't provide support for expiration events as disscussed in the forum.
SingleFileCacheStore a cache loader from a file stores that manages the data activation (loading from store to cache) and passivation (saving data to store).
List of possible attributes in the XML configuration for infinispan 4.0 and infinispan 6.0.

2. MapDB

Exists only in the embbeded mode
Enables the creation of on heap and off-heap collections (map, queue), as well as file-backed collections
Listeners registerd to cache events are notified in the main thread (i.e. should implement async notifications)
Can be used for lazy loading (e.g. Lazily_Loaded_Records.java).
Provides means for pumping the integral data available on memory to disk (e.g. Pump_InMemory_Import_Then_Save_To_Disk.java).
Transaction isolation level is Serializable which is the highest level and means a new transaction can be initiated only if previous one was committed.
Transactions uses a global lock which reduce considerably the cache performance.

3. Akiban's Persistit - github

key/value data storage library
Transactions are based on the Snapshot Isolation algorithm to provide high concurrency.
used by Titan (which is a Distribued graph database) for their storage layer.
For custom objects, users should provide a serializer for

keys by implementing com.persistit.encoding.KeyCoder, as well as for
values by implementing com.persistit.encoding.ValueCoder,
and declare coder manager.

Samples can be found here in Index and Search 2.3 Million Freebase Person Records with Persistit, and Simple Blog Application with Akiban and JugglingDB.

4. JCS (Java Caching System)

Build faster Web applications with caching - developerWorks
Caching with JCS - Object Partners
JCS event handling examples on Stackoverflow and SPOCS.
Configuring a JCS Cache - InformIT
Introduction, Using, Developing Web applications and Java Object Caching with Java Caching System (JCS) - bhaveshthaker.com.

5. Hazelcast

Can be backed with different kind of stores mysql, hbase, etc.
A case of processing Mozilla very large crash reports - highscalability.com

6. GridGain

Resources: gridgain.blogspot.com

5. Others: LArray, Cache2K, DirectMemory (initial project on github, apache proposal for incubation) an off-heap memory storage, MVStore the storage subsystem of the H2 database, Spring cache, HugeCollections.

Search

Integrating Lucene with HBase - an article explaining implementation of a Lucene backend based on HBase, the code is on [[github>>https://github.com/akkumar/hbasene]]. Other implementations: Solbase.
Lucandra / Solandra: A Cassandra-based Lucene backend - an article explaining implementation of a Lucene backend based on Cassandra. The project source code is on github.
Create Lucene Index in database using JdbcDirectory - an article explaining the use of a database as Lucene backed.
Compass project provides an Java friendly API for wrapping the Lucence api for a better integration with Java/J2ee applications.

Resources

A good explanation of the use of ByteByffer to build non-heap memory caches by Keith Gregory: blog post, JUG presentation, another one.
An article on InfoQ about HashMap implementation for off-heap map.
An ibm red book on capacity for big data and off-heap memory.
Examples related to the use of EhCache from a Devoxx 2014 presentation.

Benchmarks

Cache2K vs Infinispan/EhCache/JCS - bench
Radargun a framework for benchmarking data grids

Memory storage

In-memory databases (a detailed description can be found at Information Week):

NoSQL approaches (covers the class of nonrelational and horizontally scalable databases) like Aerospike.
NewSQL approaches (emerging databases offerting NoSQL scalability but with familiar SQL query capabilities, i.e. SQL-compliant) like VoltDB, Oracle TimesTen, IBM solidDB, MemSQL.

Companies like Microsoft, Oracle and IBM choosed to add the in-memory support for their traditional databases (e.g. moving tables to memory), whereas SAP adopted another approach with its Hana platform that aims to put everything in-memory.

Some traditional RDBMS can be configured to store their data in-memory instead of disk storage like sqlite, MySQL, etc.

Random Notes

Navigation

lundi 18 août 2014

Comparison between caching systems for Java

Qui êtes-vous ?

Visitors map

Nombre total de pages vues