Servers are getting more and more powerful with a lot of RAM (up to hundred to thousands of giga bytes). However, it is still not possible to use most of the available capacity directly in java applications due to inherent limitations of the GC (Garbage Collector) on JVM that may pause the application for a long time (even up to many minutes) to move objects between different generations.
Follows is the description/comparison between some solutions, also called data grids like, that can be used to face this problem like the Infinispan project of JBoss (ex. JBoss Cache), DirectMemory (an Apache proposal), EhCache (of terracotta), etc.
Caches
1. Infinispan (JBoss Data Grid Platform)
- Don't provide support for expiration events as disscussed in the forum.
- SingleFileCacheStore a cache loader from a file stores that manages the data activation (loading from store to cache) and passivation (saving data to store).
- List of possible attributes in the XML configuration for infinispan 4.0 and infinispan 6.0.
2. MapDB
- Exists only in the embbeded mode
- Enables the creation of on heap and off-heap collections (map, queue), as well as file-backed collections
- Listeners registerd to cache events are notified in the main thread (i.e. should implement async notifications)
- Can be used for lazy loading (e.g. Lazily_Loaded_Records.java).
- Provides means for pumping the integral data available on memory to disk (e.g. Pump_InMemory_Import_Then_Save_To_Disk.java).
- Transaction isolation level is Serializable which is the highest level and means a new transaction can be initiated only if previous one was committed.
- Transactions uses a global lock which reduce considerably the cache performance.
3. Akiban's Persistit - github
- key/value data storage library
- Transactions are based on the Snapshot Isolation algorithm to provide high concurrency.
- used by Titan (which is a Distribued graph database) for their storage layer.
- For custom objects, users should provide a serializer for
- keys by implementing com.persistit.encoding.KeyCoder, as well as for
- values by implementing com.persistit.encoding.ValueCoder,
- and declare coder manager.
- Samples can be found here in Index and Search 2.3 Million Freebase Person Records with Persistit, and Simple Blog Application with Akiban and JugglingDB.
4. JCS (Java Caching System)
- Build faster Web applications with caching - developerWorks
- Caching with JCS - Object Partners
- JCS event handling examples on Stackoverflow and SPOCS.
- Configuring a JCS Cache - InformIT
- Introduction, Using, Developing Web applications and Java Object Caching with Java Caching System (JCS) - bhaveshthaker.com.
- Can be backed with different kind of stores mysql, hbase, etc.
- A case of processing Mozilla very large crash reports - highscalability.com
- Resources: gridgain.blogspot.com
5. Others: LArray, Cache2K, DirectMemory (initial project on github, apache proposal for incubation) an off-heap memory storage, MVStore the storage subsystem of the H2 database, Spring cache, HugeCollections.
Search
- Integrating Lucene with HBase - an article explaining implementation of a Lucene backend based on HBase, the code is on [[github>>https://github.com/akkumar/hbasene]]. Other implementations: Solbase.
- Lucandra / Solandra: A Cassandra-based Lucene backend - an article explaining implementation of a Lucene backend based on Cassandra. The project source code is on github.
- Create Lucene Index in database using JdbcDirectory - an article explaining the use of a database as Lucene backed.
- Compass project provides an Java friendly API for wrapping the Lucence api for a better integration with Java/J2ee applications.
- A good explanation of the use of ByteByffer to build non-heap memory caches by Keith Gregory: blog post, JUG presentation, another one.
- An article on InfoQ about HashMap implementation for off-heap map.
- An ibm red book on capacity for big data and off-heap memory.
- Examples related to the use of EhCache from a Devoxx 2014 presentation.
Memory storage
In-memory databases (a detailed description can be found at Information Week):
- NoSQL approaches (covers the class of nonrelational and horizontally scalable databases) like Aerospike.
- NewSQL approaches (emerging databases offerting NoSQL scalability but with familiar SQL query capabilities, i.e. SQL-compliant) like VoltDB, Oracle TimesTen, IBM solidDB, MemSQL.
Some traditional RDBMS can be configured to store their data in-memory instead of disk storage like sqlite, MySQL, etc.