Advanced Reading time: ~13 min

Persistence Context

entity lifecycle, L1/L2 cache, dirty checking, flush strategies, optimistic locking

Persistence Context

The Persistence Context is the central mechanism of JPA: the EntityManager's identity map that tracks entity states, performs dirty checking, and manages a multi-level cache system.


1. Definition

  • What is it? — The Persistence Context (PC) is an in-memory store bound to the EntityManager that keeps track of managed entities. It functions as an identity map: it stores one reference per entity and guarantees repeatable reads.
  • Why does it exist? — The PC reduces DB queries (L1 cache), automatically detects changes (dirty checking), and provides consistent entity references within a transaction.
  • Where does it fit? — The PC lives inside the EntityManager. In Spring, it typically binds to a transaction (@Transactional). At the end of the transaction, the PC flushes and closes.
EntityManager
  └── Persistence Context (L1 Cache)
        ├── User#1 → managed instance
        ├── User#2 → managed instance
        └── Order#5 → managed instance

2. Core Concepts

Entity states (lifecycle)

State Description PC aware? Has ID?
New (Transient) New object, new User() ❌ (or has one, but PC doesn't know)
Managed After persist() or find()
Detached After session/transaction close
Removed After remove(), DELETE on flush ✅ (marked for deletion)
New ──persist()──→ Managed ──flush()──→ DB INSERT
                     │
                   remove() ──→ Removed ──flush()──→ DB DELETE
                     │
              session close ──→ Detached
                     ↑
              merge() ←──────── Detached

State transitions in detail

// New → Managed
User user = new User("Alice", "alice@mail.com"); // New
entityManager.persist(user);                       // Managed (ID generated)

// Managed → DB synchronization
user.setName("Bob");  // dirty checking monitors this
// on flush: UPDATE users SET name='Bob' WHERE id=1

// Managed → Detached
entityManager.detach(user);  // or: transaction ends
user.setName("Charlie");     // NOT detected! Not managed.

// Detached → Managed
User mergedUser = entityManager.merge(user);  // Managed copy
// Note: user ≠ mergedUser (merge returns a copy)

// Managed → Removed
entityManager.remove(user);  // Removed
// on flush: DELETE FROM users WHERE id=1

First-Level Cache (L1)

The L1 cache is part of the Persistence Context, always active and cannot be disabled.

User u1 = em.find(User.class, 1L); // SQL SELECT → DB hit
User u2 = em.find(User.class, 1L); // No SQL! → cache hit
assert u1 == u2;                     // true — same reference

// Repeatable read guarantee:
// The same entity in the same PC is always represented
// by the same Java object.

The L1 cache is cleared at transaction end. Its size is unlimited — manual clear() is needed for large batch operations.

Dirty checking

Dirty checking is automatic change detection at flush time:

  1. When an entity is loaded, Hibernate takes a deep copy snapshot
  2. At flush time, it compares current field values with the snapshot
  3. If there's a difference → UPDATE SQL generation
  4. Without @DynamicUpdate, all columns are included in UPDATE
  5. With @DynamicUpdate, only changed columns
@Transactional
public void updateUserName(Long id, String newName) {
    User user = userRepository.findById(id).orElseThrow();
    user.setName(newName);  // dirty
    // No save() needed — dirty checking auto-UPDATEs on flush
}

Flush strategies

FlushMode When it flushes Usage
AUTO (default) Before queries + at transaction commit ✅ Most cases
COMMIT Only at transaction commit Performance optimization
MANUAL Only on explicit em.flush() Special batch operations
// AUTO: automatically flushes before queries for consistency
user.setName("Updated");
List<User> users = em.createQuery("SELECT u FROM User u", User.class)
    .getResultList();
// ↑ Flush happens before the query so "Updated" name appears in results

3. Practical Usage

When conscious PC management matters

  • Batch INSERT/UPDATE (>1000 items): regular flush() + clear() needed, otherwise OutOfMemoryError
  • Read-only queries: @Transactional(readOnly = true) → Hibernate skips dirty checking snapshot
  • Detached entity handling: DTO pattern or merge() usage
  • Long conversations: Extended Persistence Context or explicit merge

Batch processing pattern

@Transactional
public void batchInsert(List<UserDto> dtos) {
    int batchSize = 50;
    for (int i = 0; i < dtos.size(); i++) {
        User user = new User(dtos.get(i).getName(), dtos.get(i).getEmail());
        entityManager.persist(user);

        if (i > 0 && i % batchSize == 0) {
            entityManager.flush();  // write SQL statements
            entityManager.clear();  // clear L1 cache → free memory
        }
    }
}

Read-only optimization

@Service
public class ReportService {

    @Transactional(readOnly = true)
    public List<UserSummary> getReport() {
        // readOnly = true benefits:
        // 1. No dirty checking snapshot → less memory
        // 2. Hibernate switches to FLUSH_MODE_MANUAL → no auto-flush
        // 3. DB-level read-only transaction hint
        return userRepository.findAllProjectedBy();
    }
}

Detached entity and the merge() pattern

// Controller → Service → DB flow:
@RestController
public class UserController {
    @PutMapping("/users/{id}")
    public UserDto updateUser(@PathVariable Long id, @RequestBody UserDto dto) {
        return userService.update(id, dto);
    }
}

@Service
public class UserService {
    @Transactional
    public UserDto update(Long id, UserDto dto) {
        User user = userRepository.findById(id).orElseThrow();
        // ✅ Modifying the managed entity → dirty checking handles it
        user.setName(dto.getName());
        user.setEmail(dto.getEmail());
        // No save() needed — auto-flush at transaction end
        return UserDto.from(user);
    }
}

4. Code Examples

Second-Level Cache (L2)

The L2 cache is application-level, bound to the SessionFactory. It persists across transactions.

@Entity
@Cacheable
@Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Country {
    @Id
    private String code;
    private String name;
}
# application.yml
spring:
  jpa:
    properties:
      hibernate:
        cache:
          use_second_level_cache: true
          region.factory_class: org.hibernate.cache.jcache.JCacheRegionFactory
        javax:
          cache:
            provider: org.ehcache.jsr107.EhcacheCachingProvider

Cache levels summary

Level Scope Default Content
L1 (PC) EntityManager / transaction Always active Entity references
L2 SessionFactory / application Manual configuration Entity snapshots
Query Cache SessionFactory Manual configuration Query result IDs
// L2 cache behavior:
// Transaction A:
Country hu = em.find(Country.class, "HU"); // L1 miss → L2 miss → DB SELECT
// Transaction A ends → L1 cleared, but saved to L2

// Transaction B:
Country hu = em.find(Country.class, "HU"); // L1 miss → L2 HIT! → no DB

Cache concurrency strategies

Strategy Consistency Performance When
READ_ONLY ✅ Strong ✅ Best Immutable data (Country, Currency)
NONSTRICT_READ_WRITE ⚠️ Eventual ✅ Good Rarely changed, not critical
READ_WRITE ✅ Strong ⚠️ Medium Frequently read, occasionally written
TRANSACTIONAL ✅ ACID ❌ Slow JTA transaction required

Optimistic Locking (@Version)

@Entity
public class Product {
    @Id @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    @Version
    private Integer version;

    private String name;
    private BigDecimal price;
}

How it works:

  1. When an entity is loaded, the version value is also loaded (e.g., version=3)
  2. UPDATE SQL: UPDATE product SET name=?, price=?, version=4 WHERE id=? AND version=3
  3. If WHERE affects 0 rows → OptimisticLockException
  4. The client can retry or merge
@Service
public class ProductService {
    @Transactional
    public void updatePrice(Long id, BigDecimal newPrice) {
        Product product = productRepository.findById(id).orElseThrow();
        product.setPrice(newPrice);
        // on flush: UPDATE ... WHERE version=? → on conflict: OptimisticLockException
    }
}

// Retry pattern for exception handling:
@Retryable(value = OptimisticLockException.class, maxAttempts = 3)
@Transactional
public void updatePriceWithRetry(Long id, BigDecimal newPrice) {
    Product product = productRepository.findById(id).orElseThrow();
    product.setPrice(newPrice);
}

Pessimistic Locking

// SELECT ... FOR UPDATE
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT p FROM Product p WHERE p.id = :id")
Optional<Product> findByIdWithLock(@Param("id") Long id);
Locking Mechanism When
Optimistic @Version + application level Low contention (most CRUD)
Pessimistic SELECT FOR UPDATE DB lock High contention (inventory, booking)

5. Trade-offs

Aspect Advantage Disadvantage
L1 Cache Automatic, no configuration, repeatable read Memory issues with large batches
Dirty checking No explicit save/update needed Hidden UPDATEs, performance overhead
L2 Cache Cross-transaction cache, reduces DB load Complex cache invalidation, consistency risk
Optimistic locking No DB lock, free reads Retry needed on conflict
Pessimistic locking Guaranteed exclusive access Deadlock risk, slow
readOnly=true No snapshot, less memory Ignores changes on write

6. Common Mistakes

❌ Unnecessary save() on managed entity

// BAD — unnecessary DB roundtrip
@Transactional
public void updateUser(Long id, String name) {
    User user = userRepository.findById(id).orElseThrow();
    user.setName(name);
    userRepository.save(user);  // ← unnecessary! Dirty checking handles it
}

// GOOD — dirty checking auto-UPDATEs
@Transactional
public void updateUser(Long id, String name) {
    User user = userRepository.findById(id).orElseThrow();
    user.setName(name);
    // auto-UPDATE on flush
}

❌ Modifying detached entity without flush

// BAD — changes are lost
public void updateUser(Long id, String name) {  // no @Transactional!
    User user = userRepository.findById(id).orElseThrow();
    user.setName(name);
    // Session/PC already closed → no flush → changes lost
}

❌ Batch processing without clear()

// BAD — OutOfMemoryError on large datasets
@Transactional
public void importAll(List<Data> items) {
    for (Data d : items) {
        entityManager.persist(new Record(d));
        // L1 cache grows unbounded → OOM
    }
}

❌ L2 cache on frequently changing data

If an entity changes frequently, L2 cache invalidation is more expensive than the database query itself. Use L2 cache for rarely-changing, frequently-read data (e.g., Country, Category, Configuration).

❌ Ignoring @Version in update DTOs

// BAD — frontend doesn't send version → optimistic lock lost
@PutMapping("/products/{id}")
public void update(@RequestBody ProductDto dto) {
    Product p = productRepository.findById(dto.getId()).orElseThrow();
    p.setName(dto.getName());
    // version field not synchronized!
}

// GOOD — DTO includes version
public class ProductDto {
    private Long id;
    private String name;
    private Integer version;  // ← frontend sends it back
}

❌ Misunderstanding merge() vs persist()

// persist(): New → Managed (manages the original object)
User user = new User("Alice");
em.persist(user);  // user is now managed

// merge(): Detached → Managed COPY (returns a new managed instance!)
User detached = /* ... */;
User managed = em.merge(detached);  // managed ≠ detached!
detached.setName("X");  // Does NOT affect DB — still detached
managed.setName("Y");   // This will UPDATE

7. Deep Dive

Hibernate internal: snapshot array

Hibernate uses an Object[] array as a snapshot for dirty checking, not a deep clone. Every managed entity has two arrays: the current state and the loaded state. At flush time, these are compared field by field.

This means:

  • For primitives and Strings: simple equals() comparison
  • For mutable objects (Date, Collection): deeper comparison
  • For large entities (30+ fields): dirty checking overhead is noticeable

@DynamicUpdate optimization

@Entity
@DynamicUpdate  // Only UPDATEs changed columns
public class Product {
    // 20+ fields...
}

// WITHOUT @DynamicUpdate (default):
// UPDATE product SET name=?, price=?, description=?, ... WHERE id=?
// All columns included even if only name changed

// WITH @DynamicUpdate:
// UPDATE product SET name=? WHERE id=?
// Only changed columns

When to use: entities with many columns where typically 1-2 fields change. Downside: Hibernate can't cache the prepared statement (each UPDATE has different SQL).

Extended Persistence Context

By default, the PC closes at transaction end. Extended PC lives beyond the transaction — typically in @Stateful EJBs or Spring @Scope("session") beans.

// Rarely used in Spring — prefer DTO pattern + merge()
@PersistenceContext(type = PersistenceContextType.EXTENDED)
private EntityManager em;

⚠️ Extended PC problems: memory leaks, stale data, complex lifecycle management. Avoid in most Spring applications.

Query Cache in detail

The Query Cache caches entity IDs, not the entities themselves:

Query: "SELECT u FROM User u WHERE u.status = 'ACTIVE'"
Query Cache: [1, 5, 12, 34]  ← entity IDs

After query cache hit → loads entities from L2 cache by ID

When to use:

  • Fixed-parameter, frequently-run queries
  • Only if affected entities are also in L2 cache
  • hibernate.cache.use_query_cache=true required

Flush and auto-flush behavior

AUTO flush mode ensures pending changes are written before queries:

user.setName("Updated");  // dirty, but no SQL yet

// This JPQL query triggers a flush (because the User table is affected):
List<User> result = em.createQuery("SELECT u FROM User u").getResultList();
// ↑ Before: flush() → UPDATE users SET name='Updated'
// ↑ After: SELECT * FROM users → "Updated" name is visible

// This query does NOT trigger a flush (different table):
List<Order> orders = em.createQuery("SELECT o FROM Order o").getResultList();
// ↑ Does not affect User table → no flush

8. Interview Questions

Q: What is the Persistence Context and how does the L1 cache work? A: The PC is the EntityManager's identity map. The same entity within a single transaction is always represented by the same Java reference. The L1 cache is always active and cleared at transaction end.

Q: When is explicit flush() and clear() needed? A: In batch processing (>1000 items) to prevent unbounded L1 cache growth. Regular flush() writes pending SQL, clear() frees memory.

Q: What's the difference between persist() and merge()? A: persist() transitions New → Managed and operates on the original object. merge() creates a Detached → Managed COPY — the original stays detached and a new managed instance is returned.

Q: How does dirty checking work? A: When an entity is loaded, a snapshot is taken. At flush time, Hibernate compares the current state with the snapshot. If there's a difference, UPDATE SQL is generated. With @DynamicUpdate, only changed columns are included.

Q: What is @Version and how does it handle concurrent modifications? A: @Version is an optimistic locking field. It's included in the UPDATE SQL's WHERE clause. If two transactions modify the same entity, the second gets OptimisticLockException because the version doesn't match.

Q: What's the difference between L1 and L2 cache? A: L1: EntityManager/transaction scope, always active, holds entity references. L2: SessionFactory scope, manually configured, holds entity snapshots, persists across transactions.

Q: Why is @Transactional(readOnly = true) beneficial? A: Hibernate skips the dirty checking snapshot → less memory, no auto-flush, and the DB can optimize for reads (read-only hint).


9. Glossary

Term Meaning
Persistence Context EntityManager's identity map, entity references
L1 Cache Same as PC — transaction scope, always active
L2 Cache SessionFactory-scoped, manually configured cache
Query Cache Caching of query result IDs
Managed Entity state: tracked by PC, monitored by dirty checking
Detached Entity state: after session close, has ID, no tracking
Dirty checking Automatic change detection at flush time
Flush Writing pending changes to SQL
Snapshot Deep copy of entity's state at load time
@Version Optimistic locking version field
@DynamicUpdate Only UPDATEs changed columns
Extended PC Persistence Context that lives beyond the transaction

10. Cheatsheet

ENTITY LIFECYCLE:
  New       → persist() → Managed
  Managed   → remove()  → Removed → flush() → DELETE
  Managed   → close()   → Detached
  Detached  → merge()   → Managed (COPY!)
  persist() ≠ merge()   → persist manages original, merge returns copy

L1 CACHE (Persistence Context):
  Always active, transaction scope
  find() twice → 1 SQL
  u1 == u2 → true (identity guaranteed)
  Batch: flush() + clear() regularly

DIRTY CHECKING:
  Snapshot on load → comparison on flush
  @DynamicUpdate → only changed columns
  readOnly=true → no snapshot → less memory

FLUSH MODE:
  AUTO     before queries + at commit (default)
  COMMIT   only at commit
  MANUAL   only on explicit em.flush()

L2 CACHE:
  @Cacheable + @Cache(usage=...)
  READ_ONLY      immutable data
  READ_WRITE     frequently read
  SessionFactory scope, requires configuration

LOCKING:
  @Version           optimistic (no DB lock)
  @Lock(PESSIMISTIC)  SELECT FOR UPDATE
  Optimistic → low contention (CRUD)
  Pessimistic → high contention (inventory)

TIPS:
  Managed entity → no save() needed
  Batch 1000+ → flush/clear in cycles
  readOnly=true → for report queries
  merge() returns copy, not original!

🎮 Games

10 questions