Spring Data | Developer Knowledge Base

Spring Data elevates the repository pattern to framework level: it generates implementations from interface declarations and derives queries from method names.

1. Definition

Spring Data is an umbrella project that provides a unified, repository-based programming model for various data stores (JPA, MongoDB, Redis, Elasticsearch, JDBC, R2DBC). The developer declares an interface, and Spring Data implements CRUD operations and queries at runtime through a proxy.

The central idea: eliminate persistence layer boilerplate by having the framework generate queries from method names, annotations, or Specifications.

Interface declaration → Spring Data proxy → JPA/JDBC/Mongo query
     UserRepository       SimpleJpaRepository        SELECT ...

Spring Boot auto-configures the DataSource, EntityManagerFactory, and transaction manager with the spring-boot-starter-data-jpa starter.

2. Core Concepts

Repository hierarchy

Interface	Role
Repository<T,ID>	Marker interface, no methods
CrudRepository<T,ID>	`save`, `findById`, `findAll`, `deleteById`, `count`
ListCrudRepository<T,ID>	Like CrudRepository but with `List<T>` return types
PagingAndSortingRepository<T,ID>	`findAll(Pageable)`, `findAll(Sort)`
JpaRepository<T,ID>	Flush, batch delete, `getReferenceById`, JPA-specific

JpaRepository combines ListCrudRepository + ListPagingAndSortingRepository + JPA-specific methods. Most projects extend this interface.

Query derivation — SQL from method names

Spring Data automatically generates JPQL from method name keywords:

public interface UserRepository extends JpaRepository<User, Long> {
    // SELECT u FROM User u WHERE u.email = ?1
    Optional<User> findByEmail(String email);

    // SELECT u FROM User u WHERE u.lastName = ?1 AND u.active = ?2
    List<User> findByLastNameAndActive(String lastName, boolean active);

    // SELECT u FROM User u WHERE u.age > ?1 ORDER BY u.lastName ASC
    List<User> findByAgeGreaterThanOrderByLastNameAsc(int age);

    // SELECT COUNT(u) FROM User u WHERE u.active = ?1
    long countByActive(boolean active);

    // DELETE FROM User u WHERE u.active = false
    void deleteByActiveFalse();
}

Keywords: And, Or, Between, LessThan, GreaterThan, Like, In, IsNull, IsNotNull, OrderBy, Not, True, False, Top, First, Distinct.

@Query — manual JPQL/SQL

@Query("SELECT u FROM User u WHERE u.email LIKE %:domain")
List<User> findByEmailDomain(@Param("domain") String domain);

@Query(value = "SELECT * FROM users WHERE status = :status", nativeQuery = true)
List<User> findByStatusNative(@Param("status") String status);

@Modifying
@Query("UPDATE User u SET u.active = false WHERE u.lastLogin < :date")
int deactivateInactiveUsers(@Param("date") LocalDate date);

Projection — selective field retrieval

// Interface-based (closed) projection
public interface UserSummary {
    String getName();
    String getEmail();
}
List<UserSummary> findByActive(boolean active);

// DTO-based (class) projection
@Query("SELECT new com.example.dto.UserDto(u.name, u.email) FROM User u")
List<UserDto> findAllAsDto();

3. Practical Usage

Defining a repository

@Repository
public interface OrderRepository extends JpaRepository<Order, UUID> {

    List<Order> findByCustomerIdAndStatus(Long customerId, OrderStatus status);

    @Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.id = :id")
    Optional<Order> findByIdWithItems(@Param("id") UUID id);

    Page<Order> findByStatus(OrderStatus status, Pageable pageable);
}

@Repository is optional for JpaRepository (Spring Boot auto-registers it), but the explicit annotation signals intent and activates exception translation.

Pageable and Sort

@GetMapping("/orders")
public Page<OrderDto> findAll(
        @RequestParam(defaultValue = "0") int page,
        @RequestParam(defaultValue = "20") int size,
        @RequestParam(defaultValue = "createdAt") String sortBy) {

    Pageable pageable = PageRequest.of(page, size, Sort.by(sortBy).descending());
    return orderRepository.findByStatus(OrderStatus.ACTIVE, pageable)
            .map(OrderDto::from);
}

Page<T> contains the data, the total element count, and pagination metadata.

Custom repository implementation

public interface OrderRepositoryCustom {
    List<Order> findByComplexCriteria(OrderSearchCriteria criteria);
}

@Repository
public class OrderRepositoryCustomImpl implements OrderRepositoryCustom {

    private final EntityManager em;

    public OrderRepositoryCustomImpl(EntityManager em) {
        this.em = em;
    }

    @Override
    public List<Order> findByComplexCriteria(OrderSearchCriteria criteria) {
        CriteriaBuilder cb = em.getCriteriaBuilder();
        CriteriaQuery<Order> cq = cb.createQuery(Order.class);
        Root<Order> root = cq.from(Order.class);

        List<Predicate> predicates = new ArrayList<>();
        if (criteria.getStatus() != null) {
            predicates.add(cb.equal(root.get("status"), criteria.getStatus()));
        }
        if (criteria.getMinAmount() != null) {
            predicates.add(cb.ge(root.get("totalAmount"), criteria.getMinAmount()));
        }
        cq.where(predicates.toArray(new Predicate[0]));
        return em.createQuery(cq).getResultList();
    }
}

// OrderRepository extends both:
public interface OrderRepository
        extends JpaRepository<Order, UUID>, OrderRepositoryCustom {}

Specification — dynamic queries

public class OrderSpecs {
    public static Specification<Order> hasStatus(OrderStatus status) {
        return (root, query, cb) -> cb.equal(root.get("status"), status);
    }

    public static Specification<Order> createdAfter(LocalDate date) {
        return (root, query, cb) -> cb.greaterThan(root.get("createdAt"), date);
    }
}

// Usage:
public interface OrderRepository
        extends JpaRepository<Order, UUID>, JpaSpecificationExecutor<Order> {}

List<Order> orders = orderRepository.findAll(
    OrderSpecs.hasStatus(ACTIVE).and(OrderSpecs.createdAfter(cutoff))
);

4. Code Examples

Auditing — automatic created/modified timestamps

@MappedSuperclass
@EntityListeners(AuditingEntityListener.class)
public abstract class BaseEntity {

    @CreatedDate
    @Column(updatable = false)
    private LocalDateTime createdAt;

    @LastModifiedDate
    private LocalDateTime updatedAt;

    @CreatedBy
    @Column(updatable = false)
    private String createdBy;

    @LastModifiedBy
    private String updatedBy;
}

@Configuration
@EnableJpaAuditing
public class AuditConfig {
    @Bean
    public AuditorAware<String> auditorProvider() {
        return () -> Optional.ofNullable(
            SecurityContextHolder.getContext().getAuthentication())
            .map(Authentication::getName);
    }
}

QueryByExample — simple dynamic search

User probe = new User();
probe.setActive(true);
probe.setRole("ADMIN");

ExampleMatcher matcher = ExampleMatcher.matching()
        .withIgnoreCase()
        .withStringMatcher(StringMatcher.CONTAINING);

List<User> admins = userRepository.findAll(Example.of(probe, matcher));

Stream for large dataset processing

@Transactional(readOnly = true)
public void exportAllUsers(Writer writer) {
    try (Stream<User> stream = userRepository.streamAllBy()) {
        stream.map(UserDto::from)
              .forEach(dto -> writeCsv(writer, dto));
    }
}

⚠️ Stream<T> requires an open transaction and a database cursor. Always close it with try-with-resources.

Soft delete with Specification

public class SoftDeleteSpec {
    public static <T> Specification<T> notDeleted() {
        return (root, query, cb) -> cb.isFalse(root.get("deleted"));
    }
}

// Every query automatically filters:
List<Order> activeOrders = orderRepository.findAll(
    OrderSpecs.hasStatus(ACTIVE).and(SoftDeleteSpec.notDeleted())
);

5. Trade-offs

Advantage	Disadvantage
No boilerplate CRUD code	Generated queries are not always optimal
Query derivation for fast prototyping	Long method names become unreadable
Unified API (JPA, Mongo, Redis)	Store-specific optimizations are lost
Built-in Pageable/Sort	COUNT queries are expensive on large tables
Audit, Specification, Projection	Learning curve for advanced features

When to use Spring Data

Standard CRUD + queries on a relational database
Rapid prototyping with a simple domain model
Unified access across multiple data stores

When NOT to use Spring Data (repository abstraction)

Complex analytical queries (JOOQ or native SQL is better)
Extreme performance requirements (JdbcTemplate, raw SQL)
Non-relational models where the repository pattern does not fit

6. Common Mistakes

❌ Excessively long derived query names

// BAD: unreadable
List<User> findByActiveAndRoleAndCreatedAtAfterAndEmailContaining(
    boolean active, String role, LocalDate date, String email);

// GOOD: use @Query or Specification
@Query("SELECT u FROM User u WHERE u.active = :active AND u.role = :role " +
       "AND u.createdAt > :date AND u.email LIKE %:email%")
List<User> searchUsers(@Param("active") boolean active,
                       @Param("role") String role,
                       @Param("date") LocalDate date,
                       @Param("email") String email);

❌ Using Page when you don't need total count

// BAD: Page<T> always executes an additional COUNT query
Page<Order> page = orderRepository.findAll(pageable);

// GOOD: use Slice if total count is not needed
Slice<Order> slice = orderRepository.findByStatus(status, pageable);

❌ Missing @Modifying on UPDATE/DELETE @Query

// BAD: throws exception — not a SELECT query
@Query("DELETE FROM User u WHERE u.active = false")
void deleteInactive();

// GOOD: @Modifying signals a write operation
@Modifying
@Query("DELETE FROM User u WHERE u.active = false")
void deleteInactive();

❌ N+1 queries in findAll

// BAD: if Order has a LAZY items collection
List<Order> orders = orderRepository.findAll();
orders.forEach(o -> o.getItems().size()); // N extra queries!

// GOOD: JOIN FETCH with custom query
@Query("SELECT o FROM Order o JOIN FETCH o.items")
List<Order> findAllWithItems();

❌ Unnecessary flush() and saveAndFlush()

JPA automatically flushes at transaction commit. Explicit flush() is only needed when you must see a generated ID or a DB constraint violation immediately.

7. Deep Dive

The proxy mechanism under the hood

Spring Boot classpath scanning discovers JpaRepository subinterfaces
JpaRepositoryFactoryBean creates a SimpleJpaRepository proxy
Query derivation tokenizes the method name and converts it to a CriteriaQuery
@Query annotations are registered as NamedQuery or NativeQuery
Every call goes through the SharedEntityManagerCreator session

Specification vs QueryDSL vs @Query

Approach	Type safety	Dynamic	Complexity
Query derivation	Compile-time (name)	No	Low
@Query JPQL	None (string)	No	Medium
Specification	Predicate-level	Yes	Medium
QueryDSL	Q-class level	Yes	Medium
Criteria API	Metamodel level	Yes	High

Spring Data JDBC vs JPA

Aspect	Spring Data JPA	Spring Data JDBC
ORM	Hibernate (full ORM)	No ORM, aggregate root
Lazy loading	Yes	No
Cache	1st + 2nd level	None
Dirty checking	Automatic	None
Complexity	High	Low

Spring Data JDBC is a good choice when you don't need Hibernate complexity but the repository pattern is still useful.

Derived delete gotcha

deleteBy... methods load the entities first, then delete them one by one (cascade, lifecycle hooks). For many rows, @Modifying @Query is more efficient because it runs a single SQL DELETE.

8. Interview Questions

What is the difference between CrudRepository and JpaRepository? CrudRepository: basic CRUD (save, findById, delete, count). JpaRepository: CrudRepository + flush, batch delete, getReferenceById, Pageable/Sort. Most projects use JpaRepository.
How does query derivation work? Spring Data tokenizes the method name (findBy, And, OrderBy, etc.) and generates a JPQL/Criteria query. It validates entity properties at startup.
When do you use @Query instead of derivation? When the method name is too long (3+ conditions), when JOIN FETCH is needed, when native SQL is needed, or when JPQL aggregation is needed (SUM, AVG).
What is Specification and when do you use it? A type-safe, reusable predicate builder for dynamic queries. Best for search forms where conditions are combined at runtime.
What is the difference between Page and Slice? Page: includes totalCount (extra COUNT query). Slice: only indicates if there is a next page. Slice is faster on large tables.
How do you solve the N+1 problem with Spring Data? JOIN FETCH with a custom @Query, @EntityGraph, or @BatchSize on the collection.
When is Spring Data JDBC better than JPA? When you don't need lazy loading, dirty checking, L2 cache, and prefer a simpler aggregate root model.

9. Glossary

Term	Meaning
Repository	Data access interface implemented by Spring Data proxy
Query derivation	Automatic query generation from method names
@Query	Manual JPQL or native SQL annotation
Projection	Selective field retrieval via interface or DTO
Specification	Type-safe, composable predicate builder
Pageable	Pagination request (page, size, sort)
Page<T>	Pagination response with totalCount
Slice<T>	Pagination response without totalCount
@Modifying	Marks a @Query as a write operation (UPDATE/DELETE)
Auditing	@CreatedDate, @LastModifiedDate automatic timestamps
SimpleJpaRepository	Default proxy implementation for JpaRepository
EntityGraph	Declarative fetch strategy specification

10. Cheatsheet

REPOSITORY HIERARCHY:
  Repository           Marker interface
  CrudRepository       save, findById, findAll, deleteById, count
  JpaRepository        + flush, batch, getReferenceById, Pageable

QUERY METHODS:
  findByX()            Query derivation (from method name)
  @Query("JPQL")       Manual JPQL
  @Query(nativeQuery)  Native SQL
  Specification        Dynamic, composable predicates
  QueryByExample       Search from probe object

PAGINATION:
  Pageable             PageRequest.of(page, size, Sort)
  Page<T>              Data + totalCount (COUNT query)
  Slice<T>             Data + hasNext (no COUNT)

WRITE QUERIES:
  @Modifying           UPDATE/DELETE @Query marker
  @Modifying(clear)    flushAutomatically, clearAutomatically

AUDITING:
  @EnableJpaAuditing   Configuration activation
  @CreatedDate         Auto creation timestamp
  @LastModifiedDate    Auto modification timestamp
  AuditorAware<T>      User name for audit

CUSTOM REPOSITORY:
  XxxRepositoryCustom          Interface
  XxxRepositoryCustomImpl      Implementation (EntityManager)
  JpaSpecificationExecutor     Specification support