Spring Boot Performance Tuning: A Practical Guide

Spring Boot makes it easy to build production-ready applications, but out-of-the-box defaults are not always optimal for high-performance scenarios. After years of optimizing enterprise applications — including our own backend at CodingAlphas — we have compiled the most impactful tuning strategies with real code examples and before/after metrics.

JVM Configuration: The Foundation

The JVM is the foundation of your Spring Boot application's performance. Getting these settings right can yield 20-40% improvement before you touch a single line of application code.

Heap Sizing

Set -Xms and -Xmx to the same value to avoid runtime heap resizing overhead. For most applications, 2-4GB is a good starting point.

# Production JVM flags for Spring Boot 3.x on Java 21
JAVA_OPTS="\
  -Xms4g -Xmx4g \
  -XX:+UseZGC \
  -XX:+ZGenerational \
  -XX:MaxGCPauseMillis=10 \
  -XX:+UseStringDeduplication \
  -XX:+OptimizeStringConcat \
  -XX:MetaspaceSize=256m \
  -XX:MaxMetaspaceSize=256m \
  -XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/var/log/app/heapdump.hprof \
  -Djava.security.egd=file:/dev/urandom"

Garbage Collector Selection

Java 21 offers three production-ready garbage collectors. Choose based on your latency requirements:

GC	Max Pause	Throughput	Best For
G1GC	50-200ms	High	General purpose, balanced
ZGC	<10ms	Medium-High	Low-latency APIs, real-time
Shenandoah	<10ms	Medium	Ultra-low pause, large heaps

Key Takeaway

At CodingAlphas, we switched from G1GC to ZGC (Generational) on our API server and reduced P99 latency from 180ms to 45ms. ZGC is the default choice for any new Spring Boot project on Java 21+.

Virtual Threads (Project Loom)

Virtual threads are the most significant JVM innovation in a decade. They allow you to write simple blocking code that scales like reactive/async code — without the complexity of reactive programming.

Enabling Virtual Threads in Spring Boot 3.2+

# application.yml
spring:
  threads:
    virtual:
      enabled: true

# That is it. Spring Boot 3.2+ handles the rest:
# - Tomcat uses virtual threads for request handling
# - @Async methods run on virtual threads
# - Scheduled tasks use virtual threads

Before vs After Virtual Threads

Here is real benchmark data from our API server handling concurrent database queries:

Metric	Platform Threads	Virtual Threads
Max concurrent requests	200 (thread pool limit)	10,000+
Memory per thread	~1MB stack	~1KB initially
P99 under load (1K rps)	850ms	120ms
Thread creation time	~1ms	~1 microsecond

Virtual Threads, ZGC, and GraalVM in Production

Switching from G1GC to ZGC (Generational) on our API server reduced P99 latency from 180ms to 45ms. Combined with virtual threads, our Spring Boot backend handles 10,000+ concurrent requests with sub-120ms response times.

Connection Pool Optimization

Database connections are often the bottleneck. HikariCP, the default in Spring Boot, is already the fastest Java connection pool, but proper configuration is critical.

# application.yml - Optimized HikariCP settings
spring:
  datasource:
    hikari:
      # Pool size formula: (2 * CPU cores) + effective_spindle_count
      # For cloud DB with SSD: (2 * 4 cores) + 1 = 9, round to 10
      maximum-pool-size: 10
      minimum-idle: 5

      # Connection lifecycle
      connection-timeout: 30000     # 30s - fail fast if pool exhausted
      idle-timeout: 600000          # 10min - release idle connections
      max-lifetime: 1800000         # 30min - prevent stale connections
      validation-timeout: 5000     # 5s - health check timeout

      # Leak detection (CRITICAL for development)
      leak-detection-threshold: 60000  # 60s - warn if conn held too long

      # Prepared statement caching
      data-source-properties:
        cachePrepStmts: true
        prepStmtCacheSize: 250
        prepStmtCacheSqlLimit: 2048
        useServerPrepStmts: true

Connection Pool Sizing: The Math

The most common mistake is over-provisioning connections. PostgreSQL, for example, performs poorly with more than 100 connections. The optimal pool size is usually much smaller than you think:

Key Takeaway

A pool of 10 connections can handle 10,000 requests per second if your average query time is 1ms. More connections means more contention, not more throughput. Start small and measure.

Caching Strategies

Effective caching can reduce database load by 80% or more. Spring Boot makes it easy with annotation-driven caching:

// CacheConfig.java - Caffeine cache with multiple named caches
@Configuration
@EnableCaching
public class CacheConfig {

    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();
        manager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(Duration.ofMinutes(10))
            .recordStats());  // Enable cache hit/miss metrics
        return manager;
    }
}

// Usage in service layer
@Service
public class ProductService {

    @Cacheable(value = "products", key = "#id")
    public Product getProduct(Long id) {
        return productRepository.findById(id)
            .orElseThrow(() -> new NotFoundException("Product not found"));
    }

    @CacheEvict(value = "products", key = "#product.id")
    public Product updateProduct(Product product) {
        return productRepository.save(product);
    }

    @CacheEvict(value = "products", allEntries = true)
    @Scheduled(fixedRate = 3600000)  // Clear cache every hour
    public void evictAllProducts() {
        log.info("Product cache cleared");
    }
}

Multi-Layer Caching

L1 - Caffeine (in-process): Sub-microsecond access. Use for frequently accessed, small datasets.
L2 - Redis (distributed): ~1ms access. Use for shared state across multiple app instances.
L3 - HTTP caching: ETags and Cache-Control headers for API responses. Reduces server load entirely.

Native Compilation with GraalVM

GraalVM native images compile your Spring Boot application ahead-of-time into a standalone executable. The benefits are dramatic:

Metric	JVM	Native Image
Startup time	3.2 seconds	0.08 seconds
Memory at idle	380MB	62MB
Docker image size	320MB	85MB
Build time	30 seconds	5-10 minutes

<!-- pom.xml - GraalVM Native Image plugin -->
<plugin>
    <groupId>org.graalvm.buildtools</groupId>
    <artifactId>native-maven-plugin</artifactId>
    <configuration>
        <buildArgs>
            <arg>--initialize-at-build-time</arg>
            <arg>-H:+ReportExceptionStackTraces</arg>
        </buildArgs>
    </configuration>
</plugin>

<!-- Build with: mvn -Pnative native:compile -->

Native images are ideal for serverless (AWS Lambda, Cloud Functions) and containerized microservices where startup time and memory footprint matter. For long-running monoliths, the JVM's JIT compilation still delivers better peak throughput.

Async Processing

Move non-critical operations off the request thread to improve response times:

@Service
public class OrderService {

    @Async  // Runs on virtual thread with Spring Boot 3.2+
    public CompletableFuture<Void> processOrderAsync(Order order) {
        // Send confirmation email
        emailService.sendOrderConfirmation(order);

        // Update analytics
        analyticsService.trackOrder(order);

        // Generate invoice PDF
        invoiceService.generateInvoice(order);

        return CompletableFuture.completedFuture(null);
    }

    public Order createOrder(OrderRequest request) {
        Order order = orderRepository.save(toOrder(request));

        // Non-critical work happens async
        processOrderAsync(order);

        // Return immediately - user does not wait for emails/PDFs
        return order;
    }
}

Benchmarking Methodology

You cannot optimize what you do not measure. Here is the benchmarking methodology we use at CodingAlphas:

Tools

wrk / wrk2: HTTP benchmarking with constant throughput mode for accurate latency measurements.
JMH (Java Microbenchmark Harness): For micro-benchmarking specific code paths.
async-profiler: Low-overhead CPU and allocation profiling in production.
Gatling: For realistic load testing with user journey simulation.

# wrk2 benchmark: constant throughput of 1000 req/s for 60 seconds
wrk -t4 -c100 -d60s -R1000 --latency http://localhost:8080/api/products

# Results interpretation:
# P50 = median user experience
# P99 = worst 1% experience (SLA target)
# P99.9 = tail latency (indicator of GC pauses or lock contention)

Key Takeaway

Always benchmark with constant throughput (wrk2), not open-loop (wrk). Open-loop benchmarks hide latency issues because they reduce request rate as the server slows down — the exact opposite of real-world traffic patterns.

Monitoring and Profiling

Set up observability before you need it. Our standard Spring Boot monitoring stack:

# application.yml - Comprehensive monitoring setup
management:
  endpoints:
    web:
      exposure:
        include: health,metrics,prometheus
  metrics:
    tags:
      application: my-service
    distribution:
      percentiles-histogram:
        http.server.requests: true
      sla:
        http.server.requests: 50ms,100ms,200ms,500ms

# Custom business metrics
@Component
public class BusinessMetrics {
    private final MeterRegistry registry;
    private final Counter orderCounter;
    private final Timer paymentTimer;

    public BusinessMetrics(MeterRegistry registry) {
        this.registry = registry;
        this.orderCounter = Counter.builder("business.orders.created")
            .tag("tier", "unknown")
            .register(registry);
        this.paymentTimer = Timer.builder("business.payment.processing")
            .publishPercentileHistogram()
            .register(registry);
    }
}

Conclusion and Next Steps

Performance tuning is iterative. Here is the order of operations we recommend:

Enable virtual threads — one line of config, massive concurrency improvement.
Switch to ZGC — eliminates GC pause spikes with minimal throughput cost.
Profile and right-size — use async-profiler to find actual bottlenecks before optimizing.
Optimize database access — connection pool tuning + caching delivers the biggest real-world gains.
Add monitoring — you cannot maintain performance without visibility.
Consider native images — for serverless or microservices where startup time matters.

Need help optimizing your Spring Boot application? At CodingAlphas, we have tuned Java backends handling millions of requests per day. Get a quote for a performance audit, or explore our guide on Kubernetes cost optimization to reduce your infrastructure spend alongside your application tuning.

Spring Boot Performance Tuning: A Practical Guide

JVM Configuration: The Foundation

Heap Sizing

Garbage Collector Selection

Virtual Threads (Project Loom)

Enabling Virtual Threads in Spring Boot 3.2+

Before vs After Virtual Threads

Connection Pool Optimization

Connection Pool Sizing: The Math

Caching Strategies

Multi-Layer Caching

Native Compilation with GraalVM

Async Processing

Benchmarking Methodology

Tools

Monitoring and Profiling

Conclusion and Next Steps

Related Articles

Kubernetes Cost Optimization: Reduce Cloud Spend by 40%

Building a SaaS MVP: The Complete Technical Guide

Related Case Studies

HealthTrack Patient Portal

Want to work with us?