Spring Boot Performance Tuning: A Practical Guide
In this article
Spring Boot makes it easy to build production-ready applications, but out-of-the-box defaults are not always optimal for high-performance scenarios. After years of optimizing enterprise applications — including our own backend at CodingAlphas — we have compiled the most impactful tuning strategies with real code examples and before/after metrics.
JVM Configuration: The Foundation
The JVM is the foundation of your Spring Boot application's performance. Getting these settings right can yield 20-40% improvement before you touch a single line of application code.
Heap Sizing
Set -Xms and -Xmx to the same value to avoid runtime heap resizing overhead. For most applications, 2-4GB is a good starting point.
# Production JVM flags for Spring Boot 3.x on Java 21
JAVA_OPTS="\
-Xms4g -Xmx4g \
-XX:+UseZGC \
-XX:+ZGenerational \
-XX:MaxGCPauseMillis=10 \
-XX:+UseStringDeduplication \
-XX:+OptimizeStringConcat \
-XX:MetaspaceSize=256m \
-XX:MaxMetaspaceSize=256m \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/log/app/heapdump.hprof \
-Djava.security.egd=file:/dev/urandom"
Garbage Collector Selection
Java 21 offers three production-ready garbage collectors. Choose based on your latency requirements:
| GC | Max Pause | Throughput | Best For |
|---|---|---|---|
| G1GC | 50-200ms | High | General purpose, balanced |
| ZGC | <10ms | Medium-High | Low-latency APIs, real-time |
| Shenandoah | <10ms | Medium | Ultra-low pause, large heaps |
Key Takeaway
At CodingAlphas, we switched from G1GC to ZGC (Generational) on our API server and reduced P99 latency from 180ms to 45ms. ZGC is the default choice for any new Spring Boot project on Java 21+.
Virtual Threads (Project Loom)
Virtual threads are the most significant JVM innovation in a decade. They allow you to write simple blocking code that scales like reactive/async code — without the complexity of reactive programming.
Enabling Virtual Threads in Spring Boot 3.2+
# application.yml
spring:
threads:
virtual:
enabled: true
# That is it. Spring Boot 3.2+ handles the rest:
# - Tomcat uses virtual threads for request handling
# - @Async methods run on virtual threads
# - Scheduled tasks use virtual threads
Before vs After Virtual Threads
Here is real benchmark data from our API server handling concurrent database queries:
| Metric | Platform Threads | Virtual Threads |
|---|---|---|
| Max concurrent requests | 200 (thread pool limit) | 10,000+ |
| Memory per thread | ~1MB stack | ~1KB initially |
| P99 under load (1K rps) | 850ms | 120ms |
| Thread creation time | ~1ms | ~1 microsecond |
Virtual Threads, ZGC, and GraalVM in Production
Switching from G1GC to ZGC (Generational) on our API server reduced P99 latency from 180ms to 45ms. Combined with virtual threads, our Spring Boot backend handles 10,000+ concurrent requests with sub-120ms response times.
Connection Pool Optimization
Database connections are often the bottleneck. HikariCP, the default in Spring Boot, is already the fastest Java connection pool, but proper configuration is critical.
# application.yml - Optimized HikariCP settings
spring:
datasource:
hikari:
# Pool size formula: (2 * CPU cores) + effective_spindle_count
# For cloud DB with SSD: (2 * 4 cores) + 1 = 9, round to 10
maximum-pool-size: 10
minimum-idle: 5
# Connection lifecycle
connection-timeout: 30000 # 30s - fail fast if pool exhausted
idle-timeout: 600000 # 10min - release idle connections
max-lifetime: 1800000 # 30min - prevent stale connections
validation-timeout: 5000 # 5s - health check timeout
# Leak detection (CRITICAL for development)
leak-detection-threshold: 60000 # 60s - warn if conn held too long
# Prepared statement caching
data-source-properties:
cachePrepStmts: true
prepStmtCacheSize: 250
prepStmtCacheSqlLimit: 2048
useServerPrepStmts: true
Connection Pool Sizing: The Math
The most common mistake is over-provisioning connections. PostgreSQL, for example, performs poorly with more than 100 connections. The optimal pool size is usually much smaller than you think:
Key Takeaway
A pool of 10 connections can handle 10,000 requests per second if your average query time is 1ms. More connections means more contention, not more throughput. Start small and measure.
Caching Strategies
Effective caching can reduce database load by 80% or more. Spring Boot makes it easy with annotation-driven caching:
// CacheConfig.java - Caffeine cache with multiple named caches
@Configuration
@EnableCaching
public class CacheConfig {
@Bean
public CacheManager cacheManager() {
CaffeineCacheManager manager = new CaffeineCacheManager();
manager.setCaffeine(Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.recordStats()); // Enable cache hit/miss metrics
return manager;
}
}
// Usage in service layer
@Service
public class ProductService {
@Cacheable(value = "products", key = "#id")
public Product getProduct(Long id) {
return productRepository.findById(id)
.orElseThrow(() -> new NotFoundException("Product not found"));
}
@CacheEvict(value = "products", key = "#product.id")
public Product updateProduct(Product product) {
return productRepository.save(product);
}
@CacheEvict(value = "products", allEntries = true)
@Scheduled(fixedRate = 3600000) // Clear cache every hour
public void evictAllProducts() {
log.info("Product cache cleared");
}
}
Multi-Layer Caching
- L1 - Caffeine (in-process): Sub-microsecond access. Use for frequently accessed, small datasets.
- L2 - Redis (distributed): ~1ms access. Use for shared state across multiple app instances.
- L3 - HTTP caching: ETags and Cache-Control headers for API responses. Reduces server load entirely.
Native Compilation with GraalVM
GraalVM native images compile your Spring Boot application ahead-of-time into a standalone executable. The benefits are dramatic:
| Metric | JVM | Native Image |
|---|---|---|
| Startup time | 3.2 seconds | 0.08 seconds |
| Memory at idle | 380MB | 62MB |
| Docker image size | 320MB | 85MB |
| Build time | 30 seconds | 5-10 minutes |
<!-- pom.xml - GraalVM Native Image plugin -->
<plugin>
<groupId>org.graalvm.buildtools</groupId>
<artifactId>native-maven-plugin</artifactId>
<configuration>
<buildArgs>
<arg>--initialize-at-build-time</arg>
<arg>-H:+ReportExceptionStackTraces</arg>
</buildArgs>
</configuration>
</plugin>
<!-- Build with: mvn -Pnative native:compile -->
Native images are ideal for serverless (AWS Lambda, Cloud Functions) and containerized microservices where startup time and memory footprint matter. For long-running monoliths, the JVM's JIT compilation still delivers better peak throughput.
Async Processing
Move non-critical operations off the request thread to improve response times:
@Service
public class OrderService {
@Async // Runs on virtual thread with Spring Boot 3.2+
public CompletableFuture<Void> processOrderAsync(Order order) {
// Send confirmation email
emailService.sendOrderConfirmation(order);
// Update analytics
analyticsService.trackOrder(order);
// Generate invoice PDF
invoiceService.generateInvoice(order);
return CompletableFuture.completedFuture(null);
}
public Order createOrder(OrderRequest request) {
Order order = orderRepository.save(toOrder(request));
// Non-critical work happens async
processOrderAsync(order);
// Return immediately - user does not wait for emails/PDFs
return order;
}
}
Benchmarking Methodology
You cannot optimize what you do not measure. Here is the benchmarking methodology we use at CodingAlphas:
Tools
- wrk / wrk2: HTTP benchmarking with constant throughput mode for accurate latency measurements.
- JMH (Java Microbenchmark Harness): For micro-benchmarking specific code paths.
- async-profiler: Low-overhead CPU and allocation profiling in production.
- Gatling: For realistic load testing with user journey simulation.
# wrk2 benchmark: constant throughput of 1000 req/s for 60 seconds
wrk -t4 -c100 -d60s -R1000 --latency http://localhost:8080/api/products
# Results interpretation:
# P50 = median user experience
# P99 = worst 1% experience (SLA target)
# P99.9 = tail latency (indicator of GC pauses or lock contention)
Key Takeaway
Always benchmark with constant throughput (wrk2), not open-loop (wrk). Open-loop benchmarks hide latency issues because they reduce request rate as the server slows down — the exact opposite of real-world traffic patterns.
Monitoring and Profiling
Set up observability before you need it. Our standard Spring Boot monitoring stack:
# application.yml - Comprehensive monitoring setup
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
metrics:
tags:
application: my-service
distribution:
percentiles-histogram:
http.server.requests: true
sla:
http.server.requests: 50ms,100ms,200ms,500ms
# Custom business metrics
@Component
public class BusinessMetrics {
private final MeterRegistry registry;
private final Counter orderCounter;
private final Timer paymentTimer;
public BusinessMetrics(MeterRegistry registry) {
this.registry = registry;
this.orderCounter = Counter.builder("business.orders.created")
.tag("tier", "unknown")
.register(registry);
this.paymentTimer = Timer.builder("business.payment.processing")
.publishPercentileHistogram()
.register(registry);
}
}
Conclusion and Next Steps
Performance tuning is iterative. Here is the order of operations we recommend:
- Enable virtual threads — one line of config, massive concurrency improvement.
- Switch to ZGC — eliminates GC pause spikes with minimal throughput cost.
- Profile and right-size — use async-profiler to find actual bottlenecks before optimizing.
- Optimize database access — connection pool tuning + caching delivers the biggest real-world gains.
- Add monitoring — you cannot maintain performance without visibility.
- Consider native images — for serverless or microservices where startup time matters.
Need help optimizing your Spring Boot application? At CodingAlphas, we have tuned Java backends handling millions of requests per day. Get a quote for a performance audit, or explore our guide on Kubernetes cost optimization to reduce your infrastructure spend alongside your application tuning.
Written by
CodingAlphas Team
The CodingAlphas engineering team specializes in high-performance Java applications. Our backend runs on Spring Boot 3.2, and we practice what we preach.
Related Articles
Kubernetes Cost Optimization: Reduce Cloud Spend by 40%
Practical strategies to cut Kubernetes costs without sacrificing reliability or performance.
Building a SaaS MVP: The Complete Technical Guide
From architecture decisions to launch checklist, everything you need to build and ship your SaaS minimum viable product.