22 KiB
Elasticsearch Setup Guide
This guide documents the Elasticsearch search engine setup for full-text search, multilingual content indexing, and location-based queries in the application.
SECURITY
Elasticsearch is a DATABASE and must NEVER be exposed to the internet without proper security!
Default Security Risks
By default, Elasticsearch 7.x ships with:
- NO authentication - anyone can read/write/delete all data
- NO encryption - all data transmitted in plain text
- NO access control - full admin access for anyone who connects
Required Security Configuration
For Development (Local Machine Only):
# /etc/elasticsearch/elasticsearch.yml
network.host: 127.0.0.1 # ONLY localhost - NOT 0.0.0.0!
http.port: 9200
xpack.security.enabled: false # OK for localhost-only
For Production/Remote Servers:
# /etc/elasticsearch/elasticsearch.yml
network.host: 127.0.0.1 # ONLY localhost - use reverse proxy if needed
http.port: 9200
xpack.security.enabled: true # REQUIRED for any server accessible remotely
xpack.security.transport.ssl.enabled: true
Verify Your Server Is NOT Exposed
Check what interface Elasticsearch is listening on:
ss -tlnp | grep 9200
SAFE - Should show ONLY localhost addresses:
127.0.0.1:9200 # IPv4 localhost
[::1]:9200 # IPv6 localhost
[::ffff:127.0.0.1]:9200 # IPv6-mapped IPv4 localhost (also safe!)
Note: The [::ffff:127.0.0.1] format is the IPv6 representation of IPv4 localhost - it's still localhost-only and secure.
DANGER - If you see any of these, YOU ARE EXPOSED:
0.0.0.0:9200 # Listening on ALL interfaces - EXPOSED!
*:9200 # Listening on ALL interfaces - EXPOSED!
YOUR_PUBLIC_IP:9200 # Listening on public IP - EXPOSED!
Test external accessibility:
# From another machine or from the internet
curl http://YOUR_SERVER_IP:9200
# Should get: Connection refused (GOOD!)
# If you get a JSON response - YOU ARE EXPOSED TO THE INTERNET!
What Happens If Exposed?
If Elasticsearch is exposed to the internet without authentication:
- Attackers can read all your data (users, emails, private information)
- Attackers can delete all your indices (all search data gone)
- Attackers can modify data (corrupt your search results)
- Attackers can execute scripts (potential remote code execution)
Real-world attacks:
- Ransomware attacks encrypting Elasticsearch data
- Mass data exfiltration of exposed databases
- Bitcoin mining malware installation
- Complete data deletion with ransom demands
Immediate Actions If You Discover Exposure
- IMMEDIATELY stop Elasticsearch:
sudo systemctl stop elasticsearch
- Fix the configuration:
sudo nano /etc/elasticsearch/elasticsearch.yml
# Set: network.host: 127.0.0.1
# Set: xpack.security.enabled: true
- Enable authentication and set passwords:
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
- Restart with fixed configuration:
sudo systemctl start elasticsearch
- Verify it's no longer accessible:
curl http://YOUR_SERVER_IP:9200
# Should show: Connection refused
- Review logs for unauthorized access:
sudo grep -i "unauthorized\|access denied\|failed\|401\|403" /var/log/elasticsearch/*.log
Overview
The application uses Elasticsearch 7.17.24 with Laravel Scout for:
- Full-text search across Users, Organizations, Banks, and Posts
- Multilingual search with language-specific analyzers (EN, NL, DE, ES, FR)
- Location-based search with edge n-gram tokenization
- Skill and tag matching with boost factors
- Autocomplete suggestions
- Custom search optimization with configurable boost factors
Scout Driver: matchish/laravel-scout-elasticsearch v7.12.0
Elasticsearch Client: elasticsearch/elasticsearch v8.19.0
Prerequisites
- PHP 8.3+ with required extensions
- MySQL/MariaDB database (primary data source)
- Redis server (for Scout queue)
- Java Runtime Environment (JRE) 11+ for Elasticsearch
- At least 4GB RAM available for Elasticsearch (8GB+ recommended for production)
Installation
1. Install Elasticsearch
On Ubuntu/Debian:
# Import the Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
# Add the Elasticsearch repository
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
# Update package list and install
sudo apt-get update
sudo apt-get install elasticsearch=7.17.24
# Hold the package to prevent unwanted upgrades
sudo apt-mark hold elasticsearch
On CentOS/RHEL:
# Import the GPG key
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
# Create repository file
cat <<EOF | sudo tee /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-7.x]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
# Install specific version
sudo yum install elasticsearch-7.17.24
2. Configure Elasticsearch
Basic Configuration
Edit /etc/elasticsearch/elasticsearch.yml:
# Cluster name (single-node setup)
cluster.name: elasticsearch
# Node name
node.name: node-1
# Network settings for local development
network.host: 127.0.0.1
http.port: 9200
# Discovery settings (single-node)
discovery.type: single-node
# Path settings (default, can be customized)
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
# Security (disabled for local development, enable for production)
xpack.security.enabled: false
Memory Configuration
Configure JVM heap size in /etc/elasticsearch/jvm.options.d/heap.options:
# Development: 2-4GB
-Xms2g
-Xmx2g
# Production: 8-16GB (50% of system RAM, max 32GB)
# -Xms16g
# -Xmx16g
Important Memory Guidelines:
- Set
-Xmsand-Xmxto the same value - Never exceed 50% of total system RAM
- Never exceed 32GB (compressed oops limit)
- Leave at least 50% of RAM for the OS file cache
System Limits
The systemd service already configures these limits:
LimitNOFILE=65535
LimitNPROC=4096
LimitAS=infinity
If running manually, also set in /etc/security/limits.conf:
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft nproc 4096
elasticsearch hard nproc 4096
3. Start and Enable Elasticsearch
# Start Elasticsearch
sudo systemctl start elasticsearch
# Enable to start on boot
sudo systemctl enable elasticsearch
# Check status
sudo systemctl status elasticsearch
# View logs
sudo journalctl -u elasticsearch -f
4. Verify Installation
# Test connection
curl http://localhost:9200
# Expected output:
# {
# "name" : "node-1",
# "cluster_name" : "elasticsearch",
# "version" : {
# "number" : "7.17.24",
# ...
# },
# "tagline" : "You Know, for Search"
# }
# Check cluster health
curl http://localhost:9200/_cluster/health?pretty
# Check available indices
curl http://localhost:9200/_cat/indices?v
Laravel Application Configuration
1. Environment Variables
Configure Elasticsearch connection in .env:
# Search configuration
SCOUT_DRIVER=matchish-elasticsearch
SCOUT_QUEUE=true
SCOUT_PREFIX=
# Elasticsearch connection
ELASTICSEARCH_HOST=localhost:9200
# ELASTICSEARCH_USER=elastic # Uncomment for production with auth
# ELASTICSEARCH_PASSWORD=your_password # Uncomment for production with auth
# Queue for background indexing (recommended)
QUEUE_CONNECTION=redis
2. Configuration Files
The application has extensive Elasticsearch configuration:
config/scout.php
- Driver:
matchish-elasticsearch - Queue enabled for async indexing
- Chunk size: 500 records per batch
- Soft deletes: Not kept in search index
config/elasticsearch.php
- Index mappings for all searchable models (825 lines!)
- Language-specific analyzers (NL, EN, FR, DE, ES)
- Custom analyzers for names and locations
- Date format handling
- Field boost configuration
config/timebank-cc.php (search section)
- Boost factors for fields and models
- Search behavior (type, fragment size, highlighting)
- Maximum results and caching
- Model indices to search
- Suggestion count
3. Searchable Models
The following models use Scout's Searchable trait:
- User →
users_index - Organization →
organizations_index - Bank →
banks_index - Post →
posts_index - Transaction →
transactions_index - Tag →
tags_index
Each model defines:
searchableAs(): Index nametoSearchableArray(): Data structure for indexing
Index Management
Creating Indices
Indices are automatically created when you import data:
# Import all models (creates indices with timestamps)
php artisan scout:import "App\Models\User"
php artisan scout:import "App\Models\Organization"
php artisan scout:import "App\Models\Bank"
php artisan scout:import "App\Models\Post"
# Queue-based import (recommended for large datasets)
php artisan scout:queue-import "App\Models\User"
Index Naming: Indices are created with timestamps (e.g., users_index_1758826582) and aliases are used for stable names.
Reindexing Script
The application includes a comprehensive reindexing script at re-index-search.sh:
# Run the reindexing script
./re-index-search.sh
What it does:
- Cleans up old indices and removes conflicts
- Waits for cluster health
- Imports all models (Users, Organizations, Banks, Posts)
- Creates stable aliases pointing to latest timestamped indices
- Shows final index and alias status
Important: The script uses SCOUT_QUEUE=false to force immediate indexing, bypassing the queue for reliable completion.
Manual Index Operations
# Flush (delete) an index
php artisan scout:flush "App\Models\User"
# Delete a specific index
php artisan scout:delete-index users_index_1758826582
# Delete all indices
php artisan scout:delete-all-indexes
# Create a new index
php artisan scout:index users_index
# Check indices via curl
curl http://localhost:9200/_cat/indices?v
# Check aliases
curl http://localhost:9200/_cat/aliases?v
Search Features
Multilingual Search
The configuration supports 5 languages with dedicated analyzers:
Language Analyzers:
analyzer_nl: Dutch (stop words + stemming)analyzer_en: English (stop words + stemming)analyzer_fr: French (stop words + stemming)analyzer_de: German (stop words + stemming)analyzer_es: Spanish (stop words + stemming)
Special Analyzers:
name_analyzer: For profile names with edge n-grams (autocomplete)locations_analyzer: For cities/districts with custom stop wordsanalyzer_general: Generic tokenization for general text
Boost Configuration
Field boost factors (configured in config/timebank-cc.php):
Profile Fields:
'name' => 1,
'full_name' => 1,
'cyclos_skills' => 1.5,
'tags' => 2, // Highest boost
'tag_categories' => 1.4,
'motivation' => 1,
'about_short' => 1,
'about' => 1,
Post Fields:
'title' => 2, // Highest boost
'excerpt' => 1.5,
'content' => 1,
'post_category_name' => 2, // High boost
Model Boost (score multipliers):
'user' => 1, // Baseline
'organization' => 3, // 3x boost
'bank' => 3, // 3x boost
'post' => 4, // 4x boost (highest)
Location-Based Search
The application has advanced location boost factors:
'same_district' => 5.0, // Highest boost
'same_city' => 3.0, // High boost
'same_division' => 2.0, // Medium boost
'same_country' => 1.5, // Base boost
'different_country' => 1.0, // Neutral
'no_location' => 0.9, // Slight penalty
Search Highlighting
Search results include highlighted matches:
'fragment_size' => 80, // Characters per fragment
'number_of_fragments' => 2, // Max fragments
'pre-tags' => '<span class="font-semibold text-white leading-tight">',
'post-tags' => '</span>',
Caching
Search results are cached for performance:
'cache_results' => 5, // TTL in minutes
Index Structure Examples
Users Index Mapping
{
"users_index": {
"properties": {
"id": { "type": "keyword" },
"name": {
"type": "text",
"analyzer": "name_analyzer",
"fields": {
"keyword": { "type": "keyword" },
"suggest": { "type": "completion" }
}
},
"about_nl": { "type": "text", "analyzer": "analyzer_nl" },
"about_en": { "type": "text", "analyzer": "analyzer_en" },
"about_fr": { "type": "text", "analyzer": "analyzer_fr" },
"about_de": { "type": "text", "analyzer": "analyzer_de" },
"about_es": { "type": "text", "analyzer": "analyzer_es" },
"locations": {
"properties": {
"district": { "type": "text", "analyzer": "locations_analyzer" },
"city": { "type": "text", "analyzer": "locations_analyzer" },
"division": { "type": "text", "analyzer": "locations_analyzer" },
"country": { "type": "text", "analyzer": "locations_analyzer" }
}
},
"tags": {
"properties": {
"contexts": {
"properties": {
"tags": {
"properties": {
"name_nl": { "type": "text", "analyzer": "analyzer_nl" },
"name_en": { "type": "text", "analyzer": "analyzer_en" }
// ... other languages
}
}
}
}
}
}
}
}
}
Posts Index Mapping
{
"posts_index": {
"properties": {
"id": { "type": "keyword" },
"category_id": { "type": "integer" },
"status": { "type": "keyword" },
"featured": { "type": "boolean" },
"post_translations": {
"properties": {
"title_nl": {
"type": "text",
"analyzer": "analyzer_nl",
"fields": {
"keyword": { "type": "keyword" },
"suggest": { "type": "completion" }
}
},
"title_en": {
"type": "text",
"analyzer": "analyzer_en",
"fields": {
"keyword": { "type": "keyword" },
"suggest": { "type": "completion" }
}
},
"content_nl": { "type": "text", "analyzer": "analyzer_nl" },
"content_en": { "type": "text", "analyzer": "analyzer_en" },
"from_nl": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||strict_date_optional_time||epoch_millis"
}
// ... other languages and fields
}
}
}
}
}
Troubleshooting
Elasticsearch Won't Start
Problem: Service fails to start
Solutions:
- Check memory settings:
# View JVM settings
cat /etc/elasticsearch/jvm.options.d/heap.options
# Check available system memory
free -h
# Ensure heap size doesn't exceed 50% of RAM
- Check disk space:
df -h /var/lib/elasticsearch
- Check logs:
sudo journalctl -u elasticsearch -n 100 --no-pager
sudo tail -f /var/log/elasticsearch/elasticsearch.log
- Check Java installation:
java -version
Connection Refused
Problem: Cannot connect to Elasticsearch
Solutions:
- Verify Elasticsearch is running:
sudo systemctl status elasticsearch
- Check port binding:
ss -tlnp | grep 9200
- Check configuration:
sudo grep -E "^network.host|^http.port" /etc/elasticsearch/elasticsearch.yml
- Test connection:
curl http://localhost:9200
Index Not Found
Problem: index_not_found_exception when searching
Solutions:
- Check if indices exist:
curl http://localhost:9200/_cat/indices?v
- Check if aliases exist:
curl http://localhost:9200/_cat/aliases?v
- Reimport the model:
php artisan scout:import "App\Models\User"
- Or run the full reindex script:
./re-index-search.sh
Slow Indexing / High Memory Usage
Problem: Indexing takes too long or uses excessive memory
Solutions:
- Enable queue for async indexing in
.env:
SCOUT_QUEUE=true
QUEUE_CONNECTION=redis
- Start queue worker:
php artisan queue:work --queue=high,default
- Reduce chunk size in
config/scout.php:
'chunk' => [
'searchable' => 250, // Reduced from 500
],
- Monitor Elasticsearch memory:
curl http://localhost:9200/_nodes/stats/jvm?pretty
Search Results Are Incorrect
Problem: Search doesn't return expected results
Solutions:
- Check index mapping:
curl http://localhost:9200/users_index/_mapping?pretty
- Test query directly:
curl -X GET "localhost:9200/users_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "test"
}
}
}
'
- Clear and rebuild index:
php artisan scout:flush "App\Models\User"
php artisan scout:import "App\Models\User"
- Check Scout queue jobs:
php artisan queue:failed
php artisan queue:retry all
Out of Memory Errors
Problem: OutOfMemoryError in Elasticsearch logs
Solutions:
- Increase JVM heap (but respect limits):
# Edit /etc/elasticsearch/jvm.options.d/heap.options
-Xms4g
-Xmx4g
- Restart Elasticsearch:
sudo systemctl restart elasticsearch
- Monitor memory usage:
watch -n 1 'curl -s http://localhost:9200/_cat/nodes?v&h=heap.percent,ram.percent'
- Clear fielddata cache:
curl -X POST "localhost:9200/_cache/clear?fielddata=true"
Shards Unassigned
Problem: Yellow or red cluster health
Solutions:
- Check cluster health:
curl http://localhost:9200/_cluster/health?pretty
- Check shard allocation:
curl http://localhost:9200/_cat/shards?v
- For single-node setup, set replicas to 0:
curl -X PUT "localhost:9200/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"number_of_replicas": 0
}
}
'
Production Recommendations
Security
- Enable X-Pack Security:
Edit /etc/elasticsearch/elasticsearch.yml:
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
- Set passwords:
/usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto
- Update
.env:
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=generated_password
Performance Optimization
- Increase file descriptors:
# /etc/security/limits.conf
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
- Disable swapping:
# /etc/elasticsearch/elasticsearch.yml
bootstrap.memory_lock: true
Edit /etc/systemd/system/elasticsearch.service.d/override.conf:
[Service]
LimitMEMLOCK=infinity
- Use SSD for data directory:
# /etc/elasticsearch/elasticsearch.yml
path.data: /mnt/ssd/elasticsearch
- Set appropriate refresh interval:
curl -X PUT "localhost:9200/users_index/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"refresh_interval": "30s"
}
}
'
Backup and Restore
- Configure snapshot repository:
curl -X PUT "localhost:9200/_snapshot/backup_repo" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/var/backups/elasticsearch",
"compress": true
}
}
'
- Create snapshot:
curl -X PUT "localhost:9200/_snapshot/backup_repo/snapshot_1?wait_for_completion=true"
- Restore snapshot:
curl -X POST "localhost:9200/_snapshot/backup_repo/snapshot_1/_restore"
Monitoring
- Check cluster stats:
curl http://localhost:9200/_cluster/stats?pretty
- Monitor node stats:
curl http://localhost:9200/_nodes/stats?pretty
- Check index stats:
curl http://localhost:9200/_stats?pretty
- Set up monitoring with Kibana (optional):
sudo apt-get install kibana=7.17.24
sudo systemctl enable kibana
sudo systemctl start kibana
Quick Reference
Essential Commands
# Service management
sudo systemctl start elasticsearch
sudo systemctl stop elasticsearch
sudo systemctl restart elasticsearch
sudo systemctl status elasticsearch
# Check health
curl http://localhost:9200
curl http://localhost:9200/_cluster/health?pretty
curl http://localhost:9200/_cat/indices?v
# Laravel Scout commands
php artisan scout:import "App\Models\User"
php artisan scout:flush "App\Models\User"
php artisan scout:delete-all-indexes
# Reindex everything
./re-index-search.sh
# Queue worker for async indexing
php artisan queue:work --queue=high,default
Configuration Files
.env- Connection and driver configurationconfig/scout.php- Laravel Scout settingsconfig/elasticsearch.php- Index mappings and analyzers (825 lines!)config/timebank-cc.php- Search boost factors and behavior/etc/elasticsearch/elasticsearch.yml- Elasticsearch server config/etc/elasticsearch/jvm.options.d/heap.options- JVM memory settings/usr/lib/systemd/system/elasticsearch.service- systemd service
Important Paths
- Data:
/var/lib/elasticsearch - Logs:
/var/log/elasticsearch - Config:
/etc/elasticsearch - Binary:
/usr/share/elasticsearch
Additional Resources
- Elasticsearch Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/
- Laravel Scout: https://laravel.com/docs/10.x/scout
- Matchish Scout Elasticsearch: https://github.com/matchish/laravel-scout-elasticsearch
- Elasticsearch DSL: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html
- Language Analyzers: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-lang-analyzer.html
Notes
- This application uses a multilingual search setup with custom analyzers
- The
config/elasticsearch.phpfile is extensive (825 lines) with detailed field mappings - Location-based search uses edge n-grams for autocomplete functionality
- Tags and categories have hierarchical support with multilingual translations
- The reindexing script handles index versioning and aliasing automatically
- Memory requirements are significant during indexing (plan accordingly)