# Elasticsearch Setup Guide This guide documents the Elasticsearch search engine setup for full-text search, multilingual content indexing, and location-based queries in the application. ## SECURITY **Elasticsearch is a DATABASE and must NEVER be exposed to the internet without proper security!** ### Default Security Risks By default, Elasticsearch 7.x ships with: - **NO authentication** - anyone can read/write/delete all data - **NO encryption** - all data transmitted in plain text - **NO access control** - full admin access for anyone who connects ### Required Security Configuration **For Development (Local Machine Only):** ```yaml # /etc/elasticsearch/elasticsearch.yml network.host: 127.0.0.1 # ONLY localhost - NOT 0.0.0.0! http.port: 9200 xpack.security.enabled: false # OK for localhost-only ``` **For Production/Remote Servers:** ```yaml # /etc/elasticsearch/elasticsearch.yml network.host: 127.0.0.1 # ONLY localhost - use reverse proxy if needed http.port: 9200 xpack.security.enabled: true # REQUIRED for any server accessible remotely xpack.security.transport.ssl.enabled: true ``` ### Verify Your Server Is NOT Exposed **Check what interface Elasticsearch is listening on:** ```bash ss -tlnp | grep 9200 ``` **SAFE** - Should show ONLY localhost addresses: ``` 127.0.0.1:9200 # IPv4 localhost [::1]:9200 # IPv6 localhost [::ffff:127.0.0.1]:9200 # IPv6-mapped IPv4 localhost (also safe!) ``` **Note**: The `[::ffff:127.0.0.1]` format is the IPv6 representation of IPv4 localhost - it's still localhost-only and secure. **DANGER** - If you see any of these, YOU ARE EXPOSED: ``` 0.0.0.0:9200 # Listening on ALL interfaces - EXPOSED! *:9200 # Listening on ALL interfaces - EXPOSED! YOUR_PUBLIC_IP:9200 # Listening on public IP - EXPOSED! ``` **Test external accessibility:** ```bash # From another machine or from the internet curl http://YOUR_SERVER_IP:9200 # Should get: Connection refused (GOOD!) # If you get a JSON response - YOU ARE EXPOSED TO THE INTERNET! ``` ### What Happens If Exposed? If Elasticsearch is exposed to the internet without authentication: 1. Attackers can **read all your data** (users, emails, private information) 2. Attackers can **delete all your indices** (all search data gone) 3. Attackers can **modify data** (corrupt your search results) 4. Attackers can **execute scripts** (potential remote code execution) **Real-world attacks:** - Ransomware attacks encrypting Elasticsearch data - Mass data exfiltration of exposed databases - Bitcoin mining malware installation - Complete data deletion with ransom demands ### Immediate Actions If You Discover Exposure 1. **IMMEDIATELY stop Elasticsearch:** ```bash sudo systemctl stop elasticsearch ``` 2. **Fix the configuration:** ```bash sudo nano /etc/elasticsearch/elasticsearch.yml # Set: network.host: 127.0.0.1 # Set: xpack.security.enabled: true ``` 3. **Enable authentication and set passwords:** ```bash sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive ``` 4. **Restart with fixed configuration:** ```bash sudo systemctl start elasticsearch ``` 5. **Verify it's no longer accessible:** ```bash curl http://YOUR_SERVER_IP:9200 # Should show: Connection refused ``` 6. **Review logs for unauthorized access:** ```bash sudo grep -i "unauthorized\|access denied\|failed\|401\|403" /var/log/elasticsearch/*.log ``` --- ## Overview The application uses **Elasticsearch 7.17.24** with Laravel Scout for: - Full-text search across Users, Organizations, Banks, and Posts - Multilingual search with language-specific analyzers (EN, NL, DE, ES, FR) - Location-based search with edge n-gram tokenization - Skill and tag matching with boost factors - Autocomplete suggestions - Custom search optimization with configurable boost factors **Scout Driver**: `matchish/laravel-scout-elasticsearch` v7.12.0 **Elasticsearch Client**: `elasticsearch/elasticsearch` v8.19.0 ## Prerequisites - PHP 8.3+ with required extensions - MySQL/MariaDB database (primary data source) - Redis server (for Scout queue) - Java Runtime Environment (JRE) 11+ for Elasticsearch - At least 4GB RAM available for Elasticsearch (8GB+ recommended for production) ## Installation ### 1. Install Elasticsearch #### On Ubuntu/Debian: ```bash # Import the Elasticsearch GPG key wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg # Add the Elasticsearch repository echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list # Update package list and install sudo apt-get update sudo apt-get install elasticsearch=7.17.24 # Hold the package to prevent unwanted upgrades sudo apt-mark hold elasticsearch ``` #### On CentOS/RHEL: ```bash # Import the GPG key sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch # Create repository file cat < 1, 'full_name' => 1, 'cyclos_skills' => 1.5, 'tags' => 2, // Highest boost 'tag_categories' => 1.4, 'motivation' => 1, 'about_short' => 1, 'about' => 1, ``` **Post Fields:** ```php 'title' => 2, // Highest boost 'excerpt' => 1.5, 'content' => 1, 'post_category_name' => 2, // High boost ``` **Model Boost (score multipliers):** ```php 'user' => 1, // Baseline 'organization' => 3, // 3x boost 'bank' => 3, // 3x boost 'post' => 4, // 4x boost (highest) ``` ### Location-Based Search The application has advanced location boost factors: ```php 'same_district' => 5.0, // Highest boost 'same_city' => 3.0, // High boost 'same_division' => 2.0, // Medium boost 'same_country' => 1.5, // Base boost 'different_country' => 1.0, // Neutral 'no_location' => 0.9, // Slight penalty ``` ### Search Highlighting Search results include highlighted matches: ```php 'fragment_size' => 80, // Characters per fragment 'number_of_fragments' => 2, // Max fragments 'pre-tags' => '', 'post-tags' => '', ``` ### Caching Search results are cached for performance: ```php 'cache_results' => 5, // TTL in minutes ``` ## Index Structure Examples ### Users Index Mapping ```json { "users_index": { "properties": { "id": { "type": "keyword" }, "name": { "type": "text", "analyzer": "name_analyzer", "fields": { "keyword": { "type": "keyword" }, "suggest": { "type": "completion" } } }, "about_nl": { "type": "text", "analyzer": "analyzer_nl" }, "about_en": { "type": "text", "analyzer": "analyzer_en" }, "about_fr": { "type": "text", "analyzer": "analyzer_fr" }, "about_de": { "type": "text", "analyzer": "analyzer_de" }, "about_es": { "type": "text", "analyzer": "analyzer_es" }, "locations": { "properties": { "district": { "type": "text", "analyzer": "locations_analyzer" }, "city": { "type": "text", "analyzer": "locations_analyzer" }, "division": { "type": "text", "analyzer": "locations_analyzer" }, "country": { "type": "text", "analyzer": "locations_analyzer" } } }, "tags": { "properties": { "contexts": { "properties": { "tags": { "properties": { "name_nl": { "type": "text", "analyzer": "analyzer_nl" }, "name_en": { "type": "text", "analyzer": "analyzer_en" } // ... other languages } } } } } } } } } ``` ### Posts Index Mapping ```json { "posts_index": { "properties": { "id": { "type": "keyword" }, "category_id": { "type": "integer" }, "status": { "type": "keyword" }, "featured": { "type": "boolean" }, "post_translations": { "properties": { "title_nl": { "type": "text", "analyzer": "analyzer_nl", "fields": { "keyword": { "type": "keyword" }, "suggest": { "type": "completion" } } }, "title_en": { "type": "text", "analyzer": "analyzer_en", "fields": { "keyword": { "type": "keyword" }, "suggest": { "type": "completion" } } }, "content_nl": { "type": "text", "analyzer": "analyzer_nl" }, "content_en": { "type": "text", "analyzer": "analyzer_en" }, "from_nl": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||strict_date_optional_time||epoch_millis" } // ... other languages and fields } } } } } ``` ## Troubleshooting ### Elasticsearch Won't Start **Problem**: Service fails to start **Solutions**: 1. Check memory settings: ```bash # View JVM settings cat /etc/elasticsearch/jvm.options.d/heap.options # Check available system memory free -h # Ensure heap size doesn't exceed 50% of RAM ``` 2. Check disk space: ```bash df -h /var/lib/elasticsearch ``` 3. Check logs: ```bash sudo journalctl -u elasticsearch -n 100 --no-pager sudo tail -f /var/log/elasticsearch/elasticsearch.log ``` 4. Check Java installation: ```bash java -version ``` ### Connection Refused **Problem**: Cannot connect to Elasticsearch **Solutions**: 1. Verify Elasticsearch is running: ```bash sudo systemctl status elasticsearch ``` 2. Check port binding: ```bash ss -tlnp | grep 9200 ``` 3. Check configuration: ```bash sudo grep -E "^network.host|^http.port" /etc/elasticsearch/elasticsearch.yml ``` 4. Test connection: ```bash curl http://localhost:9200 ``` ### Index Not Found **Problem**: `index_not_found_exception` when searching **Solutions**: 1. Check if indices exist: ```bash curl http://localhost:9200/_cat/indices?v ``` 2. Check if aliases exist: ```bash curl http://localhost:9200/_cat/aliases?v ``` 3. Reimport the model: ```bash php artisan scout:import "App\Models\User" ``` 4. Or run the full reindex script: ```bash ./re-index-search.sh ``` ### Slow Indexing / High Memory Usage **Problem**: Indexing takes too long or uses excessive memory **Solutions**: 1. Enable queue for async indexing in `.env`: ```env SCOUT_QUEUE=true QUEUE_CONNECTION=redis ``` 2. Start queue worker: ```bash php artisan queue:work --queue=high,default ``` 3. Reduce chunk size in `config/scout.php`: ```php 'chunk' => [ 'searchable' => 250, // Reduced from 500 ], ``` 4. Monitor Elasticsearch memory: ```bash curl http://localhost:9200/_nodes/stats/jvm?pretty ``` ### Search Results Are Incorrect **Problem**: Search doesn't return expected results **Solutions**: 1. Check index mapping: ```bash curl http://localhost:9200/users_index/_mapping?pretty ``` 2. Test query directly: ```bash curl -X GET "localhost:9200/users_index/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match": { "name": "test" } } } ' ``` 3. Clear and rebuild index: ```bash php artisan scout:flush "App\Models\User" php artisan scout:import "App\Models\User" ``` 4. Check Scout queue jobs: ```bash php artisan queue:failed php artisan queue:retry all ``` ### Out of Memory Errors **Problem**: `OutOfMemoryError` in Elasticsearch logs **Solutions**: 1. Increase JVM heap (but respect limits): ```bash # Edit /etc/elasticsearch/jvm.options.d/heap.options -Xms4g -Xmx4g ``` 2. Restart Elasticsearch: ```bash sudo systemctl restart elasticsearch ``` 3. Monitor memory usage: ```bash watch -n 1 'curl -s http://localhost:9200/_cat/nodes?v&h=heap.percent,ram.percent' ``` 4. Clear fielddata cache: ```bash curl -X POST "localhost:9200/_cache/clear?fielddata=true" ``` ### Shards Unassigned **Problem**: Yellow or red cluster health **Solutions**: 1. Check cluster health: ```bash curl http://localhost:9200/_cluster/health?pretty ``` 2. Check shard allocation: ```bash curl http://localhost:9200/_cat/shards?v ``` 3. For single-node setup, set replicas to 0: ```bash curl -X PUT "localhost:9200/_settings" -H 'Content-Type: application/json' -d' { "index": { "number_of_replicas": 0 } } ' ``` ## Production Recommendations ### Security 1. **Enable X-Pack Security**: Edit `/etc/elasticsearch/elasticsearch.yml`: ```yaml xpack.security.enabled: true xpack.security.transport.ssl.enabled: true ``` 2. **Set passwords**: ```bash /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto ``` 3. **Update `.env`**: ```env ELASTICSEARCH_USER=elastic ELASTICSEARCH_PASSWORD=generated_password ``` ### Performance Optimization 1. **Increase file descriptors**: ```bash # /etc/security/limits.conf elasticsearch soft nofile 65535 elasticsearch hard nofile 65535 ``` 2. **Disable swapping**: ```bash # /etc/elasticsearch/elasticsearch.yml bootstrap.memory_lock: true ``` Edit `/etc/systemd/system/elasticsearch.service.d/override.conf`: ```ini [Service] LimitMEMLOCK=infinity ``` 3. **Use SSD for data directory**: ```yaml # /etc/elasticsearch/elasticsearch.yml path.data: /mnt/ssd/elasticsearch ``` 4. **Set appropriate refresh interval**: ```bash curl -X PUT "localhost:9200/users_index/_settings" -H 'Content-Type: application/json' -d' { "index": { "refresh_interval": "30s" } } ' ``` ### Backup and Restore 1. **Configure snapshot repository**: ```bash curl -X PUT "localhost:9200/_snapshot/backup_repo" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/var/backups/elasticsearch", "compress": true } } ' ``` 2. **Create snapshot**: ```bash curl -X PUT "localhost:9200/_snapshot/backup_repo/snapshot_1?wait_for_completion=true" ``` 3. **Restore snapshot**: ```bash curl -X POST "localhost:9200/_snapshot/backup_repo/snapshot_1/_restore" ``` ### Monitoring 1. **Check cluster stats**: ```bash curl http://localhost:9200/_cluster/stats?pretty ``` 2. **Monitor node stats**: ```bash curl http://localhost:9200/_nodes/stats?pretty ``` 3. **Check index stats**: ```bash curl http://localhost:9200/_stats?pretty ``` 4. **Set up monitoring with Kibana** (optional): ```bash sudo apt-get install kibana=7.17.24 sudo systemctl enable kibana sudo systemctl start kibana ``` ## Quick Reference ### Essential Commands ```bash # Service management sudo systemctl start elasticsearch sudo systemctl stop elasticsearch sudo systemctl restart elasticsearch sudo systemctl status elasticsearch # Check health curl http://localhost:9200 curl http://localhost:9200/_cluster/health?pretty curl http://localhost:9200/_cat/indices?v # Laravel Scout commands php artisan scout:import "App\Models\User" php artisan scout:flush "App\Models\User" php artisan scout:delete-all-indexes # Reindex everything ./re-index-search.sh # Queue worker for async indexing php artisan queue:work --queue=high,default ``` ### Configuration Files - `.env` - Connection and driver configuration - `config/scout.php` - Laravel Scout settings - `config/elasticsearch.php` - Index mappings and analyzers (825 lines!) - `config/timebank-cc.php` - Search boost factors and behavior - `/etc/elasticsearch/elasticsearch.yml` - Elasticsearch server config - `/etc/elasticsearch/jvm.options.d/heap.options` - JVM memory settings - `/usr/lib/systemd/system/elasticsearch.service` - systemd service ### Important Paths - **Data**: `/var/lib/elasticsearch` - **Logs**: `/var/log/elasticsearch` - **Config**: `/etc/elasticsearch` - **Binary**: `/usr/share/elasticsearch` ## Additional Resources - **Elasticsearch Documentation**: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/ - **Laravel Scout**: https://laravel.com/docs/10.x/scout - **Matchish Scout Elasticsearch**: https://github.com/matchish/laravel-scout-elasticsearch - **Elasticsearch DSL**: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html - **Language Analyzers**: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-lang-analyzer.html ## Notes - This application uses a multilingual search setup with custom analyzers - The `config/elasticsearch.php` file is extensive (825 lines) with detailed field mappings - Location-based search uses edge n-grams for autocomplete functionality - Tags and categories have hierarchical support with multilingual translations - The reindexing script handles index versioning and aliasing automatically - Memory requirements are significant during indexing (plan accordingly)