Use Case: Private AI Development Environment
Imagine you’re working on a project that involves sensitive code, proprietary algorithms, or customer data. You want to use AI assistance for code generation, debugging, and documentation, but you’re concerned about:
- Data Privacy: Sending code to cloud-based AI services means your proprietary code leaves your environment
- Cost Control: Cloud AI services charge per token, and costs can add up quickly during active development
- Offline Development: You need AI assistance even when internet connectivity is unreliable
- Custom Models: You want to fine-tune models on your specific codebase or use specialized models
- Regulatory Compliance: Your organization requires data to stay on-premises for compliance reasons
The Challenge: You need an AI coding assistant that:
- Runs entirely on your local machine
- Provides fast, responsive code suggestions
- Integrates seamlessly with your development workflow
- Doesn’t require constant internet connectivity
- Keeps all your code and data private
The Solution: Ollama provides an easy way to run large language models locally, and Cursor IDE can be configured to use these local models instead of cloud-based services. This gives you the benefits of AI-assisted coding while maintaining complete control over your data and infrastructure.
This guide walks through setting up Ollama, running local models, and configuring Cursor to use them as your AI coding assistant.
Prerequisites
Before getting started, ensure you have:
- System Requirements:
- macOS, Linux, or Windows
- At least 16GB RAM (32GB recommended for larger models)
- 20GB+ free disk space for models
- Modern CPU (GPU optional but recommended for better performance)
- Software:
- Cursor IDE installed (cursor.sh)
- Terminal/Command line access
- Homebrew (macOS) or package manager (Linux)
- Basic Knowledge:
- Command line usage
- Understanding of AI models and their capabilities
Installing Ollama
1. macOS Installation
# Install using Homebrew
brew install ollama
# Or download from official website
# Visit https://ollama.ai/download
2. Linux Installation
# Install using the official script
curl -fsSL https://ollama.ai/install.sh | sh
# Or using package manager (Ubuntu/Debian)
# Download .deb package from https://ollama.ai/download
3. Windows Installation
# Download installer from https://ollama.ai/download
# Run the installer executable
4. Verify Installation
# Check Ollama version
ollama --version
# Start Ollama service (if not running automatically)
ollama serve
# In another terminal, test the installation
ollama list
Setting Up Local Models
1. Available Models
Ollama supports various open-source models. Popular choices for coding:
- CodeLlama: Specialized for code generation (7B, 13B, 34B variants)
- Llama 2: General-purpose model (7B, 13B, 70B variants)
- Mistral: Efficient and capable (7B variant)
- DeepSeek Coder: Code-focused model
- StarCoder: Code generation specialist
2. Pulling Models
# Pull CodeLlama 7B (good balance of performance and speed)
ollama pull codellama:7b
# Pull CodeLlama 13B (better quality, slower)
ollama pull codellama:13b
# Pull Mistral (efficient and fast)
ollama pull mistral:7b
# Pull DeepSeek Coder (specialized for coding)
ollama pull deepseek-coder:6.7b
# List available models
ollama list
3. Testing Models
# Test CodeLlama with a simple prompt
ollama run codellama:7b "Write a Python function to calculate fibonacci numbers"
# Interactive mode
ollama run codellama:7b
# Then type your prompts interactively
# Type /bye to exit
4. Model Management
# Show model information
ollama show codellama:7b
# Copy a model
ollama cp codellama:7b my-custom-codellama
# Remove a model (frees disk space)
ollama rm codellama:7b
# List all models
ollama list
Configuring Ollama Server
1. Server Configuration
Ollama runs a local server by default. Configure it for optimal performance:
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Set environment variables for configuration
export OLLAMA_HOST=0.0.0.0:11434
export OLLAMA_NUM_PARALLEL=1
export OLLAMA_MAX_LOADED_MODELS=1
2. API Endpoint
Ollama provides a REST API that Cursor can connect to:
# Test API endpoint
curl http://localhost:11434/api/generate -d '{
"model": "codellama:7b",
"prompt": "Hello, how are you?",
"stream": false
}'
3. Performance Optimization
# For systems with GPU (CUDA)
export CUDA_VISIBLE_DEVICES=0
# For systems with Apple Silicon (M1/M2/M3)
# Ollama automatically uses Metal acceleration
# Limit memory usage
export OLLAMA_NUM_GPU=1
export OLLAMA_MAX_LOADED_MODELS=1
Connecting Cursor to Ollama
1. Cursor Settings Configuration
Cursor can be configured to use local models via the settings:
- Open Cursor Settings:
- macOS:
Cmd + ,orCursor > Settings - Windows/Linux:
Ctrl + ,orFile > Preferences > Settings
- macOS:
- Navigate to AI Settings:
- Search for “AI” or “Model” in settings
- Look for “Model Provider” or “AI Provider” settings
- Configure Custom Model:
- Find “Custom Model” or “Local Model” option
- Set the API endpoint:
http://localhost:11434 - Set the model name:
codellama:7b(or your preferred model)
2. Using Cursor Settings JSON
Alternatively, edit Cursor’s settings JSON directly:
{
"cursor.ai.model": "codellama:7b",
"cursor.ai.provider": "custom",
"cursor.ai.endpoint": "http://localhost:11434/api/generate",
"cursor.ai.apiKey": "",
"cursor.ai.temperature": 0.7,
"cursor.ai.maxTokens": 2048
}
3. Cursor Configuration File
Create or edit .cursor/config.json in your project:
{
"ai": {
"provider": "ollama",
"model": "codellama:7b",
"endpoint": "http://localhost:11434",
"temperature": 0.7,
"maxTokens": 2048,
"stream": true
}
}
Advanced Configuration
1. Custom Model Configuration
Create a custom model configuration file:
# Create Modelfile
cat > Modelfile << EOF
FROM codellama:7b
# Set custom parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 4096
# Set system prompt for coding
SYSTEM """You are a helpful coding assistant.
You write clean, efficient, and well-documented code.
Always explain your code and suggest best practices."""
# Set template
TEMPLATE """
User:
Assistant:"""
EOF
# Create custom model
ollama create my-coder -f Modelfile
# Use the custom model
ollama run my-coder
2. Python Integration
# ollama_client.py
import requests
import json
class OllamaClient:
def __init__(self, base_url="http://localhost:11434"):
self.base_url = base_url
def generate(self, model: str, prompt: str, stream: bool = False):
"""Generate text using Ollama"""
url = f"{self.base_url}/api/generate"
payload = {
"model": model,
"prompt": prompt,
"stream": stream
}
response = requests.post(url, json=payload)
if stream:
return self._handle_stream(response)
else:
return response.json()
def chat(self, model: str, messages: list, stream: bool = False):
"""Chat with model using messages format"""
url = f"{self.base_url}/api/chat"
payload = {
"model": model,
"messages": messages,
"stream": stream
}
response = requests.post(url, json=payload)
if stream:
return self._handle_stream(response)
else:
return response.json()
def _handle_stream(self, response):
"""Handle streaming responses"""
for line in response.iter_lines():
if line:
data = json.loads(line)
yield data
def list_models(self):
"""List available models"""
url = f"{self.base_url}/api/tags"
response = requests.get(url)
return response.json()
# Usage example
if __name__ == "__main__":
client = OllamaClient()
# List models
models = client.list_models()
print("Available models:", models)
# Generate code
response = client.generate(
model="codellama:7b",
prompt="Write a Python function to sort a list of dictionaries by a key"
)
print(response['response'])
3. Node.js Integration
// ollama-client.js
const axios = require('axios');
class OllamaClient {
constructor(baseUrl = 'http://localhost:11434') {
this.baseUrl = baseUrl;
}
async generate(model, prompt, stream = false) {
const url = `${this.baseUrl}/api/generate`;
const response = await axios.post(url, {
model,
prompt,
stream
}, {
responseType: stream ? 'stream' : 'json'
});
if (stream) {
return this.handleStream(response.data);
}
return response.data;
}
async chat(model, messages, stream = false) {
const url = `${this.baseUrl}/api/chat`;
const response = await axios.post(url, {
model,
messages,
stream
}, {
responseType: stream ? 'stream' : 'json'
});
if (stream) {
return this.handleStream(response.data);
}
return response.data;
}
handleStream(stream) {
return new Promise((resolve, reject) => {
let fullResponse = '';
stream.on('data', (chunk) => {
const lines = chunk.toString().split('\n').filter(line => line.trim());
lines.forEach(line => {
try {
const data = JSON.parse(line);
if (data.response) {
fullResponse += data.response;
process.stdout.write(data.response);
}
if (data.done) {
resolve(fullResponse);
}
} catch (e) {
// Skip invalid JSON
}
});
});
stream.on('error', reject);
});
}
async listModels() {
const url = `${this.baseUrl}/api/tags`;
const response = await axios.get(url);
return response.data;
}
}
// Usage example
async function main() {
const client = new OllamaClient();
// List models
const models = await client.listModels();
console.log('Available models:', models);
// Generate code
const response = await client.generate(
'codellama:7b',
'Write a JavaScript function to debounce a function call'
);
console.log('\nResponse:', response.response);
}
main().catch(console.error);
Testing the Integration
1. Verify Ollama is Running
# Check if Ollama service is running
curl http://localhost:11434/api/tags
# Expected output: JSON with list of models
2. Test Model Response
# Test model directly
ollama run codellama:7b "Write a hello world function in Python"
# Test via API
curl http://localhost:11434/api/generate -d '{
"model": "codellama:7b",
"prompt": "Write a hello world function in Python",
"stream": false
}'
3. Test Cursor Integration
- Open Cursor IDE
- Open any code file
- Use Cursor’s AI features (Cmd/Ctrl + K for inline edit, Cmd/Ctrl + L for chat)
- Verify that responses are coming from your local Ollama model
4. Monitor Performance
# Check Ollama logs
# macOS/Linux
tail -f ~/.ollama/logs/server.log
# Check system resources
# macOS
top -pid $(pgrep ollama)
# Linux
top -p $(pgrep ollama)
Performance Optimization
1. Model Selection
Choose models based on your hardware:
# For 16GB RAM systems
ollama pull codellama:7b # ~4GB, fast
ollama pull mistral:7b # ~4GB, efficient
# For 32GB+ RAM systems
ollama pull codellama:13b # ~7GB, better quality
ollama pull llama2:13b # ~7GB, general purpose
# For systems with powerful GPUs
ollama pull codellama:34b # ~20GB, best quality
2. GPU Acceleration
# Check if GPU is available
ollama ps
# For NVIDIA GPUs, ensure CUDA is available
nvidia-smi
# Ollama should automatically use GPU if available
3. Memory Management
# Limit number of loaded models
export OLLAMA_MAX_LOADED_MODELS=1
# Set context window size (affects memory usage)
# Edit model's Modelfile
PARAMETER num_ctx 2048 # Lower = less memory
4. Response Speed
# Use smaller models for faster responses
ollama pull codellama:7b
# Reduce context window for faster processing
PARAMETER num_ctx 1024
# Adjust temperature for faster generation
PARAMETER temperature 0.5 # Lower = faster, more deterministic
Troubleshooting
1. Common Issues
Issue: Ollama not starting
# Check if port is already in use
lsof -i :11434
# Kill existing process
killall ollama
# Restart Ollama
ollama serve
Issue: Model not found
# List available models
ollama list
# Pull the model again
ollama pull codellama:7b
# Check model name spelling
ollama show codellama:7b
Issue: Slow responses
# Check system resources
top
# Use smaller model
ollama pull codellama:7b # Instead of 13b or 34b
# Reduce context window
# Edit Modelfile with: PARAMETER num_ctx 1024
Issue: Cursor not connecting
# Verify Ollama API is accessible
curl http://localhost:11434/api/tags
# Check Cursor settings
# Ensure endpoint is: http://localhost:11434
# Ensure model name matches: codellama:7b
# Check Cursor logs
# macOS: ~/Library/Logs/Cursor/
# Linux: ~/.config/Cursor/logs/
# Windows: %APPDATA%\Cursor\logs\
2. Debug Mode
# Run Ollama in debug mode
OLLAMA_DEBUG=1 ollama serve
# Check detailed logs
tail -f ~/.ollama/logs/server.log
Best Practices
1. Model Management
# Keep only models you actively use
ollama list
ollama rm unused-model
# Regularly update models
ollama pull codellama:7b # Re-pull to get updates
2. Resource Management
# Monitor disk usage
du -sh ~/.ollama/models/
# Clean up unused models
ollama list
ollama rm old-model-name
3. Security Considerations
# If exposing Ollama over network, use authentication
# Set OLLAMA_HOST to specific interface
export OLLAMA_HOST=127.0.0.1:11434 # Local only
# For remote access, use reverse proxy with authentication
# Example: nginx with basic auth
4. Development Workflow
- Start with smaller models for faster iteration
- Use larger models for complex code generation
- Test prompts before using in production code
- Monitor performance and adjust model selection
- Keep models updated for latest improvements
Complete Setup Script
#!/bin/bash
# setup-ollama-cursor.sh
set -e
echo "Setting up Ollama for Cursor integration..."
# Install Ollama
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS
if ! command -v ollama &> /dev/null; then
echo "Installing Ollama via Homebrew..."
brew install ollama
fi
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
# Linux
if ! command -v ollama &> /dev/null; then
echo "Installing Ollama..."
curl -fsSL https://ollama.ai/install.sh | sh
fi
fi
# Start Ollama service
echo "Starting Ollama service..."
ollama serve &
# Wait for service to start
sleep 5
# Pull recommended model
echo "Pulling CodeLlama 7B model..."
ollama pull codellama:7b
# Verify installation
echo "Verifying installation..."
ollama list
echo ""
echo "Setup complete!"
echo ""
echo "Next steps:"
echo "1. Configure Cursor to use: http://localhost:11434"
echo "2. Set model to: codellama:7b"
echo "3. Test the integration in Cursor"
echo ""
echo "To test: ollama run codellama:7b 'Hello, world!'"
Conclusion
Running local AI models with Ollama and connecting them to Cursor provides:
- Complete Privacy: All code and data stays on your machine
- Cost Control: No per-token charges, just hardware costs
- Offline Capability: Works without internet connectivity
- Customization: Fine-tune models for your specific needs
- Performance: Fast responses with local processing
Key takeaways:
- Ollama makes it easy to run large language models locally
- Choose model size based on your hardware capabilities
- Cursor can be configured to use local models via API endpoint
- Start with smaller models (7B) for faster iteration
- Monitor system resources and adjust accordingly
- Keep models updated for latest improvements
By following this guide, you can set up a private, cost-effective AI coding assistant that runs entirely on your local machine while maintaining the benefits of AI-assisted development.