Integrating AI music generation into your applications opens up exciting possibilities for creating dynamic, personalized, and engaging user experiences. Whether you're building a game that needs adaptive background music, a fitness app that generates motivational tracks, or a creative tool for musicians, this comprehensive guide will walk you through the entire process.
In this tutorial, we'll build a complete AI music application using ACE-Step, covering everything from basic setup to advanced customization techniques. By the end, you'll have a working application that can generate high-quality music based on user input.
1Prerequisites and Setup
System Requirements
Before we begin, ensure your development environment meets these requirements:
- Python 3.8+: We'll be using modern Python features
- GPU Support (Recommended): NVIDIA GPU with CUDA support for faster generation
- Memory: At least 8GB RAM (16GB recommended)
- Storage: 10GB free space for models and generated audio
Installing Dependencies
First, create a virtual environment and install the required packages:
# Create and activate virtual environment
python -m venv ace_step_env
source ace_step_env/bin/activate # On Windows: ace_step_env\Scripts\activate
# Install core dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers
pip install librosa
pip install soundfile
pip install gradio
pip install flask
pip install requests
Cloning ACE-Step
# Clone the ACE-Step repository
git clone https://github.com/ace-step/ACE-Step.git
cd ACE-Step
# Install ACE-Step specific requirements
pip install -r requirements.txt
GPU Setup Note
If you don't have a GPU, the model will run on CPU, but generation will be significantly slower (5-10 minutes vs 30 seconds). Consider using cloud services like Google Colab or AWS for GPU access.
2Understanding the ACE-Step API
Core Components
ACE-Step provides several key components for music generation:
- Model Pipeline: The main generation engine
- Audio Processor: Handles audio encoding/decoding
- Conditioning System: Manages text and parameter inputs
- Generation Configuration: Controls output parameters
Basic Generation Example
Let's start with a simple music generation example:
import torch
from ace_step import ACEStepPipeline
# Initialize the pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
pipeline = ACEStepPipeline.from_pretrained(
"ACE-Step/ACE-Step-v1-3.5B",
torch_dtype=torch.float16 if device == "cuda" else torch.float32
).to(device)
# Generate music
prompt = "A peaceful acoustic guitar melody in major key, slow tempo"
audio = pipeline(
prompt=prompt,
duration=30, # 30 seconds
temperature=0.8,
top_k=50
)
# Save the generated audio
import soundfile as sf
sf.write("generated_music.wav", audio.cpu().numpy(), 16000)
print("Music generated and saved to generated_music.wav")
3Building a Web Interface
Creating a Simple Flask Application
Now let's create a web interface that allows users to generate music through a browser:
from flask import Flask, render_template, request, send_file, jsonify
import os
import uuid
import threading
from ace_step import ACEStepPipeline
import torch
import soundfile as sf
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'generated_audio'
# Ensure the upload folder exists
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
# Initialize the pipeline (this may take a few minutes)
print("Loading ACE-Step model...")
device = "cuda" if torch.cuda.is_available() else "cpu"
pipeline = ACEStepPipeline.from_pretrained(
"ACE-Step/ACE-Step-v1-3.5B",
torch_dtype=torch.float16 if device == "cuda" else torch.float32
).to(device)
print("Model loaded successfully!")
# Store generation tasks
generation_tasks = {}
@app.route('/')
def index():
return render_template('index.html')
@app.route('/generate', methods=['POST'])
def generate_music():
data = request.json
prompt = data.get('prompt', '')
duration = min(int(data.get('duration', 30)), 120) # Max 2 minutes
temperature = float(data.get('temperature', 0.8))
# Generate unique task ID
task_id = str(uuid.uuid4())
generation_tasks[task_id] = {'status': 'processing', 'progress': 0}
# Start generation in a separate thread
thread = threading.Thread(
target=generate_audio_task,
args=(task_id, prompt, duration, temperature)
)
thread.start()
return jsonify({'task_id': task_id})
def generate_audio_task(task_id, prompt, duration, temperature):
try:
# Update progress
generation_tasks[task_id]['progress'] = 25
# Generate audio
audio = pipeline(
prompt=prompt,
duration=duration,
temperature=temperature,
top_k=50
)
generation_tasks[task_id]['progress'] = 75
# Save audio file
filename = f"{task_id}.wav"
filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
sf.write(filepath, audio.cpu().numpy(), 16000)
generation_tasks[task_id] = {
'status': 'completed',
'progress': 100,
'filename': filename
}
except Exception as e:
generation_tasks[task_id] = {
'status': 'error',
'error': str(e)
}
@app.route('/status/')
def get_status(task_id):
return jsonify(generation_tasks.get(task_id, {'status': 'not_found'}))
@app.route('/download/')
def download_file(filename):
return send_file(
os.path.join(app.config['UPLOAD_FOLDER'], filename),
as_attachment=True
)
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5000)
HTML Template
Create a `templates/index.html` file for the user interface:
AI Music Generator
🎵 AI Music Generator
Generation Progress
Processing...
4Advanced Features and Customization
Adding Style Presets
Create predefined styles to make the interface more user-friendly:
MUSIC_PRESETS = {
"ambient": {
"prompt": "Ambient atmospheric pad, ethereal and dreamy, slow evolution",
"temperature": 0.7,
"duration": 60
},
"upbeat": {
"prompt": "Energetic electronic dance music, 128 BPM, synthesizers",
"temperature": 0.9,
"duration": 45
},
"classical": {
"prompt": "Classical piano composition, gentle and melodic, romantic style",
"temperature": 0.6,
"duration": 90
},
"jazz": {
"prompt": "Smooth jazz trio, walking bass, soft drums, improvisation",
"temperature": 1.0,
"duration": 75
}
}
@app.route('/preset/
')
def apply_preset(preset_name):
if preset_name in MUSIC_PRESETS:
return jsonify(MUSIC_PRESETS[preset_name])
return jsonify({"error": "Preset not found"}), 404
Implementing Real-time Parameters
Add dynamic parameter adjustment during generation:
class AdvancedMusicGenerator:
def __init__(self, pipeline):
self.pipeline = pipeline
self.current_params = {}
def generate_with_callbacks(self, prompt, duration, **kwargs):
# Custom generation with parameter callbacks
def progress_callback(step, total_steps, intermediate_audio=None):
progress = (step / total_steps) * 100
print(f"Generation progress: {progress:.1f}%")
# Optional: Save intermediate results
if intermediate_audio is not None and step % 10 == 0:
filename = f"intermediate_{step}.wav"
sf.write(filename, intermediate_audio.cpu().numpy(), 16000)
return self.pipeline(
prompt=prompt,
duration=duration,
callback=progress_callback,
**kwargs
)
def batch_generate(self, prompts, **kwargs):
"""Generate multiple tracks simultaneously"""
results = []
for prompt in prompts:
audio = self.pipeline(prompt=prompt, **kwargs)
results.append(audio)
return results
Audio Post-Processing
Add effects and enhancements to generated audio:
import librosa
import numpy as np
from scipy import signal
class AudioProcessor:
@staticmethod
def apply_reverb(audio, reverb_amount=0.3):
"""Add reverb effect to audio"""
# Simple reverb using delay and feedback
delay_samples = int(0.05 * 16000) # 50ms delay
reverb_audio = np.zeros_like(audio)
reverb_audio[delay_samples:] = audio[:-delay_samples] * reverb_amount
return audio + reverb_audio
@staticmethod
def normalize_audio(audio, target_loudness=-23.0):
"""Normalize audio to target loudness (LUFS)"""
# Simple peak normalization
max_val = np.max(np.abs(audio))
if max_val > 0:
return audio / max_val * 0.8
return audio
@staticmethod
def apply_eq(audio, low_gain=1.0, mid_gain=1.0, high_gain=1.0):
"""Apply 3-band EQ"""
# Simple EQ using butterworth filters
nyquist = 16000 / 2
low_freq = 300 / nyquist
high_freq = 3000 / nyquist
# Low band
b_low, a_low = signal.butter(2, low_freq, 'low')
low_band = signal.filtfilt(b_low, a_low, audio) * low_gain
# High band
b_high, a_high = signal.butter(2, high_freq, 'high')
high_band = signal.filtfilt(b_high, a_high, audio) * high_gain
# Mid band
b_mid, a_mid = signal.butter(2, [low_freq, high_freq], 'band')
mid_band = signal.filtfilt(b_mid, a_mid, audio) * mid_gain
return low_band + mid_band + high_band
# Usage in your Flask app
processor = AudioProcessor()
def generate_and_process_audio(task_id, prompt, duration, temperature, effects=None):
try:
# Generate audio
audio = pipeline(prompt=prompt, duration=duration, temperature=temperature)
audio_np = audio.cpu().numpy()
# Apply effects if specified
if effects:
if effects.get('reverb'):
audio_np = processor.apply_reverb(audio_np, effects['reverb'])
if effects.get('eq'):
eq = effects['eq']
audio_np = processor.apply_eq(
audio_np,
eq.get('low', 1.0),
eq.get('mid', 1.0),
eq.get('high', 1.0)
)
# Normalize
audio_np = processor.normalize_audio(audio_np)
# Save processed audio
filename = f"{task_id}.wav"
filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
sf.write(filepath, audio_np, 16000)
generation_tasks[task_id] = {
'status': 'completed',
'progress': 100,
'filename': filename
}
except Exception as e:
generation_tasks[task_id] = {
'status': 'error',
'error': str(e)
}
5Performance Optimization
Model Caching and Memory Management
import torch
import gc
from contextlib import contextmanager
class OptimizedPipeline:
def __init__(self, model_name, device):
self.model_name = model_name
self.device = device
self.pipeline = None
self.last_used = 0
def load_model(self):
if self.pipeline is None:
print("Loading model...")
self.pipeline = ACEStepPipeline.from_pretrained(
self.model_name,
torch_dtype=torch.float16 if self.device == "cuda" else torch.float32
).to(self.device)
print("Model loaded!")
def unload_model(self):
if self.pipeline is not None:
del self.pipeline
self.pipeline = None
if self.device == "cuda":
torch.cuda.empty_cache()
gc.collect()
print("Model unloaded to free memory")
@contextmanager
def model_context(self):
self.load_model()
try:
yield self.pipeline
finally:
# Optionally unload after use to save memory
pass # Keep model loaded for subsequent requests
def generate(self, **kwargs):
with self.model_context() as pipeline:
return pipeline(**kwargs)
# Use the optimized pipeline
optimized_pipeline = OptimizedPipeline("ACE-Step/ACE-Step-v1-3.5B", device)
Asynchronous Generation with Queue
import asyncio
from concurrent.futures import ThreadPoolExecutor
import queue
import threading
class GenerationQueue:
def __init__(self, max_workers=2):
self.executor = ThreadPoolExecutor(max_workers=max_workers)
self.task_queue = queue.Queue()
self.results = {}
self.worker_thread = threading.Thread(target=self._worker)
self.worker_thread.daemon = True
self.worker_thread.start()
def _worker(self):
while True:
try:
task = self.task_queue.get(timeout=1)
if task is None:
break
task_id, prompt, duration, temperature = task
self.results[task_id] = {'status': 'processing', 'progress': 0}
# Submit to thread pool
future = self.executor.submit(
self._generate_audio,
task_id, prompt, duration, temperature
)
self.task_queue.task_done()
except queue.Empty:
continue
def _generate_audio(self, task_id, prompt, duration, temperature):
try:
audio = optimized_pipeline.generate(
prompt=prompt,
duration=duration,
temperature=temperature
)
filename = f"{task_id}.wav"
filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
sf.write(filepath, audio.cpu().numpy(), 16000)
self.results[task_id] = {
'status': 'completed',
'progress': 100,
'filename': filename
}
except Exception as e:
self.results[task_id] = {
'status': 'error',
'error': str(e)
}
def add_task(self, task_id, prompt, duration, temperature):
self.task_queue.put((task_id, prompt, duration, temperature))
return task_id
def get_result(self, task_id):
return self.results.get(task_id, {'status': 'not_found'})
# Initialize the queue
generation_queue = GenerationQueue(max_workers=2)
6Testing and Deployment
Unit Tests
import unittest
import tempfile
import os
class TestMusicGenerator(unittest.TestCase):
def setUp(self):
self.temp_dir = tempfile.mkdtemp()
# Initialize test pipeline (could use a smaller model for testing)
def test_basic_generation(self):
"""Test basic music generation"""
prompt = "Simple piano melody"
duration = 5 # Short for testing
# This would normally use your pipeline
# audio = pipeline(prompt=prompt, duration=duration)
# self.assertIsNotNone(audio)
# self.assertGreater(len(audio), 0)
pass
def test_invalid_parameters(self):
"""Test handling of invalid parameters"""
with self.assertRaises(ValueError):
# Test invalid duration
pass
def test_audio_processing(self):
"""Test audio post-processing functions"""
# Create dummy audio
audio = np.random.randn(16000) # 1 second of noise
# Test normalization
normalized = AudioProcessor.normalize_audio(audio)
self.assertLessEqual(np.max(np.abs(normalized)), 1.0)
def tearDown(self):
# Clean up temp files
import shutil
shutil.rmtree(self.temp_dir)
if __name__ == '__main__':
unittest.main()
Docker Deployment
FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create directory for generated audio
RUN mkdir -p generated_audio
# Expose port
EXPOSE 5000
# Set environment variables
ENV FLASK_APP=app.py
ENV FLASK_ENV=production
# Run the application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]
Production Deployment Tips
- Use a proper WSGI server like Gunicorn or uWSGI instead of Flask's development server
- Implement caching for frequently requested generations
- Add rate limiting to prevent abuse
- Use cloud storage for generated audio files
- Monitor GPU memory and implement proper cleanup
Best Practices and Considerations
User Experience
- Provide clear feedback: Show progress bars and estimated completion times
- Set realistic expectations: Inform users about generation time and quality
- Offer presets: Make it easy for users to get started with common styles
- Allow fine-tuning: Provide advanced options for power users
Performance Optimization
- Batch processing: Process multiple requests together when possible
- Model compression: Use quantized or distilled models for faster inference
- Caching: Store common generations to avoid recomputing
- Async processing: Use background tasks for long-running generations
Security and Ethics
- Rate limiting: Prevent abuse and ensure fair usage
- Content filtering: Avoid generating inappropriate content
- Copyright awareness: Be mindful of training data and potential copyright issues
- User privacy: Handle user data and generated content responsibly
Security Considerations
Always validate user inputs, implement proper authentication for production systems, and be aware of potential security implications when allowing users to execute AI models.
Conclusion and Next Steps
Congratulations! You've built a complete AI music generation application using ACE-Step. This foundation gives you the tools to create sophisticated music applications tailored to your specific needs.
Potential Enhancements
- Multi-track generation: Generate separate instruments simultaneously
- Real-time streaming: Stream audio as it's being generated
- Collaborative features: Allow multiple users to collaborate on music
- Mobile app: Create native mobile applications
- Plugin development: Create plugins for DAWs like Ableton or Logic
Community and Support
Join the ACE-Step community to share your creations, get help, and contribute to the project:
- GitHub: Report issues and contribute code
- Discord/Forums: Get help from the community
- Documentation: Comprehensive guides and API references
- Examples: Community-contributed examples and tutorials
The possibilities with AI music generation are endless, and we're excited to see what you'll build. Happy coding, and enjoy creating amazing music with AI!