mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-21 03:26:41 +02:00
refactor: restructure
monorepo with apps/ and services/ directories
This commit is contained in:
parent
25824ed0ac
commit
ff80aeec1f
4062 changed files with 2592 additions and 1278 deletions
96
apps/picture/docs/models/README.md
Normal file
96
apps/picture/docs/models/README.md
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
# Image Generation Models
|
||||
|
||||
This directory contains documentation for all supported image generation models in the Picture app.
|
||||
|
||||
## Available Models
|
||||
|
||||
### 1. [Ideogram V3 Turbo](./ideogram-v3-turbo.md)
|
||||
- **Best for**: Text rendering, logos, marketing materials
|
||||
- **Speed**: Fast (10s)
|
||||
- **Cost**: $0.02
|
||||
- **Special**: Excellent text generation capabilities
|
||||
|
||||
### 2. [Google Imagen 4 Fast](./imagen-4-fast.md)
|
||||
- **Best for**: Photorealistic images, portraits, product shots
|
||||
- **Speed**: Very Fast (8s)
|
||||
- **Cost**: $0.03
|
||||
- **Special**: Superior photorealism and coherence
|
||||
|
||||
### 3. [ByteDance SeeDream 3](./seedream-3.md)
|
||||
- **Best for**: Creative artwork, style mixing, illustrations
|
||||
- **Speed**: Moderate (12s)
|
||||
- **Cost**: $0.025
|
||||
- **Special**: Excellent artistic versatility
|
||||
|
||||
### 4. [FLUX Schnell](./flux-schnell.md)
|
||||
- **Best for**: Rapid prototyping, quick iterations
|
||||
- **Speed**: Ultra-fast (4s)
|
||||
- **Cost**: $0.01
|
||||
- **Special**: Fastest generation time
|
||||
|
||||
### 5. [FLUX Krea Dev](./flux-krea-dev.md)
|
||||
- **Best for**: Creative development, concept art
|
||||
- **Speed**: Moderate (15s)
|
||||
- **Cost**: $0.04
|
||||
- **Special**: Enhanced for creative workflows
|
||||
|
||||
### 6. [Recraft V3 SVG](./recraft-v3-svg.md)
|
||||
- **Best for**: Vector graphics, logos, icons
|
||||
- **Speed**: Moderate (20s)
|
||||
- **Cost**: $0.05
|
||||
- **Special**: Generates scalable SVG files
|
||||
|
||||
### 7. [Qwen Image](./qwen-image.md)
|
||||
- **Best for**: Multilingual content, Asian markets
|
||||
- **Speed**: Fast (10s)
|
||||
- **Cost**: $0.03
|
||||
- **Special**: Excellent multilingual support
|
||||
|
||||
## Quick Comparison
|
||||
|
||||
| Model | Speed | Quality | Text | Realism | Art | Aspect Ratios | Cost |
|
||||
|-------|-------|---------|------|---------|-----|---------------|------|
|
||||
| Ideogram V3 Turbo | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 15 ratios | $0.02 |
|
||||
| Imagen 4 Fast | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 5 ratios | $0.03 |
|
||||
| SeeDream 3 | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 9 ratios + custom | $0.025 |
|
||||
| FLUX Schnell | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | 11 ratios | $0.01 |
|
||||
| FLUX Krea Dev | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | 11 ratios | $0.04 |
|
||||
| Recraft V3 SVG | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | N/A | ⭐⭐⭐⭐ | 16 ratios | $0.05 |
|
||||
| Qwen Image | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 7 ratios | $0.03 |
|
||||
|
||||
## Choosing the Right Model
|
||||
|
||||
### For Text in Images
|
||||
Choose **Ideogram V3 Turbo** - It has the best text rendering capabilities
|
||||
|
||||
### For Photorealism
|
||||
Choose **Google Imagen 4 Fast** - Best for realistic photographs
|
||||
|
||||
### For Speed
|
||||
Choose **FLUX Schnell** - Ultra-fast 4-second generation
|
||||
|
||||
### For Artistic Work
|
||||
Choose **SeeDream 3** - Most versatile for creative styles
|
||||
|
||||
### For Logos/Icons
|
||||
Choose **Recraft V3 SVG** - Only model that generates scalable vectors
|
||||
|
||||
### For Multilingual
|
||||
Choose **Qwen Image** - Best for non-English prompts
|
||||
|
||||
### For Budget
|
||||
Choose **FLUX Schnell** - Most cost-effective at $0.01
|
||||
|
||||
## API Integration
|
||||
|
||||
All models are integrated through the Replicate API and can be selected via the model picker in the app. Each model has been configured with optimal default parameters while allowing customization of:
|
||||
|
||||
- Resolution (width/height)
|
||||
- Number of inference steps
|
||||
- Guidance scale
|
||||
- Negative prompts (where supported)
|
||||
- Random seed for reproducibility
|
||||
|
||||
## Model Updates
|
||||
|
||||
Models are regularly updated by their providers. Version numbers are tracked in the database to ensure consistency. Check individual model documentation for specific capabilities and limitations.
|
||||
222
apps/picture/docs/models/flux-1-1-pro.md
Normal file
222
apps/picture/docs/models/flux-1-1-pro.md
Normal file
|
|
@ -0,0 +1,222 @@
|
|||
# FLUX 1.1 Pro
|
||||
|
||||
## Overview
|
||||
FLUX 1.1 Pro is Black Forest Labs' flagship professional image generation model for 2025. It represents the pinnacle of the FLUX model family, delivering state-of-the-art image quality, exceptional prompt adherence, and unprecedented generation speed. This model is 6x faster than its predecessor while producing even higher quality results up to 4 megapixels.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Black Forest Labs
|
||||
- **Replicate ID**: `black-forest-labs/flux-1.1-pro`
|
||||
- **Version**: Latest stable version (1.1)
|
||||
- **Release**: 2025
|
||||
- **Status**: Production-ready, industry standard
|
||||
|
||||
## Key Features
|
||||
- **Ultra-High Quality**: Best-in-class image generation quality
|
||||
- **6x Faster**: Significantly improved inference speed over FLUX 1.0 Pro
|
||||
- **High Resolution**: Up to 4 megapixel (2048x2048) output
|
||||
- **Exceptional Prompt Adherence**: Industry-leading prompt following accuracy
|
||||
- **Output Diversity**: Generates highly diverse results from the same prompt
|
||||
- **Professional Grade**: Optimized for commercial and production use
|
||||
- **Compositional Guidance**: Supports image prompts for layout control
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 1 (single-step generation for speed)
|
||||
- **Guidance Scale**: 3.5
|
||||
- **Supports Negative Prompts**: No
|
||||
- **Supports Seed**: Yes (for reproducibility)
|
||||
- **Supports Image-to-Image**: Yes (via image prompt)
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**9 professional aspect ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Standard**: 4:3, 3:4
|
||||
- **Photo**: 3:2, 2:3
|
||||
- **Social Media**: 5:4, 4:5
|
||||
- **Widescreen**: 16:9, 9:16
|
||||
|
||||
## Supported Resolutions
|
||||
- **Width Range**: 256px - 1440px
|
||||
- **Height Range**: 256px - 1440px
|
||||
- **Constraint**: Both dimensions must be multiples of 32
|
||||
- **Maximum Output**: Up to 4 megapixels
|
||||
- **Recommended**: 1024x1024 for balanced quality and speed
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Image Prompts (Compositional Guidance)
|
||||
Use reference images to guide the composition and structure of generated images while allowing the model to reinterpret the content based on your text prompt.
|
||||
|
||||
### Safety Tolerance
|
||||
Configurable safety filter (1-6 scale):
|
||||
- **1**: Strictest filtering
|
||||
- **2**: Default, balanced filtering
|
||||
- **6**: Most permissive
|
||||
|
||||
### Prompt Upsampling
|
||||
Optional feature that automatically enhances and expands your prompt for potentially better results.
|
||||
|
||||
### Output Formats
|
||||
- **WebP**: Default, best compression
|
||||
- **JPG**: Wide compatibility
|
||||
- **PNG**: Lossless quality
|
||||
|
||||
## Best Use Cases
|
||||
- Professional marketing materials
|
||||
- High-quality product photography
|
||||
- Advertising campaigns
|
||||
- Editorial illustrations
|
||||
- Brand identity design
|
||||
- Social media content
|
||||
- E-commerce imagery
|
||||
- Art direction and concept art
|
||||
- Time-sensitive projects requiring both speed and quality
|
||||
- Production environments with high output demands
|
||||
|
||||
## Example Prompts
|
||||
|
||||
### Professional Photography
|
||||
```
|
||||
A professional product photograph of a luxury watch on black marble,
|
||||
studio lighting, macro details, reflections, high-end commercial style
|
||||
```
|
||||
|
||||
### Editorial Illustration
|
||||
```
|
||||
Editorial illustration for tech magazine cover, AI and human collaboration,
|
||||
modern minimalist style, vibrant colors, geometric elements
|
||||
```
|
||||
|
||||
### Brand Marketing
|
||||
```
|
||||
Lifestyle photograph of a young professional using a laptop in a modern
|
||||
coffee shop, natural morning light, candid moment, warm tones
|
||||
```
|
||||
|
||||
### Creative Concept
|
||||
```
|
||||
Surreal scene of a floating island with waterfalls cascading into clouds,
|
||||
cinematic lighting, photorealistic style, dramatic atmosphere
|
||||
```
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
### Prompt Engineering
|
||||
- Be specific and descriptive about desired style
|
||||
- Include lighting and atmosphere details
|
||||
- Specify composition and framing when needed
|
||||
- Mention art style or photography type explicitly
|
||||
- Use professional terminology for technical accuracy
|
||||
|
||||
### Quality Optimization
|
||||
- Use 1024x1024 or higher for best detail
|
||||
- Enable prompt upsampling for complex scenes
|
||||
- Specify output format based on use case (PNG for highest quality)
|
||||
- Use seed values for consistent results across iterations
|
||||
|
||||
### Speed vs. Quality
|
||||
- Default settings already optimized for both
|
||||
- Single-step generation is surprisingly high quality
|
||||
- For absolute best results, use maximum resolution
|
||||
- Image prompts add minimal processing time
|
||||
|
||||
### Composition Control
|
||||
- Use image prompts to maintain consistent layouts
|
||||
- Combine with detailed text prompts for best results
|
||||
- Reference images guide structure, not style
|
||||
|
||||
## Strengths
|
||||
- **Industry-Leading Quality**: Consistently produces professional-grade images
|
||||
- **Exceptional Speed**: 6x faster than previous generation
|
||||
- **Prompt Understanding**: Superior interpretation of complex prompts
|
||||
- **Versatility**: Excellent across photography, illustration, and art styles
|
||||
- **Reliability**: Consistent, predictable results
|
||||
- **Production-Ready**: Stable and dependable for commercial use
|
||||
- **High Resolution**: Up to 4MP output for print-quality work
|
||||
- **Fine Details**: Captures intricate textures and subtle elements
|
||||
|
||||
## Limitations
|
||||
- **No Negative Prompts**: Cannot explicitly exclude elements
|
||||
- **Premium Pricing**: Higher cost reflects professional quality
|
||||
- **Single-Step Only**: Fixed at 1 step (though this is optimized)
|
||||
- **Safety Filter**: May occasionally block artistic nudity or violence
|
||||
|
||||
## Performance Metrics
|
||||
- **Generation Time**: ~4 seconds average (at 1024x1024)
|
||||
- **Quality Score**: Top performer in industry benchmarks
|
||||
- **Prompt Adherence**: Highest accuracy in Text-to-Image Benchmark 2025
|
||||
- **Success Rate**: >99% successful generations
|
||||
|
||||
## Cost
|
||||
**$0.04 per generation** (regardless of resolution)
|
||||
|
||||
*Premium pricing for professional-grade quality and speed*
|
||||
|
||||
## Comparison with Other FLUX Models
|
||||
|
||||
| Feature | FLUX 1.1 Pro | FLUX Dev | FLUX Schnell |
|
||||
|---------|--------------|----------|--------------|
|
||||
| Quality | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
|
||||
| Speed | 6x faster | Baseline | 8x faster |
|
||||
| Steps | 1 | 50 | 4 |
|
||||
| Resolution | Up to 4MP | Up to 1MP | Up to 1MP |
|
||||
| Cost | $0.04 | $0.025 | $0.003 |
|
||||
| Use Case | Professional | Development | Budget/Volume |
|
||||
|
||||
## When to Use FLUX 1.1 Pro
|
||||
|
||||
### ✅ Choose FLUX 1.1 Pro When:
|
||||
- Quality is the top priority
|
||||
- You need professional, client-ready results
|
||||
- Time-sensitive projects requiring both speed and quality
|
||||
- Commercial/production environments
|
||||
- High-resolution output needed
|
||||
- Brand-critical imagery
|
||||
- Maximum prompt adherence required
|
||||
|
||||
### ❌ Consider Alternatives When:
|
||||
- Budget is extremely limited → use FLUX Schnell ($0.003 vs $0.04)
|
||||
- High-volume generation (1000+ images) → use FLUX Schnell
|
||||
- Rapid prototyping only → use FLUX Schnell
|
||||
- Non-commercial experiments → use FLUX Dev
|
||||
- Need negative prompts → use Stable Diffusion 3.5 Large
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Model Architecture
|
||||
- 12 billion parameter rectified flow transformer
|
||||
- Optimized inference pipeline for 6x speed improvement
|
||||
- Enhanced prompt encoding for better adherence
|
||||
- Advanced attention mechanisms for fine details
|
||||
|
||||
### Optimization
|
||||
- Single-step distillation from multi-step model
|
||||
- Hardware acceleration optimized
|
||||
- Efficient memory usage
|
||||
- Parallel processing capabilities
|
||||
|
||||
## Best Practices for Production
|
||||
|
||||
1. **Set Explicit Seeds**: Use fixed seeds for consistent brand imagery
|
||||
2. **Test Aspect Ratios**: Verify compositions work across different ratios
|
||||
3. **Quality Control**: Review outputs before client delivery
|
||||
4. **Backup Plans**: Have alternative models ready (SD 3.5, Recraft V3)
|
||||
5. **Cost Monitoring**: Track usage for budget management
|
||||
6. **Prompt Library**: Build reusable prompt templates for brand consistency
|
||||
|
||||
## Integration Notes
|
||||
|
||||
### API Usage
|
||||
Works seamlessly with Replicate's standard API structure. No special configuration needed.
|
||||
|
||||
### Batch Processing
|
||||
Can be parallelized for high-volume generation. Recommended for production workflows.
|
||||
|
||||
### Caching
|
||||
Use seed values and exact prompts for cacheable, reproducible results.
|
||||
|
||||
## Conclusion
|
||||
|
||||
FLUX 1.1 Pro represents the current state-of-the-art in AI image generation. Its combination of exceptional quality, industry-leading speed, and reliable performance makes it the top choice for professional applications where results matter. While the premium pricing reflects its capabilities, the value delivered in terms of quality and time savings makes it an excellent investment for serious projects.
|
||||
|
||||
**Recommended as the default model for production use.**
|
||||
65
apps/picture/docs/models/flux-krea-dev.md
Normal file
65
apps/picture/docs/models/flux-krea-dev.md
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
# FLUX Krea Dev
|
||||
|
||||
## Overview
|
||||
FLUX Krea Dev is an enhanced version of the FLUX model optimized for creative development. It combines the flexibility of the FLUX architecture with Krea's improvements for artistic and development workflows.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Black Forest Labs
|
||||
- **Replicate ID**: `black-forest-labs/flux-krea-dev`
|
||||
- **Version**: `c63e8a1037b9e90ce614e30bb44c837e1b1e86bb1f0adc6f1bb7f0e3ad088e3f`
|
||||
|
||||
## Key Features
|
||||
- **Creative Enhancement**: Optimized for artistic workflows
|
||||
- **Developer-Friendly**: Designed with API integration in mind
|
||||
- **Style Flexibility**: Excellent at various artistic styles
|
||||
- **Quality Balance**: Good balance between speed and quality
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 30
|
||||
- **Guidance Scale**: 7.5
|
||||
- **Supports Negative Prompts**: Yes
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**11 aspect ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 4:3, 3:2, 5:4, 16:9, 21:9
|
||||
- **Portrait**: 3:4, 2:3, 4:5, 9:16, 9:21
|
||||
|
||||
## Supported Resolutions
|
||||
- **Megapixel Options**: 0.25 MP or 1 MP (default)
|
||||
- **Maximum**: 1440x1440 pixels in any dimension
|
||||
- Dimensions automatically rounded to multiples of 32
|
||||
|
||||
## Best Use Cases
|
||||
- Creative development and prototyping
|
||||
- Artistic experimentation
|
||||
- Style exploration
|
||||
- Professional creative workflows
|
||||
- Game and media asset generation
|
||||
|
||||
## Example Prompts
|
||||
1. "Concept art for a steampunk airship with brass details and Victorian aesthetics"
|
||||
2. "A surreal landscape with floating islands and bioluminescent plants"
|
||||
3. "Character design sheet for a fantasy warrior with multiple poses and expressions"
|
||||
|
||||
## Tips for Best Results
|
||||
- Take advantage of the model's creative flexibility
|
||||
- Experiment with unusual style combinations
|
||||
- Use detailed artistic terminology
|
||||
- Great for iterative creative development
|
||||
- Excellent for mood boards and concept work
|
||||
|
||||
## Strengths
|
||||
- Enhanced creative capabilities
|
||||
- Good at understanding artistic concepts
|
||||
- Reliable for professional workflows
|
||||
- Balanced performance
|
||||
|
||||
## Limitations
|
||||
- Slightly slower than Schnell variant (15 seconds)
|
||||
- May require more detailed prompts for specific outcomes
|
||||
|
||||
## Cost
|
||||
Estimated at $0.04 per generation
|
||||
69
apps/picture/docs/models/flux-schnell.md
Normal file
69
apps/picture/docs/models/flux-schnell.md
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# FLUX Schnell
|
||||
|
||||
## Overview
|
||||
FLUX Schnell (German for "fast") is Black Forest Labs' speed-optimized image generation model. It delivers high-quality results in record time while maintaining the artistic excellence of the FLUX model family. **With the lowest cost per generation at just $0.003, it's the most economical choice for high-volume projects.**
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Black Forest Labs
|
||||
- **Replicate ID**: `black-forest-labs/flux-schnell`
|
||||
- **Version**: Latest stable version
|
||||
|
||||
## Key Features
|
||||
- **Ultra-Fast Generation**: One of the fastest models available
|
||||
- **Consistent Quality**: Maintains high quality despite speed
|
||||
- **Prompt Adherence**: Excellent understanding of prompt instructions
|
||||
- **Efficient Processing**: Low computational requirements
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 4 (optimized for speed)
|
||||
- **Guidance Scale**: 3.5
|
||||
- **Supports Negative Prompts**: Yes
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**11 aspect ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 4:3, 3:2, 5:4, 16:9, 21:9
|
||||
- **Portrait**: 3:4, 2:3, 4:5, 9:16, 9:21
|
||||
|
||||
## Supported Resolutions
|
||||
- **Megapixel Options**: 0.25 MP (fast) or 1 MP (standard)
|
||||
- Automatically calculated based on aspect ratio
|
||||
- All dimensions must be multiples of 32
|
||||
|
||||
## Best Use Cases
|
||||
- Rapid prototyping and iteration
|
||||
- Real-time applications
|
||||
- High-volume generation needs
|
||||
- Quick concept visualization
|
||||
- Testing prompt variations
|
||||
|
||||
## Example Prompts
|
||||
1. "A minimalist logo design for a tech startup, geometric shapes, blue and white"
|
||||
2. "Portrait of a robot chef cooking in a futuristic kitchen"
|
||||
3. "Abstract art piece with flowing colors representing music and rhythm"
|
||||
|
||||
## Tips for Best Results
|
||||
- Keep prompts clear and concise for speed
|
||||
- Use simple, direct descriptions
|
||||
- Ideal for iterative workflows
|
||||
- Great for A/B testing different concepts
|
||||
- Perfect for time-sensitive projects
|
||||
|
||||
## Strengths
|
||||
- **Cheapest model available** ($0.003 per generation)
|
||||
- Extremely fast generation (~5 seconds)
|
||||
- Reliable and consistent
|
||||
- Good general-purpose model
|
||||
- Excellent for rapid iteration
|
||||
- Perfect for high-volume/budget-conscious projects
|
||||
|
||||
## Limitations
|
||||
- May sacrifice some fine details for speed
|
||||
- Best for standard styles rather than highly specialized ones
|
||||
|
||||
## Cost
|
||||
**$0.003 per generation** (~333 images for $1)
|
||||
|
||||
*The most cost-effective model available - over 6x cheaper than most alternatives!*
|
||||
61
apps/picture/docs/models/ideogram-v3-turbo.md
Normal file
61
apps/picture/docs/models/ideogram-v3-turbo.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
# Ideogram V3 Turbo
|
||||
|
||||
## Overview
|
||||
Ideogram V3 Turbo is a fast, high-quality text-to-image generation model with exceptional text rendering capabilities. This model excels at generating images with readable, accurate text embedded within them.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Ideogram AI
|
||||
- **Replicate ID**: `ideogram-ai/ideogram-v3-turbo`
|
||||
- **Version**: `adfd685c1f08e0a1091e8c3e2e1c8c1c6aca2cb1c73cf37e982b965fb40e5c42`
|
||||
|
||||
## Key Features
|
||||
- **Excellent Text Rendering**: Superior ability to generate readable text within images
|
||||
- **Fast Generation**: Optimized for quick results (typically 10 seconds)
|
||||
- **High Quality**: Produces professional-quality images
|
||||
- **Versatile Styles**: Supports various artistic and photographic styles
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 30
|
||||
- **Guidance Scale**: 7.5
|
||||
- **Supports Negative Prompts**: Yes
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**Extensive Support** - 15 different aspect ratios:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 3:2, 4:3, 5:4, 16:10, 16:9, 2:1, 3:1
|
||||
- **Portrait**: 2:3, 3:4, 4:5, 10:16, 9:16, 1:2, 1:3
|
||||
- **Ultra-wide**: 21:9 (custom)
|
||||
|
||||
## Supported Resolutions
|
||||
- **Minimum**: 512x512
|
||||
- **Maximum**: 1536x1536 (in any dimension)
|
||||
- Flexible resolution combinations from 512x1536 to 1536x512
|
||||
|
||||
## Best Use Cases
|
||||
- Marketing materials with text overlays
|
||||
- Logo designs and branding
|
||||
- Posters and advertisements
|
||||
- Social media graphics
|
||||
- Any image requiring embedded text
|
||||
|
||||
## Example Prompts
|
||||
1. "A vintage travel poster for Paris with bold text saying 'Visit Paris' in art deco style"
|
||||
2. "A modern tech company logo with the text 'TechCorp' in sleek metallic letters"
|
||||
3. "A coffee shop menu board with handwritten chalk text listing various drinks"
|
||||
|
||||
## Tips for Best Results
|
||||
- Be specific about text placement and style
|
||||
- Describe the font style you want (bold, handwritten, serif, etc.)
|
||||
- Include context about the overall image composition
|
||||
- Use quotation marks around the exact text you want to appear
|
||||
- Specify text color and background contrast for readability
|
||||
|
||||
## Limitations
|
||||
- Complex multi-paragraph text may be challenging
|
||||
- Very small text might not be perfectly legible
|
||||
- Special characters and non-Latin scripts may have varying results
|
||||
|
||||
## Cost
|
||||
Estimated at $0.02 per generation
|
||||
65
apps/picture/docs/models/imagen-4-fast.md
Normal file
65
apps/picture/docs/models/imagen-4-fast.md
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
# Google Imagen 4 Fast
|
||||
|
||||
## Overview
|
||||
Google's Imagen 4 Fast is a state-of-the-art image generation model that balances speed with exceptional quality and coherence. It leverages Google's advanced AI research to produce highly realistic and contextually accurate images.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Google
|
||||
- **Replicate ID**: `google/imagen-4-fast`
|
||||
- **Version**: `39d3ddaf89f8eadd0f728bb96f6c1a95e99a0e06f3bb4e893d7a039f69a04f94`
|
||||
|
||||
## Key Features
|
||||
- **Photorealistic Quality**: Excels at generating realistic photographs
|
||||
- **Semantic Understanding**: Strong comprehension of complex prompts
|
||||
- **Fast Processing**: Optimized for speed (typically 8 seconds)
|
||||
- **Consistent Results**: Reliable output quality across various prompts
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 30
|
||||
- **Guidance Scale**: 7.5
|
||||
- **Supports Negative Prompts**: Yes
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**5 standard ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 16:9, 4:3
|
||||
- **Portrait**: 9:16, 3:4
|
||||
|
||||
## Supported Resolutions
|
||||
- Automatically determined by aspect ratio selection
|
||||
- High-quality output at all supported ratios
|
||||
- Optimized for each aspect ratio
|
||||
|
||||
## Best Use Cases
|
||||
- Photorealistic portraits and scenes
|
||||
- Product photography
|
||||
- Architectural visualizations
|
||||
- Nature and landscape photography
|
||||
- Editorial and documentary-style images
|
||||
|
||||
## Example Prompts
|
||||
1. "A professional headshot of a business executive in a modern office, soft natural lighting"
|
||||
2. "A hyperrealistic product shot of a luxury watch on black velvet background"
|
||||
3. "An aerial view of a sustainable city with green rooftops and solar panels"
|
||||
|
||||
## Tips for Best Results
|
||||
- Use detailed descriptions for photorealistic results
|
||||
- Specify lighting conditions (golden hour, studio lighting, etc.)
|
||||
- Include camera settings for photography-style shots
|
||||
- Mention specific details about textures and materials
|
||||
- Use professional photography terminology
|
||||
|
||||
## Strengths
|
||||
- Excellent at human faces and expressions
|
||||
- Superior understanding of spatial relationships
|
||||
- High-quality texture rendering
|
||||
- Natural lighting and shadows
|
||||
|
||||
## Limitations
|
||||
- May require more specific prompting for artistic styles
|
||||
- Best suited for realistic rather than abstract content
|
||||
|
||||
## Cost
|
||||
Estimated at $0.03 per generation
|
||||
72
apps/picture/docs/models/qwen-image.md
Normal file
72
apps/picture/docs/models/qwen-image.md
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
# Qwen Image
|
||||
|
||||
## Overview
|
||||
Qwen Image is Alibaba's advanced image generation model that combines strong multilingual understanding with high-quality image generation capabilities. It's particularly notable for its excellent handling of Asian languages and cultural contexts.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Qwen (Alibaba)
|
||||
- **Replicate ID**: `qwen/qwen-image`
|
||||
- **Version**: `9bc5cb891bfe948b11c7bb9e63ccb1c7e03c4cf53e89b963a99e673f84c5d8ef`
|
||||
|
||||
## Key Features
|
||||
- **Multilingual Excellence**: Superior understanding of Chinese, Japanese, Korean, and other languages
|
||||
- **Cultural Awareness**: Strong understanding of diverse cultural contexts
|
||||
- **Balanced Quality**: Good balance of speed and image quality
|
||||
- **Versatile Styles**: Handles both Eastern and Western artistic styles
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 30
|
||||
- **Guidance Scale**: 7.5
|
||||
- **Supports Negative Prompts**: Yes
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**7 aspect ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 4:3, 3:2, 16:9
|
||||
- **Portrait**: 3:4, 2:3, 9:16
|
||||
|
||||
## Supported Resolutions
|
||||
- **Custom Range**: 512x512 to 2048x2048
|
||||
- **Quality Modes**:
|
||||
- "optimize_for_quality" (higher resolution)
|
||||
- "optimize_for_speed" (lower resolution)
|
||||
- Custom width/height override available
|
||||
|
||||
## Best Use Cases
|
||||
- Multilingual content creation
|
||||
- Asian market visuals
|
||||
- Cultural and traditional artwork
|
||||
- E-commerce product images
|
||||
- Educational illustrations
|
||||
|
||||
## Example Prompts
|
||||
1. "Traditional Chinese garden with pavilion, koi pond, and cherry blossoms in spring"
|
||||
2. "Modern Tokyo street fashion, young person in Harajuku style clothing"
|
||||
3. "Korean traditional hanbok in modern minimalist style illustration"
|
||||
|
||||
## Tips for Best Results
|
||||
- Can handle prompts in multiple languages effectively
|
||||
- Excellent for culture-specific imagery
|
||||
- Good at combining traditional and modern elements
|
||||
- Specify regional artistic styles when needed
|
||||
- Works well with detailed scene descriptions
|
||||
|
||||
## Strengths
|
||||
- Best-in-class for Asian language prompts
|
||||
- Excellent cultural representation
|
||||
- Good at traditional art styles
|
||||
- Reliable and consistent output
|
||||
|
||||
## Limitations
|
||||
- May require more specific prompting for Western styles
|
||||
- Generation time moderate (10 seconds)
|
||||
|
||||
## Special Features
|
||||
- Accepts prompts in Chinese, Japanese, Korean, and English
|
||||
- Understands cultural nuances and symbols
|
||||
- Good at generating text in Asian languages
|
||||
|
||||
## Cost
|
||||
Estimated at $0.03 per generation
|
||||
79
apps/picture/docs/models/recraft-v3-svg.md
Normal file
79
apps/picture/docs/models/recraft-v3-svg.md
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
# Recraft V3 SVG
|
||||
|
||||
## Overview
|
||||
Recraft V3 SVG is a unique model specialized in generating vector graphics and illustrations in SVG format. Unlike raster-based models, it creates scalable vector graphics perfect for logos, icons, and illustrations that need to work at any size.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: Recraft AI
|
||||
- **Replicate ID**: `recraft-ai/recraft-v3-svg`
|
||||
- **Version**: `4747c02d57e6a055f96a74e5c6e7f9dd72e6f9c49a08f802e03f42b2c59e2bbf`
|
||||
|
||||
## Key Features
|
||||
- **Vector Output**: Generates true SVG files, not raster images
|
||||
- **Infinite Scalability**: Images can be scaled to any size without quality loss
|
||||
- **Clean Graphics**: Produces clean, professional vector illustrations
|
||||
- **Design-Ready**: Output ready for use in design software
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024 (initial render size)
|
||||
- **Steps**: 30
|
||||
- **Guidance Scale**: 7.5
|
||||
- **Supports Negative Prompts**: No
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**16 aspect ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 4:3, 3:2, 16:9, 2:1, 7:5, 5:4, 5:3
|
||||
- **Portrait**: 3:4, 2:3, 9:16, 1:2, 5:7, 4:5, 3:5
|
||||
- **Custom**: "Not set" option available
|
||||
|
||||
## Supported Resolutions
|
||||
**Preset resolutions based on aspect ratio**:
|
||||
- 1024x1024 (1:1)
|
||||
- 1365x1024 (4:3), 1024x1365 (3:4)
|
||||
- 1536x1024 (3:2), 1024x1536 (2:3)
|
||||
- 1820x1024 (16:9), 1024x1820 (9:16)
|
||||
- 2048x1024 (2:1), 1024x2048 (1:2)
|
||||
- And more specialized ratios
|
||||
- Note: As SVG, output can be scaled infinitely
|
||||
|
||||
## Best Use Cases
|
||||
- Logo design and branding
|
||||
- Icon sets and UI elements
|
||||
- Technical illustrations
|
||||
- Infographics and diagrams
|
||||
- Print-ready graphics
|
||||
- Web illustrations
|
||||
|
||||
## Example Prompts
|
||||
1. "A minimalist logo of a mountain with sunrise, flat design, vector style"
|
||||
2. "Set of weather icons in outlined style, simple and clean"
|
||||
3. "Abstract geometric pattern with circles and triangles, modern art style"
|
||||
|
||||
## Tips for Best Results
|
||||
- Use terms like "vector", "flat design", "minimalist"
|
||||
- Specify simple, clean compositions
|
||||
- Avoid requesting photorealistic details
|
||||
- Think in terms of shapes and paths
|
||||
- Request "icon style" or "logo style" for best results
|
||||
|
||||
## Strengths
|
||||
- Only model that generates true vector graphics
|
||||
- Perfect for scalable designs
|
||||
- Clean, professional output
|
||||
- Ideal for commercial design work
|
||||
|
||||
## Limitations
|
||||
- Cannot generate photorealistic images
|
||||
- Limited to vector-appropriate styles
|
||||
- No support for negative prompts
|
||||
- Best for simple to moderate complexity
|
||||
|
||||
## Output Format
|
||||
- SVG (Scalable Vector Graphics)
|
||||
- Can be edited in Adobe Illustrator, Inkscape, etc.
|
||||
- Web-ready and print-ready
|
||||
|
||||
## Cost
|
||||
Estimated at $0.05 per generation
|
||||
69
apps/picture/docs/models/seedream-3.md
Normal file
69
apps/picture/docs/models/seedream-3.md
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# ByteDance SeeDream 3
|
||||
|
||||
## Overview
|
||||
SeeDream 3 is ByteDance's advanced image generation model known for its creative capabilities and artistic flexibility. It excels at producing diverse styles ranging from photorealistic to highly stylized artwork.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: ByteDance
|
||||
- **Replicate ID**: `bytedance/seedream-3`
|
||||
- **Version**: `3c96fbed56fa0e9c6c06bb014f8be529821f5ea8e37e887fb20d3fb2fe10e1e8`
|
||||
|
||||
## Key Features
|
||||
- **Creative Versatility**: Excellent at both realistic and artistic styles
|
||||
- **Style Mixing**: Can blend multiple artistic styles effectively
|
||||
- **Detail Richness**: Produces images with intricate details
|
||||
- **Cultural Diversity**: Strong understanding of diverse cultural contexts
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 1024x1024
|
||||
- **Steps**: 30
|
||||
- **Guidance Scale**: 7.5
|
||||
- **Supports Negative Prompts**: Yes
|
||||
- **Supports Seed**: Yes
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**9 aspect ratios including custom**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 4:3, 3:2, 16:9, 21:9
|
||||
- **Portrait**: 3:4, 2:3, 9:16
|
||||
- **Custom**: Any ratio within resolution limits
|
||||
|
||||
## Supported Resolutions
|
||||
- **Minimum**: 512x512
|
||||
- **Maximum**: 2048x2048
|
||||
- **Size Presets**:
|
||||
- Big: Longest dimension 2048px
|
||||
- Regular: 1 megapixel (balanced)
|
||||
- Small: Shortest dimension 512px
|
||||
|
||||
## Best Use Cases
|
||||
- Digital artwork and illustrations
|
||||
- Character design and concept art
|
||||
- Fantasy and sci-fi scenes
|
||||
- Cultural and traditional art styles
|
||||
- Creative advertising visuals
|
||||
|
||||
## Example Prompts
|
||||
1. "A cyberpunk street market in Tokyo at night, neon lights reflecting on wet pavement"
|
||||
2. "Traditional Chinese ink painting of mountains with modern city skyline in background"
|
||||
3. "A whimsical illustration of a tea party in an enchanted forest, Studio Ghibli style"
|
||||
|
||||
## Tips for Best Results
|
||||
- Experiment with style combinations (e.g., "watercolor and digital art")
|
||||
- Include atmospheric descriptions for mood
|
||||
- Specify color palettes for consistent aesthetics
|
||||
- Use cultural references for authentic representations
|
||||
- Combine realistic elements with fantastical concepts
|
||||
|
||||
## Strengths
|
||||
- Excellent style transfer and mixing
|
||||
- Strong at creating atmospheric scenes
|
||||
- Good understanding of artistic movements
|
||||
- Handles complex compositions well
|
||||
|
||||
## Limitations
|
||||
- Generation time slightly longer than some alternatives (12 seconds)
|
||||
- May need refinement for ultra-photorealistic results
|
||||
|
||||
## Cost
|
||||
Estimated at $0.025 per generation
|
||||
85
apps/picture/docs/models/seedream-4.md
Normal file
85
apps/picture/docs/models/seedream-4.md
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
# ByteDance SeeDream 4
|
||||
|
||||
## Overview
|
||||
SeeDream 4 is ByteDance's latest generation image model featuring unified text-to-image generation and precise single-sentence editing capabilities. It offers significant improvements over SeeDream 3 with higher resolution support and more flexible workflows.
|
||||
|
||||
## Model Details
|
||||
- **Provider**: ByteDance
|
||||
- **Replicate ID**: `bytedance/seedream-4`
|
||||
- **Version**: `054cd8c667f535616fd66710ce20c8949bf64ac3d9a3459e338f026424be8bec`
|
||||
|
||||
## Key Features
|
||||
- **Unified Architecture**: Single model for both generation and editing
|
||||
- **Ultra High Resolution**: Up to 4K (4096x4096) output
|
||||
- **Multi-Reference Support**: Use up to 10 reference images
|
||||
- **Batch Generation**: Generate up to 15 images in one request
|
||||
- **Precise Editing**: Natural language prompt-based editing
|
||||
- **Consistent Characters**: Maintains character consistency across multiple outputs
|
||||
|
||||
## Default Parameters
|
||||
- **Resolution**: 2048x2048 (2K preset)
|
||||
- **Steps**: 50 (automatic based on size preset)
|
||||
- **Guidance Scale**: 7.5 (automatic)
|
||||
- **Supports Negative Prompts**: No
|
||||
- **Supports Seed**: No
|
||||
- **Supports Image-to-Image**: Yes (via image_input array)
|
||||
|
||||
## Supported Aspect Ratios
|
||||
**8 fixed ratios**:
|
||||
- **Square**: 1:1
|
||||
- **Landscape**: 4:3, 16:9, 3:2, 21:9
|
||||
- **Portrait**: 3:4, 9:16, 2:3
|
||||
|
||||
Additionally supports "match_input_image" when using reference images.
|
||||
|
||||
## Size Presets
|
||||
- **1K**: Best for quick previews (1024-2047px)
|
||||
- **2K**: Balanced quality and speed (2048-3071px) - Default
|
||||
- **4K**: Maximum quality (4096px)
|
||||
- **Custom**: Specify exact width/height (1024-4096px range)
|
||||
|
||||
## Best Use Cases
|
||||
- Character consistency across multiple scenes
|
||||
- High-resolution commercial imagery
|
||||
- Image editing with natural language prompts
|
||||
- Multi-view generation from single prompt
|
||||
- Reference-based generation
|
||||
- Batch creation of variations
|
||||
|
||||
## Example Prompts
|
||||
1. "A professional portrait of a woman in business attire, modern office background, natural lighting"
|
||||
2. "A selection of photos of this character [reference] exploring a bookshop called 'SeeDream 4'"
|
||||
3. "Multiple views of a futuristic car design, different angles and lighting"
|
||||
|
||||
## Tips for Best Results
|
||||
- Use the 2K preset for optimal balance of quality and speed
|
||||
- Leverage multi-reference input for character consistency
|
||||
- Use natural language for precise editing instructions
|
||||
- Request multiple outputs for variations in one go
|
||||
- Specify detailed scene descriptions for better results
|
||||
- For ultra-high quality, use 4K preset
|
||||
|
||||
## Strengths
|
||||
- Exceptional high-resolution output (up to 4K)
|
||||
- Unified generation and editing workflow
|
||||
- Excellent character consistency
|
||||
- Multi-reference and batch capabilities
|
||||
- Fast inference compared to quality level
|
||||
- Natural language editing
|
||||
|
||||
## Limitations
|
||||
- No manual seed control
|
||||
- No negative prompt support
|
||||
- Fixed aspect ratios only (no completely custom ratios)
|
||||
- Slightly higher cost than SeeDream 3 ($0.03 vs $0.025)
|
||||
|
||||
## Cost
|
||||
$0.03 per generation (regardless of resolution)
|
||||
|
||||
## Migration from SeeDream 3
|
||||
If you're upgrading from SeeDream 3:
|
||||
- Resolution limits increased: 2048x2048 → 4096x4096
|
||||
- New parameter structure (size presets instead of raw dimensions)
|
||||
- Removed: seed support, negative prompts
|
||||
- Added: image_input array, multi-image generation, higher resolutions
|
||||
- Slightly higher cost but significantly more features
|
||||
Loading…
Add table
Add a link
Reference in a new issue