268 lines
10 KiB
Markdown
268 lines
10 KiB
Markdown
# Image Description AI - System Architecture
|
|
|
|
## High-Level System Design
|
|
|
|
The Image Description AI is a client-side single-page application (SPA) that enables users to upload images and receive AI-generated descriptions using the Minimax API. The architecture follows a modern, responsive web application pattern with clean separation of concerns.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Client-Side Application │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
|
│ │ Upload │ │ Preview │ │ AI Description │ │
|
|
│ │ Component │ │ Component │ │ Display │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ Application State │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
|
│ │ File │ │ Image │ │ API Response │ │
|
|
│ │ Validation │ │ Processing │ │ Handler │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ Minimax API │
|
|
│ https://api.minimax.io/v1/text/chatcompletion_v2 │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Technology Choices and Rationale
|
|
|
|
### Core Technologies
|
|
- **HTML5**: Semantic markup for accessibility and modern web standards
|
|
- **CSS3**: Modern styling with Flexbox/Grid for responsive layouts
|
|
- **Vanilla JavaScript**: Lightweight, no framework overhead, fast loading
|
|
- **File API**: Native browser API for file handling and validation
|
|
|
|
### UI Framework Decision: Vanilla JavaScript vs React
|
|
**Chosen: Vanilla JavaScript**
|
|
- **Rationale**:
|
|
- Single HTML file requirement simplifies deployment
|
|
- Minimal bundle size improves load times
|
|
- No build process needed
|
|
- Direct DOM manipulation gives precise control
|
|
- Sufficient for the application's complexity level
|
|
|
|
### Styling Approach
|
|
- **CSS Grid & Flexbox**: Modern, flexible layouts
|
|
- **CSS Custom Properties**: Maintainable theming
|
|
- **Mobile-First Responsive Design**: Works across all device sizes
|
|
- **CSS Animations**: Smooth transitions and loading states
|
|
|
|
## Database Schema
|
|
|
|
**Not Applicable**: This is a client-side only application with no persistent data storage. All processing is transient and happens in memory.
|
|
|
|
## API Endpoints
|
|
|
|
### External API Integration
|
|
|
|
**Endpoint**: `https://api.minimax.io/v1/text/chatcompletion_v2`
|
|
- **Method**: POST
|
|
- **Authentication**: Bearer token (API key)
|
|
- **Content-Type**: application/json
|
|
|
|
**Request Structure**:
|
|
```json
|
|
{
|
|
"model": "MiniMax-M2",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "text",
|
|
"text": "Please provide a detailed description of this image in English"
|
|
},
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": "..."
|
|
}
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"max_tokens": 500,
|
|
"temperature": 0.7
|
|
}
|
|
```
|
|
|
|
**Response Structure**:
|
|
```json
|
|
{
|
|
"id": "chatcmpl-abc123",
|
|
"object": "chat.completion",
|
|
"created": 1677652288,
|
|
"model": "MiniMax-M2",
|
|
"choices": [
|
|
{
|
|
"index": 0,
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": "This image shows a beautiful sunset over..."
|
|
},
|
|
"finish_reason": "stop"
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 15,
|
|
"completion_tokens": 32,
|
|
"total_tokens": 47
|
|
}
|
|
}
|
|
```
|
|
|
|
## File Structure
|
|
|
|
```
|
|
image-description-ai/
|
|
├── index.html # Main application file (self-contained)
|
|
├── assets/
|
|
│ ├── styles/
|
|
│ │ └── main.css # (Optional separate CSS file)
|
|
│ └── scripts/
|
|
│ └── main.js # (Optional separate JS file)
|
|
├── README.md # Project documentation
|
|
└── docs/
|
|
├── ARCHITECTURE.md # This file
|
|
└── TASKS.md # Implementation tasks
|
|
```
|
|
|
|
### Single File Implementation
|
|
For production deployment as specified, the complete application exists in one HTML file:
|
|
- `index.html` - Contains all HTML, CSS, and JavaScript inline
|
|
- No external dependencies or build process required
|
|
- Immediate browser deployment ready
|
|
|
|
## Component Interactions
|
|
|
|
### 1. File Upload Flow
|
|
```
|
|
User Input → File Selection → Validation → Preview Display
|
|
↓ ↓ ↓ ↓
|
|
Drag&Drop → File API → Size/Type Check → Image Preview
|
|
Click → Reader → Convert to → Update UI
|
|
→ Base64 → Base64 → State Update
|
|
```
|
|
|
|
### 2. AI Processing Flow
|
|
```
|
|
Generate Click → API Request → Loading State → Response Handling
|
|
↓ ↓ ↓ ↓
|
|
Validate → Construct → Show Spinner → Display Result
|
|
Image → Payload → Disable UI → Handle Errors
|
|
↓ ↓ ↓ ↓
|
|
State Check → Send POST → Timeout → Success/Failure
|
|
→ Minimax API → Management → UI Update
|
|
```
|
|
|
|
### 3. Error Handling Flow
|
|
```
|
|
Any Error → Error Handler → User Notification → Recovery Option
|
|
↓ ↓ ↓ ↓
|
|
Network → Categorize → Clear Message → Reset/Retry
|
|
API → Error Type → Visual Alert → State Reset
|
|
File → Log Details → Action Required → User Guidance
|
|
Validation → Store State → UX Feedback → Continue Flow
|
|
```
|
|
|
|
## Data Flow Architecture
|
|
|
|
### Application State Management
|
|
```javascript
|
|
appState = {
|
|
currentImage: {
|
|
file: File | null,
|
|
base64: string | null,
|
|
preview: string | null,
|
|
metadata: {
|
|
name: string,
|
|
size: number,
|
|
type: string
|
|
}
|
|
},
|
|
apiStatus: {
|
|
isProcessing: boolean,
|
|
lastError: string | null,
|
|
requestId: string | null
|
|
},
|
|
ui: {
|
|
dragOver: boolean,
|
|
showPreview: boolean,
|
|
showResults: boolean
|
|
}
|
|
}
|
|
```
|
|
|
|
### Event-Driven Architecture
|
|
- **File Input Events**: Handle drag&drop, click-to-browse, file selection
|
|
- **Validation Events**: File size, type, and format checking
|
|
- **API Events**: Request initiation, response handling, error management
|
|
- **UI Events**: Loading states, animations, user feedback
|
|
|
|
## Security Considerations
|
|
|
|
### Client-Side Security
|
|
- **Input Validation**: Strict file type and size checking
|
|
- **XSS Prevention**: Sanitized content display
|
|
- **API Key Management**: Client-side exposure (note: production should use server-side proxy)
|
|
- **HTTPS Only**: Secure transmission to Minimax API
|
|
|
|
### Production Recommendations
|
|
1. **Server-Side API Proxy**: Move API calls to backend to hide API keys
|
|
2. **Rate Limiting**: Prevent API abuse
|
|
3. **File Scanning**: Server-side malware detection
|
|
4. **Content Security Policy**: Additional XSS protection
|
|
|
|
## Performance Optimizations
|
|
|
|
### Client-Side Optimizations
|
|
- **Lazy Loading**: Load UI components on demand
|
|
- **Debounced Validation**: Reduce unnecessary processing
|
|
- **Memory Management**: Clean up base64 strings after use
|
|
- **Progressive Enhancement**: Core functionality works without JavaScript
|
|
|
|
### API Optimizations
|
|
- **Request Compression**: Minimize payload size
|
|
- **Timeout Management**: Prevent hanging requests
|
|
- **Retry Logic**: Handle transient network failures
|
|
- **Caching**: Avoid duplicate API calls for same images
|
|
|
|
## Scalability Considerations
|
|
|
|
### Current Architecture Limits
|
|
- **Client-Only Processing**: Limited by user's device capabilities
|
|
- **File Size Constraints**: 5MB limit for practical performance
|
|
- **API Rate Limits**: Dependent on Minimax service limits
|
|
|
|
### Future Enhancements
|
|
- **Backend Integration**: Server-side processing and API management
|
|
- **Batch Processing**: Multiple image handling
|
|
- **User Accounts**: Save and manage image descriptions
|
|
- **Advanced Features**: Multiple language support, custom prompts
|
|
|
|
## Browser Compatibility
|
|
|
|
### Supported Features
|
|
- **File API**: Modern browsers (IE10+, all modern browsers)
|
|
- **Base64 Encoding**: Universal browser support
|
|
- **CSS Grid/Flexbox**: IE11+, all modern browsers
|
|
- **Fetch API**: IE11+, all modern browsers (polyfill available)
|
|
|
|
### Fallback Strategies
|
|
- **Older Browsers**: Graceful degradation with polyfills
|
|
- **No JavaScript**: Basic form submission (limited functionality)
|
|
- **Network Issues**: Offline mode with queued requests
|
|
|
|
## Deployment Architecture
|
|
|
|
### Static File Deployment
|
|
- **CDN Ready**: Single HTML file suitable for any CDN
|
|
- **Zero Dependencies**: No npm packages or build process
|
|
- **Instant Deployment**: Upload and serve immediately
|
|
- **Version Control**: Simple Git-based version management
|
|
|
|
### Environment Configuration
|
|
- **Development**: Direct API calls with test keys
|
|
- **Staging**: Mirror production with environment-specific settings
|
|
- **Production**: Server-side API proxy recommended for security |