commit cf33cc08c18cfd2ffcac84af03b29a5381c905ee Author: AI Dev Factory Date: Sat Dec 6 00:03:36 2025 +0000 Initial commit - MVP project setup Created by AI Dev Factory init-mvp-project.sh diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..3dd19a0 --- /dev/null +++ b/.gitignore @@ -0,0 +1,31 @@ +# Dependencies +node_modules/ +__pycache__/ +*.pyc +*.pyo +*.pyd +.Python +venv/ +.venv/ +env/ +.env + +# IDE +.vscode/ +.idea/ +*.swp +*.swo + +# OS +.DS_Store +Thumbs.db + +# Build +dist/ +build/ +*.egg-info/ + +# Test +.coverage +.pytest_cache/ +*.log diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..d19bbda --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,268 @@ +# Image Description AI - System Architecture + +## High-Level System Design + +The Image Description AI is a client-side single-page application (SPA) that enables users to upload images and receive AI-generated descriptions using the Minimax API. The architecture follows a modern, responsive web application pattern with clean separation of concerns. + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Client-Side Application │ +├─────────────────────────────────────────────────────────────┤ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ +│ │ Upload │ │ Preview │ │ AI Description │ │ +│ │ Component │ │ Component │ │ Display │ │ +│ └─────────────┘ └─────────────┘ └─────────────────────┘ │ +├─────────────────────────────────────────────────────────────┤ +│ Application State │ +├─────────────────────────────────────────────────────────────┤ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ +│ │ File │ │ Image │ │ API Response │ │ +│ │ Validation │ │ Processing │ │ Handler │ │ +│ └─────────────┘ └─────────────┘ └─────────────────────┘ │ +├─────────────────────────────────────────────────────────────┤ +│ Minimax API │ +│ https://api.minimax.io/v1/text/chatcompletion_v2 │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Technology Choices and Rationale + +### Core Technologies +- **HTML5**: Semantic markup for accessibility and modern web standards +- **CSS3**: Modern styling with Flexbox/Grid for responsive layouts +- **Vanilla JavaScript**: Lightweight, no framework overhead, fast loading +- **File API**: Native browser API for file handling and validation + +### UI Framework Decision: Vanilla JavaScript vs React +**Chosen: Vanilla JavaScript** +- **Rationale**: + - Single HTML file requirement simplifies deployment + - Minimal bundle size improves load times + - No build process needed + - Direct DOM manipulation gives precise control + - Sufficient for the application's complexity level + +### Styling Approach +- **CSS Grid & Flexbox**: Modern, flexible layouts +- **CSS Custom Properties**: Maintainable theming +- **Mobile-First Responsive Design**: Works across all device sizes +- **CSS Animations**: Smooth transitions and loading states + +## Database Schema + +**Not Applicable**: This is a client-side only application with no persistent data storage. All processing is transient and happens in memory. + +## API Endpoints + +### External API Integration + +**Endpoint**: `https://api.minimax.io/v1/text/chatcompletion_v2` +- **Method**: POST +- **Authentication**: Bearer token (API key) +- **Content-Type**: application/json + +**Request Structure**: +```json +{ + "model": "MiniMax-M2", + "messages": [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Please provide a detailed description of this image in English" + }, + { + "type": "image_url", + "image_url": { + "url": "..." + } + } + ] + } + ], + "max_tokens": 500, + "temperature": 0.7 +} +``` + +**Response Structure**: +```json +{ + "id": "chatcmpl-abc123", + "object": "chat.completion", + "created": 1677652288, + "model": "MiniMax-M2", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "This image shows a beautiful sunset over..." + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 15, + "completion_tokens": 32, + "total_tokens": 47 + } +} +``` + +## File Structure + +``` +image-description-ai/ +├── index.html # Main application file (self-contained) +├── assets/ +│ ├── styles/ +│ │ └── main.css # (Optional separate CSS file) +│ └── scripts/ +│ └── main.js # (Optional separate JS file) +├── README.md # Project documentation +└── docs/ + ├── ARCHITECTURE.md # This file + └── TASKS.md # Implementation tasks +``` + +### Single File Implementation +For production deployment as specified, the complete application exists in one HTML file: +- `index.html` - Contains all HTML, CSS, and JavaScript inline +- No external dependencies or build process required +- Immediate browser deployment ready + +## Component Interactions + +### 1. File Upload Flow +``` +User Input → File Selection → Validation → Preview Display + ↓ ↓ ↓ ↓ + Drag&Drop → File API → Size/Type Check → Image Preview + Click → Reader → Convert to → Update UI + → Base64 → Base64 → State Update +``` + +### 2. AI Processing Flow +``` +Generate Click → API Request → Loading State → Response Handling + ↓ ↓ ↓ ↓ + Validate → Construct → Show Spinner → Display Result + Image → Payload → Disable UI → Handle Errors + ↓ ↓ ↓ ↓ + State Check → Send POST → Timeout → Success/Failure + → Minimax API → Management → UI Update +``` + +### 3. Error Handling Flow +``` +Any Error → Error Handler → User Notification → Recovery Option + ↓ ↓ ↓ ↓ +Network → Categorize → Clear Message → Reset/Retry +API → Error Type → Visual Alert → State Reset +File → Log Details → Action Required → User Guidance +Validation → Store State → UX Feedback → Continue Flow +``` + +## Data Flow Architecture + +### Application State Management +```javascript +appState = { + currentImage: { + file: File | null, + base64: string | null, + preview: string | null, + metadata: { + name: string, + size: number, + type: string + } + }, + apiStatus: { + isProcessing: boolean, + lastError: string | null, + requestId: string | null + }, + ui: { + dragOver: boolean, + showPreview: boolean, + showResults: boolean + } +} +``` + +### Event-Driven Architecture +- **File Input Events**: Handle drag&drop, click-to-browse, file selection +- **Validation Events**: File size, type, and format checking +- **API Events**: Request initiation, response handling, error management +- **UI Events**: Loading states, animations, user feedback + +## Security Considerations + +### Client-Side Security +- **Input Validation**: Strict file type and size checking +- **XSS Prevention**: Sanitized content display +- **API Key Management**: Client-side exposure (note: production should use server-side proxy) +- **HTTPS Only**: Secure transmission to Minimax API + +### Production Recommendations +1. **Server-Side API Proxy**: Move API calls to backend to hide API keys +2. **Rate Limiting**: Prevent API abuse +3. **File Scanning**: Server-side malware detection +4. **Content Security Policy**: Additional XSS protection + +## Performance Optimizations + +### Client-Side Optimizations +- **Lazy Loading**: Load UI components on demand +- **Debounced Validation**: Reduce unnecessary processing +- **Memory Management**: Clean up base64 strings after use +- **Progressive Enhancement**: Core functionality works without JavaScript + +### API Optimizations +- **Request Compression**: Minimize payload size +- **Timeout Management**: Prevent hanging requests +- **Retry Logic**: Handle transient network failures +- **Caching**: Avoid duplicate API calls for same images + +## Scalability Considerations + +### Current Architecture Limits +- **Client-Only Processing**: Limited by user's device capabilities +- **File Size Constraints**: 5MB limit for practical performance +- **API Rate Limits**: Dependent on Minimax service limits + +### Future Enhancements +- **Backend Integration**: Server-side processing and API management +- **Batch Processing**: Multiple image handling +- **User Accounts**: Save and manage image descriptions +- **Advanced Features**: Multiple language support, custom prompts + +## Browser Compatibility + +### Supported Features +- **File API**: Modern browsers (IE10+, all modern browsers) +- **Base64 Encoding**: Universal browser support +- **CSS Grid/Flexbox**: IE11+, all modern browsers +- **Fetch API**: IE11+, all modern browsers (polyfill available) + +### Fallback Strategies +- **Older Browsers**: Graceful degradation with polyfills +- **No JavaScript**: Basic form submission (limited functionality) +- **Network Issues**: Offline mode with queued requests + +## Deployment Architecture + +### Static File Deployment +- **CDN Ready**: Single HTML file suitable for any CDN +- **Zero Dependencies**: No npm packages or build process +- **Instant Deployment**: Upload and serve immediately +- **Version Control**: Simple Git-based version management + +### Environment Configuration +- **Development**: Direct API calls with test keys +- **Staging**: Mirror production with environment-specific settings +- **Production**: Server-side API proxy recommended for security \ No newline at end of file diff --git a/PROJECT_SPEC.md b/PROJECT_SPEC.md new file mode 100644 index 0000000..ca2150b --- /dev/null +++ b/PROJECT_SPEC.md @@ -0,0 +1,43 @@ +Create a clean, modern web application that allows users to upload an image and get an AI-generated description using the Minimax API. + +**Requirements:** + +1. **Frontend Interface:** + - Single page application with a clean, centered layout + - File upload area (drag-and-drop support + click to browse) + - Image preview after upload + - "Generate Description" button + - Loading state while processing + - Display area for the AI-generated description + - Option to upload a new image after getting results + +2. **Technical Implementation:** + - Use vanilla JavaScript, HTML, and CSS (or React if you prefer) + - Handle image file validation (accept common formats: jpg, png, webp) + - Convert uploaded image to base64 for API submission + - Make POST request to Minimax API endpoint + - Handle API responses and errors gracefully + - Display clear error messages if something goes wrong + +3. **Minimax API Integration:** + - Endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2` + - Use model: `MiniMax-M2` + - Send image as base64 in the messages array + - Prompt: "Please provide a detailed description of this image in English" + - Handle API key securely (note: for production, this should be handled server-side) + +4. **UI/UX Details:** + - Responsive design that works on mobile and desktop + - Professional color scheme (suggest modern blues/grays) + - Smooth transitions and loading animations + - Clear visual feedback for all user actions + +5. **Error Handling:** + - File size validation (max 5MB recommended) + - File type validation + - API error handling with user-friendly messages + - Network error handling + +Please create a complete, production-ready single HTML file with inline CSS and JavaScript that I can immediately use in a browser. + + diff --git a/TASKS.md b/TASKS.md new file mode 100644 index 0000000..70022ad --- /dev/null +++ b/TASKS.md @@ -0,0 +1,174 @@ +# Image Description AI - Implementation Tasks + +## Task 1: Create Basic HTML Structure and Styling Foundation +**Objective**: Establish the foundational HTML structure with semantic markup and responsive CSS framework +**Deliverables**: +- Complete HTML skeleton with proper DOCTYPE and meta tags +- Responsive CSS Grid layout system for the main application container +- Modern color scheme implementation (blues/grays as specified) +- Typography system with readable fonts and proper spacing +- Mobile-first responsive breakpoints + +**Acceptance Criteria**: +- [ ] HTML validates as HTML5 standard +- [ ] Layout is responsive across mobile (320px+), tablet (768px+), and desktop (1024px+) +- [ ] Color scheme uses professional blue/gray palette with proper contrast ratios +- [ ] Typography is legible across all device sizes +- [ ] Basic layout structure includes: header, main upload area, preview section, results section +- [ ] CSS is modular with clear section organization (layout, components, utilities) + +## Task 2: Implement File Upload Interface with Drag & Drop +**Objective**: Create an intuitive file upload interface supporting both drag-and-drop and click-to-browse functionality +**Deliverables**: +- Drag-and-drop zone with visual feedback states (drag over, drop, default) +- Hidden file input element for click-to-browse functionality +- Visual upload area with icon, instructional text, and file format specifications +- File type icon display for different image formats +- Hover and focus states for accessibility + +**Acceptance Criteria**: +- [ ] Drag-and-drop zone visually responds to drag events with proper styling +- [ ] Click-to-browse opens file dialog and triggers file selection +- [ ] Upload area shows clear instructions: "Drag an image here or click to browse" +- [ ] Supported formats are displayed: JPG, PNG, WebP +- [ ] Accessibility: Keyboard navigation and screen reader support +- [ ] Visual feedback on hover/focus with smooth transitions + +## Task 3: Add File Validation and Error Handling System +**Objective**: Implement comprehensive file validation to ensure only valid images are processed +**Deliverables**: +- File type validation (accept only jpg, jpeg, png, webp) +- File size validation (maximum 5MB with user-friendly error messages) +- Validation feedback system with clear error messages +- File metadata extraction (name, size, type) +- Reset functionality to clear errors and start over + +**Acceptance Criteria**: +- [ ] Invalid file types show error: "Please select a valid image file (JPG, PNG, WebP)" +- [ ] Files over 5MB show error: "File size must be less than 5MB" +- [ ] Error messages display in red text below upload area +- [ ] Successful validation clears previous errors +- [ ] Validation occurs immediately upon file selection +- [ ] Files with valid extensions but invalid content are caught +- [ ] Reset button clears all errors and upload area state + +## Task 4: Implement Image Preview Functionality +**Objective**: Display uploaded image with proper sizing and formatting for user confirmation +**Deliverables**: +- Image preview container with proper aspect ratio handling +- Image resizing and optimization for preview (max 400px width/height) +- Base64 encoding of the selected image for API submission +- Metadata display (filename, file size, dimensions) +- Replace/change image functionality + +**Acceptance Criteria**: +- [ ] Image preview displays within 2 seconds of file selection +- [ ] Preview maintains aspect ratio without distortion +- [ ] Large images are scaled appropriately for preview +- [ ] Base64 encoding completes successfully and is stored in memory +- [ ] Image metadata is extracted and displayed (filename, size in MB, dimensions) +- [ ] "Change Image" button allows uploading a different file +- [ ] Preview clears when starting over + +## Task 5: Integrate Minimax API for Image Description Generation +**Objective**: Connect to Minimax API and implement the core AI description generation functionality +**Deliverables**: +- API request construction with proper payload format +- Base64 image embedding in the request body +- Proper error handling for network issues and API responses +- Response parsing and extraction of the AI-generated description +- API key management (client-side with security notes) + +**Acceptance Criteria**: +- [ ] API request uses correct endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2` +- [ ] Request payload includes model: "MiniMax-M2" +- [ ] Base64 image is properly formatted in the messages array +- [ ] Prompt "Please provide a detailed description of this image in English" is included +- [ ] Successful API response extracts the description text +- [ ] API errors are handled gracefully with user-friendly messages +- [ ] Network timeouts are handled with appropriate error messaging +- [ ] Loading state is shown during API calls + +## Task 6: Create Loading States and User Feedback System +**Objective**: Implement comprehensive loading and feedback mechanisms to enhance user experience +**Deliverables**: +- Loading spinner and progress indicator during API calls +- Status messages for different processing stages +- Button state management (disabled during processing) +- Timeout handling with user notification +- Success and error state animations + +**Acceptance Criteria**: +- [ ] Loading spinner appears immediately when "Generate Description" is clicked +- [ ] "Generate Description" button is disabled during processing to prevent duplicate requests +- [ ] Status message shows: "Analyzing image with AI..." +- [ ] Processing timeout (30 seconds) shows: "Processing is taking longer than expected" +- [ ] Success animation plays when description is generated +- [ ] Error state shows appropriate error message with red styling +- [ ] Loading states have smooth transitions and professional appearance + +## Task 7: Display AI Description Results with Formatting +**Objective**: Present the AI-generated description in a clean, readable format with additional functionality +**Deliverables**: +- Results display area with proper typography and spacing +- Text formatting and paragraph handling for long descriptions +- Option to copy description to clipboard +- "Generate New Description" functionality for the same image +- "Upload New Image" reset functionality +- Responsive results layout + +**Acceptance Criteria**: +- [ ] Description displays in a readable format with proper line breaks +- [ ] Long descriptions are scrollable if they exceed viewport +- [ ] "Copy to Clipboard" button works and shows confirmation +- [ ] "Generate New Description" button triggers new API call +- [ ] "Upload New Image" button clears all data and returns to upload state +- [ ] Results area is visually distinct from upload area +- [ ] Typography is large enough to read comfortably on mobile devices + +## Task 8: Final Polish, Testing, and Production Readiness +**Objective**: Complete final testing, optimizations, and prepare for immediate browser deployment +**Deliverables**: +- Comprehensive error handling for all edge cases +- Performance optimization for large images and slow networks +- Cross-browser compatibility testing +- Accessibility improvements (ARIA labels, keyboard navigation) +- Final code organization and documentation +- Single HTML file consolidation with inline CSS and JavaScript + +**Acceptance Criteria**: +- [ ] Application works in Chrome, Firefox, Safari, and Edge +- [ ] All functionality works on mobile devices (iOS and Android) +- [ ] Images up to 5MB process within 30 seconds on average connections +- [ ] Clear error messages for all failure scenarios (network, API, validation) +- [ ] Complete keyboard navigation support +- [ ] ARIA labels and semantic HTML for screen readers +- [ ] No console errors or warnings +- [ ] Single HTML file contains all code and loads immediately in any modern browser +- [ ] Professional appearance with smooth animations and transitions +- [ ] File size under 50KB for fast loading + +## Implementation Notes + +### Task Dependencies +- Tasks 1-2 can be implemented in parallel +- Task 3 depends on Task 2 (validation needs upload interface) +- Task 4 depends on Task 3 (preview needs validation) +- Task 5 depends on Task 4 (API needs base64 image) +- Task 6 depends on Task 5 (loading states during API calls) +- Task 7 depends on Task 6 (results display after processing) +- Task 8 depends on all previous tasks (final testing and polish) + +### Technical Considerations +- Use vanilla JavaScript ES6+ features for modern browser support +- Implement CSS custom properties for maintainable theming +- Follow progressive enhancement principles +- Maintain separation of concerns within the single file +- Include comprehensive error boundaries + +### Testing Approach +- Test with various image sizes and formats +- Test error scenarios (large files, invalid types, network issues) +- Verify responsive behavior across devices +- Validate accessibility with screen readers +- Performance test with slow network connections \ No newline at end of file diff --git a/prompt.md b/prompt.md new file mode 100644 index 0000000..ca2150b --- /dev/null +++ b/prompt.md @@ -0,0 +1,43 @@ +Create a clean, modern web application that allows users to upload an image and get an AI-generated description using the Minimax API. + +**Requirements:** + +1. **Frontend Interface:** + - Single page application with a clean, centered layout + - File upload area (drag-and-drop support + click to browse) + - Image preview after upload + - "Generate Description" button + - Loading state while processing + - Display area for the AI-generated description + - Option to upload a new image after getting results + +2. **Technical Implementation:** + - Use vanilla JavaScript, HTML, and CSS (or React if you prefer) + - Handle image file validation (accept common formats: jpg, png, webp) + - Convert uploaded image to base64 for API submission + - Make POST request to Minimax API endpoint + - Handle API responses and errors gracefully + - Display clear error messages if something goes wrong + +3. **Minimax API Integration:** + - Endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2` + - Use model: `MiniMax-M2` + - Send image as base64 in the messages array + - Prompt: "Please provide a detailed description of this image in English" + - Handle API key securely (note: for production, this should be handled server-side) + +4. **UI/UX Details:** + - Responsive design that works on mobile and desktop + - Professional color scheme (suggest modern blues/grays) + - Smooth transitions and loading animations + - Clear visual feedback for all user actions + +5. **Error Handling:** + - File size validation (max 5MB recommended) + - File type validation + - API error handling with user-friendly messages + - Network error handling + +Please create a complete, production-ready single HTML file with inline CSS and JavaScript that I can immediately use in a browser. + +