Initial commit - MVP project setup

Created by AI Dev Factory init-mvp-project.sh
2025-12-06 00:03:36 +00:00 · 2025-12-06 00:03:36 +00:00 · cf33cc08c1
commit cf33cc08c1
5 changed files with 559 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,31 @@
+# Dependencies
+node_modules/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.Python
+venv/
+.venv/
+env/
+.env
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Build
+dist/
+build/
+*.egg-info/
+
+# Test
+.coverage
+.pytest_cache/
+*.log
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@ -0,0 +1,268 @@
+# Image Description AI - System Architecture
+
+## High-Level System Design
+
+The Image Description AI is a client-side single-page application (SPA) that enables users to upload images and receive AI-generated descriptions using the Minimax API. The architecture follows a modern, responsive web application pattern with clean separation of concerns.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     Client-Side Application                  │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
+│  │   Upload    │  │   Preview   │  │   AI Description    │  │
+│  │  Component  │  │  Component  │  │     Display         │  │
+│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
+├─────────────────────────────────────────────────────────────┤
+│                   Application State                         │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
+│  │    File     │  │   Image     │  │   API Response      │  │
+│  │ Validation  │  │ Processing  │  │     Handler         │  │
+│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
+├─────────────────────────────────────────────────────────────┤
+│                     Minimax API                             │
+│              https://api.minimax.io/v1/text/chatcompletion_v2  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Technology Choices and Rationale
+
+### Core Technologies
+- **HTML5**: Semantic markup for accessibility and modern web standards
+- **CSS3**: Modern styling with Flexbox/Grid for responsive layouts
+- **Vanilla JavaScript**: Lightweight, no framework overhead, fast loading
+- **File API**: Native browser API for file handling and validation
+
+### UI Framework Decision: Vanilla JavaScript vs React
+**Chosen: Vanilla JavaScript**
+- **Rationale**: 
+  - Single HTML file requirement simplifies deployment
+  - Minimal bundle size improves load times
+  - No build process needed
+  - Direct DOM manipulation gives precise control
+  - Sufficient for the application's complexity level
+
+### Styling Approach
+- **CSS Grid & Flexbox**: Modern, flexible layouts
+- **CSS Custom Properties**: Maintainable theming
+- **Mobile-First Responsive Design**: Works across all device sizes
+- **CSS Animations**: Smooth transitions and loading states
+
+## Database Schema
+
+**Not Applicable**: This is a client-side only application with no persistent data storage. All processing is transient and happens in memory.
+
+## API Endpoints
+
+### External API Integration
+
+**Endpoint**: `https://api.minimax.io/v1/text/chatcompletion_v2`
+- **Method**: POST
+- **Authentication**: Bearer token (API key)
+- **Content-Type**: application/json
+
+**Request Structure**:
+```json
+{
+  "model": "MiniMax-M2",
+  "messages": [
+    {
+      "role": "user",
+      "content": [
+        {
+          "type": "text",
+          "text": "Please provide a detailed description of this image in English"
+        },
+        {
+          "type": "image_url",
+          "image_url": {
+            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..."
+          }
+        }
+      ]
+    }
+  ],
+  "max_tokens": 500,
+  "temperature": 0.7
+}
+```
+
+**Response Structure**:
+```json
+{
+  "id": "chatcmpl-abc123",
+  "object": "chat.completion",
+  "created": 1677652288,
+  "model": "MiniMax-M2",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "This image shows a beautiful sunset over..."
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 15,
+    "completion_tokens": 32,
+    "total_tokens": 47
+  }
+}
+```
+
+## File Structure
+
+```
+image-description-ai/
+├── index.html                 # Main application file (self-contained)
+├── assets/
+│   ├── styles/
+│   │   └── main.css          # (Optional separate CSS file)
+│   └── scripts/
+│       └── main.js           # (Optional separate JS file)
+├── README.md                 # Project documentation
+└── docs/
+    ├── ARCHITECTURE.md       # This file
+    └── TASKS.md              # Implementation tasks
+```
+
+### Single File Implementation
+For production deployment as specified, the complete application exists in one HTML file:
+- `index.html` - Contains all HTML, CSS, and JavaScript inline
+- No external dependencies or build process required
+- Immediate browser deployment ready
+
+## Component Interactions
+
+### 1. File Upload Flow
+```
+User Input → File Selection → Validation → Preview Display
+     ↓            ↓              ↓            ↓
+ Drag&Drop → File API → Size/Type Check → Image Preview
+   Click    → Reader   → Convert to    → Update UI
+             → Base64   → Base64        → State Update
+```
+
+### 2. AI Processing Flow
+```
+Generate Click → API Request → Loading State → Response Handling
+       ↓            ↓             ↓              ↓
+  Validate     → Construct    → Show Spinner → Display Result
+  Image       → Payload       → Disable UI   → Handle Errors
+       ↓            ↓             ↓              ↓
+  State Check → Send POST     → Timeout      → Success/Failure
+               → Minimax API  → Management    → UI Update
+```
+
+### 3. Error Handling Flow
+```
+Any Error → Error Handler → User Notification → Recovery Option
+    ↓           ↓               ↓                  ↓
+Network   → Categorize    → Clear Message    → Reset/Retry
+API      → Error Type     → Visual Alert     → State Reset
+File     → Log Details    → Action Required  → User Guidance
+Validation → Store State  → UX Feedback      → Continue Flow
+```
+
+## Data Flow Architecture
+
+### Application State Management
+```javascript
+appState = {
+  currentImage: {
+    file: File | null,
+    base64: string | null,
+    preview: string | null,
+    metadata: {
+      name: string,
+      size: number,
+      type: string
+    }
+  },
+  apiStatus: {
+    isProcessing: boolean,
+    lastError: string | null,
+    requestId: string | null
+  },
+  ui: {
+    dragOver: boolean,
+    showPreview: boolean,
+    showResults: boolean
+  }
+}
+```
+
+### Event-Driven Architecture
+- **File Input Events**: Handle drag&drop, click-to-browse, file selection
+- **Validation Events**: File size, type, and format checking
+- **API Events**: Request initiation, response handling, error management
+- **UI Events**: Loading states, animations, user feedback
+
+## Security Considerations
+
+### Client-Side Security
+- **Input Validation**: Strict file type and size checking
+- **XSS Prevention**: Sanitized content display
+- **API Key Management**: Client-side exposure (note: production should use server-side proxy)
+- **HTTPS Only**: Secure transmission to Minimax API
+
+### Production Recommendations
+1. **Server-Side API Proxy**: Move API calls to backend to hide API keys
+2. **Rate Limiting**: Prevent API abuse
+3. **File Scanning**: Server-side malware detection
+4. **Content Security Policy**: Additional XSS protection
+
+## Performance Optimizations
+
+### Client-Side Optimizations
+- **Lazy Loading**: Load UI components on demand
+- **Debounced Validation**: Reduce unnecessary processing
+- **Memory Management**: Clean up base64 strings after use
+- **Progressive Enhancement**: Core functionality works without JavaScript
+
+### API Optimizations
+- **Request Compression**: Minimize payload size
+- **Timeout Management**: Prevent hanging requests
+- **Retry Logic**: Handle transient network failures
+- **Caching**: Avoid duplicate API calls for same images
+
+## Scalability Considerations
+
+### Current Architecture Limits
+- **Client-Only Processing**: Limited by user's device capabilities
+- **File Size Constraints**: 5MB limit for practical performance
+- **API Rate Limits**: Dependent on Minimax service limits
+
+### Future Enhancements
+- **Backend Integration**: Server-side processing and API management
+- **Batch Processing**: Multiple image handling
+- **User Accounts**: Save and manage image descriptions
+- **Advanced Features**: Multiple language support, custom prompts
+
+## Browser Compatibility
+
+### Supported Features
+- **File API**: Modern browsers (IE10+, all modern browsers)
+- **Base64 Encoding**: Universal browser support
+- **CSS Grid/Flexbox**: IE11+, all modern browsers
+- **Fetch API**: IE11+, all modern browsers (polyfill available)
+
+### Fallback Strategies
+- **Older Browsers**: Graceful degradation with polyfills
+- **No JavaScript**: Basic form submission (limited functionality)
+- **Network Issues**: Offline mode with queued requests
+
+## Deployment Architecture
+
+### Static File Deployment
+- **CDN Ready**: Single HTML file suitable for any CDN
+- **Zero Dependencies**: No npm packages or build process
+- **Instant Deployment**: Upload and serve immediately
+- **Version Control**: Simple Git-based version management
+
+### Environment Configuration
+- **Development**: Direct API calls with test keys
+- **Staging**: Mirror production with environment-specific settings  
+- **Production**: Server-side API proxy recommended for security
--- a/PROJECT_SPEC.md
+++ b/PROJECT_SPEC.md
@ -0,0 +1,43 @@
+Create a clean, modern web application that allows users to upload an image and get an AI-generated description using the Minimax API.
+
+**Requirements:**
+
+1. **Frontend Interface:**
+   - Single page application with a clean, centered layout
+   - File upload area (drag-and-drop support + click to browse)
+   - Image preview after upload
+   - "Generate Description" button
+   - Loading state while processing
+   - Display area for the AI-generated description
+   - Option to upload a new image after getting results
+
+2. **Technical Implementation:**
+   - Use vanilla JavaScript, HTML, and CSS (or React if you prefer)
+   - Handle image file validation (accept common formats: jpg, png, webp)
+   - Convert uploaded image to base64 for API submission
+   - Make POST request to Minimax API endpoint
+   - Handle API responses and errors gracefully
+   - Display clear error messages if something goes wrong
+
+3. **Minimax API Integration:**
+   - Endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2`
+   - Use model: `MiniMax-M2` 
+   - Send image as base64 in the messages array
+   - Prompt: "Please provide a detailed description of this image in English"
+   - Handle API key securely (note: for production, this should be handled server-side)
+
+4. **UI/UX Details:**
+   - Responsive design that works on mobile and desktop
+   - Professional color scheme (suggest modern blues/grays)
+   - Smooth transitions and loading animations
+   - Clear visual feedback for all user actions
+
+5. **Error Handling:**
+   - File size validation (max 5MB recommended)
+   - File type validation
+   - API error handling with user-friendly messages
+   - Network error handling
+
+Please create a complete, production-ready single HTML file with inline CSS and JavaScript that I can immediately use in a browser.
+
+
--- a/TASKS.md
+++ b/TASKS.md
@ -0,0 +1,174 @@
+# Image Description AI - Implementation Tasks
+
+## Task 1: Create Basic HTML Structure and Styling Foundation
+**Objective**: Establish the foundational HTML structure with semantic markup and responsive CSS framework
+**Deliverables**: 
+- Complete HTML skeleton with proper DOCTYPE and meta tags
+- Responsive CSS Grid layout system for the main application container
+- Modern color scheme implementation (blues/grays as specified)
+- Typography system with readable fonts and proper spacing
+- Mobile-first responsive breakpoints
+
+**Acceptance Criteria**:
+- [ ] HTML validates as HTML5 standard
+- [ ] Layout is responsive across mobile (320px+), tablet (768px+), and desktop (1024px+)
+- [ ] Color scheme uses professional blue/gray palette with proper contrast ratios
+- [ ] Typography is legible across all device sizes
+- [ ] Basic layout structure includes: header, main upload area, preview section, results section
+- [ ] CSS is modular with clear section organization (layout, components, utilities)
+
+## Task 2: Implement File Upload Interface with Drag & Drop
+**Objective**: Create an intuitive file upload interface supporting both drag-and-drop and click-to-browse functionality
+**Deliverables**:
+- Drag-and-drop zone with visual feedback states (drag over, drop, default)
+- Hidden file input element for click-to-browse functionality
+- Visual upload area with icon, instructional text, and file format specifications
+- File type icon display for different image formats
+- Hover and focus states for accessibility
+
+**Acceptance Criteria**:
+- [ ] Drag-and-drop zone visually responds to drag events with proper styling
+- [ ] Click-to-browse opens file dialog and triggers file selection
+- [ ] Upload area shows clear instructions: "Drag an image here or click to browse"
+- [ ] Supported formats are displayed: JPG, PNG, WebP
+- [ ] Accessibility: Keyboard navigation and screen reader support
+- [ ] Visual feedback on hover/focus with smooth transitions
+
+## Task 3: Add File Validation and Error Handling System
+**Objective**: Implement comprehensive file validation to ensure only valid images are processed
+**Deliverables**:
+- File type validation (accept only jpg, jpeg, png, webp)
+- File size validation (maximum 5MB with user-friendly error messages)
+- Validation feedback system with clear error messages
+- File metadata extraction (name, size, type)
+- Reset functionality to clear errors and start over
+
+**Acceptance Criteria**:
+- [ ] Invalid file types show error: "Please select a valid image file (JPG, PNG, WebP)"
+- [ ] Files over 5MB show error: "File size must be less than 5MB"
+- [ ] Error messages display in red text below upload area
+- [ ] Successful validation clears previous errors
+- [ ] Validation occurs immediately upon file selection
+- [ ] Files with valid extensions but invalid content are caught
+- [ ] Reset button clears all errors and upload area state
+
+## Task 4: Implement Image Preview Functionality
+**Objective**: Display uploaded image with proper sizing and formatting for user confirmation
+**Deliverables**:
+- Image preview container with proper aspect ratio handling
+- Image resizing and optimization for preview (max 400px width/height)
+- Base64 encoding of the selected image for API submission
+- Metadata display (filename, file size, dimensions)
+- Replace/change image functionality
+
+**Acceptance Criteria**:
+- [ ] Image preview displays within 2 seconds of file selection
+- [ ] Preview maintains aspect ratio without distortion
+- [ ] Large images are scaled appropriately for preview
+- [ ] Base64 encoding completes successfully and is stored in memory
+- [ ] Image metadata is extracted and displayed (filename, size in MB, dimensions)
+- [ ] "Change Image" button allows uploading a different file
+- [ ] Preview clears when starting over
+
+## Task 5: Integrate Minimax API for Image Description Generation
+**Objective**: Connect to Minimax API and implement the core AI description generation functionality
+**Deliverables**:
+- API request construction with proper payload format
+- Base64 image embedding in the request body
+- Proper error handling for network issues and API responses
+- Response parsing and extraction of the AI-generated description
+- API key management (client-side with security notes)
+
+**Acceptance Criteria**:
+- [ ] API request uses correct endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2`
+- [ ] Request payload includes model: "MiniMax-M2"
+- [ ] Base64 image is properly formatted in the messages array
+- [ ] Prompt "Please provide a detailed description of this image in English" is included
+- [ ] Successful API response extracts the description text
+- [ ] API errors are handled gracefully with user-friendly messages
+- [ ] Network timeouts are handled with appropriate error messaging
+- [ ] Loading state is shown during API calls
+
+## Task 6: Create Loading States and User Feedback System
+**Objective**: Implement comprehensive loading and feedback mechanisms to enhance user experience
+**Deliverables**:
+- Loading spinner and progress indicator during API calls
+- Status messages for different processing stages
+- Button state management (disabled during processing)
+- Timeout handling with user notification
+- Success and error state animations
+
+**Acceptance Criteria**:
+- [ ] Loading spinner appears immediately when "Generate Description" is clicked
+- [ ] "Generate Description" button is disabled during processing to prevent duplicate requests
+- [ ] Status message shows: "Analyzing image with AI..."
+- [ ] Processing timeout (30 seconds) shows: "Processing is taking longer than expected"
+- [ ] Success animation plays when description is generated
+- [ ] Error state shows appropriate error message with red styling
+- [ ] Loading states have smooth transitions and professional appearance
+
+## Task 7: Display AI Description Results with Formatting
+**Objective**: Present the AI-generated description in a clean, readable format with additional functionality
+**Deliverables**:
+- Results display area with proper typography and spacing
+- Text formatting and paragraph handling for long descriptions
+- Option to copy description to clipboard
+- "Generate New Description" functionality for the same image
+- "Upload New Image" reset functionality
+- Responsive results layout
+
+**Acceptance Criteria**:
+- [ ] Description displays in a readable format with proper line breaks
+- [ ] Long descriptions are scrollable if they exceed viewport
+- [ ] "Copy to Clipboard" button works and shows confirmation
+- [ ] "Generate New Description" button triggers new API call
+- [ ] "Upload New Image" button clears all data and returns to upload state
+- [ ] Results area is visually distinct from upload area
+- [ ] Typography is large enough to read comfortably on mobile devices
+
+## Task 8: Final Polish, Testing, and Production Readiness
+**Objective**: Complete final testing, optimizations, and prepare for immediate browser deployment
+**Deliverables**:
+- Comprehensive error handling for all edge cases
+- Performance optimization for large images and slow networks
+- Cross-browser compatibility testing
+- Accessibility improvements (ARIA labels, keyboard navigation)
+- Final code organization and documentation
+- Single HTML file consolidation with inline CSS and JavaScript
+
+**Acceptance Criteria**:
+- [ ] Application works in Chrome, Firefox, Safari, and Edge
+- [ ] All functionality works on mobile devices (iOS and Android)
+- [ ] Images up to 5MB process within 30 seconds on average connections
+- [ ] Clear error messages for all failure scenarios (network, API, validation)
+- [ ] Complete keyboard navigation support
+- [ ] ARIA labels and semantic HTML for screen readers
+- [ ] No console errors or warnings
+- [ ] Single HTML file contains all code and loads immediately in any modern browser
+- [ ] Professional appearance with smooth animations and transitions
+- [ ] File size under 50KB for fast loading
+
+## Implementation Notes
+
+### Task Dependencies
+- Tasks 1-2 can be implemented in parallel
+- Task 3 depends on Task 2 (validation needs upload interface)
+- Task 4 depends on Task 3 (preview needs validation)
+- Task 5 depends on Task 4 (API needs base64 image)
+- Task 6 depends on Task 5 (loading states during API calls)
+- Task 7 depends on Task 6 (results display after processing)
+- Task 8 depends on all previous tasks (final testing and polish)
+
+### Technical Considerations
+- Use vanilla JavaScript ES6+ features for modern browser support
+- Implement CSS custom properties for maintainable theming
+- Follow progressive enhancement principles
+- Maintain separation of concerns within the single file
+- Include comprehensive error boundaries
+
+### Testing Approach
+- Test with various image sizes and formats
+- Test error scenarios (large files, invalid types, network issues)
+- Verify responsive behavior across devices
+- Validate accessibility with screen readers
+- Performance test with slow network connections
--- a/prompt.md
+++ b/prompt.md
@ -0,0 +1,43 @@
+Create a clean, modern web application that allows users to upload an image and get an AI-generated description using the Minimax API.
+
+**Requirements:**
+
+1. **Frontend Interface:**
+   - Single page application with a clean, centered layout
+   - File upload area (drag-and-drop support + click to browse)
+   - Image preview after upload
+   - "Generate Description" button
+   - Loading state while processing
+   - Display area for the AI-generated description
+   - Option to upload a new image after getting results
+
+2. **Technical Implementation:**
+   - Use vanilla JavaScript, HTML, and CSS (or React if you prefer)
+   - Handle image file validation (accept common formats: jpg, png, webp)
+   - Convert uploaded image to base64 for API submission
+   - Make POST request to Minimax API endpoint
+   - Handle API responses and errors gracefully
+   - Display clear error messages if something goes wrong
+
+3. **Minimax API Integration:**
+   - Endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2`
+   - Use model: `MiniMax-M2` 
+   - Send image as base64 in the messages array
+   - Prompt: "Please provide a detailed description of this image in English"
+   - Handle API key securely (note: for production, this should be handled server-side)
+
+4. **UI/UX Details:**
+   - Responsive design that works on mobile and desktop
+   - Professional color scheme (suggest modern blues/grays)
+   - Smooth transitions and loading animations
+   - Clear visual feedback for all user actions
+
+5. **Error Handling:**
+   - File size validation (max 5MB recommended)
+   - File type validation
+   - API error handling with user-friendly messages
+   - Network error handling
+
+Please create a complete, production-ready single HTML file with inline CSS and JavaScript that I can immediately use in a browser.
+
+