diff --git a/.gitignore b/.gitignore index 4c49bd7..03a21e9 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,3 @@ .env +rate +hwserver diff --git a/.history/README_20250115141957.md b/.history/README_20250115141957.md new file mode 100644 index 0000000..7000334 --- /dev/null +++ b/.history/README_20250115141957.md @@ -0,0 +1,44 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` \ No newline at end of file diff --git a/.history/README_20250115142949.md b/.history/README_20250115142949.md new file mode 100644 index 0000000..fd40d78 --- /dev/null +++ b/.history/README_20250115142949.md @@ -0,0 +1,79 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: \ No newline at end of file diff --git a/.history/README_20250115143018.md b/.history/README_20250115143018.md new file mode 100644 index 0000000..442c0f2 --- /dev/null +++ b/.history/README_20250115143018.md @@ -0,0 +1,85 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: +ash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' \ No newline at end of file diff --git a/.history/README_20250115143037.md b/.history/README_20250115143037.md new file mode 100644 index 0000000..86892a9 --- /dev/null +++ b/.history/README_20250115143037.md @@ -0,0 +1,87 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` diff --git a/.history/README_20250115143045.md b/.history/README_20250115143045.md new file mode 100644 index 0000000..86892a9 --- /dev/null +++ b/.history/README_20250115143045.md @@ -0,0 +1,87 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` diff --git a/.history/README_20250115143103.md b/.history/README_20250115143103.md new file mode 100644 index 0000000..514acce --- /dev/null +++ b/.history/README_20250115143103.md @@ -0,0 +1,88 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: diff --git a/.history/README_20250115143114.md b/.history/README_20250115143114.md new file mode 100644 index 0000000..3ed4331 --- /dev/null +++ b/.history/README_20250115143114.md @@ -0,0 +1,92 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +json +{ +"image_urls": [ +"https://your-domain/image1.jpg", diff --git a/.history/README_20250115143127.md b/.history/README_20250115143127.md new file mode 100644 index 0000000..3263eb9 --- /dev/null +++ b/.history/README_20250115143127.md @@ -0,0 +1,97 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} \ No newline at end of file diff --git a/.history/README_20250115143139.md b/.history/README_20250115143139.md new file mode 100644 index 0000000..94bfac9 --- /dev/null +++ b/.history/README_20250115143139.md @@ -0,0 +1,98 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` diff --git a/.history/README_20250115143149.md b/.history/README_20250115143149.md new file mode 100644 index 0000000..3ca09d2 --- /dev/null +++ b/.history/README_20250115143149.md @@ -0,0 +1,105 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: \ No newline at end of file diff --git a/.history/README_20250115143209.md b/.history/README_20250115143209.md new file mode 100644 index 0000000..04cc9cf --- /dev/null +++ b/.history/README_20250115143209.md @@ -0,0 +1,106 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json diff --git a/.history/README_20250115143224.md b/.history/README_20250115143224.md new file mode 100644 index 0000000..2404539 --- /dev/null +++ b/.history/README_20250115143224.md @@ -0,0 +1,113 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` diff --git a/.history/README_20250115143237.md b/.history/README_20250115143237.md new file mode 100644 index 0000000..ccaa964 --- /dev/null +++ b/.history/README_20250115143237.md @@ -0,0 +1,114 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` +**响应格式**: diff --git a/.history/README_20250115143250.md b/.history/README_20250115143250.md new file mode 100644 index 0000000..e244416 --- /dev/null +++ b/.history/README_20250115143250.md @@ -0,0 +1,121 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` +**响应格式**: +```json +{ +"original_text": "原始识别文本", +"result": "处理后的文本", +"success": true +} +``` \ No newline at end of file diff --git a/.history/README_20250115143303.md b/.history/README_20250115143303.md new file mode 100644 index 0000000..d396b6a --- /dev/null +++ b/.history/README_20250115143303.md @@ -0,0 +1,160 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` +**响应格式**: +```json +{ +"original_text": "原始识别文本", +"result": "处理后的文本", +"success": true +} +``` + +## 错误码说明 + +| HTTP状态码 | 错误描述 | 可能原因 | +|------------|----------|----------| +| 400 | Invalid request format | 请求格式错误 | +| 400 | No files uploaded | 未上传文件 | +| 400 | Maximum 5 files allowed | 超过最大文件数限制 | +| 400 | File size exceeds the limit of 10MB | 文件大小超限 | +| 400 | Invalid file type | 不支持的文件类型 | +| 401 | Invalid API key | API密钥无效 | +| 500 | OCR processing failed | OCR处理失败 | +| 500 | Text processing failed | 文本处理失败 | + +## 注意事项 + +1. 图片上传建议: + - 确保图片清晰可读 + - 控制图片大小在10MB以内 + - 使用支持的图片格式 + +2. OCR识别建议: + - 对于多图片场景,系统会自动整理文本逻辑 + - 单图片场景直接返回识别结果 + +3. API调用限制: + - 需要正确的API密钥 + - 建议控制并发请求数量 + +## 部署要求 + +- Go 1.16+ +- 配置文件中需要设置: + - Tencent Cloud OCR配置 + - Cloudflare R2存储配置 + - Gemini API配置 + - API密钥 + +## 环境变量配置 diff --git a/.history/README_20250115143315.md b/.history/README_20250115143315.md new file mode 100644 index 0000000..be2f5ee --- /dev/null +++ b/.history/README_20250115143315.md @@ -0,0 +1,171 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` +**响应格式**: +```json +{ +"original_text": "原始识别文本", +"result": "处理后的文本", +"success": true +} +``` + +## 错误码说明 + +| HTTP状态码 | 错误描述 | 可能原因 | +|------------|----------|----------| +| 400 | Invalid request format | 请求格式错误 | +| 400 | No files uploaded | 未上传文件 | +| 400 | Maximum 5 files allowed | 超过最大文件数限制 | +| 400 | File size exceeds the limit of 10MB | 文件大小超限 | +| 400 | Invalid file type | 不支持的文件类型 | +| 401 | Invalid API key | API密钥无效 | +| 500 | OCR processing failed | OCR处理失败 | +| 500 | Text processing failed | 文本处理失败 | + +## 注意事项 + +1. 图片上传建议: + - 确保图片清晰可读 + - 控制图片大小在10MB以内 + - 使用支持的图片格式 + +2. OCR识别建议: + - 对于多图片场景,系统会自动整理文本逻辑 + - 单图片场景直接返回识别结果 + +3. API调用限制: + - 需要正确的API密钥 + - 建议控制并发请求数量 + +## 部署要求 + +- Go 1.16+ +- 配置文件中需要设置: + - Tencent Cloud OCR配置 + - Cloudflare R2存储配置 + - Gemini API配置 + - API密钥 + +## 环境变量配置 +```env +TENCENT_SECRET_ID=your_secret_id +TENCENT_SECRET_KEY=your_secret_key +GEMINI_API_KEY=your_gemini_api_key +API_KEY=your_api_key +R2_ACCESS_KEY=your_r2_access_key +R2_SECRET_KEY=your_r2_secret_key +R2_BUCKET=your_bucket_name +R2_ENDPOINT=your_r2_endpoint +R2_CUSTOM_DOMAIN=your_custom_domain +``` diff --git a/.history/README_20250115143318.md b/.history/README_20250115143318.md new file mode 100644 index 0000000..be2f5ee --- /dev/null +++ b/.history/README_20250115143318.md @@ -0,0 +1,171 @@ +# 腾讯手写识别接口转接 + +1. 输入图片的BASE64,返回识别结果 + +2. 使用JSON POST传输,返回JSON,符合restful风格 +3. 入参: + - 图片的BASE64,string + - Scene:场景,默认是null,可选only_hw,string + - apikey: 测试期间,设置为固定值:1234567890,string +4. 出参: + - 识别结果,string + - 成功与否,boolean + +6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发; +7. 流程: + - 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等); + - 调用腾讯通用手写体识别OCR SDK进行图像识别; + - 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt: + ``` + 你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。 + ``` + - 返回识别结果。 + +8. google gemini的api key:"your key" +9. tencentSecretId = "your id",tencentSecretKey = "your secret" + +10. key存储在.env文件中,使用dotenv库进行加载。 +11. 增加rate功能,批改作文 +``` +项目结构 +``` +tencenthw/ +├── go.mod +├── go.sum +├── cmd/ +│ └── server/ +│ └── main.go +└── pkg/ + ├── config/ + │ └── config.go + └── handler/ + └── ocr.go + └── rate.go +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` +**响应格式**: +```json +{ +"original_text": "原始识别文本", +"result": "处理后的文本", +"success": true +} +``` + +## 错误码说明 + +| HTTP状态码 | 错误描述 | 可能原因 | +|------------|----------|----------| +| 400 | Invalid request format | 请求格式错误 | +| 400 | No files uploaded | 未上传文件 | +| 400 | Maximum 5 files allowed | 超过最大文件数限制 | +| 400 | File size exceeds the limit of 10MB | 文件大小超限 | +| 400 | Invalid file type | 不支持的文件类型 | +| 401 | Invalid API key | API密钥无效 | +| 500 | OCR processing failed | OCR处理失败 | +| 500 | Text processing failed | 文本处理失败 | + +## 注意事项 + +1. 图片上传建议: + - 确保图片清晰可读 + - 控制图片大小在10MB以内 + - 使用支持的图片格式 + +2. OCR识别建议: + - 对于多图片场景,系统会自动整理文本逻辑 + - 单图片场景直接返回识别结果 + +3. API调用限制: + - 需要正确的API密钥 + - 建议控制并发请求数量 + +## 部署要求 + +- Go 1.16+ +- 配置文件中需要设置: + - Tencent Cloud OCR配置 + - Cloudflare R2存储配置 + - Gemini API配置 + - API密钥 + +## 环境变量配置 +```env +TENCENT_SECRET_ID=your_secret_id +TENCENT_SECRET_KEY=your_secret_key +GEMINI_API_KEY=your_gemini_api_key +API_KEY=your_api_key +R2_ACCESS_KEY=your_r2_access_key +R2_SECRET_KEY=your_r2_secret_key +R2_BUCKET=your_bucket_name +R2_ENDPOINT=your_r2_endpoint +R2_CUSTOM_DOMAIN=your_custom_domain +``` diff --git a/.history/cmd/main_20250115142528.go b/.history/cmd/main_20250115142528.go new file mode 100644 index 0000000..0519ecb --- /dev/null +++ b/.history/cmd/main_20250115142528.go @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/.history/cmd/main_20250115142536.go b/.history/cmd/main_20250115142536.go new file mode 100644 index 0000000..e09b2f0 --- /dev/null +++ b/.history/cmd/main_20250115142536.go @@ -0,0 +1,25 @@ +// Initialize services +geminiService, err := service.NewGeminiService(config.GeminiAPIKey) +if err != nil { + log.Fatal(err) +} +defer geminiService.Close() + +ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, +) + +uploadHandler := handler.NewUploadHandler( + config.AccessKey, + config.SecretKey, + config.Bucket, + config.Endpoint, + config.CustomDomain, + ocrService, + geminiService, +) + +// Setup routes +router.POST("/upload", uploadHandler.HandleMultiUpload) \ No newline at end of file diff --git a/.history/cmd/main_20250115142615.go b/.history/cmd/main_20250115142615.go new file mode 100644 index 0000000..e09b2f0 --- /dev/null +++ b/.history/cmd/main_20250115142615.go @@ -0,0 +1,25 @@ +// Initialize services +geminiService, err := service.NewGeminiService(config.GeminiAPIKey) +if err != nil { + log.Fatal(err) +} +defer geminiService.Close() + +ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, +) + +uploadHandler := handler.NewUploadHandler( + config.AccessKey, + config.SecretKey, + config.Bucket, + config.Endpoint, + config.CustomDomain, + ocrService, + geminiService, +) + +// Setup routes +router.POST("/upload", uploadHandler.HandleMultiUpload) \ No newline at end of file diff --git a/.history/cmd/main_20250115155220.go b/.history/cmd/main_20250115155220.go new file mode 100644 index 0000000..077af79 --- /dev/null +++ b/.history/cmd/main_20250115155220.go @@ -0,0 +1,77 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} +// Initialize services +geminiService, err := service.NewGeminiService(config.GeminiAPIKey) +if err != nil { + log.Fatal(err) +} +defer geminiService.Close() + +ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, +) + +uploadHandler := handler.NewUploadHandler( + config.AccessKey, + config.SecretKey, + config.Bucket, + config.Endpoint, + config.CustomDomain, + ocrService, + geminiService, +) + +// Setup routes +router.POST("/upload", uploadHandler.HandleMultiUpload) \ No newline at end of file diff --git a/.history/cmd/main_20250115155312.go b/.history/cmd/main_20250115155312.go new file mode 100644 index 0000000..aec9d04 --- /dev/null +++ b/.history/cmd/main_20250115155312.go @@ -0,0 +1,81 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(config.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} +// Initialize services +geminiService, err := service.NewGeminiService(config.GeminiAPIKey) +if err != nil { + log.Fatal(err) +} +defer geminiService.Close() + +ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, +) + +uploadHandler := handler.NewUploadHandler( + config.AccessKey, + config.SecretKey, + config.Bucket, + config.Endpoint, + config.CustomDomain, + ocrService, + geminiService, +) + +// Setup routes +router.POST("/upload", uploadHandler.HandleMultiUpload) \ No newline at end of file diff --git a/.history/cmd/main_20250115155347.go b/.history/cmd/main_20250115155347.go new file mode 100644 index 0000000..7a9a75c --- /dev/null +++ b/.history/cmd/main_20250115155347.go @@ -0,0 +1,89 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(config.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + defer geminiService.Close() + + ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, + ) + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} +// Initialize services +geminiService, err := service.NewGeminiService(config.GeminiAPIKey) +if err != nil { + log.Fatal(err) +} +defer geminiService.Close() + +ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, +) + +uploadHandler := handler.NewUploadHandler( + config.AccessKey, + config.SecretKey, + config.Bucket, + config.Endpoint, + config.CustomDomain, + ocrService, + geminiService, +) + +// Setup routes +router.POST("/upload", uploadHandler.HandleMultiUpload) \ No newline at end of file diff --git a/.history/cmd/main_20250115155425.go b/.history/cmd/main_20250115155425.go new file mode 100644 index 0000000..7205004 --- /dev/null +++ b/.history/cmd/main_20250115155425.go @@ -0,0 +1,91 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(config.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + defer geminiService.Close() + + ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, + ) + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ocrService, + geminiService, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} +// Initialize services +geminiService, err := service.NewGeminiService(config.GeminiAPIKey) +if err != nil { + log.Fatal(err) +} +defer geminiService.Close() + +ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, +) + +uploadHandler := handler.NewUploadHandler( + config.AccessKey, + config.SecretKey, + config.Bucket, + config.Endpoint, + config.CustomDomain, + ocrService, + geminiService, +) + +// Setup routes +router.POST("/upload", uploadHandler.HandleMultiUpload) \ No newline at end of file diff --git a/.history/cmd/main_20250115155453.go b/.history/cmd/main_20250115155453.go new file mode 100644 index 0000000..b25350a --- /dev/null +++ b/.history/cmd/main_20250115155453.go @@ -0,0 +1,66 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(config.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + defer geminiService.Close() + + ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, + ) + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ocrService, + geminiService, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} \ No newline at end of file diff --git a/.history/cmd/main_20250115155512.go b/.history/cmd/main_20250115155512.go new file mode 100644 index 0000000..5ebf673 --- /dev/null +++ b/.history/cmd/main_20250115155512.go @@ -0,0 +1,66 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(cfg.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + defer geminiService.Close() + + ocrService := handler.NewOCRService( + config.TencentSecretID, + config.TencentSecretKey, + geminiService, + ) + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ocrService, + geminiService, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} \ No newline at end of file diff --git a/.history/cmd/main_20250115155521.go b/.history/cmd/main_20250115155521.go new file mode 100644 index 0000000..5460f36 --- /dev/null +++ b/.history/cmd/main_20250115155521.go @@ -0,0 +1,66 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(cfg.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + defer geminiService.Close() + + ocrService := handler.NewOCRService( + cfg.TencentSecretID, + cfg.TencentSecretKey, + geminiService, + ) + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ocrService, + geminiService, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} \ No newline at end of file diff --git a/.history/pkg/handler/ocr_20250115141957.go b/.history/pkg/handler/ocr_20250115141957.go new file mode 100644 index 0000000..bbf9be0 --- /dev/null +++ b/.history/pkg/handler/ocr_20250115141957.go @@ -0,0 +1,166 @@ +package handler + +import ( + "encoding/base64" + "net/http" + "strings" + + "github.com/gin-gonic/gin" + "github.com/google/generative-ai-go/genai" + "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common" + "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile" + ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119" + "google.golang.org/api/option" +) + +type OCRHandler struct { + tencentSecretID string + tencentSecretKey string + geminiAPIKey string + apiKey string +} + +type OCRRequest struct { + ImageBase64 string `json:"image_base64"` + ImageURL string `json:"image_url"` + Scene string `json:"scene"` + APIKey string `json:"apikey" binding:"required"` +} + +type OCRResponse struct { + OriginalText string `json:"original_text"` + Result string `json:"result"` + Success bool `json:"success"` +} + +func NewOCRHandler(tencentSecretID, tencentSecretKey, geminiAPIKey, apiKey string) *OCRHandler { + return &OCRHandler{ + tencentSecretID: tencentSecretID, + tencentSecretKey: tencentSecretKey, + geminiAPIKey: geminiAPIKey, + apiKey: apiKey, + } +} + +func (h *OCRHandler) HandleOCR(c *gin.Context) { + var req OCRRequest + if err := c.ShouldBindJSON(&req); err != nil { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Invalid request format", + }) + return + } + + // Validate API key + if req.APIKey != h.apiKey { + c.JSON(http.StatusUnauthorized, OCRResponse{ + Success: false, + Result: "Invalid API key", + }) + return + } + + // Validate that at least one of ImageURL or ImageBase64 is provided + if req.ImageURL == "" && req.ImageBase64 == "" { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Either image_url or image_base64 must be provided", + }) + return + } + + // Initialize Tencent Cloud client + credential := common.NewCredential(h.tencentSecretID, h.tencentSecretKey) + cpf := profile.NewClientProfile() + cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com" + client, err := ocr.NewClient(credential, "", cpf) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "Failed to initialize OCR client", + }) + return + } + + // Create OCR request + request := ocr.NewGeneralHandwritingOCRRequest() + + // Prioritize ImageURL if both are provided + if req.ImageURL != "" { + request.ImageUrl = common.StringPtr(req.ImageURL) + } else { + // Remove base64 prefix if exists + imageBase64 := req.ImageBase64 + if idx := strings.Index(imageBase64, "base64,"); idx != -1 { + imageBase64 = imageBase64[idx+7:] // 7 is the length of "base64," + } + + // Validate base64 + if _, err := base64.StdEncoding.DecodeString(imageBase64); err != nil { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Invalid base64 image", + }) + return + } + request.ImageBase64 = common.StringPtr(imageBase64) + } + + if req.Scene != "" { + request.Scene = common.StringPtr(req.Scene) + } + + // Perform OCR + response, err := client.GeneralHandwritingOCR(request) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "OCR processing failed", + }) + return + } + + // Extract text from OCR response + var ocrText string + for _, textDetection := range response.Response.TextDetections { + ocrText += *textDetection.DetectedText + "\n" + } + + // Process with Gemini + ctx := c.Request.Context() + client2, err := genai.NewClient(ctx, option.WithAPIKey(h.geminiAPIKey)) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "Failed to initialize Gemini client", + }) + return + } + defer client2.Close() + + model := client2.GenerativeModel("gemini-2.0-flash-exp") + prompt := "你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。\n\n" + ocrText + resp, err := model.GenerateContent(ctx, genai.Text(prompt)) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "Text processing failed", + }) + return + } + + // Get the processed text from Gemini response + processedText := "" + if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 { + if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok { + processedText = string(textPart) + } + } + + c.JSON(http.StatusOK, OCRResponse{ + Success: true, + OriginalText: ocrText, + Result: processedText, + }) +} \ No newline at end of file diff --git a/.history/pkg/handler/ocr_20250115142525.go b/.history/pkg/handler/ocr_20250115142525.go new file mode 100644 index 0000000..9f2d252 --- /dev/null +++ b/.history/pkg/handler/ocr_20250115142525.go @@ -0,0 +1,127 @@ +package handler + +import ( + "context" + "encoding/base64" + "net/http" + "strings" + + "github.com/gin-gonic/gin" + "github.com/google/generative-ai-go/genai" + "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common" + "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile" + ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119" + "google.golang.org/api/option" + "pkg/service" +) + +type OCRService struct { + tencentSecretID string + tencentSecretKey string + geminiService *service.GeminiService +} + +func NewOCRService(tencentSecretID, tencentSecretKey string, geminiService *service.GeminiService) *OCRService { + return &OCRService{ + tencentSecretID: tencentSecretID, + tencentSecretKey: tencentSecretKey, + geminiService: geminiService, + } +} + +func (s *OCRService) ProcessImage(ctx context.Context, imageBase64 string) (string, error) { + // Initialize Tencent Cloud client + credential := common.NewCredential(s.tencentSecretID, s.tencentSecretKey) + cpf := profile.NewClientProfile() + cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com" + client, err := ocr.NewClient(credential, "", cpf) + if err != nil { + return "", err + } + + // Create OCR request + request := ocr.NewGeneralHandwritingOCRRequest() + request.ImageBase64 = common.StringPtr(imageBase64) + + // Perform OCR + response, err := client.GeneralHandwritingOCR(request) + if err != nil { + return "", err + } + + // Extract text from OCR response + var ocrText string + for _, textDetection := range response.Response.TextDetections { + ocrText += *textDetection.DetectedText + "\n" + } + + return ocrText, nil +} + +type OCRRequest struct { + ImageBase64 string `json:"image_base64"` + ImageURL string `json:"image_url"` + Scene string `json:"scene"` + APIKey string `json:"apikey" binding:"required"` +} + +type OCRResponse struct { + OriginalText string `json:"original_text"` + Result string `json:"result"` + Success bool `json:"success"` +} + +func (h *OCRService) HandleOCR(c *gin.Context) { + var req OCRRequest + if err := c.ShouldBindJSON(&req); err != nil { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Invalid request format", + }) + return + } + + // Validate API key + if req.APIKey != h.geminiService.APIKey { + c.JSON(http.StatusUnauthorized, OCRResponse{ + Success: false, + Result: "Invalid API key", + }) + return + } + + // Validate that at least one of ImageURL or ImageBase64 is provided + if req.ImageURL == "" && req.ImageBase64 == "" { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Either image_url or image_base64 must be provided", + }) + return + } + + // Process image + ocrText, err := h.ProcessImage(c.Request.Context(), req.ImageBase64) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "OCR processing failed", + }) + return + } + + // Process with Gemini + processedText, err := h.geminiService.ProcessText(c.Request.Context(), ocrText) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "Text processing failed", + }) + return + } + + c.JSON(http.StatusOK, OCRResponse{ + Success: true, + OriginalText: ocrText, + Result: processedText, + }) +} \ No newline at end of file diff --git a/.history/pkg/handler/ocr_20250115142558.go b/.history/pkg/handler/ocr_20250115142558.go new file mode 100644 index 0000000..9f2d252 --- /dev/null +++ b/.history/pkg/handler/ocr_20250115142558.go @@ -0,0 +1,127 @@ +package handler + +import ( + "context" + "encoding/base64" + "net/http" + "strings" + + "github.com/gin-gonic/gin" + "github.com/google/generative-ai-go/genai" + "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common" + "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile" + ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119" + "google.golang.org/api/option" + "pkg/service" +) + +type OCRService struct { + tencentSecretID string + tencentSecretKey string + geminiService *service.GeminiService +} + +func NewOCRService(tencentSecretID, tencentSecretKey string, geminiService *service.GeminiService) *OCRService { + return &OCRService{ + tencentSecretID: tencentSecretID, + tencentSecretKey: tencentSecretKey, + geminiService: geminiService, + } +} + +func (s *OCRService) ProcessImage(ctx context.Context, imageBase64 string) (string, error) { + // Initialize Tencent Cloud client + credential := common.NewCredential(s.tencentSecretID, s.tencentSecretKey) + cpf := profile.NewClientProfile() + cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com" + client, err := ocr.NewClient(credential, "", cpf) + if err != nil { + return "", err + } + + // Create OCR request + request := ocr.NewGeneralHandwritingOCRRequest() + request.ImageBase64 = common.StringPtr(imageBase64) + + // Perform OCR + response, err := client.GeneralHandwritingOCR(request) + if err != nil { + return "", err + } + + // Extract text from OCR response + var ocrText string + for _, textDetection := range response.Response.TextDetections { + ocrText += *textDetection.DetectedText + "\n" + } + + return ocrText, nil +} + +type OCRRequest struct { + ImageBase64 string `json:"image_base64"` + ImageURL string `json:"image_url"` + Scene string `json:"scene"` + APIKey string `json:"apikey" binding:"required"` +} + +type OCRResponse struct { + OriginalText string `json:"original_text"` + Result string `json:"result"` + Success bool `json:"success"` +} + +func (h *OCRService) HandleOCR(c *gin.Context) { + var req OCRRequest + if err := c.ShouldBindJSON(&req); err != nil { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Invalid request format", + }) + return + } + + // Validate API key + if req.APIKey != h.geminiService.APIKey { + c.JSON(http.StatusUnauthorized, OCRResponse{ + Success: false, + Result: "Invalid API key", + }) + return + } + + // Validate that at least one of ImageURL or ImageBase64 is provided + if req.ImageURL == "" && req.ImageBase64 == "" { + c.JSON(http.StatusBadRequest, OCRResponse{ + Success: false, + Result: "Either image_url or image_base64 must be provided", + }) + return + } + + // Process image + ocrText, err := h.ProcessImage(c.Request.Context(), req.ImageBase64) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "OCR processing failed", + }) + return + } + + // Process with Gemini + processedText, err := h.geminiService.ProcessText(c.Request.Context(), ocrText) + if err != nil { + c.JSON(http.StatusInternalServerError, OCRResponse{ + Success: false, + Result: "Text processing failed", + }) + return + } + + c.JSON(http.StatusOK, OCRResponse{ + Success: true, + OriginalText: ocrText, + Result: processedText, + }) +} \ No newline at end of file diff --git a/.history/pkg/handler/upload_20250115141957.go b/.history/pkg/handler/upload_20250115141957.go new file mode 100644 index 0000000..7f6ac4e --- /dev/null +++ b/.history/pkg/handler/upload_20250115141957.go @@ -0,0 +1,130 @@ +// 上传文件到cloudflare R2 +package handler +import ( + "bytes" + "fmt" + "net/http" + "github.com/gin-gonic/gin" + "github.com/aws/aws-sdk-go/aws" + "github.com/aws/aws-sdk-go/aws/credentials" + "github.com/aws/aws-sdk-go/aws/session" + "github.com/aws/aws-sdk-go/service/s3" +) + +type UploadHandler struct { + accessKey string + secretKey string + bucket string + endpoint string + customDomain string +} + +type UploadRequest struct { + File string `json:"file" binding:"required"` + APIKey string `json:"apikey" binding:"required"` +} + +type UploadResponse struct { + ImageURL string `json:"image_url"` + Success bool `json:"success"` +} + +func NewUploadHandler(accessKey, secretKey, bucket, endpoint, customDomain string) *UploadHandler { + return &UploadHandler{ + accessKey: accessKey, + secretKey: secretKey, + bucket: bucket, + endpoint: endpoint, + customDomain: customDomain, + } +} +// 上传文件到cloudflare R2。判断文件是否是图片,如果是图片,则上传到R2,并返回图片的url,如果不是图片,则返回错误。 +// 图片大小限制为10M,图片格式为jpg, jpeg, png, gif, bmp, tiff, webp +// HandleUpload 上传文件到Cloudflare R2 +func (h *UploadHandler) HandleUpload(c *gin.Context) { + // 解析请求体 + file, header, err := c.Request.FormFile("file") + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to read file from request"}) + return + } + defer file.Close() + + // 读取文件内容 + fileBuffer := make([]byte, header.Size) + _, err = file.Read(fileBuffer) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file content"}) + return + } + + // 验证文件类型 + contentType := http.DetectContentType(fileBuffer) + if !isImage(contentType) { + c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"}) + return + } + + // 验证文件大小 + if header.Size > 10<<20 { // 10MB + c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"}) + return + } + + // 上传文件到R2 + imageURL, err := h.uploadToR2(fileBuffer, header.Filename, contentType) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("Failed to upload file to R2: %v", err)}) + return + } + + // 返回结果 + response := UploadResponse{ + ImageURL: imageURL, + Success: true, + } + c.JSON(http.StatusOK, response) +} + +// uploadToR2 上传文件到Cloudflare R2 +func (h *UploadHandler) uploadToR2(file []byte, fileName, contentType string) (string, error) { + // 创建S3会话 + sess, err := session.NewSession(&aws.Config{ + Endpoint: aws.String(h.endpoint), + Region: aws.String("auto"), + Credentials: credentials.NewStaticCredentials(h.accessKey, h.secretKey, ""), + }) + if err != nil { + return "", fmt.Errorf("failed to create S3 session: %v", err) + } + + // 创建S3服务客户端 + svc := s3.New(sess) + + // 上传文件到R2 + _, err = svc.PutObject(&s3.PutObjectInput{ + Bucket: aws.String(h.bucket), + Key: aws.String(fileName), + Body: bytes.NewReader(file), + ContentType: aws.String(contentType), + ACL: aws.String("public-read"), // 设置文件为公开可读 + }) + if err != nil { + return "", fmt.Errorf("failed to upload file to R2: %v", err) + } + + // 生成文件的URL + imageURL := fmt.Sprintf("https://%s/%s", h.customDomain, fileName) + return imageURL, nil +} + +// isImage 检查文件是否是图片 +func isImage(contentType string) bool { + allowedTypes := []string{"image/jpeg", "image/png", "image/gif", "image/bmp", "image/tiff", "image/webp"} + for _, t := range allowedTypes { + if contentType == t { + return true + } + } + return false +} \ No newline at end of file diff --git a/.history/pkg/handler/upload_20250115142533.go b/.history/pkg/handler/upload_20250115142533.go new file mode 100644 index 0000000..630add1 --- /dev/null +++ b/.history/pkg/handler/upload_20250115142533.go @@ -0,0 +1,162 @@ +// 上传文件到cloudflare R2 +package handler +import ( + "bytes" + "fmt" + "net/http" + "github.com/gin-gonic/gin" + "github.com/aws/aws-sdk-go/aws" + "github.com/aws/aws-sdk-go/aws/credentials" + "github.com/aws/aws-sdk-go/aws/session" + "github.com/aws/aws-sdk-go/service/s3" + "encoding/base64" + "io" + "strings" + "your-project/pkg/service" +) + +type UploadHandler struct { + accessKey string + secretKey string + bucket string + endpoint string + customDomain string + ocrService *OCRService + geminiService *service.GeminiService +} + +type MultiUploadResponse struct { + ImageURLs []string `json:"image_urls"` + Text string `json:"text"` + Success bool `json:"success"` +} + +func (h *UploadHandler) HandleMultiUpload(c *gin.Context) { + form, err := c.MultipartForm() + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to parse form"}) + return + } + + files := form.File["files"] + if len(files) == 0 { + c.JSON(http.StatusBadRequest, gin.H{"error": "No files uploaded"}) + return + } + + if len(files) > 5 { + c.JSON(http.StatusBadRequest, gin.H{"error": "Maximum 5 files allowed"}) + return + } + + var imageURLs []string + var ocrTexts []string + + for _, fileHeader := range files { + if fileHeader.Size > 10<<20 { // 10MB + c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"}) + return + } + + file, err := fileHeader.Open() + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to open file"}) + return + } + defer file.Close() + + // Read file content + fileBytes, err := io.ReadAll(file) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file"}) + return + } + + // Verify file type + contentType := http.DetectContentType(fileBytes) + if !isImage(contentType) { + c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"}) + return + } + + // Convert to base64 + base64Str := base64.StdEncoding.EncodeToString(fileBytes) + + // Process OCR + ocrText, err := h.ocrService.ProcessImage(c.Request.Context(), base64Str) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "OCR processing failed"}) + return + } + ocrTexts = append(ocrTexts, ocrText) + + // Upload to R2 + imageURL, err := h.uploadToR2(fileBytes, fileHeader.Filename, contentType) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to upload file"}) + return + } + imageURLs = append(imageURLs, imageURL) + } + + // Process combined text with Gemini if multiple images + finalText := strings.Join(ocrTexts, "\n") + if len(ocrTexts) > 1 { + prompt := "请将以下多段文字重新组织成一段通顺的文字,保持原意的同时确保语法和逻辑正确:\n\n" + finalText + processedText, err := h.geminiService.ProcessText(c.Request.Context(), prompt) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Text processing failed"}) + return + } + finalText = processedText + } + + c.JSON(http.StatusOK, MultiUploadResponse{ + ImageURLs: imageURLs, + Text: finalText, + Success: true, + }) +} + +// uploadToR2 上传文件到Cloudflare R2 +func (h *UploadHandler) uploadToR2(file []byte, fileName, contentType string) (string, error) { + // 创建S3会话 + sess, err := session.NewSession(&aws.Config{ + Endpoint: aws.String(h.endpoint), + Region: aws.String("auto"), + Credentials: credentials.NewStaticCredentials(h.accessKey, h.secretKey, ""), + }) + if err != nil { + return "", fmt.Errorf("failed to create S3 session: %v", err) + } + + // 创建S3服务客户端 + svc := s3.New(sess) + + // 上传文件到R2 + _, err = svc.PutObject(&s3.PutObjectInput{ + Bucket: aws.String(h.bucket), + Key: aws.String(fileName), + Body: bytes.NewReader(file), + ContentType: aws.String(contentType), + ACL: aws.String("public-read"), // 设置文件为公开可读 + }) + if err != nil { + return "", fmt.Errorf("failed to upload file to R2: %v", err) + } + + // 生成文件的URL + imageURL := fmt.Sprintf("https://%s/%s", h.customDomain, fileName) + return imageURL, nil +} + +// isImage 检查文件是否是图片 +func isImage(contentType string) bool { + allowedTypes := []string{"image/jpeg", "image/png", "image/gif", "image/bmp", "image/tiff", "image/webp"} + for _, t := range allowedTypes { + if contentType == t { + return true + } + } + return false +} \ No newline at end of file diff --git a/.history/pkg/handler/upload_20250115142606.go b/.history/pkg/handler/upload_20250115142606.go new file mode 100644 index 0000000..630add1 --- /dev/null +++ b/.history/pkg/handler/upload_20250115142606.go @@ -0,0 +1,162 @@ +// 上传文件到cloudflare R2 +package handler +import ( + "bytes" + "fmt" + "net/http" + "github.com/gin-gonic/gin" + "github.com/aws/aws-sdk-go/aws" + "github.com/aws/aws-sdk-go/aws/credentials" + "github.com/aws/aws-sdk-go/aws/session" + "github.com/aws/aws-sdk-go/service/s3" + "encoding/base64" + "io" + "strings" + "your-project/pkg/service" +) + +type UploadHandler struct { + accessKey string + secretKey string + bucket string + endpoint string + customDomain string + ocrService *OCRService + geminiService *service.GeminiService +} + +type MultiUploadResponse struct { + ImageURLs []string `json:"image_urls"` + Text string `json:"text"` + Success bool `json:"success"` +} + +func (h *UploadHandler) HandleMultiUpload(c *gin.Context) { + form, err := c.MultipartForm() + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to parse form"}) + return + } + + files := form.File["files"] + if len(files) == 0 { + c.JSON(http.StatusBadRequest, gin.H{"error": "No files uploaded"}) + return + } + + if len(files) > 5 { + c.JSON(http.StatusBadRequest, gin.H{"error": "Maximum 5 files allowed"}) + return + } + + var imageURLs []string + var ocrTexts []string + + for _, fileHeader := range files { + if fileHeader.Size > 10<<20 { // 10MB + c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"}) + return + } + + file, err := fileHeader.Open() + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to open file"}) + return + } + defer file.Close() + + // Read file content + fileBytes, err := io.ReadAll(file) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file"}) + return + } + + // Verify file type + contentType := http.DetectContentType(fileBytes) + if !isImage(contentType) { + c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"}) + return + } + + // Convert to base64 + base64Str := base64.StdEncoding.EncodeToString(fileBytes) + + // Process OCR + ocrText, err := h.ocrService.ProcessImage(c.Request.Context(), base64Str) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "OCR processing failed"}) + return + } + ocrTexts = append(ocrTexts, ocrText) + + // Upload to R2 + imageURL, err := h.uploadToR2(fileBytes, fileHeader.Filename, contentType) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to upload file"}) + return + } + imageURLs = append(imageURLs, imageURL) + } + + // Process combined text with Gemini if multiple images + finalText := strings.Join(ocrTexts, "\n") + if len(ocrTexts) > 1 { + prompt := "请将以下多段文字重新组织成一段通顺的文字,保持原意的同时确保语法和逻辑正确:\n\n" + finalText + processedText, err := h.geminiService.ProcessText(c.Request.Context(), prompt) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Text processing failed"}) + return + } + finalText = processedText + } + + c.JSON(http.StatusOK, MultiUploadResponse{ + ImageURLs: imageURLs, + Text: finalText, + Success: true, + }) +} + +// uploadToR2 上传文件到Cloudflare R2 +func (h *UploadHandler) uploadToR2(file []byte, fileName, contentType string) (string, error) { + // 创建S3会话 + sess, err := session.NewSession(&aws.Config{ + Endpoint: aws.String(h.endpoint), + Region: aws.String("auto"), + Credentials: credentials.NewStaticCredentials(h.accessKey, h.secretKey, ""), + }) + if err != nil { + return "", fmt.Errorf("failed to create S3 session: %v", err) + } + + // 创建S3服务客户端 + svc := s3.New(sess) + + // 上传文件到R2 + _, err = svc.PutObject(&s3.PutObjectInput{ + Bucket: aws.String(h.bucket), + Key: aws.String(fileName), + Body: bytes.NewReader(file), + ContentType: aws.String(contentType), + ACL: aws.String("public-read"), // 设置文件为公开可读 + }) + if err != nil { + return "", fmt.Errorf("failed to upload file to R2: %v", err) + } + + // 生成文件的URL + imageURL := fmt.Sprintf("https://%s/%s", h.customDomain, fileName) + return imageURL, nil +} + +// isImage 检查文件是否是图片 +func isImage(contentType string) bool { + allowedTypes := []string{"image/jpeg", "image/png", "image/gif", "image/bmp", "image/tiff", "image/webp"} + for _, t := range allowedTypes { + if contentType == t { + return true + } + } + return false +} \ No newline at end of file diff --git a/.history/pkg/service/gemini_20250115142509.go b/.history/pkg/service/gemini_20250115142509.go new file mode 100644 index 0000000..0519ecb --- /dev/null +++ b/.history/pkg/service/gemini_20250115142509.go @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/.history/pkg/service/gemini_20250115142516.go b/.history/pkg/service/gemini_20250115142516.go new file mode 100644 index 0000000..4d7bc8b --- /dev/null +++ b/.history/pkg/service/gemini_20250115142516.go @@ -0,0 +1,46 @@ +package service + +import ( + "context" + "github.com/google/generative-ai-go/genai" + "google.golang.org/api/option" +) + +type GeminiService struct { + apiKey string + client *genai.Client +} + +func NewGeminiService(apiKey string) (*GeminiService, error) { + ctx := context.Background() + client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey)) + if err != nil { + return nil, err + } + + return &GeminiService{ + apiKey: apiKey, + client: client, + }, nil +} + +func (s *GeminiService) Close() { + if s.client != nil { + s.client.Close() + } +} + +func (s *GeminiService) ProcessText(ctx context.Context, prompt string) (string, error) { + model := s.client.GenerativeModel("gemini-2.0-flash-exp") + resp, err := model.GenerateContent(ctx, genai.Text(prompt)) + if err != nil { + return "", err + } + + if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 { + if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok { + return string(textPart), nil + } + } + return "", nil +} \ No newline at end of file diff --git a/.history/pkg/service/gemini_20250115142545.go b/.history/pkg/service/gemini_20250115142545.go new file mode 100644 index 0000000..4d7bc8b --- /dev/null +++ b/.history/pkg/service/gemini_20250115142545.go @@ -0,0 +1,46 @@ +package service + +import ( + "context" + "github.com/google/generative-ai-go/genai" + "google.golang.org/api/option" +) + +type GeminiService struct { + apiKey string + client *genai.Client +} + +func NewGeminiService(apiKey string) (*GeminiService, error) { + ctx := context.Background() + client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey)) + if err != nil { + return nil, err + } + + return &GeminiService{ + apiKey: apiKey, + client: client, + }, nil +} + +func (s *GeminiService) Close() { + if s.client != nil { + s.client.Close() + } +} + +func (s *GeminiService) ProcessText(ctx context.Context, prompt string) (string, error) { + model := s.client.GenerativeModel("gemini-2.0-flash-exp") + resp, err := model.GenerateContent(ctx, genai.Text(prompt)) + if err != nil { + return "", err + } + + if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 { + if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok { + return string(textPart), nil + } + } + return "", nil +} \ No newline at end of file diff --git a/README.md b/README.md index 7000334..be2f5ee 100644 --- a/README.md +++ b/README.md @@ -41,4 +41,131 @@ tencenthw/ └── handler/ └── ocr.go └── rate.go -``` \ No newline at end of file +``` + +# OCR Image Processing Service + +这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。 + +## 功能特点 + +- 支持多图片上传(最多5张) +- 自动OCR文字识别 +- 智能文本整理(多图片场景) +- 图片云存储 +- 支持多种图片格式 + +## API 接口说明 + +### 1. 多图片上传接口 + +**接口地址**: `/upload` +**请求方法**: POST +**Content-Type**: multipart/form-data + +**请求参数**: +- `files`: 图片文件数组(支持1-5张图片) + +**支持的图片格式**: +- JPEG/JPG +- PNG +- GIF +- BMP +- TIFF +- WEBP + +**文件大小限制**: 每个文件最大10MB + +**请求示例**: + +```bash +curl -X POST \ +'http://your-domain/upload' \ +-H 'Content-Type: multipart/form-data' \ +-F 'files=@image1.jpg' \ +-F 'files=@image2.jpg' +``` +**响应格式**: +```json +{ +"image_urls": [ +"https://your-domain/image1.jpg", +"https://your-domain/image2.jpg" +], +"text": "整理后的文本内容", +"success": true +} +``` +### 2. OCR识别接口 + +**接口地址**: `/ocr` +**请求方法**: POST +**Content-Type**: application/json + +**请求参数**: +```json +{ +"image_base64": "base64编码的图片内容", +"image_url": "图片URL地址(可选,优先使用image_base64)", +"scene": "场景类型(可选)", +"apikey": "您的API密钥" +} +``` +**响应格式**: +```json +{ +"original_text": "原始识别文本", +"result": "处理后的文本", +"success": true +} +``` + +## 错误码说明 + +| HTTP状态码 | 错误描述 | 可能原因 | +|------------|----------|----------| +| 400 | Invalid request format | 请求格式错误 | +| 400 | No files uploaded | 未上传文件 | +| 400 | Maximum 5 files allowed | 超过最大文件数限制 | +| 400 | File size exceeds the limit of 10MB | 文件大小超限 | +| 400 | Invalid file type | 不支持的文件类型 | +| 401 | Invalid API key | API密钥无效 | +| 500 | OCR processing failed | OCR处理失败 | +| 500 | Text processing failed | 文本处理失败 | + +## 注意事项 + +1. 图片上传建议: + - 确保图片清晰可读 + - 控制图片大小在10MB以内 + - 使用支持的图片格式 + +2. OCR识别建议: + - 对于多图片场景,系统会自动整理文本逻辑 + - 单图片场景直接返回识别结果 + +3. API调用限制: + - 需要正确的API密钥 + - 建议控制并发请求数量 + +## 部署要求 + +- Go 1.16+ +- 配置文件中需要设置: + - Tencent Cloud OCR配置 + - Cloudflare R2存储配置 + - Gemini API配置 + - API密钥 + +## 环境变量配置 +```env +TENCENT_SECRET_ID=your_secret_id +TENCENT_SECRET_KEY=your_secret_key +GEMINI_API_KEY=your_gemini_api_key +API_KEY=your_api_key +R2_ACCESS_KEY=your_r2_access_key +R2_SECRET_KEY=your_r2_secret_key +R2_BUCKET=your_bucket_name +R2_ENDPOINT=your_r2_endpoint +R2_CUSTOM_DOMAIN=your_custom_domain +``` diff --git a/cmd/main.go b/cmd/main.go new file mode 100644 index 0000000..5460f36 --- /dev/null +++ b/cmd/main.go @@ -0,0 +1,66 @@ +package main + +import ( + "log" + + "github.com/gin-gonic/gin" + "tencenthw/pkg/config" + "tencenthw/pkg/handler" +) + +func main() { + // Load configuration + cfg, err := config.LoadConfig() + if err != nil { + log.Fatalf("Failed to load configuration: %v", err) + } + // Initialize services + geminiService, err := service.NewGeminiService(cfg.GeminiAPIKey) + if err != nil { + log.Fatal(err) + } + defer geminiService.Close() + + ocrService := handler.NewOCRService( + cfg.TencentSecretID, + cfg.TencentSecretKey, + geminiService, + ) + + // Initialize handlers + ocrHandler := handler.NewOCRHandler( + cfg.TencentSecretID, + cfg.TencentSecretKey, + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + rateHandler := handler.NewRateHandler( + cfg.GeminiAPIKey, + cfg.APIKey, + ) + + uploadHandler := handler.NewUploadHandler( + cfg.R2AccessKey, + cfg.R2SecretKey, + cfg.R2Bucket, + cfg.R2Endpoint, + cfg.R2CustomDomain, + ocrService, + geminiService, + ) + + // Setup Gin router + r := gin.Default() + + // Register routes + r.POST("/ocr", ocrHandler.HandleOCR) + r.POST("/rate", rateHandler.HandleRate) + // upload file to server + r.POST("/upload", uploadHandler.HandleUpload) + + // Start server + if err := r.Run("localhost:8080"); err != nil { + log.Fatalf("Failed to start server: %v", err) + } +} \ No newline at end of file diff --git a/hwserver b/hwserver deleted file mode 100755 index 4e1bad9..0000000 Binary files a/hwserver and /dev/null differ diff --git a/pkg/handler/ocr.go b/pkg/handler/ocr.go index bbf9be0..9f2d252 100644 --- a/pkg/handler/ocr.go +++ b/pkg/handler/ocr.go @@ -1,6 +1,7 @@ package handler import ( + "context" "encoding/base64" "net/http" "strings" @@ -11,13 +12,50 @@ import ( "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile" ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119" "google.golang.org/api/option" + "pkg/service" ) -type OCRHandler struct { +type OCRService struct { tencentSecretID string tencentSecretKey string - geminiAPIKey string - apiKey string + geminiService *service.GeminiService +} + +func NewOCRService(tencentSecretID, tencentSecretKey string, geminiService *service.GeminiService) *OCRService { + return &OCRService{ + tencentSecretID: tencentSecretID, + tencentSecretKey: tencentSecretKey, + geminiService: geminiService, + } +} + +func (s *OCRService) ProcessImage(ctx context.Context, imageBase64 string) (string, error) { + // Initialize Tencent Cloud client + credential := common.NewCredential(s.tencentSecretID, s.tencentSecretKey) + cpf := profile.NewClientProfile() + cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com" + client, err := ocr.NewClient(credential, "", cpf) + if err != nil { + return "", err + } + + // Create OCR request + request := ocr.NewGeneralHandwritingOCRRequest() + request.ImageBase64 = common.StringPtr(imageBase64) + + // Perform OCR + response, err := client.GeneralHandwritingOCR(request) + if err != nil { + return "", err + } + + // Extract text from OCR response + var ocrText string + for _, textDetection := range response.Response.TextDetections { + ocrText += *textDetection.DetectedText + "\n" + } + + return ocrText, nil } type OCRRequest struct { @@ -33,16 +71,7 @@ type OCRResponse struct { Success bool `json:"success"` } -func NewOCRHandler(tencentSecretID, tencentSecretKey, geminiAPIKey, apiKey string) *OCRHandler { - return &OCRHandler{ - tencentSecretID: tencentSecretID, - tencentSecretKey: tencentSecretKey, - geminiAPIKey: geminiAPIKey, - apiKey: apiKey, - } -} - -func (h *OCRHandler) HandleOCR(c *gin.Context) { +func (h *OCRService) HandleOCR(c *gin.Context) { var req OCRRequest if err := c.ShouldBindJSON(&req); err != nil { c.JSON(http.StatusBadRequest, OCRResponse{ @@ -53,7 +82,7 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) { } // Validate API key - if req.APIKey != h.apiKey { + if req.APIKey != h.geminiService.APIKey { c.JSON(http.StatusUnauthorized, OCRResponse{ Success: false, Result: "Invalid API key", @@ -70,49 +99,8 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) { return } - // Initialize Tencent Cloud client - credential := common.NewCredential(h.tencentSecretID, h.tencentSecretKey) - cpf := profile.NewClientProfile() - cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com" - client, err := ocr.NewClient(credential, "", cpf) - if err != nil { - c.JSON(http.StatusInternalServerError, OCRResponse{ - Success: false, - Result: "Failed to initialize OCR client", - }) - return - } - - // Create OCR request - request := ocr.NewGeneralHandwritingOCRRequest() - - // Prioritize ImageURL if both are provided - if req.ImageURL != "" { - request.ImageUrl = common.StringPtr(req.ImageURL) - } else { - // Remove base64 prefix if exists - imageBase64 := req.ImageBase64 - if idx := strings.Index(imageBase64, "base64,"); idx != -1 { - imageBase64 = imageBase64[idx+7:] // 7 is the length of "base64," - } - - // Validate base64 - if _, err := base64.StdEncoding.DecodeString(imageBase64); err != nil { - c.JSON(http.StatusBadRequest, OCRResponse{ - Success: false, - Result: "Invalid base64 image", - }) - return - } - request.ImageBase64 = common.StringPtr(imageBase64) - } - - if req.Scene != "" { - request.Scene = common.StringPtr(req.Scene) - } - - // Perform OCR - response, err := client.GeneralHandwritingOCR(request) + // Process image + ocrText, err := h.ProcessImage(c.Request.Context(), req.ImageBase64) if err != nil { c.JSON(http.StatusInternalServerError, OCRResponse{ Success: false, @@ -121,27 +109,8 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) { return } - // Extract text from OCR response - var ocrText string - for _, textDetection := range response.Response.TextDetections { - ocrText += *textDetection.DetectedText + "\n" - } - // Process with Gemini - ctx := c.Request.Context() - client2, err := genai.NewClient(ctx, option.WithAPIKey(h.geminiAPIKey)) - if err != nil { - c.JSON(http.StatusInternalServerError, OCRResponse{ - Success: false, - Result: "Failed to initialize Gemini client", - }) - return - } - defer client2.Close() - - model := client2.GenerativeModel("gemini-2.0-flash-exp") - prompt := "你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。\n\n" + ocrText - resp, err := model.GenerateContent(ctx, genai.Text(prompt)) + processedText, err := h.geminiService.ProcessText(c.Request.Context(), ocrText) if err != nil { c.JSON(http.StatusInternalServerError, OCRResponse{ Success: false, @@ -150,14 +119,6 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) { return } - // Get the processed text from Gemini response - processedText := "" - if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 { - if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok { - processedText = string(textPart) - } - } - c.JSON(http.StatusOK, OCRResponse{ Success: true, OriginalText: ocrText, diff --git a/pkg/handler/upload.go b/pkg/handler/upload.go index 7f6ac4e..630add1 100644 --- a/pkg/handler/upload.go +++ b/pkg/handler/upload.go @@ -9,81 +9,113 @@ import ( "github.com/aws/aws-sdk-go/aws/credentials" "github.com/aws/aws-sdk-go/aws/session" "github.com/aws/aws-sdk-go/service/s3" + "encoding/base64" + "io" + "strings" + "your-project/pkg/service" ) type UploadHandler struct { - accessKey string - secretKey string - bucket string - endpoint string - customDomain string + accessKey string + secretKey string + bucket string + endpoint string + customDomain string + ocrService *OCRService + geminiService *service.GeminiService } -type UploadRequest struct { - File string `json:"file" binding:"required"` - APIKey string `json:"apikey" binding:"required"` +type MultiUploadResponse struct { + ImageURLs []string `json:"image_urls"` + Text string `json:"text"` + Success bool `json:"success"` } -type UploadResponse struct { - ImageURL string `json:"image_url"` - Success bool `json:"success"` -} - -func NewUploadHandler(accessKey, secretKey, bucket, endpoint, customDomain string) *UploadHandler { - return &UploadHandler{ - accessKey: accessKey, - secretKey: secretKey, - bucket: bucket, - endpoint: endpoint, - customDomain: customDomain, - } -} -// 上传文件到cloudflare R2。判断文件是否是图片,如果是图片,则上传到R2,并返回图片的url,如果不是图片,则返回错误。 -// 图片大小限制为10M,图片格式为jpg, jpeg, png, gif, bmp, tiff, webp -// HandleUpload 上传文件到Cloudflare R2 -func (h *UploadHandler) HandleUpload(c *gin.Context) { - // 解析请求体 - file, header, err := c.Request.FormFile("file") +func (h *UploadHandler) HandleMultiUpload(c *gin.Context) { + form, err := c.MultipartForm() if err != nil { - c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to read file from request"}) - return - } - defer file.Close() - - // 读取文件内容 - fileBuffer := make([]byte, header.Size) - _, err = file.Read(fileBuffer) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file content"}) + c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to parse form"}) return } - // 验证文件类型 - contentType := http.DetectContentType(fileBuffer) - if !isImage(contentType) { - c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"}) + files := form.File["files"] + if len(files) == 0 { + c.JSON(http.StatusBadRequest, gin.H{"error": "No files uploaded"}) return } - // 验证文件大小 - if header.Size > 10<<20 { // 10MB - c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"}) + if len(files) > 5 { + c.JSON(http.StatusBadRequest, gin.H{"error": "Maximum 5 files allowed"}) return } - // 上传文件到R2 - imageURL, err := h.uploadToR2(fileBuffer, header.Filename, contentType) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("Failed to upload file to R2: %v", err)}) - return + var imageURLs []string + var ocrTexts []string + + for _, fileHeader := range files { + if fileHeader.Size > 10<<20 { // 10MB + c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"}) + return + } + + file, err := fileHeader.Open() + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to open file"}) + return + } + defer file.Close() + + // Read file content + fileBytes, err := io.ReadAll(file) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file"}) + return + } + + // Verify file type + contentType := http.DetectContentType(fileBytes) + if !isImage(contentType) { + c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"}) + return + } + + // Convert to base64 + base64Str := base64.StdEncoding.EncodeToString(fileBytes) + + // Process OCR + ocrText, err := h.ocrService.ProcessImage(c.Request.Context(), base64Str) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "OCR processing failed"}) + return + } + ocrTexts = append(ocrTexts, ocrText) + + // Upload to R2 + imageURL, err := h.uploadToR2(fileBytes, fileHeader.Filename, contentType) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to upload file"}) + return + } + imageURLs = append(imageURLs, imageURL) } - // 返回结果 - response := UploadResponse{ - ImageURL: imageURL, - Success: true, + // Process combined text with Gemini if multiple images + finalText := strings.Join(ocrTexts, "\n") + if len(ocrTexts) > 1 { + prompt := "请将以下多段文字重新组织成一段通顺的文字,保持原意的同时确保语法和逻辑正确:\n\n" + finalText + processedText, err := h.geminiService.ProcessText(c.Request.Context(), prompt) + if err != nil { + c.JSON(http.StatusInternalServerError, gin.H{"error": "Text processing failed"}) + return + } + finalText = processedText } - c.JSON(http.StatusOK, response) + + c.JSON(http.StatusOK, MultiUploadResponse{ + ImageURLs: imageURLs, + Text: finalText, + Success: true, + }) } // uploadToR2 上传文件到Cloudflare R2 diff --git a/pkg/service/gemini.go b/pkg/service/gemini.go new file mode 100644 index 0000000..4d7bc8b --- /dev/null +++ b/pkg/service/gemini.go @@ -0,0 +1,46 @@ +package service + +import ( + "context" + "github.com/google/generative-ai-go/genai" + "google.golang.org/api/option" +) + +type GeminiService struct { + apiKey string + client *genai.Client +} + +func NewGeminiService(apiKey string) (*GeminiService, error) { + ctx := context.Background() + client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey)) + if err != nil { + return nil, err + } + + return &GeminiService{ + apiKey: apiKey, + client: client, + }, nil +} + +func (s *GeminiService) Close() { + if s.client != nil { + s.client.Close() + } +} + +func (s *GeminiService) ProcessText(ctx context.Context, prompt string) (string, error) { + model := s.client.GenerativeModel("gemini-2.0-flash-exp") + resp, err := model.GenerateContent(ctx, genai.Text(prompt)) + if err != nil { + return "", err + } + + if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 { + if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok { + return string(textPart), nil + } + } + return "", nil +} \ No newline at end of file diff --git a/rate b/rate deleted file mode 100755 index 543694b..0000000 Binary files a/rate and /dev/null differ