update
This commit is contained in:
parent
bc9ba4855a
commit
5fbac922be
2
.gitignore
vendored
2
.gitignore
vendored
@ -1 +1,3 @@
|
||||
.env
|
||||
rate
|
||||
hwserver
|
||||
|
||||
44
.history/README_20250115141957.md
Normal file
44
.history/README_20250115141957.md
Normal file
@ -0,0 +1,44 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
79
.history/README_20250115142949.md
Normal file
79
.history/README_20250115142949.md
Normal file
@ -0,0 +1,79 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
85
.history/README_20250115143018.md
Normal file
85
.history/README_20250115143018.md
Normal file
@ -0,0 +1,85 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
ash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
87
.history/README_20250115143037.md
Normal file
87
.history/README_20250115143037.md
Normal file
@ -0,0 +1,87 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
87
.history/README_20250115143045.md
Normal file
87
.history/README_20250115143045.md
Normal file
@ -0,0 +1,87 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
88
.history/README_20250115143103.md
Normal file
88
.history/README_20250115143103.md
Normal file
@ -0,0 +1,88 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
92
.history/README_20250115143114.md
Normal file
92
.history/README_20250115143114.md
Normal file
@ -0,0 +1,92 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
97
.history/README_20250115143127.md
Normal file
97
.history/README_20250115143127.md
Normal file
@ -0,0 +1,97 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
98
.history/README_20250115143139.md
Normal file
98
.history/README_20250115143139.md
Normal file
@ -0,0 +1,98 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
105
.history/README_20250115143149.md
Normal file
105
.history/README_20250115143149.md
Normal file
@ -0,0 +1,105 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
106
.history/README_20250115143209.md
Normal file
106
.history/README_20250115143209.md
Normal file
@ -0,0 +1,106 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
113
.history/README_20250115143224.md
Normal file
113
.history/README_20250115143224.md
Normal file
@ -0,0 +1,113 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
114
.history/README_20250115143237.md
Normal file
114
.history/README_20250115143237.md
Normal file
@ -0,0 +1,114 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
**响应格式**:
|
||||
121
.history/README_20250115143250.md
Normal file
121
.history/README_20250115143250.md
Normal file
@ -0,0 +1,121 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"original_text": "原始识别文本",
|
||||
"result": "处理后的文本",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
160
.history/README_20250115143303.md
Normal file
160
.history/README_20250115143303.md
Normal file
@ -0,0 +1,160 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"original_text": "原始识别文本",
|
||||
"result": "处理后的文本",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
## 错误码说明
|
||||
|
||||
| HTTP状态码 | 错误描述 | 可能原因 |
|
||||
|------------|----------|----------|
|
||||
| 400 | Invalid request format | 请求格式错误 |
|
||||
| 400 | No files uploaded | 未上传文件 |
|
||||
| 400 | Maximum 5 files allowed | 超过最大文件数限制 |
|
||||
| 400 | File size exceeds the limit of 10MB | 文件大小超限 |
|
||||
| 400 | Invalid file type | 不支持的文件类型 |
|
||||
| 401 | Invalid API key | API密钥无效 |
|
||||
| 500 | OCR processing failed | OCR处理失败 |
|
||||
| 500 | Text processing failed | 文本处理失败 |
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 图片上传建议:
|
||||
- 确保图片清晰可读
|
||||
- 控制图片大小在10MB以内
|
||||
- 使用支持的图片格式
|
||||
|
||||
2. OCR识别建议:
|
||||
- 对于多图片场景,系统会自动整理文本逻辑
|
||||
- 单图片场景直接返回识别结果
|
||||
|
||||
3. API调用限制:
|
||||
- 需要正确的API密钥
|
||||
- 建议控制并发请求数量
|
||||
|
||||
## 部署要求
|
||||
|
||||
- Go 1.16+
|
||||
- 配置文件中需要设置:
|
||||
- Tencent Cloud OCR配置
|
||||
- Cloudflare R2存储配置
|
||||
- Gemini API配置
|
||||
- API密钥
|
||||
|
||||
## 环境变量配置
|
||||
171
.history/README_20250115143315.md
Normal file
171
.history/README_20250115143315.md
Normal file
@ -0,0 +1,171 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"original_text": "原始识别文本",
|
||||
"result": "处理后的文本",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
## 错误码说明
|
||||
|
||||
| HTTP状态码 | 错误描述 | 可能原因 |
|
||||
|------------|----------|----------|
|
||||
| 400 | Invalid request format | 请求格式错误 |
|
||||
| 400 | No files uploaded | 未上传文件 |
|
||||
| 400 | Maximum 5 files allowed | 超过最大文件数限制 |
|
||||
| 400 | File size exceeds the limit of 10MB | 文件大小超限 |
|
||||
| 400 | Invalid file type | 不支持的文件类型 |
|
||||
| 401 | Invalid API key | API密钥无效 |
|
||||
| 500 | OCR processing failed | OCR处理失败 |
|
||||
| 500 | Text processing failed | 文本处理失败 |
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 图片上传建议:
|
||||
- 确保图片清晰可读
|
||||
- 控制图片大小在10MB以内
|
||||
- 使用支持的图片格式
|
||||
|
||||
2. OCR识别建议:
|
||||
- 对于多图片场景,系统会自动整理文本逻辑
|
||||
- 单图片场景直接返回识别结果
|
||||
|
||||
3. API调用限制:
|
||||
- 需要正确的API密钥
|
||||
- 建议控制并发请求数量
|
||||
|
||||
## 部署要求
|
||||
|
||||
- Go 1.16+
|
||||
- 配置文件中需要设置:
|
||||
- Tencent Cloud OCR配置
|
||||
- Cloudflare R2存储配置
|
||||
- Gemini API配置
|
||||
- API密钥
|
||||
|
||||
## 环境变量配置
|
||||
```env
|
||||
TENCENT_SECRET_ID=your_secret_id
|
||||
TENCENT_SECRET_KEY=your_secret_key
|
||||
GEMINI_API_KEY=your_gemini_api_key
|
||||
API_KEY=your_api_key
|
||||
R2_ACCESS_KEY=your_r2_access_key
|
||||
R2_SECRET_KEY=your_r2_secret_key
|
||||
R2_BUCKET=your_bucket_name
|
||||
R2_ENDPOINT=your_r2_endpoint
|
||||
R2_CUSTOM_DOMAIN=your_custom_domain
|
||||
```
|
||||
171
.history/README_20250115143318.md
Normal file
171
.history/README_20250115143318.md
Normal file
@ -0,0 +1,171 @@
|
||||
# 腾讯手写识别接口转接
|
||||
|
||||
1. 输入图片的BASE64,返回识别结果
|
||||
|
||||
2. 使用JSON POST传输,返回JSON,符合restful风格
|
||||
3. 入参:
|
||||
- 图片的BASE64,string
|
||||
- Scene:场景,默认是null,可选only_hw,string
|
||||
- apikey: 测试期间,设置为固定值:1234567890,string
|
||||
4. 出参:
|
||||
- 识别结果,string
|
||||
- 成功与否,boolean
|
||||
|
||||
6. 使用腾讯通用手写体识别OCR SDK进行图像识别, 使用go语言,gin框架开发;
|
||||
7. 流程:
|
||||
- 应用接收到POST数据以后,校验数据的合法性(json格式、base64格式等);
|
||||
- 调用腾讯通用手写体识别OCR SDK进行图像识别;
|
||||
- 再调用google gemini的api进行组织语言,去除可能识别的错误。使用如下prompt:
|
||||
```
|
||||
你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。
|
||||
```
|
||||
- 返回识别结果。
|
||||
|
||||
8. google gemini的api key:"your key"
|
||||
9. tencentSecretId = "your id",tencentSecretKey = "your secret"
|
||||
|
||||
10. key存储在.env文件中,使用dotenv库进行加载。
|
||||
11. 增加rate功能,批改作文
|
||||
```
|
||||
项目结构
|
||||
```
|
||||
tencenthw/
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go
|
||||
└── pkg/
|
||||
├── config/
|
||||
│ └── config.go
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"original_text": "原始识别文本",
|
||||
"result": "处理后的文本",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
## 错误码说明
|
||||
|
||||
| HTTP状态码 | 错误描述 | 可能原因 |
|
||||
|------------|----------|----------|
|
||||
| 400 | Invalid request format | 请求格式错误 |
|
||||
| 400 | No files uploaded | 未上传文件 |
|
||||
| 400 | Maximum 5 files allowed | 超过最大文件数限制 |
|
||||
| 400 | File size exceeds the limit of 10MB | 文件大小超限 |
|
||||
| 400 | Invalid file type | 不支持的文件类型 |
|
||||
| 401 | Invalid API key | API密钥无效 |
|
||||
| 500 | OCR processing failed | OCR处理失败 |
|
||||
| 500 | Text processing failed | 文本处理失败 |
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 图片上传建议:
|
||||
- 确保图片清晰可读
|
||||
- 控制图片大小在10MB以内
|
||||
- 使用支持的图片格式
|
||||
|
||||
2. OCR识别建议:
|
||||
- 对于多图片场景,系统会自动整理文本逻辑
|
||||
- 单图片场景直接返回识别结果
|
||||
|
||||
3. API调用限制:
|
||||
- 需要正确的API密钥
|
||||
- 建议控制并发请求数量
|
||||
|
||||
## 部署要求
|
||||
|
||||
- Go 1.16+
|
||||
- 配置文件中需要设置:
|
||||
- Tencent Cloud OCR配置
|
||||
- Cloudflare R2存储配置
|
||||
- Gemini API配置
|
||||
- API密钥
|
||||
|
||||
## 环境变量配置
|
||||
```env
|
||||
TENCENT_SECRET_ID=your_secret_id
|
||||
TENCENT_SECRET_KEY=your_secret_key
|
||||
GEMINI_API_KEY=your_gemini_api_key
|
||||
API_KEY=your_api_key
|
||||
R2_ACCESS_KEY=your_r2_access_key
|
||||
R2_SECRET_KEY=your_r2_secret_key
|
||||
R2_BUCKET=your_bucket_name
|
||||
R2_ENDPOINT=your_r2_endpoint
|
||||
R2_CUSTOM_DOMAIN=your_custom_domain
|
||||
```
|
||||
1
.history/cmd/main_20250115142528.go
Normal file
1
.history/cmd/main_20250115142528.go
Normal file
@ -0,0 +1 @@
|
||||
|
||||
25
.history/cmd/main_20250115142536.go
Normal file
25
.history/cmd/main_20250115142536.go
Normal file
@ -0,0 +1,25 @@
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
config.AccessKey,
|
||||
config.SecretKey,
|
||||
config.Bucket,
|
||||
config.Endpoint,
|
||||
config.CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup routes
|
||||
router.POST("/upload", uploadHandler.HandleMultiUpload)
|
||||
25
.history/cmd/main_20250115142615.go
Normal file
25
.history/cmd/main_20250115142615.go
Normal file
@ -0,0 +1,25 @@
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
config.AccessKey,
|
||||
config.SecretKey,
|
||||
config.Bucket,
|
||||
config.Endpoint,
|
||||
config.CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup routes
|
||||
router.POST("/upload", uploadHandler.HandleMultiUpload)
|
||||
77
.history/cmd/main_20250115155220.go
Normal file
77
.history/cmd/main_20250115155220.go
Normal file
@ -0,0 +1,77 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
config.AccessKey,
|
||||
config.SecretKey,
|
||||
config.Bucket,
|
||||
config.Endpoint,
|
||||
config.CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup routes
|
||||
router.POST("/upload", uploadHandler.HandleMultiUpload)
|
||||
81
.history/cmd/main_20250115155312.go
Normal file
81
.history/cmd/main_20250115155312.go
Normal file
@ -0,0 +1,81 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
config.AccessKey,
|
||||
config.SecretKey,
|
||||
config.Bucket,
|
||||
config.Endpoint,
|
||||
config.CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup routes
|
||||
router.POST("/upload", uploadHandler.HandleMultiUpload)
|
||||
89
.history/cmd/main_20250115155347.go
Normal file
89
.history/cmd/main_20250115155347.go
Normal file
@ -0,0 +1,89 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
config.AccessKey,
|
||||
config.SecretKey,
|
||||
config.Bucket,
|
||||
config.Endpoint,
|
||||
config.CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup routes
|
||||
router.POST("/upload", uploadHandler.HandleMultiUpload)
|
||||
91
.history/cmd/main_20250115155425.go
Normal file
91
.history/cmd/main_20250115155425.go
Normal file
@ -0,0 +1,91 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
config.AccessKey,
|
||||
config.SecretKey,
|
||||
config.Bucket,
|
||||
config.Endpoint,
|
||||
config.CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup routes
|
||||
router.POST("/upload", uploadHandler.HandleMultiUpload)
|
||||
66
.history/cmd/main_20250115155453.go
Normal file
66
.history/cmd/main_20250115155453.go
Normal file
@ -0,0 +1,66 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(config.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
66
.history/cmd/main_20250115155512.go
Normal file
66
.history/cmd/main_20250115155512.go
Normal file
@ -0,0 +1,66 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(cfg.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
config.TencentSecretID,
|
||||
config.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
66
.history/cmd/main_20250115155521.go
Normal file
66
.history/cmd/main_20250115155521.go
Normal file
@ -0,0 +1,66 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(cfg.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
166
.history/pkg/handler/ocr_20250115141957.go
Normal file
166
.history/pkg/handler/ocr_20250115141957.go
Normal file
@ -0,0 +1,166 @@
|
||||
package handler
|
||||
|
||||
import (
|
||||
"encoding/base64"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/google/generative-ai-go/genai"
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common"
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile"
|
||||
ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119"
|
||||
"google.golang.org/api/option"
|
||||
)
|
||||
|
||||
type OCRHandler struct {
|
||||
tencentSecretID string
|
||||
tencentSecretKey string
|
||||
geminiAPIKey string
|
||||
apiKey string
|
||||
}
|
||||
|
||||
type OCRRequest struct {
|
||||
ImageBase64 string `json:"image_base64"`
|
||||
ImageURL string `json:"image_url"`
|
||||
Scene string `json:"scene"`
|
||||
APIKey string `json:"apikey" binding:"required"`
|
||||
}
|
||||
|
||||
type OCRResponse struct {
|
||||
OriginalText string `json:"original_text"`
|
||||
Result string `json:"result"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func NewOCRHandler(tencentSecretID, tencentSecretKey, geminiAPIKey, apiKey string) *OCRHandler {
|
||||
return &OCRHandler{
|
||||
tencentSecretID: tencentSecretID,
|
||||
tencentSecretKey: tencentSecretKey,
|
||||
geminiAPIKey: geminiAPIKey,
|
||||
apiKey: apiKey,
|
||||
}
|
||||
}
|
||||
|
||||
func (h *OCRHandler) HandleOCR(c *gin.Context) {
|
||||
var req OCRRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid request format",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Validate API key
|
||||
if req.APIKey != h.apiKey {
|
||||
c.JSON(http.StatusUnauthorized, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid API key",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Validate that at least one of ImageURL or ImageBase64 is provided
|
||||
if req.ImageURL == "" && req.ImageBase64 == "" {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Either image_url or image_base64 must be provided",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Initialize Tencent Cloud client
|
||||
credential := common.NewCredential(h.tencentSecretID, h.tencentSecretKey)
|
||||
cpf := profile.NewClientProfile()
|
||||
cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com"
|
||||
client, err := ocr.NewClient(credential, "", cpf)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Failed to initialize OCR client",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Create OCR request
|
||||
request := ocr.NewGeneralHandwritingOCRRequest()
|
||||
|
||||
// Prioritize ImageURL if both are provided
|
||||
if req.ImageURL != "" {
|
||||
request.ImageUrl = common.StringPtr(req.ImageURL)
|
||||
} else {
|
||||
// Remove base64 prefix if exists
|
||||
imageBase64 := req.ImageBase64
|
||||
if idx := strings.Index(imageBase64, "base64,"); idx != -1 {
|
||||
imageBase64 = imageBase64[idx+7:] // 7 is the length of "base64,"
|
||||
}
|
||||
|
||||
// Validate base64
|
||||
if _, err := base64.StdEncoding.DecodeString(imageBase64); err != nil {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid base64 image",
|
||||
})
|
||||
return
|
||||
}
|
||||
request.ImageBase64 = common.StringPtr(imageBase64)
|
||||
}
|
||||
|
||||
if req.Scene != "" {
|
||||
request.Scene = common.StringPtr(req.Scene)
|
||||
}
|
||||
|
||||
// Perform OCR
|
||||
response, err := client.GeneralHandwritingOCR(request)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "OCR processing failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Extract text from OCR response
|
||||
var ocrText string
|
||||
for _, textDetection := range response.Response.TextDetections {
|
||||
ocrText += *textDetection.DetectedText + "\n"
|
||||
}
|
||||
|
||||
// Process with Gemini
|
||||
ctx := c.Request.Context()
|
||||
client2, err := genai.NewClient(ctx, option.WithAPIKey(h.geminiAPIKey))
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Failed to initialize Gemini client",
|
||||
})
|
||||
return
|
||||
}
|
||||
defer client2.Close()
|
||||
|
||||
model := client2.GenerativeModel("gemini-2.0-flash-exp")
|
||||
prompt := "你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。\n\n" + ocrText
|
||||
resp, err := model.GenerateContent(ctx, genai.Text(prompt))
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Text processing failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Get the processed text from Gemini response
|
||||
processedText := ""
|
||||
if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 {
|
||||
if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok {
|
||||
processedText = string(textPart)
|
||||
}
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, OCRResponse{
|
||||
Success: true,
|
||||
OriginalText: ocrText,
|
||||
Result: processedText,
|
||||
})
|
||||
}
|
||||
127
.history/pkg/handler/ocr_20250115142525.go
Normal file
127
.history/pkg/handler/ocr_20250115142525.go
Normal file
@ -0,0 +1,127 @@
|
||||
package handler
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/base64"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/google/generative-ai-go/genai"
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common"
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile"
|
||||
ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119"
|
||||
"google.golang.org/api/option"
|
||||
"pkg/service"
|
||||
)
|
||||
|
||||
type OCRService struct {
|
||||
tencentSecretID string
|
||||
tencentSecretKey string
|
||||
geminiService *service.GeminiService
|
||||
}
|
||||
|
||||
func NewOCRService(tencentSecretID, tencentSecretKey string, geminiService *service.GeminiService) *OCRService {
|
||||
return &OCRService{
|
||||
tencentSecretID: tencentSecretID,
|
||||
tencentSecretKey: tencentSecretKey,
|
||||
geminiService: geminiService,
|
||||
}
|
||||
}
|
||||
|
||||
func (s *OCRService) ProcessImage(ctx context.Context, imageBase64 string) (string, error) {
|
||||
// Initialize Tencent Cloud client
|
||||
credential := common.NewCredential(s.tencentSecretID, s.tencentSecretKey)
|
||||
cpf := profile.NewClientProfile()
|
||||
cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com"
|
||||
client, err := ocr.NewClient(credential, "", cpf)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Create OCR request
|
||||
request := ocr.NewGeneralHandwritingOCRRequest()
|
||||
request.ImageBase64 = common.StringPtr(imageBase64)
|
||||
|
||||
// Perform OCR
|
||||
response, err := client.GeneralHandwritingOCR(request)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Extract text from OCR response
|
||||
var ocrText string
|
||||
for _, textDetection := range response.Response.TextDetections {
|
||||
ocrText += *textDetection.DetectedText + "\n"
|
||||
}
|
||||
|
||||
return ocrText, nil
|
||||
}
|
||||
|
||||
type OCRRequest struct {
|
||||
ImageBase64 string `json:"image_base64"`
|
||||
ImageURL string `json:"image_url"`
|
||||
Scene string `json:"scene"`
|
||||
APIKey string `json:"apikey" binding:"required"`
|
||||
}
|
||||
|
||||
type OCRResponse struct {
|
||||
OriginalText string `json:"original_text"`
|
||||
Result string `json:"result"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func (h *OCRService) HandleOCR(c *gin.Context) {
|
||||
var req OCRRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid request format",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Validate API key
|
||||
if req.APIKey != h.geminiService.APIKey {
|
||||
c.JSON(http.StatusUnauthorized, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid API key",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Validate that at least one of ImageURL or ImageBase64 is provided
|
||||
if req.ImageURL == "" && req.ImageBase64 == "" {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Either image_url or image_base64 must be provided",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Process image
|
||||
ocrText, err := h.ProcessImage(c.Request.Context(), req.ImageBase64)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "OCR processing failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Process with Gemini
|
||||
processedText, err := h.geminiService.ProcessText(c.Request.Context(), ocrText)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Text processing failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, OCRResponse{
|
||||
Success: true,
|
||||
OriginalText: ocrText,
|
||||
Result: processedText,
|
||||
})
|
||||
}
|
||||
127
.history/pkg/handler/ocr_20250115142558.go
Normal file
127
.history/pkg/handler/ocr_20250115142558.go
Normal file
@ -0,0 +1,127 @@
|
||||
package handler
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/base64"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/google/generative-ai-go/genai"
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common"
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile"
|
||||
ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119"
|
||||
"google.golang.org/api/option"
|
||||
"pkg/service"
|
||||
)
|
||||
|
||||
type OCRService struct {
|
||||
tencentSecretID string
|
||||
tencentSecretKey string
|
||||
geminiService *service.GeminiService
|
||||
}
|
||||
|
||||
func NewOCRService(tencentSecretID, tencentSecretKey string, geminiService *service.GeminiService) *OCRService {
|
||||
return &OCRService{
|
||||
tencentSecretID: tencentSecretID,
|
||||
tencentSecretKey: tencentSecretKey,
|
||||
geminiService: geminiService,
|
||||
}
|
||||
}
|
||||
|
||||
func (s *OCRService) ProcessImage(ctx context.Context, imageBase64 string) (string, error) {
|
||||
// Initialize Tencent Cloud client
|
||||
credential := common.NewCredential(s.tencentSecretID, s.tencentSecretKey)
|
||||
cpf := profile.NewClientProfile()
|
||||
cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com"
|
||||
client, err := ocr.NewClient(credential, "", cpf)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Create OCR request
|
||||
request := ocr.NewGeneralHandwritingOCRRequest()
|
||||
request.ImageBase64 = common.StringPtr(imageBase64)
|
||||
|
||||
// Perform OCR
|
||||
response, err := client.GeneralHandwritingOCR(request)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Extract text from OCR response
|
||||
var ocrText string
|
||||
for _, textDetection := range response.Response.TextDetections {
|
||||
ocrText += *textDetection.DetectedText + "\n"
|
||||
}
|
||||
|
||||
return ocrText, nil
|
||||
}
|
||||
|
||||
type OCRRequest struct {
|
||||
ImageBase64 string `json:"image_base64"`
|
||||
ImageURL string `json:"image_url"`
|
||||
Scene string `json:"scene"`
|
||||
APIKey string `json:"apikey" binding:"required"`
|
||||
}
|
||||
|
||||
type OCRResponse struct {
|
||||
OriginalText string `json:"original_text"`
|
||||
Result string `json:"result"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func (h *OCRService) HandleOCR(c *gin.Context) {
|
||||
var req OCRRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid request format",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Validate API key
|
||||
if req.APIKey != h.geminiService.APIKey {
|
||||
c.JSON(http.StatusUnauthorized, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid API key",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Validate that at least one of ImageURL or ImageBase64 is provided
|
||||
if req.ImageURL == "" && req.ImageBase64 == "" {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Either image_url or image_base64 must be provided",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Process image
|
||||
ocrText, err := h.ProcessImage(c.Request.Context(), req.ImageBase64)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "OCR processing failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Process with Gemini
|
||||
processedText, err := h.geminiService.ProcessText(c.Request.Context(), ocrText)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Text processing failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, OCRResponse{
|
||||
Success: true,
|
||||
OriginalText: ocrText,
|
||||
Result: processedText,
|
||||
})
|
||||
}
|
||||
130
.history/pkg/handler/upload_20250115141957.go
Normal file
130
.history/pkg/handler/upload_20250115141957.go
Normal file
@ -0,0 +1,130 @@
|
||||
// 上传文件到cloudflare R2
|
||||
package handler
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/aws/aws-sdk-go/aws"
|
||||
"github.com/aws/aws-sdk-go/aws/credentials"
|
||||
"github.com/aws/aws-sdk-go/aws/session"
|
||||
"github.com/aws/aws-sdk-go/service/s3"
|
||||
)
|
||||
|
||||
type UploadHandler struct {
|
||||
accessKey string
|
||||
secretKey string
|
||||
bucket string
|
||||
endpoint string
|
||||
customDomain string
|
||||
}
|
||||
|
||||
type UploadRequest struct {
|
||||
File string `json:"file" binding:"required"`
|
||||
APIKey string `json:"apikey" binding:"required"`
|
||||
}
|
||||
|
||||
type UploadResponse struct {
|
||||
ImageURL string `json:"image_url"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func NewUploadHandler(accessKey, secretKey, bucket, endpoint, customDomain string) *UploadHandler {
|
||||
return &UploadHandler{
|
||||
accessKey: accessKey,
|
||||
secretKey: secretKey,
|
||||
bucket: bucket,
|
||||
endpoint: endpoint,
|
||||
customDomain: customDomain,
|
||||
}
|
||||
}
|
||||
// 上传文件到cloudflare R2。判断文件是否是图片,如果是图片,则上传到R2,并返回图片的url,如果不是图片,则返回错误。
|
||||
// 图片大小限制为10M,图片格式为jpg, jpeg, png, gif, bmp, tiff, webp
|
||||
// HandleUpload 上传文件到Cloudflare R2
|
||||
func (h *UploadHandler) HandleUpload(c *gin.Context) {
|
||||
// 解析请求体
|
||||
file, header, err := c.Request.FormFile("file")
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to read file from request"})
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
// 读取文件内容
|
||||
fileBuffer := make([]byte, header.Size)
|
||||
_, err = file.Read(fileBuffer)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file content"})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证文件类型
|
||||
contentType := http.DetectContentType(fileBuffer)
|
||||
if !isImage(contentType) {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证文件大小
|
||||
if header.Size > 10<<20 { // 10MB
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"})
|
||||
return
|
||||
}
|
||||
|
||||
// 上传文件到R2
|
||||
imageURL, err := h.uploadToR2(fileBuffer, header.Filename, contentType)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("Failed to upload file to R2: %v", err)})
|
||||
return
|
||||
}
|
||||
|
||||
// 返回结果
|
||||
response := UploadResponse{
|
||||
ImageURL: imageURL,
|
||||
Success: true,
|
||||
}
|
||||
c.JSON(http.StatusOK, response)
|
||||
}
|
||||
|
||||
// uploadToR2 上传文件到Cloudflare R2
|
||||
func (h *UploadHandler) uploadToR2(file []byte, fileName, contentType string) (string, error) {
|
||||
// 创建S3会话
|
||||
sess, err := session.NewSession(&aws.Config{
|
||||
Endpoint: aws.String(h.endpoint),
|
||||
Region: aws.String("auto"),
|
||||
Credentials: credentials.NewStaticCredentials(h.accessKey, h.secretKey, ""),
|
||||
})
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to create S3 session: %v", err)
|
||||
}
|
||||
|
||||
// 创建S3服务客户端
|
||||
svc := s3.New(sess)
|
||||
|
||||
// 上传文件到R2
|
||||
_, err = svc.PutObject(&s3.PutObjectInput{
|
||||
Bucket: aws.String(h.bucket),
|
||||
Key: aws.String(fileName),
|
||||
Body: bytes.NewReader(file),
|
||||
ContentType: aws.String(contentType),
|
||||
ACL: aws.String("public-read"), // 设置文件为公开可读
|
||||
})
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to upload file to R2: %v", err)
|
||||
}
|
||||
|
||||
// 生成文件的URL
|
||||
imageURL := fmt.Sprintf("https://%s/%s", h.customDomain, fileName)
|
||||
return imageURL, nil
|
||||
}
|
||||
|
||||
// isImage 检查文件是否是图片
|
||||
func isImage(contentType string) bool {
|
||||
allowedTypes := []string{"image/jpeg", "image/png", "image/gif", "image/bmp", "image/tiff", "image/webp"}
|
||||
for _, t := range allowedTypes {
|
||||
if contentType == t {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
162
.history/pkg/handler/upload_20250115142533.go
Normal file
162
.history/pkg/handler/upload_20250115142533.go
Normal file
@ -0,0 +1,162 @@
|
||||
// 上传文件到cloudflare R2
|
||||
package handler
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/aws/aws-sdk-go/aws"
|
||||
"github.com/aws/aws-sdk-go/aws/credentials"
|
||||
"github.com/aws/aws-sdk-go/aws/session"
|
||||
"github.com/aws/aws-sdk-go/service/s3"
|
||||
"encoding/base64"
|
||||
"io"
|
||||
"strings"
|
||||
"your-project/pkg/service"
|
||||
)
|
||||
|
||||
type UploadHandler struct {
|
||||
accessKey string
|
||||
secretKey string
|
||||
bucket string
|
||||
endpoint string
|
||||
customDomain string
|
||||
ocrService *OCRService
|
||||
geminiService *service.GeminiService
|
||||
}
|
||||
|
||||
type MultiUploadResponse struct {
|
||||
ImageURLs []string `json:"image_urls"`
|
||||
Text string `json:"text"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func (h *UploadHandler) HandleMultiUpload(c *gin.Context) {
|
||||
form, err := c.MultipartForm()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to parse form"})
|
||||
return
|
||||
}
|
||||
|
||||
files := form.File["files"]
|
||||
if len(files) == 0 {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "No files uploaded"})
|
||||
return
|
||||
}
|
||||
|
||||
if len(files) > 5 {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Maximum 5 files allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
var imageURLs []string
|
||||
var ocrTexts []string
|
||||
|
||||
for _, fileHeader := range files {
|
||||
if fileHeader.Size > 10<<20 { // 10MB
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"})
|
||||
return
|
||||
}
|
||||
|
||||
file, err := fileHeader.Open()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to open file"})
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
// Read file content
|
||||
fileBytes, err := io.ReadAll(file)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file"})
|
||||
return
|
||||
}
|
||||
|
||||
// Verify file type
|
||||
contentType := http.DetectContentType(fileBytes)
|
||||
if !isImage(contentType) {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
// Convert to base64
|
||||
base64Str := base64.StdEncoding.EncodeToString(fileBytes)
|
||||
|
||||
// Process OCR
|
||||
ocrText, err := h.ocrService.ProcessImage(c.Request.Context(), base64Str)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "OCR processing failed"})
|
||||
return
|
||||
}
|
||||
ocrTexts = append(ocrTexts, ocrText)
|
||||
|
||||
// Upload to R2
|
||||
imageURL, err := h.uploadToR2(fileBytes, fileHeader.Filename, contentType)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to upload file"})
|
||||
return
|
||||
}
|
||||
imageURLs = append(imageURLs, imageURL)
|
||||
}
|
||||
|
||||
// Process combined text with Gemini if multiple images
|
||||
finalText := strings.Join(ocrTexts, "\n")
|
||||
if len(ocrTexts) > 1 {
|
||||
prompt := "请将以下多段文字重新组织成一段通顺的文字,保持原意的同时确保语法和逻辑正确:\n\n" + finalText
|
||||
processedText, err := h.geminiService.ProcessText(c.Request.Context(), prompt)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Text processing failed"})
|
||||
return
|
||||
}
|
||||
finalText = processedText
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, MultiUploadResponse{
|
||||
ImageURLs: imageURLs,
|
||||
Text: finalText,
|
||||
Success: true,
|
||||
})
|
||||
}
|
||||
|
||||
// uploadToR2 上传文件到Cloudflare R2
|
||||
func (h *UploadHandler) uploadToR2(file []byte, fileName, contentType string) (string, error) {
|
||||
// 创建S3会话
|
||||
sess, err := session.NewSession(&aws.Config{
|
||||
Endpoint: aws.String(h.endpoint),
|
||||
Region: aws.String("auto"),
|
||||
Credentials: credentials.NewStaticCredentials(h.accessKey, h.secretKey, ""),
|
||||
})
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to create S3 session: %v", err)
|
||||
}
|
||||
|
||||
// 创建S3服务客户端
|
||||
svc := s3.New(sess)
|
||||
|
||||
// 上传文件到R2
|
||||
_, err = svc.PutObject(&s3.PutObjectInput{
|
||||
Bucket: aws.String(h.bucket),
|
||||
Key: aws.String(fileName),
|
||||
Body: bytes.NewReader(file),
|
||||
ContentType: aws.String(contentType),
|
||||
ACL: aws.String("public-read"), // 设置文件为公开可读
|
||||
})
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to upload file to R2: %v", err)
|
||||
}
|
||||
|
||||
// 生成文件的URL
|
||||
imageURL := fmt.Sprintf("https://%s/%s", h.customDomain, fileName)
|
||||
return imageURL, nil
|
||||
}
|
||||
|
||||
// isImage 检查文件是否是图片
|
||||
func isImage(contentType string) bool {
|
||||
allowedTypes := []string{"image/jpeg", "image/png", "image/gif", "image/bmp", "image/tiff", "image/webp"}
|
||||
for _, t := range allowedTypes {
|
||||
if contentType == t {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
162
.history/pkg/handler/upload_20250115142606.go
Normal file
162
.history/pkg/handler/upload_20250115142606.go
Normal file
@ -0,0 +1,162 @@
|
||||
// 上传文件到cloudflare R2
|
||||
package handler
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/aws/aws-sdk-go/aws"
|
||||
"github.com/aws/aws-sdk-go/aws/credentials"
|
||||
"github.com/aws/aws-sdk-go/aws/session"
|
||||
"github.com/aws/aws-sdk-go/service/s3"
|
||||
"encoding/base64"
|
||||
"io"
|
||||
"strings"
|
||||
"your-project/pkg/service"
|
||||
)
|
||||
|
||||
type UploadHandler struct {
|
||||
accessKey string
|
||||
secretKey string
|
||||
bucket string
|
||||
endpoint string
|
||||
customDomain string
|
||||
ocrService *OCRService
|
||||
geminiService *service.GeminiService
|
||||
}
|
||||
|
||||
type MultiUploadResponse struct {
|
||||
ImageURLs []string `json:"image_urls"`
|
||||
Text string `json:"text"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func (h *UploadHandler) HandleMultiUpload(c *gin.Context) {
|
||||
form, err := c.MultipartForm()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to parse form"})
|
||||
return
|
||||
}
|
||||
|
||||
files := form.File["files"]
|
||||
if len(files) == 0 {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "No files uploaded"})
|
||||
return
|
||||
}
|
||||
|
||||
if len(files) > 5 {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Maximum 5 files allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
var imageURLs []string
|
||||
var ocrTexts []string
|
||||
|
||||
for _, fileHeader := range files {
|
||||
if fileHeader.Size > 10<<20 { // 10MB
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"})
|
||||
return
|
||||
}
|
||||
|
||||
file, err := fileHeader.Open()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to open file"})
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
// Read file content
|
||||
fileBytes, err := io.ReadAll(file)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file"})
|
||||
return
|
||||
}
|
||||
|
||||
// Verify file type
|
||||
contentType := http.DetectContentType(fileBytes)
|
||||
if !isImage(contentType) {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
// Convert to base64
|
||||
base64Str := base64.StdEncoding.EncodeToString(fileBytes)
|
||||
|
||||
// Process OCR
|
||||
ocrText, err := h.ocrService.ProcessImage(c.Request.Context(), base64Str)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "OCR processing failed"})
|
||||
return
|
||||
}
|
||||
ocrTexts = append(ocrTexts, ocrText)
|
||||
|
||||
// Upload to R2
|
||||
imageURL, err := h.uploadToR2(fileBytes, fileHeader.Filename, contentType)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to upload file"})
|
||||
return
|
||||
}
|
||||
imageURLs = append(imageURLs, imageURL)
|
||||
}
|
||||
|
||||
// Process combined text with Gemini if multiple images
|
||||
finalText := strings.Join(ocrTexts, "\n")
|
||||
if len(ocrTexts) > 1 {
|
||||
prompt := "请将以下多段文字重新组织成一段通顺的文字,保持原意的同时确保语法和逻辑正确:\n\n" + finalText
|
||||
processedText, err := h.geminiService.ProcessText(c.Request.Context(), prompt)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Text processing failed"})
|
||||
return
|
||||
}
|
||||
finalText = processedText
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, MultiUploadResponse{
|
||||
ImageURLs: imageURLs,
|
||||
Text: finalText,
|
||||
Success: true,
|
||||
})
|
||||
}
|
||||
|
||||
// uploadToR2 上传文件到Cloudflare R2
|
||||
func (h *UploadHandler) uploadToR2(file []byte, fileName, contentType string) (string, error) {
|
||||
// 创建S3会话
|
||||
sess, err := session.NewSession(&aws.Config{
|
||||
Endpoint: aws.String(h.endpoint),
|
||||
Region: aws.String("auto"),
|
||||
Credentials: credentials.NewStaticCredentials(h.accessKey, h.secretKey, ""),
|
||||
})
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to create S3 session: %v", err)
|
||||
}
|
||||
|
||||
// 创建S3服务客户端
|
||||
svc := s3.New(sess)
|
||||
|
||||
// 上传文件到R2
|
||||
_, err = svc.PutObject(&s3.PutObjectInput{
|
||||
Bucket: aws.String(h.bucket),
|
||||
Key: aws.String(fileName),
|
||||
Body: bytes.NewReader(file),
|
||||
ContentType: aws.String(contentType),
|
||||
ACL: aws.String("public-read"), // 设置文件为公开可读
|
||||
})
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to upload file to R2: %v", err)
|
||||
}
|
||||
|
||||
// 生成文件的URL
|
||||
imageURL := fmt.Sprintf("https://%s/%s", h.customDomain, fileName)
|
||||
return imageURL, nil
|
||||
}
|
||||
|
||||
// isImage 检查文件是否是图片
|
||||
func isImage(contentType string) bool {
|
||||
allowedTypes := []string{"image/jpeg", "image/png", "image/gif", "image/bmp", "image/tiff", "image/webp"}
|
||||
for _, t := range allowedTypes {
|
||||
if contentType == t {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
1
.history/pkg/service/gemini_20250115142509.go
Normal file
1
.history/pkg/service/gemini_20250115142509.go
Normal file
@ -0,0 +1 @@
|
||||
|
||||
46
.history/pkg/service/gemini_20250115142516.go
Normal file
46
.history/pkg/service/gemini_20250115142516.go
Normal file
@ -0,0 +1,46 @@
|
||||
package service
|
||||
|
||||
import (
|
||||
"context"
|
||||
"github.com/google/generative-ai-go/genai"
|
||||
"google.golang.org/api/option"
|
||||
)
|
||||
|
||||
type GeminiService struct {
|
||||
apiKey string
|
||||
client *genai.Client
|
||||
}
|
||||
|
||||
func NewGeminiService(apiKey string) (*GeminiService, error) {
|
||||
ctx := context.Background()
|
||||
client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return &GeminiService{
|
||||
apiKey: apiKey,
|
||||
client: client,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (s *GeminiService) Close() {
|
||||
if s.client != nil {
|
||||
s.client.Close()
|
||||
}
|
||||
}
|
||||
|
||||
func (s *GeminiService) ProcessText(ctx context.Context, prompt string) (string, error) {
|
||||
model := s.client.GenerativeModel("gemini-2.0-flash-exp")
|
||||
resp, err := model.GenerateContent(ctx, genai.Text(prompt))
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 {
|
||||
if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok {
|
||||
return string(textPart), nil
|
||||
}
|
||||
}
|
||||
return "", nil
|
||||
}
|
||||
46
.history/pkg/service/gemini_20250115142545.go
Normal file
46
.history/pkg/service/gemini_20250115142545.go
Normal file
@ -0,0 +1,46 @@
|
||||
package service
|
||||
|
||||
import (
|
||||
"context"
|
||||
"github.com/google/generative-ai-go/genai"
|
||||
"google.golang.org/api/option"
|
||||
)
|
||||
|
||||
type GeminiService struct {
|
||||
apiKey string
|
||||
client *genai.Client
|
||||
}
|
||||
|
||||
func NewGeminiService(apiKey string) (*GeminiService, error) {
|
||||
ctx := context.Background()
|
||||
client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return &GeminiService{
|
||||
apiKey: apiKey,
|
||||
client: client,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (s *GeminiService) Close() {
|
||||
if s.client != nil {
|
||||
s.client.Close()
|
||||
}
|
||||
}
|
||||
|
||||
func (s *GeminiService) ProcessText(ctx context.Context, prompt string) (string, error) {
|
||||
model := s.client.GenerativeModel("gemini-2.0-flash-exp")
|
||||
resp, err := model.GenerateContent(ctx, genai.Text(prompt))
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 {
|
||||
if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok {
|
||||
return string(textPart), nil
|
||||
}
|
||||
}
|
||||
return "", nil
|
||||
}
|
||||
129
README.md
129
README.md
@ -41,4 +41,131 @@ tencenthw/
|
||||
└── handler/
|
||||
└── ocr.go
|
||||
└── rate.go
|
||||
```
|
||||
```
|
||||
|
||||
# OCR Image Processing Service
|
||||
|
||||
这是一个集成了OCR识别、图片存储和文本处理功能的服务。支持多图片上传,自动OCR识别,并可以智能组织识别出的文本。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 支持多图片上传(最多5张)
|
||||
- 自动OCR文字识别
|
||||
- 智能文本整理(多图片场景)
|
||||
- 图片云存储
|
||||
- 支持多种图片格式
|
||||
|
||||
## API 接口说明
|
||||
|
||||
### 1. 多图片上传接口
|
||||
|
||||
**接口地址**: `/upload`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: multipart/form-data
|
||||
|
||||
**请求参数**:
|
||||
- `files`: 图片文件数组(支持1-5张图片)
|
||||
|
||||
**支持的图片格式**:
|
||||
- JPEG/JPG
|
||||
- PNG
|
||||
- GIF
|
||||
- BMP
|
||||
- TIFF
|
||||
- WEBP
|
||||
|
||||
**文件大小限制**: 每个文件最大10MB
|
||||
|
||||
**请求示例**:
|
||||
|
||||
```bash
|
||||
curl -X POST \
|
||||
'http://your-domain/upload' \
|
||||
-H 'Content-Type: multipart/form-data' \
|
||||
-F 'files=@image1.jpg' \
|
||||
-F 'files=@image2.jpg'
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"image_urls": [
|
||||
"https://your-domain/image1.jpg",
|
||||
"https://your-domain/image2.jpg"
|
||||
],
|
||||
"text": "整理后的文本内容",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
### 2. OCR识别接口
|
||||
|
||||
**接口地址**: `/ocr`
|
||||
**请求方法**: POST
|
||||
**Content-Type**: application/json
|
||||
|
||||
**请求参数**:
|
||||
```json
|
||||
{
|
||||
"image_base64": "base64编码的图片内容",
|
||||
"image_url": "图片URL地址(可选,优先使用image_base64)",
|
||||
"scene": "场景类型(可选)",
|
||||
"apikey": "您的API密钥"
|
||||
}
|
||||
```
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"original_text": "原始识别文本",
|
||||
"result": "处理后的文本",
|
||||
"success": true
|
||||
}
|
||||
```
|
||||
|
||||
## 错误码说明
|
||||
|
||||
| HTTP状态码 | 错误描述 | 可能原因 |
|
||||
|------------|----------|----------|
|
||||
| 400 | Invalid request format | 请求格式错误 |
|
||||
| 400 | No files uploaded | 未上传文件 |
|
||||
| 400 | Maximum 5 files allowed | 超过最大文件数限制 |
|
||||
| 400 | File size exceeds the limit of 10MB | 文件大小超限 |
|
||||
| 400 | Invalid file type | 不支持的文件类型 |
|
||||
| 401 | Invalid API key | API密钥无效 |
|
||||
| 500 | OCR processing failed | OCR处理失败 |
|
||||
| 500 | Text processing failed | 文本处理失败 |
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 图片上传建议:
|
||||
- 确保图片清晰可读
|
||||
- 控制图片大小在10MB以内
|
||||
- 使用支持的图片格式
|
||||
|
||||
2. OCR识别建议:
|
||||
- 对于多图片场景,系统会自动整理文本逻辑
|
||||
- 单图片场景直接返回识别结果
|
||||
|
||||
3. API调用限制:
|
||||
- 需要正确的API密钥
|
||||
- 建议控制并发请求数量
|
||||
|
||||
## 部署要求
|
||||
|
||||
- Go 1.16+
|
||||
- 配置文件中需要设置:
|
||||
- Tencent Cloud OCR配置
|
||||
- Cloudflare R2存储配置
|
||||
- Gemini API配置
|
||||
- API密钥
|
||||
|
||||
## 环境变量配置
|
||||
```env
|
||||
TENCENT_SECRET_ID=your_secret_id
|
||||
TENCENT_SECRET_KEY=your_secret_key
|
||||
GEMINI_API_KEY=your_gemini_api_key
|
||||
API_KEY=your_api_key
|
||||
R2_ACCESS_KEY=your_r2_access_key
|
||||
R2_SECRET_KEY=your_r2_secret_key
|
||||
R2_BUCKET=your_bucket_name
|
||||
R2_ENDPOINT=your_r2_endpoint
|
||||
R2_CUSTOM_DOMAIN=your_custom_domain
|
||||
```
|
||||
|
||||
66
cmd/main.go
Normal file
66
cmd/main.go
Normal file
@ -0,0 +1,66 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"tencenthw/pkg/config"
|
||||
"tencenthw/pkg/handler"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// Load configuration
|
||||
cfg, err := config.LoadConfig()
|
||||
if err != nil {
|
||||
log.Fatalf("Failed to load configuration: %v", err)
|
||||
}
|
||||
// Initialize services
|
||||
geminiService, err := service.NewGeminiService(cfg.GeminiAPIKey)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer geminiService.Close()
|
||||
|
||||
ocrService := handler.NewOCRService(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Initialize handlers
|
||||
ocrHandler := handler.NewOCRHandler(
|
||||
cfg.TencentSecretID,
|
||||
cfg.TencentSecretKey,
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
rateHandler := handler.NewRateHandler(
|
||||
cfg.GeminiAPIKey,
|
||||
cfg.APIKey,
|
||||
)
|
||||
|
||||
uploadHandler := handler.NewUploadHandler(
|
||||
cfg.R2AccessKey,
|
||||
cfg.R2SecretKey,
|
||||
cfg.R2Bucket,
|
||||
cfg.R2Endpoint,
|
||||
cfg.R2CustomDomain,
|
||||
ocrService,
|
||||
geminiService,
|
||||
)
|
||||
|
||||
// Setup Gin router
|
||||
r := gin.Default()
|
||||
|
||||
// Register routes
|
||||
r.POST("/ocr", ocrHandler.HandleOCR)
|
||||
r.POST("/rate", rateHandler.HandleRate)
|
||||
// upload file to server
|
||||
r.POST("/upload", uploadHandler.HandleUpload)
|
||||
|
||||
// Start server
|
||||
if err := r.Run("localhost:8080"); err != nil {
|
||||
log.Fatalf("Failed to start server: %v", err)
|
||||
}
|
||||
}
|
||||
@ -1,6 +1,7 @@
|
||||
package handler
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/base64"
|
||||
"net/http"
|
||||
"strings"
|
||||
@ -11,13 +12,50 @@ import (
|
||||
"github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common/profile"
|
||||
ocr "github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/ocr/v20181119"
|
||||
"google.golang.org/api/option"
|
||||
"pkg/service"
|
||||
)
|
||||
|
||||
type OCRHandler struct {
|
||||
type OCRService struct {
|
||||
tencentSecretID string
|
||||
tencentSecretKey string
|
||||
geminiAPIKey string
|
||||
apiKey string
|
||||
geminiService *service.GeminiService
|
||||
}
|
||||
|
||||
func NewOCRService(tencentSecretID, tencentSecretKey string, geminiService *service.GeminiService) *OCRService {
|
||||
return &OCRService{
|
||||
tencentSecretID: tencentSecretID,
|
||||
tencentSecretKey: tencentSecretKey,
|
||||
geminiService: geminiService,
|
||||
}
|
||||
}
|
||||
|
||||
func (s *OCRService) ProcessImage(ctx context.Context, imageBase64 string) (string, error) {
|
||||
// Initialize Tencent Cloud client
|
||||
credential := common.NewCredential(s.tencentSecretID, s.tencentSecretKey)
|
||||
cpf := profile.NewClientProfile()
|
||||
cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com"
|
||||
client, err := ocr.NewClient(credential, "", cpf)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Create OCR request
|
||||
request := ocr.NewGeneralHandwritingOCRRequest()
|
||||
request.ImageBase64 = common.StringPtr(imageBase64)
|
||||
|
||||
// Perform OCR
|
||||
response, err := client.GeneralHandwritingOCR(request)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// Extract text from OCR response
|
||||
var ocrText string
|
||||
for _, textDetection := range response.Response.TextDetections {
|
||||
ocrText += *textDetection.DetectedText + "\n"
|
||||
}
|
||||
|
||||
return ocrText, nil
|
||||
}
|
||||
|
||||
type OCRRequest struct {
|
||||
@ -33,16 +71,7 @@ type OCRResponse struct {
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func NewOCRHandler(tencentSecretID, tencentSecretKey, geminiAPIKey, apiKey string) *OCRHandler {
|
||||
return &OCRHandler{
|
||||
tencentSecretID: tencentSecretID,
|
||||
tencentSecretKey: tencentSecretKey,
|
||||
geminiAPIKey: geminiAPIKey,
|
||||
apiKey: apiKey,
|
||||
}
|
||||
}
|
||||
|
||||
func (h *OCRHandler) HandleOCR(c *gin.Context) {
|
||||
func (h *OCRService) HandleOCR(c *gin.Context) {
|
||||
var req OCRRequest
|
||||
if err := c.ShouldBindJSON(&req); err != nil {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
@ -53,7 +82,7 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) {
|
||||
}
|
||||
|
||||
// Validate API key
|
||||
if req.APIKey != h.apiKey {
|
||||
if req.APIKey != h.geminiService.APIKey {
|
||||
c.JSON(http.StatusUnauthorized, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid API key",
|
||||
@ -70,49 +99,8 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
// Initialize Tencent Cloud client
|
||||
credential := common.NewCredential(h.tencentSecretID, h.tencentSecretKey)
|
||||
cpf := profile.NewClientProfile()
|
||||
cpf.HttpProfile.Endpoint = "ocr.tencentcloudapi.com"
|
||||
client, err := ocr.NewClient(credential, "", cpf)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Failed to initialize OCR client",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Create OCR request
|
||||
request := ocr.NewGeneralHandwritingOCRRequest()
|
||||
|
||||
// Prioritize ImageURL if both are provided
|
||||
if req.ImageURL != "" {
|
||||
request.ImageUrl = common.StringPtr(req.ImageURL)
|
||||
} else {
|
||||
// Remove base64 prefix if exists
|
||||
imageBase64 := req.ImageBase64
|
||||
if idx := strings.Index(imageBase64, "base64,"); idx != -1 {
|
||||
imageBase64 = imageBase64[idx+7:] // 7 is the length of "base64,"
|
||||
}
|
||||
|
||||
// Validate base64
|
||||
if _, err := base64.StdEncoding.DecodeString(imageBase64); err != nil {
|
||||
c.JSON(http.StatusBadRequest, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Invalid base64 image",
|
||||
})
|
||||
return
|
||||
}
|
||||
request.ImageBase64 = common.StringPtr(imageBase64)
|
||||
}
|
||||
|
||||
if req.Scene != "" {
|
||||
request.Scene = common.StringPtr(req.Scene)
|
||||
}
|
||||
|
||||
// Perform OCR
|
||||
response, err := client.GeneralHandwritingOCR(request)
|
||||
// Process image
|
||||
ocrText, err := h.ProcessImage(c.Request.Context(), req.ImageBase64)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
@ -121,27 +109,8 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
// Extract text from OCR response
|
||||
var ocrText string
|
||||
for _, textDetection := range response.Response.TextDetections {
|
||||
ocrText += *textDetection.DetectedText + "\n"
|
||||
}
|
||||
|
||||
// Process with Gemini
|
||||
ctx := c.Request.Context()
|
||||
client2, err := genai.NewClient(ctx, option.WithAPIKey(h.geminiAPIKey))
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
Result: "Failed to initialize Gemini client",
|
||||
})
|
||||
return
|
||||
}
|
||||
defer client2.Close()
|
||||
|
||||
model := client2.GenerativeModel("gemini-2.0-flash-exp")
|
||||
prompt := "你是一个专业的助手,负责纠正OCR识别结果中的文本。只需要输出识别结果,不需要输出任何解释。\n\n" + ocrText
|
||||
resp, err := model.GenerateContent(ctx, genai.Text(prompt))
|
||||
processedText, err := h.geminiService.ProcessText(c.Request.Context(), ocrText)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, OCRResponse{
|
||||
Success: false,
|
||||
@ -150,14 +119,6 @@ func (h *OCRHandler) HandleOCR(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
// Get the processed text from Gemini response
|
||||
processedText := ""
|
||||
if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 {
|
||||
if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok {
|
||||
processedText = string(textPart)
|
||||
}
|
||||
}
|
||||
|
||||
c.JSON(http.StatusOK, OCRResponse{
|
||||
Success: true,
|
||||
OriginalText: ocrText,
|
||||
|
||||
@ -9,81 +9,113 @@ import (
|
||||
"github.com/aws/aws-sdk-go/aws/credentials"
|
||||
"github.com/aws/aws-sdk-go/aws/session"
|
||||
"github.com/aws/aws-sdk-go/service/s3"
|
||||
"encoding/base64"
|
||||
"io"
|
||||
"strings"
|
||||
"your-project/pkg/service"
|
||||
)
|
||||
|
||||
type UploadHandler struct {
|
||||
accessKey string
|
||||
secretKey string
|
||||
bucket string
|
||||
endpoint string
|
||||
customDomain string
|
||||
accessKey string
|
||||
secretKey string
|
||||
bucket string
|
||||
endpoint string
|
||||
customDomain string
|
||||
ocrService *OCRService
|
||||
geminiService *service.GeminiService
|
||||
}
|
||||
|
||||
type UploadRequest struct {
|
||||
File string `json:"file" binding:"required"`
|
||||
APIKey string `json:"apikey" binding:"required"`
|
||||
type MultiUploadResponse struct {
|
||||
ImageURLs []string `json:"image_urls"`
|
||||
Text string `json:"text"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
type UploadResponse struct {
|
||||
ImageURL string `json:"image_url"`
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
func NewUploadHandler(accessKey, secretKey, bucket, endpoint, customDomain string) *UploadHandler {
|
||||
return &UploadHandler{
|
||||
accessKey: accessKey,
|
||||
secretKey: secretKey,
|
||||
bucket: bucket,
|
||||
endpoint: endpoint,
|
||||
customDomain: customDomain,
|
||||
}
|
||||
}
|
||||
// 上传文件到cloudflare R2。判断文件是否是图片,如果是图片,则上传到R2,并返回图片的url,如果不是图片,则返回错误。
|
||||
// 图片大小限制为10M,图片格式为jpg, jpeg, png, gif, bmp, tiff, webp
|
||||
// HandleUpload 上传文件到Cloudflare R2
|
||||
func (h *UploadHandler) HandleUpload(c *gin.Context) {
|
||||
// 解析请求体
|
||||
file, header, err := c.Request.FormFile("file")
|
||||
func (h *UploadHandler) HandleMultiUpload(c *gin.Context) {
|
||||
form, err := c.MultipartForm()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to read file from request"})
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
// 读取文件内容
|
||||
fileBuffer := make([]byte, header.Size)
|
||||
_, err = file.Read(fileBuffer)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file content"})
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to parse form"})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证文件类型
|
||||
contentType := http.DetectContentType(fileBuffer)
|
||||
if !isImage(contentType) {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"})
|
||||
files := form.File["files"]
|
||||
if len(files) == 0 {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "No files uploaded"})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证文件大小
|
||||
if header.Size > 10<<20 { // 10MB
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"})
|
||||
if len(files) > 5 {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Maximum 5 files allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
// 上传文件到R2
|
||||
imageURL, err := h.uploadToR2(fileBuffer, header.Filename, contentType)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("Failed to upload file to R2: %v", err)})
|
||||
return
|
||||
var imageURLs []string
|
||||
var ocrTexts []string
|
||||
|
||||
for _, fileHeader := range files {
|
||||
if fileHeader.Size > 10<<20 { // 10MB
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "File size exceeds the limit of 10MB"})
|
||||
return
|
||||
}
|
||||
|
||||
file, err := fileHeader.Open()
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to open file"})
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
// Read file content
|
||||
fileBytes, err := io.ReadAll(file)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read file"})
|
||||
return
|
||||
}
|
||||
|
||||
// Verify file type
|
||||
contentType := http.DetectContentType(fileBytes)
|
||||
if !isImage(contentType) {
|
||||
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid file type. Only images are allowed"})
|
||||
return
|
||||
}
|
||||
|
||||
// Convert to base64
|
||||
base64Str := base64.StdEncoding.EncodeToString(fileBytes)
|
||||
|
||||
// Process OCR
|
||||
ocrText, err := h.ocrService.ProcessImage(c.Request.Context(), base64Str)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "OCR processing failed"})
|
||||
return
|
||||
}
|
||||
ocrTexts = append(ocrTexts, ocrText)
|
||||
|
||||
// Upload to R2
|
||||
imageURL, err := h.uploadToR2(fileBytes, fileHeader.Filename, contentType)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to upload file"})
|
||||
return
|
||||
}
|
||||
imageURLs = append(imageURLs, imageURL)
|
||||
}
|
||||
|
||||
// 返回结果
|
||||
response := UploadResponse{
|
||||
ImageURL: imageURL,
|
||||
Success: true,
|
||||
// Process combined text with Gemini if multiple images
|
||||
finalText := strings.Join(ocrTexts, "\n")
|
||||
if len(ocrTexts) > 1 {
|
||||
prompt := "请将以下多段文字重新组织成一段通顺的文字,保持原意的同时确保语法和逻辑正确:\n\n" + finalText
|
||||
processedText, err := h.geminiService.ProcessText(c.Request.Context(), prompt)
|
||||
if err != nil {
|
||||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Text processing failed"})
|
||||
return
|
||||
}
|
||||
finalText = processedText
|
||||
}
|
||||
c.JSON(http.StatusOK, response)
|
||||
|
||||
c.JSON(http.StatusOK, MultiUploadResponse{
|
||||
ImageURLs: imageURLs,
|
||||
Text: finalText,
|
||||
Success: true,
|
||||
})
|
||||
}
|
||||
|
||||
// uploadToR2 上传文件到Cloudflare R2
|
||||
|
||||
46
pkg/service/gemini.go
Normal file
46
pkg/service/gemini.go
Normal file
@ -0,0 +1,46 @@
|
||||
package service
|
||||
|
||||
import (
|
||||
"context"
|
||||
"github.com/google/generative-ai-go/genai"
|
||||
"google.golang.org/api/option"
|
||||
)
|
||||
|
||||
type GeminiService struct {
|
||||
apiKey string
|
||||
client *genai.Client
|
||||
}
|
||||
|
||||
func NewGeminiService(apiKey string) (*GeminiService, error) {
|
||||
ctx := context.Background()
|
||||
client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return &GeminiService{
|
||||
apiKey: apiKey,
|
||||
client: client,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (s *GeminiService) Close() {
|
||||
if s.client != nil {
|
||||
s.client.Close()
|
||||
}
|
||||
}
|
||||
|
||||
func (s *GeminiService) ProcessText(ctx context.Context, prompt string) (string, error) {
|
||||
model := s.client.GenerativeModel("gemini-2.0-flash-exp")
|
||||
resp, err := model.GenerateContent(ctx, genai.Text(prompt))
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 {
|
||||
if textPart, ok := resp.Candidates[0].Content.Parts[0].(genai.Text); ok {
|
||||
return string(textPart), nil
|
||||
}
|
||||
}
|
||||
return "", nil
|
||||
}
|
||||
Loading…
Reference in New Issue
Block a user