Skip to content

Commit 94acecc

Browse files
v1.0.1
1 parent 4a8f6c1 commit 94acecc

14 files changed

+242
-55
lines changed

Dockerfile

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,34 @@
1-
# 设置基础镜像,这里选择 Python 3.8
2-
FROM python:3.8.19
1+
# 使用 Ubuntu 22.04 作为基础镜像
2+
FROM ubuntu:22.04
33

44
# 设置工作目录
55
WORKDIR /app
66

77
# 复制项目文件到容器中的工作目录
88
COPY . /app
99

10-
# 安装 Python 依赖
11-
RUN pip install --no-cache-dir -r requirements.txt
10+
# 安装系统依赖项
11+
RUN apt-get update && \
12+
apt-get install -y libgl1-mesa-glx libpython3-dev
1213

13-
# 安装 Node.js 和 npm
14-
RUN curl -fsSL https://deb.nodesource.com/setup_14.x | bash - \
15-
&& apt-get install -y nodejs \
16-
&& rm -rf /var/lib/apt/lists/*
14+
# 安装 Python 3.9
15+
RUN apt-get install -y python3.9
1716

18-
# 安装 npm 依赖和 Playwright 浏览器
19-
RUN npm install && npx playwright install
17+
# 安装 pip
18+
RUN apt-get install -y python3-pip
19+
20+
# 安装 Python 依赖项
21+
RUN pip3 install --no-cache-dir -r requirements.txt
22+
23+
# 安装 Playwright 及其依赖项
24+
RUN playwright install --with-deps chromium
2025

2126
# 暴露端口
2227
EXPOSE 8501
2328

29+
# 设置环境变量以指定操作系统
30+
ENV OS_TYPE="linux"
31+
2432
# 运行 Streamlit 应用
25-
CMD ["streamlit", "run", "web_ui.py", "--server.port", "8501"]
33+
CMD ["python3", "-m", "streamlit", "run", "web_ui.py", "--server.port", "8501"]
34+

README.md

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
### 基于LLM大模型的AI机器人
1+
# 基于LLM大模型的AI机器人
22
一款开源的AI语言模型机器人,集成人机对话,信息检索生成,PDF和URL解析对话等功能。该平台优势为全部采用免费开源API,以最低成本实现LLM定制化功能。
33

44
## 工具和平台
@@ -7,8 +7,6 @@ Langchain, Streamlit, Oracle Cloud, Groq,Google cloud, Baidu Cloud, Docker
77
## DEMO链接
88
[Link](http://168.138.28.54:8501)
99

10-
## DEMO链接
11-
[Link](http://168.138.28.54:8501)
1210
## 文件结构描述
1311
<pre>
1412
.
@@ -39,12 +37,12 @@ Langchain, Streamlit, Oracle Cloud, Groq,Google cloud, Baidu Cloud, Docker
3937
├── web_ui.py # main interface
4038
</pre>
4139

42-
以下是优化后的Markdown写法:
4340

4441
## 功能描述
4542

4643
### Crawler爬虫模块
4744

45+
4846
* 该模块主要包含三种爬虫方法: [Selenium](https://selenium-python.readthedocs.io/)[Playwright](https://playwright.dev/python/docs/intro)[基于Langchain的DuckDuckGo](https://api.python.langchain.com/en/latest/tools/langchain_community.tools.ddg_search.tool.DuckDuckGoSearchResults.html)
4947

5048
* 实验显示,Playwright的耗时只有Selenium的一半:
@@ -53,7 +51,8 @@ Langchain, Streamlit, Oracle Cloud, Groq,Google cloud, Baidu Cloud, Docker
5351
| selenium_url_crawler | 27s |
5452
| playwright_url_crawler | 11s |
5553

56-
* 由于Streamlit和Playwright的同步方式会产生冲突,所以应使用异步方法。 [参考](https://discuss.streamlit.io/t/using-playwright-with-streamlit/28380/5)
54+
* 由于Streamlit和Playwright的同步方式会产生冲突,所以应使用异步方法。[参考](https://discuss.streamlit.io/t/using-playwright-with-streamlit/28380/5)
55+
5756

5857
### Chat模块 (在线和离线)
5958

@@ -92,6 +91,13 @@ Langchain, Streamlit, Oracle Cloud, Groq,Google cloud, Baidu Cloud, Docker
9291
3. 根据问题检索top_k个相关文档。
9392
4. 基于文档内容回答问题。
9493

94+
### PDF解析模块
95+
1. 基于[Streamlit-PDF-API](https://discuss.streamlit.io/t/display-pdf-in-streamlit/62274)[Langchain-PDFMinerLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PDFMinerLoader.html)
96+
2. 使用流程:
97+
1. 上传PDF
98+
2. 解析PDF内容大模型基于prompt总结PDF
99+
3. 根据问题和PDF内容进行回答
100+
95101
## 使用教程
96102

97103
### 本地部署
@@ -125,11 +131,37 @@ Langchain, Streamlit, Oracle Cloud, Groq,Google cloud, Baidu Cloud, Docker
125131
streamlit run web_ui.py
126132
```
127133
### 服务器部署
128-
1. [Docker链接](https://hub.docker.com/repository/docker/jiyuanc1/aibot/general)
129-
2. 服务器部署教程:[wiki链接](https://github.com/Boomm-shakalaka/AIBot-LLM/wiki/Oracle%E6%9C%8D%E5%8A%A1%E5%99%A8%E6%90%AD%E5%BB%BA%E6%95%99%E7%A8%8B)
134+
方法一: Linux环境本地安装和执行Docker
135+
* 服务器拉取github仓库
136+
* 构建镜像
137+
138+
方法二: Docker Hub拉取和执行镜像
139+
* [Docker Hub链接](https://hub.docker.com/repository/docker/jiyuanc1/aibot/general)
140+
141+
部署教程
142+
* 服务器部署教程:[wiki链接](https://github.com/Boomm-shakalaka/AIBot-LLM/wiki/Oracle%E6%9C%8D%E5%8A%A1%E5%99%A8%E6%90%AD%E5%BB%BA%E6%95%99%E7%A8%8B)
143+
144+
## Docker构建镜像已知问题
145+
1. Google-genai打包失败,没有找到该问题原因
146+
```bash
147+
ERROR: Could not find a version that satisfies the requirement langchain-google-genai (from -r requirements.txt (line 11)) (from >versions: none)
148+
ERROR: No matching distribution found for langchain-google-genai (from -r requirements.txt (line 11))
149+
```
150+
2. 对于windows和linux 不同操作系统,异步方法也不同 [参考](https://stackoverflow.com/questions/67964463/what-are-selectoreventloop-and-proactoreventloop-in-python-asyncio)
151+
```python
152+
if sys.platform == "win32":
153+
loop = asyncio.ProactorEventLoop() #windows系统
154+
else:
155+
loop = asyncio.SelectorEventLoop()#linux系统
156+
```
157+
3. playwright无法直接打包进Docker! 需要基于Ubuntu镜像环境[参考](https://stackoverflow.com/questions/72181737/issue-running-playwright-python-in-docker-container)
158+
130159
131160
## 版本更新记录
132-
v1.0.0 (oracle cloud)
161+
v1.0.1 (oracle)
162+
1. 解决Docker构建镜像问题,解决不同操作系统存在的异步方法
163+
164+
v1.0.0
133165
1. 优化pdf chat功能中的简历评估功能,增加对话
134166
2. 新增playwright爬虫模块,优化异步调用
135167
3. 新增url chat爬虫模块调用和来源检索选择功能
@@ -139,6 +171,7 @@ v1.0.0 (oracle cloud)
139171
7. 整合prompt配置内容
140172
8. 页面美化
141173
9. 新增about页面
174+
10. 更新Dockerfile
142175
143176
v0.0.5
144177
1. 新增百度千帆大模型(ERNIE-Lite-8K和ERNIE-Speed-128K免费开放)
100 Bytes
Binary file not shown.

crawler_modules.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import asyncio
22
import re
3+
import sys
34
from bs4 import BeautifulSoup
45
import requests
56
from langchain_core.documents import Document
@@ -169,10 +170,13 @@ async def playwright_crawler_async(url):
169170

170171
def selenium_url_crawler(url):
171172
options = Options()
172-
options.add_argument('--headless')
173+
options.add_argument("--headless") # Run Chrome in headless mode
174+
options.add_argument("--no-sandbox") # Bypass OS security model
175+
options.add_argument("--disable-dev-shm-usage") # Overcome limited resource problems
173176
# options.add_argument('--window-size=1920x1080')
174177

175-
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
178+
# driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
179+
driver = webdriver.Chrome(options=options)
176180
driver.get(url)
177181
# time.sleep(2)
178182

@@ -241,16 +245,22 @@ def duckduck_search(question):
241245
# print(data_playwright)
242246

243247
'''playwright_async'''
244-
# loop = asyncio.ProactorEventLoop()
248+
# if sys.platform == "win32":
249+
# loop = asyncio.ProactorEventLoop()
250+
# else:
251+
# loop = asyncio.SelectorEventLoop()
245252
# data_playwright_async = loop.run_until_complete(playwright_crawler_async('https://www.google.com/search?q=墨尔本天气'))
246253
# print(data_playwright_async)
247254

248255
'''google_search_sync'''
249256
# data_sync = google_search_sync(question)
250257
# print(data_sync)
251258

252-
'''google_search_async'''
253-
loop = asyncio.ProactorEventLoop()
259+
'''google_search_async'''
260+
if sys.platform == "win32":
261+
loop = asyncio.ProactorEventLoop()
262+
else:
263+
loop = asyncio.SelectorEventLoop()
254264
data_async = loop.run_until_complete(google_search_async(question))
255265
print(data_async)
256266

requirements.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ BeautifulSoup4
88
langchain_cohere
99
chromadb
1010
duckduckgo-search
11-
langchain-google-genai
11+
qianfan
12+
asyncio
13+
webdriver-manager
14+
# langchain-google-genai
1215
pdfminer.six
13-
selenium
16+
selenium
17+
playwright
-259 Bytes
Binary file not shown.
-59 Bytes
Binary file not shown.
-69 Bytes
Binary file not shown.
-128 Bytes
Binary file not shown.

web_pages/about_page.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,9 @@ def about_page():
4040
* [Bootstrap官网](https://getbootstrap.com/)
4141
4242
作者:Boomm-shakalaka
43-
版本:1.0
43+
版本:1.1
4444
Github项目地址:[AIBot-LLM](https://github.com/Boomm-shakalaka/AIBot-LLM)
45-
'报告Bug':[Github Issues](https://github.com/Boomm-shakalaka/AIBot-LLM/issues)
45+
报告Bug:[Github Issues](https://github.com/Boomm-shakalaka/AIBot-LLM/issues)
4646
"""
4747
)
4848

0 commit comments

Comments
 (0)