Skip to content

Commit f8d4d8e

Browse files
v0.0.5
1 parent 9157b3e commit f8d4d8e

21 files changed

+650
-125
lines changed

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
### 基于LLM大模型的AI机器人
2-
一套基于开源框架、平台的AI语言模型机器人,集成人机对话,信息检索生成,PDF和URL解析对话等功能。
2+
一套基于开源框架、平台的AI语言模型机器人,集成人机对话,信息检索生成,PDF和URL解析对话等功能。全部采用免费开源API,以最低成本实现LLM定制化功能。
33

44
## 工具和平台
5-
Langchain, Streamlit, Oracle Cloud, Groq, Docker
5+
Langchain, Streamlit, Oracle Cloud, Groq, Docker, Baidu Cloud
66

77
## 文件结构描述
88
<pre>
@@ -18,15 +18,15 @@ Langchain, Streamlit, Oracle Cloud, Groq, Docker
1818
├── README.md
1919
├── .gitgnore
2020
├── config_setting/
21-
│ ├── model_config.py
21+
│ ├── model_config.py #all models
2222
│ └── prompt_config.py
2323
├── about_page.py
2424
├── chat_page.py
2525
├── Dockerfile
2626
├── pdf_page.py
2727
├── requirements.txt
2828
├── summary_page.py
29-
├── url_page.py
29+
├── url_page.py #ui
3030
</pre>
3131

3232
## 使用教程
@@ -42,22 +42,34 @@ Langchain, Streamlit, Oracle Cloud, Groq, Docker
4242
|----------------|-------------------------------------------------|
4343
| Groq API KEY | [Groq网页](https://console.groq.com/playground) |
4444
| COHERE API KEY | [COHERE网页](https://dashboard.cohere.com/) |
45+
| Gemini API KEY | [谷歌云网页](https://ai.google.dev/) |
46+
| BaiduQianfan API KEY | [百度智能云网页](https://cloud.baidu.com/) |
4547

4648
3. 项目根目录建立.env
4749
```bash
48-
GROQ_API_KEY=<Groq-API-KEY>
50+
GROQ_API_KEY= <Groq-API-KEY>
4951
COHERE_API_KEY= <COHERE-API-KEY>
52+
GOOGLE_API_KEY= <GOOGLE-API-KEY>
53+
QIANFAN_AK= <QIANFAN-AK>
54+
QIANFAN_SK= <QIANFAN-SK>
5055
```
5156
4. 运行
5257
```bash
53-
streamlit run chat_page.py
58+
streamlit run web_ui.py
5459
```
5560
### 服务器部署
5661
[wiki链接](https://github.com/Boomm-shakalaka/AIBot-LLM/wiki/Oracle%E6%9C%8D%E5%8A%A1%E5%99%A8%E6%90%AD%E5%BB%BA%E6%95%99%E7%A8%8B)
5762

5863

5964

6065
## 版本更新
66+
v0.0.5
67+
1. 新增百度千帆大模型(ERNIE-Lite-8K和ERNIE-Speed-128K免费开放)
68+
2. 新增gemini模型(gemini模型不支持streaming输出,暂未开放)
69+
3. 新增online chat功能,使用duckduck-search进行在线搜索
70+
4. 优化在线搜索调用方式
71+
5. 实现pdf chat功能中的简历评估功能
72+
6173
v0.0.4.1
6274
1. 新增selenium爬虫,优化网页解析能力
6375
2. 优化urlbot架构
@@ -92,8 +104,6 @@ v0.0.2
92104
4. 新增URLBot,可以根据URL进行检索
93105
5. 优化URL解析动画
94106

95-
96-
97107
v0.0.1
98108
1. 构建Streamlit网页基本框架
99109
2. 新增chatBot页面,编辑聊天窗口及侧边栏
-100 Bytes
Binary file not shown.
152 Bytes
Binary file not shown.
2.64 KB
Binary file not shown.

config_setting/model_config.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
model_ls = {
2+
"百度千帆大模型": {"name": "ERNIE-Lite-8K", "tokens": 8192, "developer": "Baidu"},
3+
"谷歌Gemma大模型": {"name": "gemma-7b-it", "tokens": 8192, "developer": "Google"},
4+
"谷歌gemini大模型":{"name": "gemini-1.5-flash-latest", "tokens": 8192, "developer": "Google"},
5+
"Llama3-70b大模型": {"name": "llama3-70b-8192", "tokens": 8192, "developer": "Meta"},
6+
"Llama3-8b大模型": {"name": "llama3-8b-8192", "tokens": 8192, "developer": "Meta"},
7+
"Mixtral大模型": {"name": "mixtral-8x7b-32768", "tokens": 32768, "developer": "Mistral"},
8+
}

config_setting/prompt_config.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,4 +61,29 @@
6161
question: {question}.
6262
chat_history:{chat_history}.
6363
search_result:{search_result}.
64-
"""
64+
"""
65+
66+
67+
resume_summary_prompt="""
68+
You are a human resources professional and you need to comment on the content provided in your resume. Your review criteria should adhere to the STAR principle, and your review framework is as follows:
69+
[Overall evaluation]
70+
You need to evaluate your entire resume. You need to look at the qualifications and relevant experience involved in the resume to determine whether there is a clear intention to work, and give your advice.
71+
[score]
72+
Give a scale of 0-100. The higher the score, the better the resume.
73+
[Personal Information]
74+
You need to list the personal information you include in your resume, such as name, email address, phone number, linkedin, etc. You need to determine if the information is complete
75+
[Educational background]
76+
You'll need to check with HR to see if the resume includes your full educational background. Whether there is a clear school name, major name, start time and graduation time, and optional school location. In some cases a description of the relevant profession may be included.
77+
[Work experience]
78+
You need to determine if the resume includes a job description. It involves working hours, position, company, location. You need to determine whether the content description of each work experience is clear and complete.
79+
[Internship experience]
80+
You need to determine if the resume includes a description of your internship experience. It involves working hours, position, company, location. You need to determine whether the description of each work experience is clear and complete.
81+
[Project research experience]
82+
You need to determine if the resume includes a description of the research experience of the project. Include working hours, job title, company or project name, and geographic location. You need to determine whether the description of each work experience is clear and complete.
83+
[Social activity experience]
84+
You need to decide whether to include some social or campus activities. Involve activity time, position, activity name, geographical location, etc. You need to determine whether the description of each work experience is clear and complete. This part is optional.
85+
[Optimization and modification suggestions]
86+
Give specific suggestions for changes.\n
87+
If the resume content is chinese, you should also give your comments in chinese.
88+
Resume content: {resume_content}
89+
"""

requirements.txt

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,5 @@ BeautifulSoup4
88
langchain_cohere
99
chromadb
1010
duckduckgo-search
11-
selenium
12-
# webdriver-manager
13-
playwright
14-
# playwright install
15-
lxml
11+
langchain-google-genai
12+
pdfminer.six

test/dockdockgo_test.py

Lines changed: 160 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,162 @@
1-
# from langchain_community.tools import DuckDuckGoSearchResults
2-
# import asyncio
3-
# async def get_search_results(question):
4-
# search = DuckDuckGoSearchResults()
5-
# results=search.run(question)
6-
# return results
7-
8-
# async def duckduck_search(question):
9-
# ddgs_results = await get_search_results(question)
10-
# return ddgs_results
11-
12-
13-
# from duckduckgo_search import ddg
14-
# r = ddg("the latest cnn news", max_results=5)
15-
# for page in r:
16-
# print(page)
17-
# question='the latest cnn news'
18-
# search_result=asyncio.run(duckduck_search(question))
19-
# print(search_result)
1+
import random
2+
import time
203
from duckduckgo_search import DDGS
4+
from langchain_community.document_loaders import WebBaseLoader
5+
import re
6+
from langchain_groq import ChatGroq
7+
import requests
8+
import streamlit as st
9+
from langchain_core.messages import AIMessage, HumanMessage
10+
from langchain_core.prompts import ChatPromptTemplate
11+
from langchain_core.output_parsers import StrOutputParser
12+
from langchain_groq import ChatGroq
13+
from langchain_core.prompts import PromptTemplate
14+
from langchain_google_genai import ChatGoogleGenerativeAI
15+
from langchain_cohere import ChatCohere
16+
from langchain_community.chat_models import QianfanChatEndpoint
17+
from langchain_community.tools import DuckDuckGoSearchResults
18+
from dotenv import find_dotenv, load_dotenv
19+
20+
def format_text(text):
21+
# 用正则表达式将连续多个制表符替换为一个制表符
22+
text = re.sub(r'\t+', '\t', text)
23+
# 用正则表达式将连续多个空格替换为一个空格
24+
text = re.sub(r' +', ' ', text)
25+
# 用正则表达式将多个换行符和空白字符的组合替换为一个换行符
26+
text = re.sub(r'\n\s*\n+', '\n', text)
27+
# 用正则表达式将单个换行符和空白字符的组合替换为一个换行符
28+
text = re.sub(r'\n\s+', '\n', text)
29+
return text
30+
31+
32+
def duckduck_search(question):
33+
headers = {
34+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36',
35+
}
36+
search = DuckDuckGoSearchResults()
37+
results=search.run(question)
38+
time.sleep(2)
39+
content=[]
40+
content.append(results)
41+
links = re.findall(r'link: (https?://[^\s\]]+)', results)
42+
count=0
43+
for url in links:
44+
response = requests.get(url,headers=headers)
45+
if response.status_code == 200:
46+
loader = WebBaseLoader(url)
47+
docs = loader.load()
48+
for doc in docs:
49+
page_text=format_text(doc.page_content)
50+
page_text=page_text[:2000]
51+
content.append(page_text)
52+
count+=1
53+
if count>=3:
54+
break
55+
return content
56+
57+
58+
59+
60+
61+
def judge_search(question,chat_history,llm):
62+
prompt_tempate="""
63+
Give you a question, you need to judge whether you need real-time information to answer.\n
64+
If you think you need more real-time information (It may include today,weather,location,new conceptsnames,non-existent concept etc.) to answer the question better,
65+
you need to output "[search]" with a standalone query. The query can be understood without the Chat History.\n
66+
If you can answer the question using the chat history without needing real-time information, just output your answer.\n
67+
Do not explain your decision process.
68+
Output format: "[search]: your query" or your answer.
69+
User Question: {question}.
70+
Chat History: {chat_history}.
71+
"""
72+
prompt = PromptTemplate.from_template(prompt_tempate)
73+
chain = prompt | llm | StrOutputParser()
74+
response = chain.invoke({
75+
"chat_history": chat_history,
76+
"question": question,
77+
})
78+
return response
79+
80+
81+
82+
def generate_based_history_query(question,chat_history,llm):
83+
based_history_prompt="""
84+
Use the following latest User Question to formulate a standalone query.
85+
The query can be understood without the Chat History.
86+
The output should just be the sentence sutiable for query.
87+
If you feel confused, just output the latest User Question.
88+
Do not provide any answer.
89+
User Question: '''{question}'''
90+
Chat History: '''{chat_history}'''
91+
query:
92+
"""
93+
rag_chain = PromptTemplate.from_template(based_history_prompt) | llm | StrOutputParser()
94+
result=rag_chain.invoke(
95+
{
96+
"chat_history":chat_history ,
97+
"question": question
98+
}
99+
)
100+
return result
101+
102+
def chat_response(question,chat_history,llm,content):
103+
try:
104+
# chatBot_template_prompt="""
105+
# You are a helpful assistant. Answer all questions to the best of your ability.
106+
# You can also use Chat History to help you understand User Questions.
107+
# You should use the Search Context to help you answer the User Questions.
108+
# If your cognition conflicts with the content in Search Context, please give priority to the content in Search Context.
109+
110+
# If the User Questions are asked in Chinese, then your answers must also be in Chinese.
111+
# User Questions: {question}.
112+
# Chat History:{chat_history}.
113+
# Search Context:{content}.
114+
# """
115+
chatBot_template_prompt="""
116+
You are a chat assistant. Please answer User Questions to the best of your ability.
117+
If the User Questions are asked in Chinese, then your answers must also be in Chinese.
118+
You can use the context of the Chat History to help you understand the user's question.
119+
If your understanding conflicts with the Search Context, please use the Search Context first to answer the question.
120+
If you think the Search Context is not helpful, please answer based on your understanding.
121+
If necessary, please output useful links from the Search Context at the end.
122+
User Questions: {question}.
123+
Chat History:{chat_history}.
124+
Search Context:{content}.
125+
"""
126+
127+
prompt = ChatPromptTemplate.from_template(chatBot_template_prompt)
128+
chain = prompt | llm | StrOutputParser()
129+
result=chain.invoke({
130+
"chat_history": chat_history,
131+
"question": question,
132+
"content": content
133+
})
134+
return result
135+
# return chain.stream({
136+
# "chat_history": chat_history,
137+
# "question": question,
138+
# "content": content
139+
# })
140+
except Exception as e:
141+
return f"当前模型暂不可用,请稍后尝试。"
142+
143+
144+
if __name__ == "__main__":
145+
question=input("请输入问题:")
146+
load_dotenv(find_dotenv())
147+
chat_history=[]
148+
while True:
149+
llm = QianfanChatEndpoint(model='ERNIE-Lite-8K')
150+
judge_result=judge_search(question,chat_history,llm)
151+
if '[search]' in judge_result:
152+
query=judge_result.split(":")[1]
153+
content=duckduck_search(query)
154+
llm=ChatGroq(model_name='mixtral-8x7b-32768',temperature=0.1)
155+
response=chat_response(question,chat_history,llm,content)
156+
else:
157+
response=judge_result
158+
chat_history.extend([HumanMessage(content=question), response])
159+
question=input("请输入问题:")
160+
161+
21162

22-
# results = DDGS().text("德国时间", max_results=5)
23-
# print(results)
24-
results = DDGS().text("NBA比赛")
25-
print(results)

test/duck_search.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# from langchain_community.tools import DuckDuckGoSearchRun
2+
# search = DuckDuckGoSearchRun()
3+
# result=search.run("NBA今日赛况")
4+
# print(result)
5+
6+
7+
# from langchain_community.tools import DuckDuckGoSearchResults
8+
# search = DuckDuckGoSearchResults()
9+
# result=search.run("介绍一下gpt-4o")
10+
# print(result)

test/llm_test.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
import random
2+
from langchain_groq import ChatGroq
3+
from dotenv import find_dotenv, load_dotenv
4+
import time
5+
6+
# llm=ChatGroq(model_name='gemma-7b-it',temperature=1)
7+
# question="宁诺附中老师"
8+
# result=llm.invoke(question)
9+
# print(result)
10+
11+
from langchain_google_genai import ChatGoogleGenerativeAI
12+
load_dotenv(find_dotenv())
13+
num=0
14+
while True:
15+
model=random.choice(["gemini-1.5-flash-latest",'gemini-1.0-pro-001','gemini-1.5-pro-latest',"gemini-1.0-pro"])
16+
print(model)
17+
# model_random=["gemini-1.5-flash-latest",'gemini-1.0-pro-001','gemini-1.5-pro-latest',"gemini-1.0-pro"]
18+
llm = ChatGoogleGenerativeAI(model=model,temperature=1)
19+
question="你是谁"
20+
result=llm.invoke(question)
21+
print(num)
22+
time.sleep(1)
23+
num+=1

0 commit comments

Comments
 (0)