Streamlit使用指南

一. 前言

How to build an LLM-powered ChatBot with Streamlit

1.1 低代码趋势

低代码不是无代码.

Low-code is a visual approach to software development that enables faster delivery of applications through minimal hand-coding.

from IBM

A low-code development platform (LCDP) provides a development environment used to create application software through a graphical user interface.

from wiki

低代码的目的:

  • 减少代码量.
  • 部分甚至全部实现图形化开发(俗称的傻瓜式拖拽开发).
  • 加快项目开发速度.
  • 降低技术复杂程度.
  • 降低跨语言的入门门槛.

所谓的低代码并不是什么新的概念, 简单而言, 本质上和高级语言的出现没什么区别. 设想一下, 假如所有的项目都是从底层语言逐步开始搭建, 那么这种工作量是不可想象的.

一定程度上也可以形象地认为高级语言是低级语言的封装, 以人类相对容易理解的逻辑表达出来, 假如在这基础上, 进一步将需要的工作进行整合, 如将相应的工作进一步封装, 形成相对独立的库/框架库, 如pythonpip, nodejsnpm, 这样就进一步降低代码的工作量.

假如在上述的基础上再进一步将一些特定的大型任务整合在一起, 那么就形成所谓低代码的基础.

import requests

res = resquests.get(url, headers = {}...)

正如接触python爬虫的新手会看到的经典代码片段, 这段仅仅以两行的代码就将神秘的爬虫程序以一种极简的方式展现出来.

(当然爬虫的核心并在与此, 多在于逆向JavaScript的能力, 规避反爬(伪装), 分布式, 线程池/进程池, 代理池....)

简单的理解, 就是利用现成解决(模板)方案去解决特定的问题, 但是这个过程变得简单, 甚至图形化的方式操作.

所谓的低代码, 常见的如, 各种大型工业软件中常带有自动化辅助的宏类似, 例如在excel上可以用录制宏(即记录(鼠标)操作的过程, 直接后台生成可以实现这个过程的代码, 以这种方式来实现代码的自动生成, 其局限性很明显的.

  • 模板化, 即灵活程度非常低, 执行效率底下(在excel上, 录制的宏可以看到大量激活特定对象的操作, 而实际上这些操作是多余的, 而且这些激活操作会导致程序运行变得非常慢).
  • 需要适应模板的特定逻辑思维(即相对固定的方式来展开), 对于复杂功能的实现, 也许并不是好的选择.

这种局限性在各类第三方通用ERP系统(这些系统多功能庞杂, 操作极度的繁琐)上也是可以看到的, 这些ERP系统的实施多需要企业改造自身业务的流程来适应这种相对固定的ERP系统流程框架, 这个过程并不是容易的事情(这不仅涉及到一些技术的问题, 也涉及到企业对于整体的数字化管理的认知程度, 以及执行力度等基本问题).

对于低代码相对认可的评论:

低代码平台肯定是有用的, 在特定领域内它能大大地提升软件开发效率, 降低开发门槛, 但要说它是"职业新风口", 对, 吹牛不犯法的……

from zhihu, @金旭亮, 北理工

1.2 为什么是Streamlit?

1.2.1 简介

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps. So let's get started!

  • 开源
  • 易用
  • 分享
  • 前端(跨平台, 部署极为简单)
  • 针对机器学习等数据科学相关的领域
  • 让数据更强大

Roadmap - Streamlit

示例: Streamlit (30days.streamlit.app), 这是一个streamlit实现的web站点.

pp9BGVO.png

VBA开发图形界面程序为例, 只需要使用鼠标简单拖拽对应功能组件, 即可完成从布局到基本功能函数的设置, 所有的工作都集成在IDE完成, 用户可以节省大量时间, 将大部分的时间集中在主要的事务上来.

反面, 以pythonGUI开发为例, 其预置的tkinter库, 并无提供类似的拖拽式的开发工具, 所有的东西都需要从最底层开始, 例如需要窗体的大小, 每个组件的相对位置等等一堆琐事(对于只是希望开发个简单易用的小工具而言, 某种意义上来看, 这些操作可视作毫无价值的行为, 不断重复).

streamlit就是为了解决这种基础问题的前端库, 其目的在于降低非前端用户群体对于使用前端产品的门槛, 例如, 快速搭建一个中小型企业局域网范围内使用的数据查询终端, 复杂(数据)产品demo, 或者其他需要复杂交互数据看板演示等等. 显然不希望从底层的JavaScript, css, html一步步地构建起来, 通常这些项目也不需要太过复杂的功能的实现(JavaScript), 也不需要追求极致绚丽的特效(css), 也不希望浪费时间在写基础文档(html), 只是需要实现特定的功能即可, 快速上线, 易于维护(只需要少量的技术人员即可完成全栈开发和维护).

1.2.2 数据可视化前端库对比

由于python数据可视化前端库数量相当多, 其质量参差不齐, 这里只挑出几个已经相对成熟, 已经有一定规模用户群的库, What are some alternatives to Streamlit? - StackShare.

挑选出下面三个库作为对比, 这里来看看三者的宣传标语:

  • Streamlit

    A faster way to build and share data apps

    更快的方式创建和分享数据app

  • Gradio

    Build Machine Learning Web Apps - in Python

    创建机器学习的web app

  • Dash

    Dash is the most downloaded, trusted Python framework for building ML & data science web apps.

    下载最多的机器学习, 数据科学的web app框架

    需要额外注意的是dash之外还有个极其强大的数据可视化库Plotly.

引用篇文章-Gradio vs Streamlit vs Dash vs Flask的小结:

Gradio: Gradio is specifically built with machine learning models in mind. So if you want to create a web UI specifically for a machine learning model that you built, Gradio’s simple syntax and setup is the way to go.

为机器学习准备的.

Streamlit: Streamlit is useful if you want to get a dashboard up and running quickly, and have the flexibility to add lots of components and controls. As well, Streamlit allows you to build a web UI or a dashboard much faster than Dash or Flask.

为开发速度准备的(也可以理解为为中小型公司/组织而生, 可以为全栈开发的工程师提供前端的能力).

Dash: Choose Dash if you want to be a production-ready dashboard for a larger company, since it’s mainly tailored for enterprise companies.

为大型公司准备的(有足够的开发人员去雕刻这些生成的图表, 或者说产品已经高度成熟可以纳入到生产力阶段).

Flask: Choose Flask if you have knowledge of Python/HTML/CSS programming and you want to build your own solution completely from scratch.

python前端准备的(轻量化前端框架)

由于flask和其他三者的目的完全不一样(其更接近于传统的web开发框架), 用途也有相对大的差异, 这里不纳入讨论的范围.

这三者均以数据科学(机器学习/深度学习)作为发起的支点, 相比于以Excel, PowerBI等为代表的传统数据可视化方式, 以机器学习, 大数据为代表的新数据可视化要求更为复杂, 条件更为苛刻.

  • 更复杂的数据处理的需要.

    如数据的来源, 可能是数据库(各种数据库), 各种文件格式(不再限于csv, excel, txt等), 爬虫等.

  • 满足更庞大的数据处理需求(传统的数据处理方式很容易撞上数据量的墙)

    pp96xaV.png

    (PowerBI在载入了一个13.2Mcsv文件, 数据量约为12万行后的内存占用, 闲置状态下占用约为1G左右, 载入文件后占用增长近400M)

  • 更复杂的数据可视化/交互

    指标体系可能出现成百上千个, 相对单一的数据可视化方式很难展示其中某些数据的细节.

  • 跨平台

    由于远程会议, 居家办公等情形, 数据不再是局限在特定的系统和平台上.

map

web方式, 很好的解决跨平台这个问题, 同时对于复杂亦或者绚丽的图表视觉效果, 例如百度地图的春节人口迁移数据看板, 使用到百度开源的Echarts(Python版本对应的是pyecharts), web端的实现也是非常好的选择(而且自由度更高).

对于数据分析/挖掘, 多应以于数据价值的挖掘为主, 显然不应该浪费过多的时间在华丽图表的制作以及为实现这种华丽所消耗的时间上, 虽然数据可视化是数据处理中不可或缺的一环, 但是不是核心的一环.

对于图表在生产力上使用应当遵循简洁明了, 适当美化的原则, 生产力中的图表主要目的在于将相对复杂的事情以相对简单而易懂的方式阐明, 而和媒体使用的图表需要绚烂的视觉效果追求的给人震撼力不同.

1.2.3 Dash & Streamlit

作为streamlit最主要的竞争对手 Dash, 相比这二者, streamlit进一步简化使用, 部署速度更快, 这一点在某种程度更容易获得用户的认可.

image-20230227101604398

从二者在Github上的受欢迎程度(需要注意streamlit要晚于dash, 其star的速度远快于后者), 可见一斑.

引用原文一张图片展示二者star趋势

但是需要注意的是, 这种模块化的前端库, 并不意味着完全不需要前端的知识, 而是降低了使用的门槛, 大幅度降低代码的量, 而是减少非必要代码的工作量, 让作业者专注于主要的工作, 减少非主业的时间消耗.

1.2.4 项目前景

2021年, streamlit完成3500w$B轮融资

Today, we're excited to announce a $35 million Series B investment led by Sequoia and backed by our existing investors Gradient Ventures and GGV Capital.

两家风投的背景介绍:

二. 基本使用

main

2.1 使用前的准备

  • 对于markdown有较深入的使用(例如一些html标签的插入, 图表, 多媒体信息标签等), 这对于使用streamlit, 亦或者其他的快速部署前端工具都会有帮助.
  • 对于前端有一定的了解, 如简单的css.
  • 对浏览器的基本机制有一定的了解, 例如session(会话), storage等(相对应的是sessionStorage, LocalStorage), 这个部分非常关键对于需要操作需要对数据进行演算并保留结果.

2.2 基本命令行

相关的运行参数和配置见文章底部的配置文件部分.

# 查看帮助
streamlit --help

# 运行脚本
python -m streamlit run your_script.py
# 等价于
streamlit run your_script.py

# 查看版本
streamlit version

# 查看文档
streamlit docs

# 查看配置
streamlit config show

2.3 内容输出方式

streamlit的内容输出, 也许是借鉴了jupyter notebook的特性, 不一定需要st.write(js中同名函数document.write, 但该特性已经被浏览器废弃)来实现, 可以使用"魔术"方式直接输出内容.

def main():
    '''
    # This is the document title
    This is some _markdown_.
    '''

    df = pd.DataFrame({'col1': [1,2,3]})
    df  # 等价于st.write / st.dataframe

    x = 10
    'x', x  #

    arr = np.random.normal(1, 1, size=100)
    fig, ax = plt.subplots()
    ax.hist(arr, bins=20)

    fig
pp97pid.png

但是不建议这种方式, 虽然很方便, 但是在IDE上不利于故障的排查, 除非是在jupyter notebook上的使用.

三. 容器

简单的理解, 就是将相关内容进行分区/分块, 让页面更为整洁, 管理更方便.

function 含义
st.sidebar 侧边栏
st.columns
st.tabs 标签
st.expander 可收起容器
st.container (大)容器
st.empty 空的容器
pp97RfA.png

VBA窗体中的容器multi_page, frame, 实现相对复杂的页面布局.

在容器内可以放置其他的组件, 如按钮, 下拉框, 文本框等, 同时将相应的功能模块以视觉的方式分离开来, 更好管理和使用.

st.columns为例:

import streamlit as st

# 展示散列图片(在同一行中)
col1, col2, col3 = st.columns(3)

with col1:
    st.header("A cat")
    st.image("https://static.streamlit.io/examples/cat.jpg")

with col2:
    st.header("A dog")
    st.image("https://static.streamlit.io/examples/dog.jpg")

with col3:
    st.header("An owl")
    st.image("https://static.streamlit.io/examples/owl.jpg")

四. 文本

4.1 输入

function 含义
st.text_input 单行文本输入
st.text_area 多行文本输入
st.number_input 数字输入
st.time_input 时间输入

这部分相对简单, 没什么好说的.

4.2 展示

对于这部分不了解的, 可以先用markdown文档用来练手, 这个部分基本上就是markdown的翻版.

function 作用 适用范围
st.markdown 以markdown的格式展示内容 全局
st.title 全局标题 全局
st.header 标题 大的区块
st.subheader 次级标题 大的区块
st.caption 组件的名称 小组件
st.code 代码块 -
st.text 常规文本 -
st.latex latex, 文献(数学公式)排版相关 -

这里需要注意的是st.markdown

st.markdown(body, unsafe_allow_html=False)
Parameters
body (str) The string to display as Github-flavored Markdown. Syntax information can be found at: https://github.github.com/gfm.This also supports:Emoji shortcodes, such as :+1: and :sunglasses:. For a list of all supported codes, see https://share.streamlit.io/streamlit/emoji-shortcodes.LaTeX expressions, by wrapping them in ""or"" or ""(the"" (the "$" must be on their own lines). Supported LaTeX functions are listed at https://katex.org/docs/supported.html.Colored text, using the syntax :color[text to be colored], where color needs to be replaced with any of the following supported colors: blue, green, orange, red, violet.
unsafe_allow_html (bool) By default, any HTML tags found in the body will be escaped and therefore treated as pure text. This behavior may be turned off by setting this argument to True.That said, we strongly advise against it. It is hard to write secure HTML, so by using this argument you may be compromising your users' security. For more information, see:https://github.com/streamlit/streamlit/issues/152

unsafe_allow_html参数可以让更复杂的自定义页面成为可能.

默认模式下, 相关的内容先经过转义, 当作普通的纯文本内容来处理.

@staticmethod
def _inject_css():
    css = '''
                    h1{text-align: center;}
                    .css-1kyxreq{justify-content: center;}
                    .css-ffhzg2{background: gray !important}
                    .row-widget.stTextInput{margin-left: 25% !important;}
                    .st-bt{width: 50% !important;}
                '''
    st.markdown(f'<style>{css}</style>', unsafe_allow_html=True)

如: 注入css改变页面的布局, 但是这里并不支持直接注入js脚本.

Unfortunately, right now JavaScript integration is not supported in Streamlit, but we already have a feature request based on our users’ issues: https://github.com/streamlit/streamlit/issues/969

Please, feel free to follow that issue for up-to-date information and write any additional details to that request. And thanks for using Streamlit!

由于考虑到安全问题, 这种方式注入脚本已经完全被禁止.

需要在页面中注入脚本, 相关内容见套件部分的内容, 禁止直接在页面诸如脚本, 一定程度降低了streamlit的使用灵活性.

数学公式:

import streamlit as st

st.latex(r'''
\text{单位矩阵:(一般使用I, E来表示)}\\
I= \begin{bmatrix}
1 & 0 & \cdots & 0\\
0 & 1 & \cdots & 0\\
\vdots&\vdots & \ddots & \vdots\\
0 & 0 & \cdots  & 1
\end{bmatrix}\\\\
正交矩阵:
A^TA = I, 则称A为正交矩阵\\\\
矩阵的转置:(A^T)^T = A\\
(A_1A_2..A_n)^T = A_n^T..A_2^TA_1^T\\\\
对角矩阵:\\
A = \begin{bmatrix}
	 a_{11} &   & &  \\
	  & a_{22} &  & \\
	  &   & a_{33}&\\
	  &   &  &a_{44}
	 \end{bmatrix}\\
对角矩阵的转置等于自身: A^T = A\\\\
逆矩阵: AB = BA = I\\
A = B ^{-1}\\
B = A ^{-1}\\
(AB) ^{-1} = B^{-1}A^{-1}, A, B同阶, 可逆\\
单位矩阵的逆矩阵是其本身.\\
正交矩阵A的逆矩阵A^{-1} = A^T
''')
p926RMV.png

使用的渲染引擎是: /vendor/bokeh/bokeh-mathjax-2.4.3.min.js, mathjax.

五. 数据展示

function 含义
st.dataframe 输出可交互的表
st.json 输出json结构的内容(json数据的可视化)
st.table 输出静态的表
st.metric 粗体的公制单位内容展示
st.dataframe(data=None, width=None, height=None, *, use_container_width=False)
Parameters
data (pandas.DataFrame, pandas.Styler, pyarrow.Table, numpy.ndarray, pyspark.sql.DataFrame, snowflake.snowpark.dataframe.DataFrame, snowflake.snowpark.table.Table, Iterable, dict, or None) The data to display.If 'data' is a pandas.Styler, it will be used to style its underlying DataFrame. Streamlit supports custom cell values and colors. (It does not support some of the more exotic pandas styling features, like bar charts, hovering, and captions.) Styler support is experimental! Pyarrow tables are not supported by Streamlit's legacy DataFrame serialization (i.e. with config.dataFrameSerialization = "legacy"). To use pyarrow tables, please enable pyarrow by changing the config setting, config.dataFrameSerialization = "arrow".
width (int or None) Desired width of the dataframe expressed in pixels. If None, the width will be automatically calculated based on the column content.
height (int or None) Desired height of the dataframe expressed in pixels. If None, a default height is used.
use_container_width (bool) If True, set the dataframe width to the width of the parent container. This takes precedence over the width argument. This argument can only be supplied by keyword.

pandasdataframe可以直接使用magic方式输出.

六. 数据可视化

自带部分简单的绘图, 同时支持多种数据可视化库.

function 含义 备注
st.line_chart 折线图
st.area_chart 面积图
st.bar_chart 柱状图
st.pyplot 和matplotlib联动
st.altair_chart 和Altair联动
st.vega_lite_chart 和Vega-Lite联动
st.plotly_chart 和plotly联动 图形具有交互能力
st.bokeh_chart 和Bokeh联动
st.pydeck_chart 和PyDeck联动
st.graphviz_chart 图论
st.map 绘制地图相关 需要外部api的数据支持

st.plotly_chartst.pyplot为例

st.plotly_chart(figure_or_data, use_container_width=False, sharing="streamlit", theme="streamlit", **kwargs)
Parameters
figure_or_data (plotly.graph_objs.Figure, plotly.graph_objs.Data,) dict/list of plotly.graph_objs.Figure/DataSee https://plot.ly/python/ for examples of graph descriptions.
use_container_width (bool) If True, set the chart width to the column width. This takes precedence over the figure's native width value.
sharing ({'streamlit', 'private', 'secret', 'public'}) Use 'streamlit' to insert the plot and all its dependencies directly in the Streamlit app using plotly's offline mode (default). Use any other sharing mode to send the chart to Plotly chart studio, which requires an account. See https://plot.ly/python/chart-studio/ for more information.
theme ("streamlit" or None) The theme of the chart. Currently, we only support "streamlit" for the Streamlit defined design or None to fallback to the default behavior of the library.
**kwargs (null) Any argument accepted by Plotly's plot() function.
  • fig对象或者是数据
  • 容器的宽度, 布尔值, 假如是True, 这将覆盖原生的fig的width
  • 分享模式
  • 主题
  • 其他绘图的参数
import streamlit as st
import plotly.express as px

data_wind = px.data.wind()

st.write('a_page')

fig = px.bar_polar(
    data_wind,
    r="strength",
    theta="direction",
    color="frequency",
    width=720, height=480
)

fig.show()
ppsyo0e.png
# 必须使用这种方式装载图像, 否则, 图会自动加载浏览器新的页面上
st.plotly_chart(fig)
ppsyTTH.png
import streamlit as st
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

st.write('b_page')

df = pd.DataFrame({"Price 1": [7, 1, 5, 6, 3, 10, 5, 8],
                   "Price 2": [1, 2, 8, 4, 3, 9, 5, 2],
                   "Day": [1, 2, 3, 4, 5, 6, 7, 8]})

plt.figure()

fig = plt.figure(figsize=(20, 6))

# sns, 这里返回的是ax
sns.barplot(x='Day', y='Price 1', data=df, color='red')

st.pyplot(fig)
import plotly.graph_objects as go
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

mesh_size = .02
margin = 0.25

X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
    X, y.astype(str), test_size=0.25, random_state=0)

# Create a mesh grid on which we will run our model
x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
xrange = np.arange(x_min, x_max, mesh_size)
yrange = np.arange(y_min, y_max, mesh_size)
xx, yy = np.meshgrid(xrange, yrange)

# 癌症数据的knn
clf = KNeighborsClassifier(15, weights='uniform')
clf.fit(X, y)
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)

# Plot the figure
fig = go.Figure(data=[
    go.Contour(
        x=xrange,
        y=yrange,
        z=Z,
        colorscale='RdBu'
    )
])

st.plotly_chart(fig)
p92cksf.png

在展示复杂的图像时, 可交互的模式显然更为之友好, 能让阅读者更好, 更容易理解图像的信息和细节.

import plotly.graph_objects as go

# Load data
df = pd.read_csv(r"D:\workspace_python\plotly专题\apple.csv")

df.columns = [col.replace("AAPL.", "") for col in df.columns]

# Create figure
fig = go.Figure()

fig.add_trace(
    go.Scatter(x=list(df.Date), y=list(df.High)))

# Set title
fig.update_layout(
    title_text="Time series with range slider and selectors"
)

# Add range slider
fig.update_layout(
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1,
                     label="1m",
                     step="month",
                     stepmode="backward"),
                dict(count=6,
                     label="6m",
                     step="month",
                     stepmode="backward"),
                dict(count=1,
                     label="YTD",
                     step="year",
                     stepmode="todate"),
                dict(count=1,
                     label="1y",
                     step="year",
                     stepmode="backward"),
                dict(step="all")
            ])
        ),
        rangeslider=dict(
            visible=True
        ),
        type="date"
    )
)

st.plotly_chart(fig)
p92c3LT.png

由于plotly自身就自带交互控件, 实现相对复杂的交互并不难.

七. 文件

7.1 上传

st.file_uploader(label, type=None, accept_multiple_files=False, key=None, help=None, on_change=None, args=None, kwargs=None, *, disabled=False, label_visibility="visible")
Parameters
label (str) A short label explaining to the user what this file uploader is for. The label can optionally contain Markdown and supports the following elements: Bold, Italics, Strikethroughs, Inline Code, Emojis, and Links.This also supports:Emoji shortcodes, such as :+1: and :sunglasses:. For a list of all supported codes, see https://share.streamlit.io/streamlit/emoji-shortcodes.LaTeX expressions, by wrapping them in ""or"" or ""(the"" (the "$" must be on their own lines). Supported LaTeX functions are listed at https://katex.org/docs/supported.html.Colored text, using the syntax :color[text to be colored], where color needs to be replaced with any of the following supported colors: blue, green, orange, red, violet.Unsupported elements are not displayed. Display unsupported elements as literal characters by backslash-escaping them. E.g. 1\. Not an ordered list.For accessibility reasons, you should never set an empty label (label="") but hide it with label_visibility if needed. In the future, we may disallow empty labels by raising an exception.
type (str or list of str or None) Array of allowed extensions. ['png', 'jpg'] The default is None, which means all extensions are allowed.
accept_multiple_files (bool) If True, allows the user to upload multiple files at the same time, in which case the return value will be a list of files. Default: False
key (str or int) An optional string or integer to use as the unique key for the widget. If this is omitted, a key will be generated for the widget based on its content. Multiple widgets of the same type may not share the same key.
help (str) A tooltip that gets displayed next to the file uploader.
on_change (callable) An optional callback invoked when this file_uploader's value changes.
args (tuple) An optional tuple of args to pass to the callback.
kwargs (dict) An optional dict of kwargs to pass to the callback.
disabled (bool) An optional boolean, which disables the file uploader if set to True. The default is False. This argument can only be supplied by keyword.
label_visibility ("visible" or "hidden" or "collapsed") The visibility of the label. If "hidden", the label doesn't show but there is still empty space for it above the widget (equivalent to label=""). If "collapsed", both the label and the space are removed. Default is "visible". This argument can only be supplied by keyword.
Returns
(None or UploadedFile or list of UploadedFile) If accept_multiple_files is False, returns either None or an UploadedFile object.If accept_multiple_files is True, returns a list with the uploaded files as UploadedFile objects. If no files were uploaded, returns an empty list.The UploadedFile class is a subclass of BytesIO, and therefore it is "file-like". This means you can pass them anywhere where a file is expected.
import streamlit as st
import pandas as pd
from io import StringIO

# 单个文件上传
uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
    # To read file as bytes:
    bytes_data = uploaded_file.getvalue()
    st.write(bytes_data)

    # To convert to a string based IO:
    stringio = StringIO(uploaded_file.getvalue().decode("utf-8"))
    st.write(stringio)

    # To read file as string:
    string_data = stringio.read()
    st.write(string_data)

    # Can be used wherever a "file-like" object is accepted:
    dataframe = pd.read_csv(uploaded_file)
    st.write(dataframe)

# 多个文件上传
uploaded_files = st.file_uploader("Choose a CSV file", accept_multiple_files=True)
for uploaded_file in uploaded_files:
    bytes_data = uploaded_file.read()
    st.write("filename:", uploaded_file.name)
    st.write(bytes_data)

By default, uploaded files are limited to 200MB. You can configure this using the server.maxUploadSize config option. For more info on how to set config options, see https://docs.streamlit.io/library/advanced-features/configuration#set-configuration-options

需要注意的是默认的上传文件大小限制 limited to 200MB.

7.2 下载

st.download_button(label, data, file_name=None, mime=None, key=None, help=None, on_click=None, args=None, kwargs=None, *, disabled=False, use_container_width=False)
Parameters
label (str) A short label explaining to the user what this button is for. The label can optionally contain Markdown and supports the following elements: Bold, Italics, Strikethroughs, and Emojis.Unsupported elements are not displayed. Display unsupported elements as literal characters by backslash-escaping them. E.g. 1\. Not an ordered list.
data (str or bytes or file) The contents of the file to be downloaded. See example below for caching techniques to avoid recomputing this data unnecessarily.
file_name (str) An optional string to use as the name of the file to be downloaded, such as 'my_file.csv'. If not specified, the name will be automatically generated.
mime (str or None) The MIME type of the data. If None, defaults to "text/plain" (if data is of type str or is a textual file) or "application/octet-stream" (if data is of type bytes or is a binary file).
key (str or int) An optional string or integer to use as the unique key for the widget. If this is omitted, a key will be generated for the widget based on its content. Multiple widgets of the same type may not share the same key.
help (str) An optional tooltip that gets displayed when the button is hovered over.
on_click (callable) An optional callback invoked when this button is clicked.
args (tuple) An optional tuple of args to pass to the callback.
kwargs (dict) An optional dict of kwargs to pass to the callback.
disabled (bool) An optional boolean, which disables the download button if set to True. The default is False. This argument can only be supplied by keyword.
use_container_width (bool) An optional boolean, which makes the button stretch its width to match the parent container.
Returns
(bool) True if the button was clicked on the last run of the app, False otherwise.
import streamlit as st

@st.cache
def convert_df(df):
    # IMPORTANT: Cache the conversion to prevent computation on every rerun
    return df.to_csv().encode('utf-8')

csv = convert_df(my_large_df)
# 下载文本
st.download_button(
    label="Download data as CSV",
    data=csv,
    file_name='large_df.csv',
    mime='text/csv',
)
text_contents = '''This is some text'''
st.download_button('Download some text', text_contents)

# 下载二进制文件
binary_contents = b'example content'
# Defaults to 'application/octet-stream'
st.download_button('Download binary file', binary_contents)

# 下载图片
with open("flower.png", "rb") as file:
    btn = st.download_button(
            label="Download image",
            data=file,
            file_name="flower.png",
            mime="image/png"
          )
Extension Kind of document MIME Type
.aac AAC audio audio/aac
.abw AbiWord document application/x-abiword
.arc Archive document (multiple files embedded) application/x-freearc
.avif AVIF image image/avif
.avi AVI: Audio Video Interleave video/x-msvideo
.azw Amazon Kindle eBook format application/vnd.amazon.ebook
.bin Any kind of binary data application/octet-stream
.bmp Windows OS/2 Bitmap Graphics image/bmp
.bz BZip archive application/x-bzip
.bz2 BZip2 archive application/x-bzip2
.cda CD audio application/x-cdf
.csh C-Shell script application/x-csh
.css Cascading Style Sheets (CSS) text/css
.csv Comma-separated values (CSV) text/csv
.doc Microsoft Word application/msword
.docx Microsoft Word (OpenXML) application/vnd.openxmlformats-officedocument.wordprocessingml.document
.eot MS Embedded OpenType fonts application/vnd.ms-fontobject
.epub Electronic publication (EPUB) application/epub+zip
.gz GZip Compressed Archive application/gzip
.gif Graphics Interchange Format (GIF) image/gif
.htm, .html HyperText Markup Language (HTML) text/html
.ico Icon format image/vnd.microsoft.icon
.ics iCalendar format text/calendar
.jar Java Archive (JAR) application/java-archive
.jpeg, .jpg JPEG images image/jpeg
.js JavaScript text/javascript (Specifications: HTML and RFC 9239)
.json JSON format application/json
.jsonld JSON-LD format application/ld+json
.mid, .midi Musical Instrument Digital Interface (MIDI) audio/midi, audio/x-midi
.mjs JavaScript module text/javascript
.mp3 MP3 audio audio/mpeg
.mp4 MP4 video video/mp4
.mpeg MPEG Video video/mpeg
.mpkg Apple Installer Package application/vnd.apple.installer+xml
.odp OpenDocument presentation document application/vnd.oasis.opendocument.presentation
.ods OpenDocument spreadsheet document application/vnd.oasis.opendocument.spreadsheet
.odt OpenDocument text document application/vnd.oasis.opendocument.text
.oga OGG audio audio/ogg
.ogv OGG video video/ogg
.ogx OGG application/ogg
.opus Opus audio audio/opus
.otf OpenType font font/otf
.png Portable Network Graphics image/png
.pdf Adobe Portable Document Format (PDF) application/pdf
.php Hypertext Preprocessor (Personal Home Page) application/x-httpd-php
.ppt Microsoft PowerPoint application/vnd.ms-powerpoint
.pptx Microsoft PowerPoint (OpenXML) application/vnd.openxmlformats-officedocument.presentationml.presentation
.rar RAR archive application/vnd.rar
.rtf Rich Text Format (RTF) application/rtf
.sh Bourne shell script application/x-sh
.svg Scalable Vector Graphics (SVG) image/svg+xml
.tar Tape Archive (TAR) application/x-tar
.tif, .tiff Tagged Image File Format (TIFF) image/tiff
.ts MPEG transport stream video/mp2t
.ttf TrueType Font font/ttf
.txt Text, (generally ASCII or ISO 8859-n) text/plain
.vsd Microsoft Visio application/vnd.visio
.wav Waveform Audio Format audio/wav
.weba WEBM audio audio/webm
.webm WEBM video video/webm
.webp WEBP image image/webp
.woff Web Open Font Format (WOFF) font/woff
.woff2 Web Open Font Format (WOFF) font/woff2
.xhtml XHTML application/xhtml+xml
.xls Microsoft Excel application/vnd.ms-excel
.xlsx Microsoft Excel (OpenXML) application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xml XML application/xml is recommended as of RFC 7303 (section 4.1), but text/xml is still used sometimes. You can assign a specific MIME type to a file with .xml extension depending on how its contents are meant to be interpreted. For instance, an Atom feed is application/atom+xml, but application/xml serves as a valid default.
.xul XUL application/vnd.mozilla.xul+xml
.zip ZIP archive application/zip
.3gp 3GPP audio/video container video/3gpp; audio/3gpp if it doesn't contain video
.3g2 3GPP2 audio/video container video/3gpp2; audio/3gpp2 if it doesn't contain video
.7z 7-zip archive application/x-7z-compressed

八. 多媒体

就是html对应的img, audio, video标签.

function 含义
st.image 图像
st.audio 音频
st.video 视频

九. 状态提示

这部分没有什么复杂的东西, 注意一下进度条好了.

function
st.progress 进度条
st.spinner 临时消息提示
st.balloons 庆祝气球
st.snow 雪花效果
st.error 错误内容
st.warning 警告内容
st.info 告知内容
st.success 成功内容
st.exception 意外内容

十. 流控制

function 含义
st.stop 立即停止执行
st.form 创建表单
st.form_submit_button 展示一个表单提交按钮
st.experimental_rerun 即可重新运行脚本

十一. 工具

st.set_page_config这个需要注意, 这是页面的配置方式, 如站点的favico, 名称等的设置.

function 含义
st.set_page_config 页面配置
st.echo
st.help 查看帮助文档
st.experimental_show
st.experimental_get_query_params
st.experimental_set_query_params

十二. 缓存

对于希望实现复杂的网页内容, 这部分是核心.

caching-high-level-diagram.png (1431×548) (streamlit.io)

目前streamlit已经将缓存机制合并到这两个装饰器之下:

  • st.cache_data
  • st.cache_resource

12.1 st.cache_data

st.cache_data(func=None, *, ttl, max_entries, show_spinner, persist, experimental_allow_widgets)
Parameters
func (callable) The function to cache. Streamlit hashes the function's source code.
ttl (float or timedelta or None) The maximum number of seconds to keep an entry in the cache, or None if cache entries should not expire. The default is None. Note that ttl is incompatible with persist="disk" - ttl will be ignored if persist is specified.
max_entries (int or None) The maximum number of entries to keep in the cache, or None for an unbounded cache. (When a new entry is added to a full cache, the oldest cached entry will be removed.) The default is None.
show_spinner (boolean) Enable the spinner. Default is True to show a spinner when there is a cache miss.
persist (str or boolean or None) Optional location to persist cached data to**. Passing "disk" (or True) will persist the cached data to the local disk**. None (or False) will disable persistence. The default is None.
experimental_allow_widgets (boolean) Allow widgets to be used in the cached function. Defaults to False. Support for widgets in cached functions is currently experimental. Setting this parameter to True may lead to excessive memory use since the widget value is treated as an additional input parameter to the cache. We may remove support for this option at any time without notice.
import streamlit as st

@st.cache_data
def fetch_and_clean_data(url):
    # Fetch data from URL here, and then clean it up.
    return data

d1 = fetch_and_clean_data(DATA_URL_1)
# Actually executes the function, since this is the first time it was
# encountered.

d2 = fetch_and_clean_data(DATA_URL_1)
# Does not execute the function. Instead, returns its previously computed
# value. This means that now the data in d1 is the same as in d2.

d3 = fetch_and_clean_data(DATA_URL_2)
# This is a different URL, so the function executes.

# 支持的参数
@st.cache_data(persist="disk")
def fetch_and_clean_data(url):
    # Fetch data from URL here, and then clean it up.
    return data

By default, all parameters to a cached function must be hashable. Any parameter whose name begins with _ will not be hashed. You can use this as an "escape hatch" for parameters that are not hashable:

默认情况下, 所有的被装饰的函数的参数必须是可哈希的的对象(hashable), 对于使用下划线标记的参数, 将不会被哈希, 对于不可哈希的参数, 可以使用这个标记.

12.2 st.cache_resurce

Decorator to cache functions that return global resources (e.g. database connections, ML models).

缓存全局对象资源(单例), 如数据库连接对象, 机器学习模型

st.cache_resource(func, *, ttl, max_entries, show_spinner, validate, experimental_allow_widgets)

Cached objects are shared across all users, sessions, and reruns.

缓存的对象将是全局用户, 全局会话.

They must be thread-safe because they can be accessed from multiple threads concurrently. If thread safety is an issue, consider using st.session_state to store resources per session instead.

缓存对象, 必须是线程安全的, 否则, 应当使用st.session_state

这里以mysql.connector为例

10.1.4 mysql.connector.threadsafety Property

This property is an integer that indicates the supported level of thread safety provided by Connector/Python.

>>> mysql.connector.threadsafety
1
Parameters
func (callable) The function that creates the cached resource. Streamlit hashes the function's source code.
ttl (float or timedelta or None) The maximum number of seconds to keep an entry in the cache, or None if cache entries should not expire. The default is None.
max_entries (int or None) The maximum number of entries to keep in the cache, or None for an unbounded cache. (When a new entry is added to a full cache, the oldest cached entry will be removed.) The default is None.
show_spinner (boolean or string) Enable the spinner. Default is True to show a spinner when there is a "cache miss" and the cached resource is being created. If string, value of show_spinner param will be used for spinner text.
validate (callable or None) An optional validation function for cached data. validate is called each time the cached value is accessed. It receives the cached value as its only parameter and it must return a boolean. If validate returns False, the current cached value is discarded, and the decorated function is called to compute a new value. This is useful e.g. to check the health of database connections.
experimental_allow_widgets (boolean) Allow widgets to be used in the cached function. Defaults to False. Support for widgets in cached functions is currently experimental. Setting this parameter to True may lead to excessive memory use since the widget value is treated as an additional input parameter to the cache. We may remove support for this option at any time without notice.

12.3 二者应用场景

Use case Typical return types Caching decorator
Reading a CSV file with pd.read_csv(读取的csv内容) pandas.DataFrame st.cache_data
Reading a text file(读取的文本内容) str, list of str st.cache_data
Transforming pandas dataframes(pandas的对象) pandas.DataFrame, pandas.Series st.cache_data
Computing with numpy arrays(np的数组) numpy.ndarray st.cache_data
Simple computations with basic types(基本类型数据) str, int, float, … st.cache_data
Querying a database(查询数据库) pandas.DataFrame st.cache_data
Querying an API(查询 api) pandas.DataFrame, str, dict st.cache_data
Running an ML model (inference)(运行机器学习模型实例) pandas.DataFrame, str, int, dict, list st.cache_data
Creating or processing images(处理图片) PIL.Image.Image, numpy.ndarray st.cache_data
Creating charts(创建图表) matplotlib.figure.Figure, plotly.graph_objects.Figure, altair.Chart st.cache_data (but some libraries require st.cache_resource, since the chart object is not serializable – make sure not to mutate the chart after creation!)
Loading ML models(载入机器学习模型) transformers.Pipeline, torch.nn.Module, tensorflow.keras.Model st.cache_resource
Initializing database connections(数据库连接对象) pyodbc.Connection, sqlalchemy.engine.base.Engine, psycopg2.connection, mysql.connector.MySQLConnection, sqlite3.Connection st.cache_resource
Opening persistent file handles(文件句柄) _io.TextIOWrapper st.cache_resource
Opening persistent threads(线程) threading.thread st.cache_resource

简单总结

  • 返回直接数据的, 使用st.cache_data
  • 返回的是实例对象, 句柄, 数据库连接对象等, 需要全局保持唯一性的, 使用st.cache_resource

注: 官方的改名, 有点奇葩, 之前的单例(experimental_singleton), 挺好分辨的.

12.4 会话缓存

ppiABG9.png

在浏览器中, storage分为:

  • localStorage, 持久性存储, 如果不主动清除, 将一直存在在浏览器上

  • sessionStorage, 会话存储, 当会话结束, 就自动销毁.

We define access to a Streamlit app in a browser tab as a session. For each browser tab that connects to the Streamlit server, a new session is created. Streamlit reruns your script from top to bottom every time you interact with your app. Each reruns takes place in a blank slate: no variables are shared between runs.

streamlit中的会话定义: For each browser tab that connects to the Streamlit server, a new session is created, 和浏览器一致, 当每新建一个标签, 就代表一个新的session.

Session State is a way to share variables between reruns, for each user session. In addition to the ability to store and persist state, Streamlit also exposes the ability to manipulate state using Callbacks. Session state also persists across apps inside a multipage app.

会话用于解决多次运行脚本在一个会话中的变量的分享以及对多页面app的变量分享.

ppiAWIe.png

使用st.cache_resource装饰的 a = App() 是作用于全局的, session是作用于每个会话的.

import streamlit as st

# 必须初始化才能能访问
st.write(st.session_state['value'])
state-uninitialized-exception

12.5 清除缓存

Clear all in-memory and on-disk data caches.

清除内存/硬盘上的缓存

import streamlit as st

@st.cache_data
def square(x):
    return x**2

@st.cache_data
def cube(x):
    return x**3

# 手动清理缓存
if st.button("Clear All"):
    # Clear values from *all* all in-memory and on-disk data caches:
    # i.e. clear values from both square and cube
    st.cache_data.clear()

十三. 其他组件

这部分主要是一些按钮, 下拉框等.

function 含义
st.button 普通按钮
st.experimental_data_editor 实验性数据编辑器
st.checkbox 选中框
st.selectbox 选择
st.multiselect 多选框
st.slider 滑动条
st.select_slider 选择滑动条
st.color_picker 颜色选择器
from datetime import datetime

cols = st.columns(2)
with cols[0]:
    start_time = st.slider(
        "When do you start?",
        value=datetime(2020, 1, 1, 9, 30),
        format="MM/DD/YY - hh:mm")
    st.write("Start time:", start_time)

with cols[1]:
    start_color, end_color = st.select_slider(
        'Select a range of color wavelength',
        options=['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'],
        value=('red', 'blue'))
    st.write('You selected wavelengths between', start_color, 'and', end_color)

十四. 回调函数

streamlit为一些组件提供事件回调函数的支持, 但是使用上并没有实现JavaScript事件回调那种级别的便利, 只是很简单的事件回调.

// js
document.onkeydown = (event) => do somthing....;

Callbacks can be used with widgets using the parameters on_change (or on_click), args, and kwargs:

Parameters

  • on_change or on_click - The function name to be used as a callback
  • args (tuple) - List of arguments to be passed to the callback function
  • kwargs (dict) - Named arguments to be passed to the callback function

支持数据发生变化, 点击, 这两种事件

Widgets which support the on_change event:

  • st.checkbox, 复选框
  • st.color_picker, 颜色选择器
  • st.date_input, 日期输入
  • st.multiselect, 多选
  • st.number_input, 数字输入
  • st.radio, 单选
  • st.select_slider, 复合滑动条
  • st.selectbox, 下拉框
  • st.slider, 滑动条
  • st.text_area, 多行文本输入
  • st.text_input, 单行文本输入
  • st.time_input, 事件输入
  • st.file_uploader, 文件上传

Widgets which support the on_click event:

  • st.button, 普通按钮
  • st.download_button, 下载按钮
  • st.form_submit_button, 表单按钮

To add a callback, define a callback function above the widget declaration and pass it to the widget via the on_change (or on_click ) parameter.

st.selectbox(
        'How would you like to be contacted?',
        ('Email', 'Home phone', 'Mobile phone'), on_change=a.test, args=(1,1))

其实现的方式和Tampermonky中的自定义函数事件回调类似, 但是streamlit无法通过回调函数直接传递参数.

十五. 多页面

ppPMfA0.png

和普通的方式, 主要的差异在于url, 可以直接使用

Once you've created your entrypoint file, you can add pages by creating .py files in a pages/ directory relative to your entrypoint file. For example, if your entrypoint file is Home.py, then you can create a pages/About.py file to define the "About" page. Here's a valid directory structure for a multipage app:

Home.py # This is the file you run with "streamlit run"
└─── pages/
└─── About.py # This is a page
└─── 2_Page_two.py # This is another page
└─── 3_😎_three.py # So is this

只需要home.py作为入口(不一定需要这个名称), pages(必须这个名称)下包含需要的页面文件即可

Only .py files in the pages/ directory will be loaded as pages. Streamlit ignores all other files in the pages/ directory and subdirectories.

注意放置的目录, 不要放置在子目录之下, 文件夹必须为pages

Directory structure侧边栏自动生成, 其他的组件可以正常放置在侧边栏之上.

十六. 套件

Component, 暂时翻译为套件吧(习惯了将诸如按钮, 图表这类的对象成为组件).

If your goal in creating a Streamlit Component is solely to display HTML code or render a chart from a Python visualization library, Streamlit provides two methods that greatly simplify the process: components.html() and components.iframe().

Render an HTML string

While st.text, st.markdown and st.write make it easy to write text to a Streamlit app, sometimes you'd rather implement a custom piece of HTML. Similarly, while Streamlit natively supports many charting libraries, you may want to implement a specific HTML/JavaScript template for a new charting library. components.html works by giving you the ability to embed an iframe inside of a Streamlit app that contains your desired output.

Render an iframe URL

components.iframe is similar in features to components.html, with the difference being that components.iframe takes a URL as its input. This is used for situations where you want to include an entire page within a Streamlit app.

可以将套件认为是, 集成的独立模块, 相当于现成的streamlit特定功能的模块(可能这个功能是为未来的套件商店_Components • Streamlit做准备).

16.1 st.components.v1.html

# 在页面中注入js脚本
from streamlit.components.v1 import html

js = '''
    <script>
    console.log("Hello World!");
    </script>
    '''

html(js)

Display an HTML string in an iframe.

注意这里注入的js脚本还是针对iframe

Function signature
st.components.v1.html(html, width=None, height=None, scrolling=False)
Parameters
html (str) The HTML string to embed in the iframe.
width (int) The width of the frame in CSS pixels. Defaults to the app's default element width.
height (int) The height of the frame in CSS pixels. Defaults to 150.
scrolling (bool) If True, show a scrollbar when the content is larger than the iframe. Otherwise, do not show a scrollbar. Defaults to False.

16.2 st.components.v1.iframe

Load a remote URL in an iframe.

载入子框架页面

Function signature
st.components.v1.iframe(src, width=None, height=None, scrolling=False)
Parameters
src (str) The URL of the page to embed.
width (int) The width of the frame in CSS pixels. Defaults to the app's default element width.
height (int) The height of the frame in CSS pixels. Defaults to 150.
scrolling (bool) If True, show a scrollbar when the content is larger than the iframe. Otherwise, do not show a scrollbar. Defaults to False.
import streamlit.components.v1 as components

# 嵌入一个子页面
components.iframe("https://docs.streamlit.io/en/latest", width=1000, height=900, scrolling=True)
ppiumSe.png

十七. 使用

  • 普通文档内容, markdown
  • 数据处理, pandas, numpy
  • 图表, matplotlib/plotly
  • 数据库, SQLite3/MySQL
  • 模型/数学: sklearn/scipy
  • .....

通过简单组合, 即可完成从从数据存储, 查询, 展示, 绘图等等一些列内容....实现从前端到后端的全栈开发的需要.

Streamlit (geo.streamlit.app)

p922RMt.png

十八. 配置文件

In a global config file at ~/.streamlit/config.toml for macOS/Linux or %userprofile%/.streamlit/config.toml for Windows:

streamlit使用的配置文件为toml格式

# 端口
[server]
port = 80 # 端口
showWarningOnDirectExecution = false
runOnSave = true # 代码发生变化-自动运行脚本

# 等价于
streamlit run your_script.py --server.port 80

# 主题
[theme]
base="light"

侧边菜单栏设置, 由于侧边栏带有一个清除缓存的功能,

十九. 部署

  • 私下分享

    直接在局域网可以访问.

    Sharing private apps

    By default all apps deployed from private source code are private to the developers in the workspace. Your apps will not be visible to anyone else unless you grant them explicit permission. You can grant permission either in your workspace or from the app itself.

  • 公开分享

    streamlit提供Streamlit Community Cloud作为托管和实现的方式

    Add your app to GitHub

    Streamlit Community Cloud launches apps directly from your GitHub repo, so your app code and dependencies need to be on GitHub before you try to deploy the app. See App dependencies for more information.

    支持直接启动Github仓库中的streamlit项目

    Optionally, add a configuration file

    Streamlit allows you to optionally set configuration options via four different methods. Among other things, you can use custom configs to customize your app's theme, enable logging, or set the port on which your app runs. For more information, see Configuration and Theming. On Streamlit Community Cloud, however, you can only set configuration options via a configuration file in your GitHub repo.

    Specifically, you can add a configuration file to the root (top-level) directory of your repo: create a .streamlit folder, and then add a config.toml file to that folder. E.g., if your app is in a repo called my-app, you would add a file called my-app/.streamlit/config.toml. Say you want to set the theme of your app to "dark". You would add the following to your .streamlit/config.toml file:

  • Deploy an app - Streamlit Docs

二十. 资源

.... 未完, 待续