Streamlit使用指南

pandas-profiling (文档还是老名称) primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, pandas-profiling delivers an extended analysis of a DataFrame while alllowing the data analysis to be exported in different formats such as html and json.

该库的目标在于提供类似于pandas的描述性统计的一站式数据概览(EDA)支持.

阅读全文 »

常用级数求和公式

发布于 2023-05-04 | 标签: math | 9分钟 | 1433字数

前言

阶乘和gamma函数关系密切.

\Gamma(z) = \int_0^{\infty}x^{z-1} * e ^{-x}\mathrm{d}x\\ 或者\\ \Gamma(z + 1) = \int_0^{\infty}x^z * e ^{-x}\mathrm{d}x\\ \\ \Gamma(z) = (z -1)!, \; z 为正整数

# 在python中适用sympy来求这个积分
>>> from sympy import *
>>> import numpy as np
>>> x = symbols('x')
>>> print(integrate(x**1 * exp(-x), (x, 0, np.inf)))
1
>>> print(integrate(x**2 * exp(-x), (x, 0, np.inf)))
2
>>> print(integrate(x**3 * exp(-x), (x, 0, np.inf)))
6

阅读全文 »

常用统计学概要

发布于 2023-05-04 | 标签: spss 、 statistics 、 priority | 109分钟 | 25279字数

本就存在大量相当晦涩的概念, 翻译和各种理解(符号使用)上的混乱, 让统计学变得更为复杂.

大学时我一直觉得统计学很难还差点挂科.

工作以后才发现难的不是统计学而是我们的教材写得不好. 比起高等数学统计概念其实容易理解多了.

以下内容主要整合自: 多种(国内/国外)统计学教材(或其他统计学相关书籍), Wikipedia, stackexchange, 知乎, 百度百科等...以及其他相对权威的统计学站点和spss相关内容站点.

对于不确定的信息或者难以理解的部分, 一般采用英文版本的内容.

相关内容的描述, 计算等, 优先采用SPSS的解决方案.

IBM SPSS Statistics 26 Documentation.

阅读全文 »

数据可视化-Anscombe's quartet

发布于 2023-04-27 | 标签: plotly 、 data visualization | 13分钟 | 2201字数

一. 前言

Anscombe's quartet, 安斯库姆四重奏.

阅读全文 »

Excel有趣函数系列-lookup

发布于 2023-04-24 | 标签: excel | 9分钟 | 2083字数

一. 前言

注意不是vlookup, 是lookup.

碰到一个这样的问题, 下面动图的函数实现原理:

(图源: LOOKUP函数10种经典用法新手必看)

这个函数并不难理解, 比较大的坑点, 在于其查询内容是需要经过排序的.

注意这里的排序, 这个点类似于pandas中的loc在定位不到数据时(这个功能在pandas上正逐步被废弃), 假如数据不是经过排序, 其返回的内容是异常的.

阅读全文 »

机器学习系列-数据预处理

发布于 2023-04-23 | 标签: data preprocessing 、 machine learning | 15分钟 | 2714字数

一. 前言

In statistics and applications of statistics, normalization can have a range of meanings.[1] In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment.

在Wikipedia中, 以下的操作都归于Normalization(统计学的概念上).

阅读全文 »

Excel有趣函数系列1 - sumproduct()

发布于 2023-04-20 | 标签: excel | 6分钟 | 1128字数

一. 前言

突然被问, excel如何实现中国式排名?

值	排名
1	1
1	1
2	2, 还是延续上面的序号, 而不是直接跳到3, 这种方式就称为中国式排名
3	3
4	4

即, 在排的值中出现重复值时该如何处理接下来的序号.

在MySQL的窗口函数, 针对排名的问题提供了三种模式:

rank()
dense_rank()
row_number()

阅读全文 »