一个Tampermonkey脚本的诞生

一. 前言

以下是Kyouichirou/BiliBili_Optimizer: enjoy and control bilibili (github.com), 这个脚本的写作的一些想法和总结.

吸取了这个项目Kyouichirou/Zhihu_Optimizer: 知乎优化器 (github.com)的教训, 由于在写这个脚本的时候并没有确定需要实现什么功能, 前期的代码规划并不明确, 随着大量功能不断地被添加上去, 导致代码变得极度臃肿和难以阅读(由于当时对元编程的理解不够深刻), 难以维护.

由于在tampermonkey上, 无法将直接将代码相对整合成相对独立的模块, 为了避免后期代码的增长导致难以维护的问题, 预先对代码进行严格的管控,

遵循以上的基本逻辑.

1.1 脚本的目标

拦截内容
自动化
搜索
额外辅助

1.2 调试

为了方便调试和代码, 直接调用本地脚本.

在Tampermonkey进阶指南 | Lian (kyouichirou.github.io)已经提及, 如何直接调用本地脚本的设置问题.

// ==UserScript==
// @name         bili_bili_optimizer_local_debug
// @namespace    https://github.com/Kyouichirou
// @version      1.0
// @description  control bilibili!
// @author       HLA
// @match        https://t.bilibili.com/
// @match        https://www.bilibili.com/*
// @match        https://space.bilibili.com/*
// @match        https://search.bilibili.com/*
// @grant        GM_addStyle
// @grant        GM_setValue
// @grant        GM_getValue
// @grant        GM_openInTab
// @grant        unsafeWindow
// @grant        GM_notification
// @grant        window.onurlchange
// @grant        GM_registerMenuCommand
// @grant        GM_unregisterMenuCommand
// @grant        GM_addValueChangeListener
// @noframes
// @require      file:///D:/common_code/javascript/bili/finish%20copy.js
// @run-at       document-start
// ==/UserScript==

由于受到新的扩展协议v3的影响, tampermonkey不鼓励还继续使用@includes, 而应该使用@match替代.

二. 页面分析

对于干预页面的操作, 可以在多个阶段上进行操作.

2.1 页面事件

事件是网页的核心组成, 干预事件执行, 能够控制绝大部分的页面操作执行.

2.2 window对象

当涉及全局操作, 例如全局对象或者全局函数, 那么就可以轻易拦截到相关的操作执行.

2.2.1 原型链

对于一些需要等待加载完成之后才进行操作, 如document.body, 通过监听事件并不是很好的的方式.

HTMLBodyElement.prototype

一些需要全局监听操作, 如a标签数据的写入.

HTMLAnchorElement.prototype

原型链 + Object + Proxy/Reflect, 构成元编程的核心.

2.3 网络请求

chrome浏览器的控制板上的网络请求模块, 侧边栏的检索可以直接检索请求返回的内容.

找到执行的脚本的来源
拦截返回的内容

2.3.1 关键词

通过一些非常具有特征的关键词, 找到函数所在的位置.

三. 干预

在JavaScript进阶-拦截和修改fetch response内容-以B站为例 | Lian (kyouichirou.github.io)中提及, B站页面基本为异步框架结构, 大量的后期操作html的操作可以被拦截.

干预的基本原则是:

尽量减少对html的直接操作, 例如对元素进行隐藏, 删除等操作.
尽量减少css的使用
效率优先

精确拦截操作, 例如通过监听节点的变化来执行的操作, 有时可能不是很好的选择, 例如需要等待页面加载完成.

_search_box_clear() {
    this.__proxy(unsafeWindow, 'open', {
        apply(...args) {
            const url = args[2]?.[0]?.split('&')[0] || '';
            if (url) {
                if (Dynamic_Variants_Manager.key_check(decodeURIComponent(url))) {
                    Colorful_Console.main('search content contain black key', 'warning', true);
                    return;
                } else args[2][0] = url;
            }
            Reflect.apply(...args);
        }
    });
}

以拦截搜索框的检索操作, 监听元素的click事件, 或者keydown事件, 虽然可以实现需要的功能, 但是不仅麻烦, 而且通用性不好. 通过代码分析, 可以看到, 搜索框打开搜索页面的方式是window.open()实现的.

通过拦截window.open不仅代码实现更为简单, 而且不需要繁琐的判断各种元素, 事件是否准确被拦截, 以及等待元素加载完成才能操作.

Thoughts on ES6 Proxies Performance | www.thecodebarbarian.com

四. 数据

鉴于indexedDB的使用存在诸多的问题, 使用的存储全部为tampermonkey的内置存储, 但是内置存储数据存在查询, 大规模存储, 跨标签的数据同步/通信等诸多的问题.

为了更好管理数据:

自定义数据结构
有限数据存储

4.1 自定义数据结构

脚本中的数据载体均为数组结构, 这是为了统一管理数据操作方式.

class Dic_Array extends Array {
    #id_name;
    /**
         *
         * @param {Array} data
         * @param {string} id_name
         */
    constructor(data, id_name) {
        // 继承, 必须先调用父类, 才能使用this
        if (typeof data !== 'object') {
            super();
            return;
        }
        super(...data);
        this.#id_name = id_name;
    }
    /**
         *
         * @param {string} id
         * @returns {boolean}
         */
    includes_r(id, mode = false) {
        if (!id) return null;
        const id_name = this.#id_name;
        const target = super.find(e => e[id_name] === id);
        // 更新访问的数据
        let f = false;
        if (target) {
            f = true;
            const now = Date.now();
            target.last_active_date = now;
            target.visited_times += 1;
            Dynamic_Variants_Manager.rate_up_status_sync(id_name, id, now, target.visited_times);
            id_name === 'up_id' && Dynamic_Variants_Manager.accumulative_func();
        }
        return mode ? target : f;
    }
    /**
         *
         * @param {string} id
         * @returns {boolean}
         */
    remove(id) {
        // 返回结果, 根据是否执行了删除操作来决定是否写入数据
        const index = super.findIndex(e => id === e[this.#id_name]);
        // 注意这里的删除操作, splice会返回和这个类数据结构一样的数组包裹的元素, 导致这个函数会访问constructor(), 需要再次调用super()
        return index > -1 && (super.splice(index, 1), true);
    }
    /**
         *
         * @param {object} info
         */
    update_active_status(info) {
        const id = info.id;
        if (!id) return;
        const target = super.find(e => e[this.#id_name] === id);
        if (target) target.last_active_date = info.date, target.visited_times = info.visited_times;
    }
}

class Visited_Array extends Array {
    #limit = 999;
    /**
         *
         * @param {Array} data
         * @param {number} limit
         */
    constructor(data, limit) {
        if (typeof data !== 'object') {
            super();
            return;
        }
        super(...data);
        limit > 999 ? this.#limit = limit : null;
    }
    /**
         *
         * @param {string} id
         */
    push(id) {
        // 只允许存储限制范围内的数据长度, 默认长度1000
        // 超出范围, 则弹出数据
        // 假如存在数据, 则移动到第一位
        if (!id) return;
        const index = super.indexOf(id);
        // unshift, 返回拼接后的数组长度, 注意
        (index < 0 ? super.unshift(id) : index > 0 ? super.unshift(super.splice(index, 1)[0]) : super.length) > this.#limit && super.pop();
    }
}

class Block_Video_Array extends Visited_Array {
    includes_r(id) { return (id && super.includes(id)) ? (Dynamic_Variants_Manager.accumulative_func(), true) : false; }
    remove(id) {
        const index = super.indexOf(id);
        return index > -1 && (super.splice(index, 1), true);
    }
}

继承Array, 根据需要自定义修改.

4.2 有限数据

顾名思义, 避免数据存储的爆炸.

时效(最后一次拦截的时间)和厌恶的程度(愿意使用存储空间存储更多的拦截内容), 访问次数(拦截的次数, 越活越, 就一直保存), 构成拦截的数据的处理的不同方式.

永久存储的数据, 这部分数据较少, 剔除了配置之外, 其他的拦截数据, 如拦截的up, 通过时间戳和访问次数, 后面可以通过自定义的方式移除掉部分不活跃的数据.
```
{
    "up_id": "37974444",
    "up_name": "黑马程序员",
    "add_date": 1695713396982,
    "last_active_date": 1696906697736,
    "visited_times": 12,
    "block_reason": 0,
    "block_from": 0
}
```

动态数据, 例如某些获得流量倾斜的up的内容, 可以通过管控数据的来源的先后, 旧的数据当达到存储的上限就移除掉, 其中拦截视频, 拦截关键词等内容, 均为该模式.

push(id) {
        // 只允许存储限制范围内的数据长度, 默认长度1000
        // 超出范围, 则弹出数据
        // 假如存在数据, 则移动到第一位
        if (!id) return;
        const index = super.indexOf(id);
        // unshift, 返回拼接后的数组长度, 注意
        (index < 0 ? super.unshift(id) : index > 0 ? super.unshift(super.splice(index, 1)[0]) : super.length) > this.#limit && super.pop();
    }
}

五. 代码风格

5.1 简单

简单, 尽可能的简单, ES6+提供了更多简化代码的方式的实现, 例如:.

字典结构数据的指定读取.

// ----------------- 启动
{
    const { href, search, origin, pathname } = location;
    // 清除直接访问的链接追踪参数
    search.startsWith('?spm_id_from') ? (window.location.href = origin + pathname) : (new Bili_Optimizer(href)).start();
}
// 启动 -----------------

链式操作符

这是新的js特性, 这个特性非常好用, 让很多无关紧要的判断操作, 直接一行代码解决.

document.getElementsByClassName(classname)[0]?.click();

// 等价
const nodes = document.getElementsByClassName(classname);
nodes.length > 0 && nodes[0].click();

三点操作符

this.#video.paused ? this.#video.play() : this.#video.pause();

以上仅是示例, 简化写法, 不仅仅是代码的简化, 其本身就蕴含着代码逻辑的清晰理解的表征, 但不宜过度使用, 避免舍本逐末.

5.2 注释

在各类高级语言中, 不需要声明变量的类型是一把双刃剑.

不需要声明变量类型尽管让写代码变得更为容易和简单, 但是编辑器预先发现代码的问题变得非常困难, 以及后期维护和团队协作都将会存在问题.

这个时候, 对代码如何进行高效的注释(分为文档, 类型注释), 成为一条相对简单的解决代码难以检查, 难以阅读的捷径.

/**
*
* @param {string} id
* @returns {boolean}
*/
remove(id) {
    // 返回结果, 根据是否执行了删除操作来决定是否写入数据
    const index = super.findIndex(e => id === e[this.#id_name]);
    // 注意这里的删除操作, splice会返回和这个类数据结构一样的数组包裹的元素, 导致这个函数会访问constructor(), 需要再次调用super()
    return index > -1 && (super.splice(index, 1), true);
}

对于代码的类型注释, js并没有提供语言层级的解决方案(貌似在讨论中).

# python版本的typing

from typing import Union

def ret_multi(a: int, b: int) -> Union[str, int]:
    if (a >= b):
        return a - b
    else:
        return 'No!'

但是注释过多, 却某种程度也在大幅度增加代码的工作量, 但是严格清晰的注释, 对于编辑器预先发现问题却是不可或缺的, 如何平衡也是个问题.