Posts

Python 源码学习（5）：协程

Python 源码学习（5）：协程协程 coroutine 是一种用户态的轻量级线程，它可以在函数的特定位置暂停或恢复，同时调用者可以从协程中获取状态或将状态传递给协程；Python中的生成器 generator 就是一个典型的协程应用，本文简单地对 Python 中生成器的实现进行分析。 1 生成器如果 Python 中的函数含有 yield 关键字，那么在调用这个函数时，它不会如同普通的函数一样运行到 return 语句并返回一个变量，而是会立即返回一个生成器对象；以一个斐波那契数列生成函数为例： def FibonacciSequenceGenerator(): a, b = 0, 1 while True: yield a + b a, b = b, a + b if __name__ == "__main__": fsg = FibonacciSequenceGenerator() print(fsg) print(type(fsg)) $ python3 main.py <generator object FibonacciSequenceGenerator at 0x7fb4720b1ac0> <class 'generator'> 可以看到函数 FibonacciSequenceGenerator 返回了一个类型为 generator 的生成器对象 f；对于生成器对象，我们不能像操作普通函数一样直接进行函数调用，而是要使用 next() 或 fsg.send() 来进行函数切换，使得生成器函数开始或继续执行，直到 yield 所在行或是函数末尾再将执行权交还给调用方： for i in range(100): print(next(fsg)) $ python3 main.py 1 2 3 5 # ... 218922995834555169026 354224848179261915075 573147844013817084101 生成器的这种行为与线程切换非常类似，它包含了执行，保存，恢复上下文的步骤，用生成器来模拟线程的行为可以避免从用户态到内核态的切换，从而提升效率。 ...

Python 源码学习（4）：编译器和虚拟机

Python 源码学习（4）：编译器和虚拟机 Python 是一种解释型语言，一般在使用前我们会从 Python 官方网站上下载使用 C 语言开发编译的 CPython 解释器，本文用到的源码均来自 CPython。 Python 解释器（Python Interpreter）由 Python 编译器（Python Compiler）和 Python 虚拟机（Python Virutal Machine）两部分组成。当我们通过 Python 命令执行 Python 代码时，Python 编译器会将 Python 代码编译为 Python 字节码（bytecode）；随后 Python 虚拟机会读取并逐步执行这些字节码。 1 Python 编译器 1.1 代码对象 Python 提供了内置函数 compile，可以编译 Python 代码并生成一个包含字节码信息的对象，举例如下： # test.py def Square(a): return a * a print(f"result:\t\t{Square(5)}") # main.py f = "test.py" code_obj = compile(open(f).read(), f, 'exec') exec(code_obj) print(f"code_obj:\t{code_obj}") print(f"type:\t\t{type(code_obj)}") $ python3 main.py result: 25 code_obj: <code object <module> at 0x7f052c156b30, file "test.py", line 1> type: <class 'code'> 可以看到生成的 code_obj 对象的类型是 class 'code'，它在源码中对应的结构体是代码对象 PyCodeObject；代码对象是后续步骤中 Python 虚拟机操作的核心，它将字节码相关的参数个数、局部变量、变量名称、指令序列等信息包装成了一个结构体： ...

Python 源码学习（3）：list 类型

Python 源码学习（3）：list 类型 Python 中的 list 类型在源码中是一个名为 PyListObject 的结构体，定义在 listobject.h 文件中： // Include/cpython/listobject.h typedef struct { PyObject_VAR_HEAD /* Vector of pointers to list elements. list[0] is ob_item[0], etc. */ PyObject **ob_item; /* ob_item contains space for 'allocated' elements. The number * currently in use is ob_size. * Invariants: * 0 <= ob_size <= allocated * len(list) == ob_size * ob_item == NULL implies ob_size == allocated == 0 * list.sort() temporarily sets allocated to -1 to detect mutations. * * Items must normally not be NULL, except during construction when * the list is not yet visible outside the function that builds it. */ Py_ssize_t allocated; } PyListObject; 它的实现和 C++ 中的 std::vector 类似，都是通过维护一个动态数组，在增加数据的时候动态扩大数组的容量来实现的；PyListObject 结构中包含了一个变长对象头部 PyObject_VAR_HEAD，ob_size 表示当前动态数组的长度，**ob_item 是指向动态数组的指针，allocated 是动态数组的容量；我们可以从它的类型指针 PyTypeObject PyList_Type 中找到用来操作 list 对象的相关方法： ...

ProtoBuf 语法和编码原理入门

ProtoBuf 语法和编码原理入门序列化是指将结构化数据转换成易于存储或发送的数据格式的过程，Protocol Buffer 简称 ProtoBuf，是一种语言无关，平台无关的序列化工具，由谷歌在 2008 年开源。相较于常见的序列化工具 XML, JSON, YAML, CSV 等，ProtoBuf 的优势主要包括序列化后数据量小，序列化和反序列化过程速度快，使用时只需定义 proto 文件使得其维护成本低，可向后兼容等；但因为其数据以二进制数据流的形式存在，也有人类不可读的劣势。本文主要介绍 ProtoBuf 的使用方法，包括 .proto 文件的语法，以及如何使用 protoc 工具来生成不通语言的代码；以及其编码原理。 1 语法首先从 https://github.com/protocolbuffers/protobuf 找到最新版本的 ProtoBuf，下载预编译好的二进制文件 protoc 解压到环境变量目录，本文使用的是 3.15.7 版本： $ protoc --version libprotoc 3.15.7 以一个简单的 proto 文件为例，它的语法和 C++ 类似： // msg.proto syntax = "proto3"; package Message; message SearchRequest { reserved 6, 9 to 12; reserved "foo", "bar"; string query = 1; int32 page_number = 2; int32 result_per_page = 3; } message ResultType { message Result { string url = 1; string title = 2; repeated string snippets = 3; } } message SearchResponse { repeated ResultType.Result results = 1; } 使用 protoc 工具生成指定语言的代码： ...

Python 源码学习（2）：int 类型

Python 源码学习（2）：int 类型 Python 中的标准数据类型有六种，分别是 number, string, list, tuple, set, dictionary，前文已经阐述过它们的对象类型都是继承了 PyBaseObject_Type 类型的 PyType_Type 类型的实例对象，本文则主要探究 Python 中 int 类型的实现。不同于 C 和 C++ 中的 int 类型，Python 中的 int 类型最大的特点是它一般是不会溢出的，对比用 C 和 Python 分别输出两个一百万相乘的结果： >>> x = 10000000000 >>> print(x) 10000000000 在 C 语言中会发生溢出： printf("%d\n", 1000000 * 1000000); printf("%u\n", 1000000 * 1000000); -727379968 3567587328 1 int 类型在内存中的存储方式 1.1 内存结构 Python 中的 int 整数类型实际上是一个名为 PyLongObject 的结构体，定义在 longintrepr.h 文件中： // Include/object.h #define PyObject_VAR_HEAD PyVarObject ob_base; // Objects/longobject.h #if PYLONG_BITS_IN_DIGIT == 30 typedef uint32_t digit; // ... #elif PYLONG_BITS_IN_DIGIT == 15 typedef unsigned short digit; // ... #endif typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */ // Include/longintrepr.h struct _longobject { PyObject_VAR_HEAD digit ob_digit[1]; }; 它由两部分组成，分别是： ...