Taichi v1.1.0 was released!

Taichi Lang
Parallel-Programming-in-Python
13 min readAug 23, 2022

--

Highlights

New features

Quantized data types

High-resolution simulations can deliver great visual quality, but are often limited by the capacity of the onboard GPU memory. This release adds quantized data types, allowing you to define your own integers, fixed-point numbers, or floating-point numbers of arbitrary number of bits that may strike a balance between your hardware limits and simulation effects. See Using quantized data types for a comprehensive introduction.

Offline cache

A Taichi kernel is implicitly compiled the first time it is called. The compilation results are kept in an online in-memory cache to reduce the overhead in the subsequent function calls. As long as the kernel function is unchanged, it can be directly loaded and launched. The cache, however, is no longer available when the program terminates. Then, if you run the program again, Taichi has to re-compile all kernel functions and reconstruct the online in-memory cache. And the first launch of a Taichi function is always slow due to the compilation overhead.

To address this problem, this release adds the offline cache feature, which dumps the compilation cache to the disk for future runs. The first launch overhead can be drastically reduced in subsequent runs. Taichi now constructs and maintains an offline cache by default.

The following table shows the launch overhead of running cornell_box on the CUDA backend with and without offline cache:

Note that, for now, the offline cache feature works only on the CPU and CUDA backends. If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or ti.init(offline_cache=False) and file an issue with us on Taichi's GitHub repo. See Offline cache for more information.

Forward-mode automatic differentiation

Adds forward-mode automatic differentiation via ti.ad.FwdMode. Unlike the existing reverse-mode automatic differentiation, which computes vector-Jacobian product (vJp), forward-mode computes Jacobian-vector product (Jvp) when evaluating derivatives. Therefore, forward-mode automatic differentiation is much more efficient in situations where the number of a function's outputs is greater than its inputs. Read this example, which demonstrates Jacobian matrix computation in forward mode and reverse mode.

SharedArray (experimental)

GPU’s shared memory is a fast small memory that is visible within each thread block (or workgroup in Vulkan). It is widely used in scenarios where performance is a crucial concern. To give you access to your GPU’s shared memory, this release adds the SharedArray API under the namespace ti.simt.block.

The following diagram illustrates the performance benefits of Taichi's SharedArray. With SharedArray, Taichi Lang is comparable to or even outperforms the equivalent CUDA code.

Texture (experimental)

Taichi now supports texture bilinear sampling and raw texel fetch on both Vulkan and OpenGL backends. This feature leverages the hardware texture unit and diminishes the need for manual composition of bilinear interpolation code in image processing tasks. This feature also provides an easy way for texture mapping in tasks such as rasterization or ray-tracing. On Vulkan backend, Taichi additionally supports image load and store. You can directly manipulate texels of an image and use this very image in subsequent texture mapping.

Note that the current texture and image APIs are in the early stages and subject to change. In the future we plan to support bindless textures to extend to tasks such as ray-tracing. We also plan to extend full texture support to all backends that support texture APIs.

Run ti example simple_texture to see an example of texture support!

Improvements

GGUI

  1. Supports fetching and storing the depth information of the current scene:

1.1 In a Taichi field: ti.ui.Window.get_depth_buffer(field);

1.2 In a NumPy array: ti.ui.Window.get_depth_buffer_as_numpy().

2. Supports drawing 3D lines using Scene.lines(vertices, width).

3. Supports drawing mesh instances. You can pass a list of transformation matrices (ti.Matrix.field(4, 4, ti.f32, shape=N)) and call ti.ui.Scene.mesh_instance(vertices, transforms=TransformMatrixField) to put various mesh instances at different places.

4. Supports showing the wireframe of a mesh when calling Scene.mesh() or Scene.mesh_instance() by setting show_wireframe=True.

Syntax

  • Taichi dataclass: Taichi now recommends using the @ti.dataclass decorator to define struct types, or even attach functions to them. See Taichi dataclasses for more information.
  • As shown in the dataclass example above, vec2, vec3, and vec4 in the taichi.math module (same for ivec and uvec) can be directly used as type hints. The numeric precision of these types is determined by default_ip or default_fp in ti.init().
  • More flexible instantiation for a struct or dataclass:
    In earlier releases, to instantiate a taichi.types.struct and taichi.dataclass, you have to explicitly put down a complete list of member-value pairs like:
  • As of this release, you are given more options. The positional arguments are passed to the struct members in the order they are defined; the keyword arguments set the corresponding struct members. Unspecified struct members are automatically set to zero. For example:
  • Supports calling fill() from both the Python scope and the Taichi scope.
    In earlier releases, you can only call fill() from the Python scope, which is a method in the ScalarField or MatrixField class. As of this release, you can call this method from either the Python scope or the Taichi scope. See the following code snippet:
  • More flexible initialization for customized matrix types:
    As the following code snippet shows, matrix types created using taichi.types.matrix() or taichi.types.vector() can be initialized more flexibly: Taichi automatically combines the inputs and converts them to a matrix whose shape matches the shape of the target matrix type.
  • Makes ti.f32(x) syntax sugar for ti.cast(x, ti.f32), if x is neither a literal nor of a compound data type. Same for other primitive types such as ti.i32, ti.u8, or ti.f64.
  • More convenient axes order adjustment: A common way to improve the performance of a Taichi program is to adjust the order of axes when laying out field data in the memory. In earlier releases, this requires in-depth knowledge about the data definition language (the SNode system) and may become an extra burden in situations where sparse data structures are not required. As of this release, Taichi supports specifying the order of axes when defining a Taichi field.

Important bug fixes

  • Fixed infinite loop when an integer pow() has a negative exponent (#5275)
  • Fixed numerical issues with matrix slicing (#4677)
  • Improved data type checks for ti.ndrange (#4478)

API changes

Added

  • ti.BitpackedFields
  • ti.from_paddle
  • ti.to_paddle
  • ti.FieldsBuilder.lazy_dual
  • ti.math module
  • ti.Texture
  • ti.ref
  • ti.dataclass
  • ti.simt.block.SharedArray

Moved

Deprecated

  • ti.ui.make_camera: Please construct cameras with ti.ui.Camera instead.

Deprecation notice

Python 3.6

As announced in v1.0.0 release, we no longer provide official python3.6 wheels through pypi. Users who need taichi with python3.6 may still build from source but its support is not guaranteed.

Taichi_GLSL

The taichi_glsl package on pypi will no longer be maintained as of this release. GLSL-related features will be implemented in the official taichi.math module, which includes data types and handy functions for daily math and shader development:

  • Vector types: vec2, vec3, and vec4.
  • Matrix types: mat2,mat3, and mat4.
  • GLSL functions such as step(),clamp(), and smoothstep().

MacOS 10.14

Official support for MacOS Mojave (10.14, released in 2018) will be dropped starting from v1.2.0. Please upgrade your MacOS if possible or let us know if you have any concerns.

Full changelog:

  • [misc] Update version to v1.1.0 (by Ailing Zhang)
  • [test] Fix autodiff test for unsupported shift ptr (#5723) (by Mingrui Zhang)
  • [Doc] [type] Add introduction to quantized types (#5705) (by Yi Xu)
  • [autodiff] Clear all dual fields when exiting context manager (#5716) (by Mingrui Zhang)
  • [bug] Support indexing via np.integer for field (#5712) (by Ailing)
  • [Doc] Add docs for GGUI’s new features (#5647) (by Mocki)
  • [Doc] Add introduction to forward mode autodiff (#5680) (by Mingrui Zhang)
  • [autodiff] Fix AdStackAllocaStmt not correctly backup (#5692) (by Mingrui Zhang)
  • Fix shared array for all Vulkan versions. (#5721) (by Haidong Lan)
  • [misc] Rc v1.1.0 patch3 (#5709) (by Ailing)
  • [bug] RC v1.1.0 patch2 (#5683) (by Ailing)
  • [ci] Temporarily disable a M1 vulkan test (#5703) (by Proton)
  • [Doc] Add doc about offline cache (#5646) (#5686) (by Mingming Zhang)
  • [bug] Fix bug that kernel names are not correctly captured by the profiler (#5651) (#5669) (by Mingming Zhang)
  • [gui] GGUI scene APIs are broken (#5658) (#5667) (by PENGUINLIONG)
  • [release] v1.1.0 patch1 (#5649) (by Ailing)
  • [llvm] Compile serially when num_thread=0 (#5631) (by Lin Jiang)
  • [cuda] Reduce kernel profiler memory usage (#5623) (by Bo Qiao)
  • [doc] Add docstrings for texture related apis (by Ailing Zhang)
  • [Lang] Support from/to_image for textures and add tests (by Ailing Zhang)
  • [gui] Add wareframe mode for mesh & mesh_instance, add slider_int for Window.GUI. (#5576) (by Mocki)
  • avoid redundant compilation (#5607) (by yixu)
  • [misc] Enable offline cache by default (#5613) (by Mingming Zhang)
  • [Lang] Add parameter ‘order’ to specify layout for scalar, vector, matrix fields (#5617) (by Yi Xu)
  • [autodiff] [example] Add an example for computing Jacobian matrix (#5609) (by Mingrui Zhang)
  • [ci] Add PR tag for dx12. (#5614) (by Xiang Li)
  • fix ti.ui.Space (#5606) (by yixu)
  • [ci] Build Android export core (#5409) (by Proton)
  • [type] Rename module quantized_types to quant (#5608) (by Yi Xu)
  • [llvm] [aot] Add unit tests for Dynamic SNodes with LLVM AOT (#5594) (by Zhanlue Yang)
  • [build] Forcing write file encoding in misc/make_changelog.py (#5604) (by Proton)
  • [llvm] [aot] Add unit tests for Bitmasked SNodes with LLVM AOT (#5593) (by Zhanlue Yang)
  • [GUI] Shifted to a more commonly supported type for set_image (#5514) (by PENGUINLIONG)
  • [gui] Fix snode offset (mesh disappearing bug) (#5579) (by Bob Cao)
  • [refactor] Redesign loading, dumping and cleaning of offline cache (#5578) (by Mingming Zhang)
  • [autodiff] [test] Add more complex for loop test cases for forward mode (#5592) (by Mingrui Zhang)
  • fix num_triangles (#5602) (by yixu)
  • [cuda] Decouple update from sync in kernel profiler (#5589) (by Bo Qiao)
  • Removed unnecessary tags to work around a crowdIn issue. (#5590) (by Vissidarte-Herman)
  • [Lang] Change vec2/3/4 from function calls to types (#5556) (by Zhao Liang)
  • [vulkan] Enable shared array support for vulkan backend (#5583) (by Haidong Lan)
  • [aot] Avoid reserved words when generate C# AOT bindings (#5586) (by Proton)
  • [ci] Update llvm15 prebuild binary. (#5581) (by Xiang Li)
  • [doc] Removed a redundant line break to see if it will fix a CrowdIn issue (#5584) (by Vissidarte-Herman)
  • [type] Refine SNode with quant 10/n: Add validity checks and simplify BitStructType (#5573) (by Yi Xu)
  • [autodiff] [refactor] Rename the parameters to param for forward mode (#5582) (by Mingrui Zhang)
  • [doc] Format fix to work around a crowdIn issue (#5580) (by Vissidarte-Herman)
  • Update syntax.md (#5575) (by Zhao Liang)
  • [doc] Added an mdx-code-block escape hatch syntaxt to workaround a CrowdIn … (#5574) (by Vissidarte-Herman)
  • [Doc] Update external.md (#5547) (by Zhao Liang)
  • [doc] Add introductions to ambient_elements in llvm_sparse_runtime.md (#5567) (by Zhanlue Yang)
  • [refactor] Unify ways to set external array args (#5565) (by Ailing)
  • [Lang] [type] Refine SNode with quant 9/n: Rename some parameters in quant APIs (#5566) (by Yi Xu)
  • [opt] Improved warning messages for statements (#5564) (by Zhanlue Yang)
  • [bug] Fix android build for taichi-aot-demo (#5560) (by Ailing)
  • [opt] Added llvm::SeparateConstOffsetFromGEPPass() for shared_memory optimizations (#5494) (by Zhanlue Yang)
  • [Lang] [type] Refine SNode with quant 8/n: Replace bit_struct with ti.BitpackedFields (#5532) (by Yi Xu)
  • [build] Enforce local-scoped symbols in static llvm libs (#5553) (by Bo Qiao)
  • [refactor] Unify ways to set ndarray args (#5559) (by Ailing)
  • [gui] [vulkan] Support for drawing mesh instances (#5546) (by Mocki)
  • [llvm] [aot] Added taichi_sparse unit test to C-API for CUDA backend (#5531) (by Zhanlue Yang)
  • Add glFinish to wait_idle (#5538) (by Bo Qiao)
  • [autodiff] Skip ConstStmt when generating alloca for dual (#5554) (by Mingrui Zhang)
  • [ci] Fix macOS nightly build (#5552) (by Proton)
  • Fix potential bug of lang::Program that could be double finalized (#5550) (by Mingming Zhang)
  • [Error] Raise error when using the struct for in python scope (#5536) (by Lin Jiang)
  • [bug] Fix calling make_aot_kernel failed when offline_cache=True (#5537) (by Mingming Zhang)
  • [ci] Move macOS 10.15 workloads to self-hosted runners (#5539) (by Proton)
  • [build] [refactor] Utilize find_cuda_toolkit and clean some target dependencies (#5526) (by Bo Qiao)
  • [autodiff] [test] Add more for-loop tests for forward mode (#5525) (by Mingrui Zhang)
  • [Lang] [bug] Ensure non-i32 compatibility in while statement conditions (#5521) (by daylily)
  • [Lang] Improve error message for ggui on opengl backend (#5509) (by Zhao Liang)
  • [aot] Support texture and rwtexture in cgraph (#5528) (by Ailing)
  • [llvm] Add parallel compilation to CUDA backend (#5519) (by Lin Jiang)
  • [type] [refactor] Decouple quant from SNode 9/n: Remove exponent handling from SNode (#5510) (by Yi Xu)
  • [Lang] Fix numpy and taichi operations problem (#5506) (by Zhao Liang)
  • [Vulkan] Added an interface to get accumulated on-device execution time (#5488) (by PENGUINLIONG)
  • [Async] [refactor] Remove AsyncTaichi (#5523) (by Lin Jiang)
  • [misc] Fix warning at GGUI canvas.circles (#5424) (#5518) (by Proton)
  • [gui] Support rendering lines from a part of VBO (#5495) (by Mocki)
  • [ir] Cast indices of ExternalPtrStmt to ti.i32 (#5516) (by Yi Xu)
  • [Lang] Support syntax sugar for ti.cast (#5515) (by Yi Xu)
  • [Lang] Better struct initialization (#5481) (by Zhao Liang)
  • [example] Make implicit_fem fallback to CPU when CUDA is not available (#5512) (by Yi Xu)
  • [Lang] Make MatrixType support more ways of initialization (#5479) (by Zhao Liang)
  • [Vulkan] Fixed depth texture validation error (#5507) (by PENGUINLIONG)
  • [bug] Fix vulkan source when build for android (#5508) (by Bo Qiao)
  • [refactor] [llvm] Rename CodeGenCPU/CUDA/WASM and CodeGenLLVMCPU/CUDA/WASM (#5500) (by Lin Jiang)
  • [bug] Let the arguments in ti.init override the environment variables (#5497) (by Lin Jiang)
  • [misc] Add debug logging and TI_AUTO_PROF for offline cache (#5503) (by Mingming Zhang)
  • [misc] ti.Tape -> ti.ad.Tape (#5501) (by Zihua Wu)
  • [misc] Support jit offline cache for kernels that call real functions (#5477) (by Mingming Zhang)
  • [doc] Update cpp tests build doc (#5493) (by Bo Qiao)
  • [Lang] Support call field.fill in kernel functions (#5486) (by Zhao Liang)
  • [Lang] [bug] Make comparisons always return i32 (#5487) (by Yi Xu)
  • [gui] [vulkan] Support 3d-lines rendering (#5492) (by Mocki)
  • [autodiff] Switch off parts of store forwarding optimization for autodiff (#5464) (by Mingrui Zhang)
  • [llvm] [aot] Add LLVM to CAPI part 9: Added AOT field tests for LLVM backend in C-API (#5461) (by Zhanlue Yang)
  • [bug] [llvm] Fix GEP when allocating TLS buffer in struct for (#5473) (by Lin Jiang)
  • [gui] [vulkan] Modify some internal APIs (#5484) (by Mocki)
  • [Build] Remove TI_EMSCRIPTENED related code (#5483) (by Bo Qiao)
  • [type] [refactor] Decouple quant from SNode 8/n: Remove redundant handling of llvm15 in codegen_llvm_quant (#5480) (by Yi Xu)
  • [CUDA] Enable shared memory for CUDA (#5429) (by Haidong Lan)
  • [gui] [vulkan] A faster version of depth copy through ti.field/ti.ndarray (copy directly from vulkan to cuda/gpu/cpu) (#5455) (by Mocki)
  • [misc] Add missing members of XXXExpression and FrontendXXXStmt to result of ASTSerializer (#5471) (by Mingming Zhang)
  • [llvm] [aot] Added field tests for LLVM backend in CGraph (#5458) (by Zhanlue Yang)
  • [type] [refactor] Decouple quant from SNode 7/n: Rewrite BitStructStoreStmt codegen without SNode (#5475) (by Yi Xu)
  • [llvm] [aot] Add LLVM to CAPI part 8: Added CGraph tests for LLVM backend in C-API (#5456) (by Zhanlue Yang)
  • [build] [refactor] Rename taichi core and taichi python targets (#5451) (by Bo Qiao)
  • [llvm] [aot] Add LLVM to CAPI part 6: Handle Field initialization in C-API (#5444) (by Zhanlue Yang)
  • [llvm] [aot] Add LLVM to CAPI part 7: Added AOT kernel tests for LLVM backend in C-API (#5447) (by Zhanlue Yang)
  • [error] Throw proper error message when an Ndarray is passed in via ti.template (#5457) (by Ailing)
  • [type] [refactor] Decouple quant from SNode 6/n: Rewrite extract_quant_float() without SNode (#5448) (by Yi Xu)
  • [bug] Set SNode tree id to all SNodes (#5454) (by Lin Jiang)
  • [AOT] Support on-device event (#5433) (by PENGUINLIONG)
  • [llvm] [aot] Add LLVM to CAPI part 5: Added C-API tests for Vulkan and Cuda backend (#5440) (by Zhanlue Yang)
  • [llvm] [bug] Fixing the crash in release tests introduced by a typo in #5381 where we need a deep copy of arglist. (#5441) (by Proton)
  • [llvm] [aot] Add LLVM to CAPI part 4: Enabled C-API tests on CI & Added C-API tests for CPU backend (#5435) (by Zhanlue Yang)
  • [misc] Bump version to v1.0.5 (#5437) (by Proton)
  • [aot] Support specifying vk_api_version in CompileConfig (#5419) (by Ailing)
  • [Lang] Add append attribute to dynamic fields (#5413) (by Zhao Liang)
  • [Lang] Add inf and nan (#5270) (by Zhao Liang)
  • [Doc] Updated docsite structure (#5416) (by Vissidarte-Herman)
  • [ci] Run release tests (#5327) (by Proton)
  • [type] [refactor] Decouple quant from SNode 5/n: Rewrite load_quant_float() without SNode (#5422) (by Yi Xu)
  • [llvm] Allow using clang 15 for COMPILE_LLVM_RUNTIME (#5381) (by Xiang Li)
  • [opengl] Speedup compilation for Nvidia cards (#5430) (by Bob Cao)
  • [Bug] Fix infinite loop when exponent of integer pow is negative (#5275) (by Mike He)
  • [build] [refactor] Move spirv codegen and common targets (#5415) (by Bo Qiao)
  • [autodiff] Check not placed field.dual and add needs_dual (#5412) (by Mingrui Zhang)
  • [bug] Simplify scalar handling in cgraph and relax field_dim check (#5411) (by Ailing)
  • [gui] [vulkan] Surpport for getting depth information for python users. (#5410) (by Mocki)
  • [AOT] Adjusted C-API for nd-array type conformance (#5417) (by PENGUINLIONG)
  • [type] Decouple quant from SNode 4/n: Add exponent info to BitStructType (#5407) (by Yi Xu)
  • [llvm] Avoid creating new LLVM contexts when updating struct module (#5397) (by Lin Jiang)
  • [build] Enable C-API compilation on CI (#5403) (by Zhanlue Yang)
  • [Lang] Implement assignment by slicing (#5369) (by Mike He)
  • [llvm] [aot] Add LLVM to CAPI part 3: Adapted AOT interfaces for LLVM backend (#5402) (by Zhanlue Yang)
  • [AOT] Fixed Vulkan device import capability settings (#5400) (by PENGUINLIONG)
  • [llvm] [aot] Add LLVM to CAPI part 2: Adapted memory allocation interfaces for LLVM backend (#5396) (by Zhanlue Yang)
  • [autodiff] Add ternary operators for forward mode (#5405) (by Mingrui Zhang)
  • [llvm] [aot] Add LLVM to CAPI part 1: Implemented capi::LlvmRuntime class (#5393) (by Zhanlue Yang)
  • [ci] Add per test hard timeout limit (#5384) (by Proton)
  • [ci] Properly detect $DISPLAY (#5398) (by Proton)
  • [ci] Llvm15 clang10 ci (#5368) (by Xiang Li)
  • [llvm] [bug] Add stop grad to ASTSerializer (#5401) (by Lin Jiang)
  • [autodiff] Add test for ternary operators in reverse mode autodiff (#5395) (by Mingrui Zhang)
  • [llvm] [aot] Add numerical unit tests for LLVM-CGraph (#5319) (by Zhanlue Yang)

--

--

Taichi Lang
Parallel-Programming-in-Python

The Taichi language is an open-source, imperative, parallel programming language for high-performance numerical computation