PyTorch 0.4.0 Release & 1.0 Preview
Weekly Reading List #3
Issue #3: 2018/04/30 to 2018/05/06
This is an experimental series in which I briefly introduce the interesting data science stuffs I read, watched, or listened to during the week. Please give this post some claps if you’d like this series to be continued.
I’ve been busy with other stuffs this week, so this issue will only cover the new Pytorch 0.4.0 and the roadmap to the production ready 1.0 version.
PyTorch 0.4.0
Released in late April:
with a migration guide:
A perhaps incomplete list of important changes with a brief summary for each one of them:
- Merging
Tensor
andVariable
class. Basicallytorch.Tensor
has becometorch.autograd.Variable
. But the old code will still work. - Don’t use
type()
to query the underlying type of aTensor
object. Useisinstance()
orx.type()
. - Add an in-place method
.requires_grad_()
to set therequires_grad
flag. - The
.data
attribute now returns aTensor
withrequires_grad=False
. But changes to the returnedTensor
won’t be tracked byautograd
. - Use
.detach()
method if you want the changes to be tracked. - 0-dimensional (scalar) Tensors. Fixes the inconsistency between
tensor.sum()
andvariable.sum()
before 0.4.0. - Use .
item()
to get the Python number from a scaler variable instead of.data[0]
. - Use
torch.no_grad()
ortorch.set_grad_enabled(is_train)
to exclude variables fromautograd
instead of settingvolatile=True
. - Use
torch.tensor
to create newTensor
objects. When calling the function, assign the dtype, device, and layout with the newtorch.dtype
,torch.dtype
, andtorch.layout
classes. - The new
torch.*_like
andtensor.new_*
shortcuts. The former takes aTensor
; the latter takes a shape. - Use the new
.to(device)
method to write device-agnostic code. - Add a new
.device
attribute to get thetorch.device
for all Tensors.
The code samples at the end of the migration guide are a good way to check if you’ve understood the above changes correctly.
Similarly, a maybe incomplete list of new features:
- Windows support.
torch.where(condition, tensor1, tensor2)
torch.expm1
- Use
torch.utils.checkpoint.checkpoint
to trade compute for memory. torch.utils.checkpoint.checkpoin_sequential
for sequential models.- torch.utils.bottleneck to identify hotspots.
reduce=False
support for all loss functions.nn.LayerNorm
nn.GroupNorm
torch.nn.utils.clip_grad
Embedding.from_pretrained
factory- 24 basic probability distributions
TransformedDistribution
andConstraint
PyTorch 1.0
Published on May 2:
Probably one of the most important takeaways:
In 1.0, your code continues to work as-is, we’re not making any big changes to the existing API.
Basically Facebook is merging Caffe2 and PyTorch to provide both a framework that works for both research and production settings, as hinted earlier in April:
So the gist of the solution is adding a just-in-time (JIT) compiler torch.jit
to export your model to run on a Caffe2-based C++-only runtime. This compiler has two modes:
- Tracing Mode: tracing native Python code. But it will probably cause problems if your model contains if statements and loops (for example, RNN with variable lengths).
- Script Mode: compile code into a intermediate representation. But it only supports a subset of Python language, so usually you’ll have to isolate the code you want to be compiled.
The naming is still subject to change. The 1.0 version is expected to be released this summer.