Back to blog

Polars 0.19 upgrade guide

Thu, 31 Aug 2023

Yesterday, we published a new breaking Polars release to PyPI. The new version includes quite a few breaking changes. Most of these changes have been announced through deprecation warnings, but we did include some surprises. This document should help you navigate the upgrade to Polars 0.19.

For the full list of changes, please see our changelog.

Table of contents

Breaking changes

While this section does not contain an exhaustive list of all breaking changes, we think these are most likely to impact your code.

Aggregation functions no longer support horizontal computation

This impacts aggregation functions like sum, min, and max. These functions were overloaded to support both vertical and horizontal computation. Recently, new dedicated functionality for horizontal computation was released, and horizontal computation was deprecated.

Restore the old behavior by using the horizontal variant, e.g. sum_horizontal.

Example

Before:

df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
df.select(pl.sum('a', 'b'))  # horizontal computation
shape: (2, 1)
┌─────┐
│ sum │
│ --- │
│ i64 │
╞═════╡
│ 12  │
│ 14  │
└─────┘

After:

df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
df.select(pl.sum('a', 'b'))  # vertical computation
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 3   ┆ 23  │
└─────┴─────┘

Update to all / any

all will now ignore null values by default, rather than treat them as False.

For both any and all, the drop_nulls parameter has been renamed to ignore_nulls and is now keyword-only. Also fixed an issue when setting this parameter to False would erroneously result in None output in some cases.

To restore the old behavior, set ignore_nulls to False and check for None output.

Example

Before:

pl.Series([True, None]).all()
False

After:

pl.Series([True, None]).all()
True

Improved error types for many methods

Improving our error messages is an ongoing effort. We did a sweep of our Python code base and made many improvements to error messages and error types. Most notably, many ValueErrors were changed to TypeErrors.

If your code relies on handling Polars exceptions, you may have to make some adjustments.

Example

Before:

pl.Series(values=15)
ValueError: Series constructor called with unsupported type; got 'int'

After:

pl.Series(values=15)
TypeError: Series constructor called with unsupported type 'int' for the `values` parameter

Updates to expression input parsing

Methods like select and with_columns accept one or more expressions. But they also accept strings, integers, lists, and other inputs that we try to interpret as expressions. We updated our internal logic to parse inputs more consistently.

Example

Before:

pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
└─────┘

After:

pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 2)
┌─────┬─────────┐
│ a   ┆ literal │
│ --- ┆ ---     │
│ i64 ┆ null    │
╞═════╪═════════╡
│ 1   ┆ null    │
│ 2   ┆ null    │
└─────┴─────────┘

shuffle / sample now use an internal Polars seed

If you used the built-in Python random.seed function to control the randomness of Polars expressions, this will no longer work. Instead, use the new set_random_seed function.

Example

Before:

import random
random.seed(1)

After:

import polars as pl
pl.set_random_seed(1)

Deprecations

Creating a consistent and intuitive API is hard, finding the right name for each function, method, and parameter might be the hardest part. The new version comes with quite some naming changes, and you will most likely run into deprecation warnings when upgrading to 0.19.

If you want to upgrade without worrying about deprecation warnings right now, you can add the following snippet to your code:

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

groupby renamed to group_by

This is not a change we make lightly, as it will impact almost all our users. But “group by” are really two different words, and our naming strategy dictates that these should be separated by an underscore.

Most likely, a simple search and replace will be enough to take care of this update:

  • Search: .groupby(
  • Replace: .group_by(

apply renamed to map_*

apply is probably the most misused part of our API. Many Polars users come from pandas, where apply has a completely different meaning.

We now consolidate all our functionality for user-defined functions under the name map. This results in the following renaming:

BeforeAfter
Series/Expr.applymap_elements
Series/Expr.rolling_applyrolling_map
DataFrame.applymap_rows
GroupBy.applymap_groups
pl.applymap_groups
mapmap_batches
1
2
4
3
5
6
7
8
9
10
11
12