From Async/Await to Virtual Threads - Eric Roe

A proposal for moving Python beyond async/await to virtual threads with structured concurrency, simplifying concurrent programming by eliminating colored functions.

The Current State

Async/await brought concurrent programming to more developers by introducing syntax-level support. However, it requires complex internal machinery that leaks into user code and creates "colored functions" - the distinction between async and sync functions that pervades codebases.

Threads offer conceptual simplicity, but traditional threading APIs have significant ergonomic problems. Recent Python changes with free-threading compound complexity - developers now face both async ecosystem complexity and threading system complexity simultaneously.

This creates an opportunity to reconsider whether fully embracing threads with better APIs might be a superior path.

Structured Concurrency: The Good Part

Async experimentation in Python yielded important innovations, particularly structured concurrency. This concept disallows tasks from outliving their parents, establishing clear parent-child relationships that make information flow (like context variables) much clearer than traditional thread-local variables.

The Challenge: Python's task groups (structured concurrency implementation) are recent and have strict cancellation requirements that many libraries haven't properly implemented. This creates real problems.

Example Problem: The popular aiofiles library uses thread pools for I/O operations since platforms lack consistent async file I/O. However, it doesn't support cancellation. If multiple tasks spawn where some block on aiofiles reads that depend on other tasks completing, cancellation can cause deadlocks. The caught exception remains invisible until the blocking read is interrupted by a signal, creating poor developer experience.

Virtual Threads: The Solution

Virtual threads address the performance challenges that motivated asyncio initially. The key requirement is handling async I/O directly in the runtime - when blocking operations occur, the virtual thread returns to the scheduler, allowing others to run.

But this alone would feel regressive without preserving structured concurrency.

A Better API Design

Sequential Download (Baseline):

def download_all(urls):
    results = {}
    for url in urls:
        results[url] = fetch_url(url)
    return results

This uses simple blocking APIs - no async/await. If any download fails, execution stops and raises an exception, losing collected results.

Parallel Download (Imaginary Syntax):

def download_all(urls):
    results = {}
    await:
        for url in urls:
            async:
                results[url] = fetch_url(url)
    return results

Note the inverted usage compared to current async/await:

await: creates a structured thread group where spawned threads attach and are awaited. If any thread fails, future spawns block and existing threads receive cancellation
async: acts as a function declaration paired with spawn - the entire body runs in another task with inherited parent context

Behind the Scenes:

def download_all(urls):
    results = {}
    with ThreadGroup():
        def _thread(url):
            results[url] = fetch_url(url)
        for url in urls:
            ThreadGroup.current.spawn(partial(_thread, url))
    return results

All threads are virtual - they behave like threads but may be scheduled on different kernel threads. Failed threads fail the entire group and prevent further spawns.

Python Compatibility Issue: This syntax doesn't fit Python well. Python lacks hidden function declarations, and its single-scope functions prevent helper functions from properly closing over loop variables.

Practical API Compromise

A more Python-compatible approach makes thread groups explicit:

def download_and_store(results, url):
    results[url] = fetch_url(url)

def download_all(urls):
    results = {}
    with ThreadGroup() as g:
        for url in urls:
            g.spawn(partial(download_and_store, results, url))
    return results

This maintains similar behavior with explicit operations and helper functions, while completely avoiding promises or futures.

Where Complexity Belongs

This approach moves concurrent programming complexity into the interpreter and internal APIs:

The results dictionary requires locking
APIs like fetch_url need cancellation support
The I/O layer must suspend virtual threads and return to the scheduler

For most programmers, this complexity is hidden.

Better Concurrency Primitives: Modern approaches like Rust's mutex-enclosed values and semaphores for limiting concurrency could become thread group features:

def download_and_store(results_mutex, url):
    result = fetch_url(url)
    with results_mutex.lock() as results:
        results.store(url, result)

def download_all(urls):
    results = Mutex(MyResultStore())
    with ThreadGroup(max_concurrency=8) as g:
        for url in urls:
            g.spawn(partial(download_and_store, results, url))
    return results

The Role of Futures

Futures would still exist for valid use cases, obtained from spawn return values:

def download_all(urls):
    futures = []
    with ThreadGroup() as g:
        for url in urls:
            futures.append((url, g.spawn(lambda: fetch_url(url))))
    return {url: future.result() for (url, future) in futures}

Open Questions

Spawn Without Thread Groups: Should spawn work outside thread groups? Trio requires nurseries (their thread group equivalent) for all spawns. Alternatives include default thread groups for background tasks that join at process shutdown.

Future of Async/Await: Existing async functionality could continue for legacy code, but might be unnecessary for future code.

The Vision

This proposal eliminates colored functions while preserving the ergonomic wins of async/await. It's intended as a conversation starter about virtual threads rather than a complete specification. Many questions remain, particularly for Python, but the potential to simplify concurrent programming significantly is compelling.