Evaluation of Python Sound Modules

Background

For a project, I needed to evaluate sound processing modules for use in software written in Python. After finishing my evaluation, I decided to make the evaluation criteria and results publicly available. This way, more people can also help themselves and other people by sending me corrections and information about new Python sound modules.

2022 Oct 31 update

During the last 15 years, there have been big changes in the Python audio scene. If you wish to select a Python sound module for your project, I recommend that you head to the Real Python tutorial about playing and recording sound in Python. It was last updated at January 2021.

Another source of information is the wiki page on audio in Python.

The following information is obsolete.

2007 Sep 24 update

Elijah Rutschman brought to my attention an additional Python sound processing package, and I decided to make this Web page even more useful by adding information about it, even if only partial.

Criteria

For my project, I needed the following evaluation criteria:

Work with my Python scripts (duh).
Multi-platform (at least Linux and MS-Windows XP).
Real time sound acquisition from soundcard and making it available for subsequent processing at real time.
Support for 16KHz sampling rate of >8-bit sound.
Process a sound file, not necessarily at real time.
Efficient

In addition, I needed the following speech sound processing capabilities:

Determine whether there is pitch and if yes, its frequency.
Locate the lowest three formants and their bandwidths.
Find the power of the sound.
Perform 256-point FFT of the sound, after applying to it standard pre-emphasis and Hamming window.

Evaluated packages

Snack

Multi-platform: The same scripts are usable on Windows 95/98/NT/2K/XP, Linux, Macintosh, Sun Solaris, HP-UX, FreeBSD, NetBSD, and SGI IRIX.
Real-time sound acquisition: Yes.
Support for 16KHz sampling rate and >8-bit sound: Yes.
Sound file processing: Yes.
Efficiency: Inefficient - data is converted into string by the Tcl part of the package and then converted back into data by the Python part.
Pitch existence and frequency: Yes. Each 10mSec, using the ESPS method (the ADMF method is available, too).
Formants and their bandwidths: http://www.speech.kth.se/snack/man/snack2.2/tcl-man.html#sound - see the formant subcommand.
FFT with pre-emphasis and Hamming window: See above link - the powerSpectrum subcommand.
Power: See above link - the power subcommand.

ossaudiodev

Multi-platform: Not enough. Implemented in Linux and FreeBSD. Available for a wide range of open-source and commercial Unices. But apparently not for MS-Windows.
Real-time sound acquisition: Blocking reads, by default. Probably can set to non-blocking.
Support for 16KHz sampling rate and >8-bit sound: Seems to depend upon the sound card.
Sound file processing: Use another package for this.
Efficiency: Direct I/O access.
Pitch existence and frequency: Use another package for this.
Formants and their bandwidths: Use another package for this.
FFT with pre-emphasis and Hamming window: Use another package for this.
Power: Use another package for this.

winsound

Not relevant for our needs. This module knows only to play existing sound files.

MCI.py (from Arik Baratz) together with ctypes.py

Multi-platform: ctypes.py is supported by all 32-bit MS Windows (95/98/NT/2000/XP), All BSD Platforms (FreeBSD/NetBSD/OpenBSD/Apple Mac OS X), All POSIX (Linux/BSD/UNIX-like OSes), WinCE.
MCI.py was designed to communicate with MS-Windows winmm.dll.
Real-time sound acquisition: Unknown.
Support for 16KHz sampling rate and >8-bit sound: Unknown.
Sound file processing: Seems to be able to record to a file.
Efficiency: Commands are sent as strings.
Pitch existence and frequency: Use another package for this.
Formants and their bandwidths: Use another package for this.
FFT with pre-emphasis and Hamming window: Use another package for this.
Power: Use another package for this.

PyMedia.py

The documentation is very sketchy.

Multi-platform: Package is compilable for MS-Windows, Linux and cygwin.
Real-time sound acquisition: Unknown
Support for 16KHz sampling rate and >8-bit sound: Probably depends upon sound card.
Sound file processing: Yes.
Efficiency: Unknown.
Pitch existence and frequency: Use another package for this.
Formants and their bandwidths: Use another package for this.
FFT with pre-emphasis and Hamming window: Use another package for this.
Power: Use another package for this.

Additional Packages

PyAudio

The following only summarizes information from the PyAudio Web page.

PyAudio provides Python bindings for the PortAudio audio I/O library. The current version of PyAudio is V0.1.0, which is alpha quality.

Multi-platform: Package is compilable for MS-Windows, Apple Mac OS X, Linux and cygwin.
Real-time sound acquisition: Unknown
Support for 16KHz sampling rate and >8-bit sound: Unknown.
Sound file processing: Unknown.
Efficiency: Unknown.
Pitch existence and frequency: Unknown.
Formants and their bandwidths: Unknown.
FFT with pre-emphasis and Hamming window: Unknown.
Power: Unknown.

Omer Zak's Web Site

Stuff