Crosstabs in PostgreSQL: PivotMyTable
PivotMyTable is a PL/Python function for use in PostgreSQL servers. Its aim is to get crosstab tables in PostgreSQL in a more friendly way that PostgreSQL module tablefunc does with its crosstab series functions and in fact it behaves ,at last, as a proxy for tablefunc functions.
PivotMyTable makes possible to create crosstabs in PostgreSQL in the same way that other available solutions, automating the creation of the queries that the tablefunc crosstab functions need to work.
More precisely, two queries and one column specification are created:
- First query specifies the categorization/grouping of the original table
- Second query gets the list of summarized columns from pivot column.
- The column specification determines the type of the output columns.
The problem with another solutions is that they area, as far as I know, not enough dynamic to make crosstabs with ease. And also some of them just return you the query to be executed. PivotMyTable creates the table for you and makes a nice work regarding the type of output fields choosing its correct type, what makes unnecessary for the user to specify it.
Yet another nice feature is the ability of PivotMyTable to use more than one column to categorize the data. This is accomplished by creating a joined column prior to the crosstabulation, and subsequently re-creating the fields which origined the joined column.
Also, PivotMyTablee makes possible to get percentages in the pivoted tables, as well as get rid of null values in the oputput tables. This is a desirable feature, for instance to use the data in other application like R. Bear in mind that updating the pivoted columns one by one can be an ordeal if your table has more than maybe 20 or 30 columns… not to think about having 1K output columns!!!
PivotMyTable is released under GPL v3 license and can be downloaded from my GitHub repository. Feel free to contact me for suggestion or bug/error reporting.
PivotMyTable requires the PostgreSQL extension tablefunc and the language PL/Python installed in the database to work.
- Installation of tablefunc extension: Please check PostgreSQL documentation pages for extensions installation in http://www.postgresql.org/docs/9.3/static/contrib.html. Check for the correct manual page for your version of PostgreSQL.
- Installation of PL/Python language: Please check PostgreSQL documentation pages for the installation of PL/Python http://www.postgresql.org/docs/9.3/static/plpython.html. Check for the correct manual page for your version of PostgreSQL.
The usage of PivotMyTable is simple: Providing that you have a table myinfo like:
You can issue this query:
select * from pivotmytable('myinfo','pivotedinfo','player,tool','round','hits','sum',sort_order:='asc');
To to create a crosstab in PostgreSQL like this:
All the parameters of the type varchar must be assigned between single quotes for PivotMyTable to work.
- input_table (varchar): Name of the table to get data to pivot.
- output_table (varchar): Name for the output table.
- group_fields(varchar): Name of the column(s) for categorizing the data. Unlike the native tablefuncmodule functions crosstab, it accepts multiple columns, separating them with commas.
- pivot_field(varchar): Column to be pivoted, must be a single column.
- value_field(varchar): Data to aggregate for the pivot field. It is expected for value_field to be a numeric (int, float, etc) field.
- agg_func(varchar): Aggregate function to apply to the data from value_field. It must be specified without brackets. It’s user responsability to check for function/data type compatibility and resulting data. The available(tested) functions are: sum, count and avg.
- as_percentage(bool): This option enables the output as percentage of each output columns over its overall sum BY ROW. It defaults to FALSE, so to enable percentage calculation set it to TRUE. This option is only available with sum as the selected agg_func.
- sort_order(varchar): This option enables sorting the resulting pivoted columns. You can specify ‘asc’, ‘desc’. It defaults to ‘no_sorting’.
- drop_ex_tbl (boolean): pivotMyTablee checks if the output table specified with the parameter output_table exists before further processing. Setting this option to TRUE makes possible to automatically drop the existing table. By default, drop_ex_tbl is set to FALSE so the function exists with a warning if output_table already exists in the database.
- as_view (boolean): This option enables the output to a view instead of to a table. To enable view output, set it to TRUE. Defaults to FALSE.
I hope this work helps you!!! Please feel free to drop me a tweet for feedback!!!