Skip to content

Support customized alignment and storage for pg_tle create type API#266

Merged
adamguo0 merged 11 commits intoaws:mainfrom
leopan3:main
Feb 27, 2024
Merged

Support customized alignment and storage for pg_tle create type API#266
adamguo0 merged 11 commits intoaws:mainfrom
leopan3:main

Conversation

@leopan3
Copy link
Copy Markdown
Contributor

@leopan3 leopan3 commented Feb 7, 2024

Issue #263.

In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit:
ERROR: row is too big: size xxx, maximum size xxx

With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html).

The default alignment is int4 while the default storage is plain, compatible with previous versions.

A new version 1.3.5 is created along with this change.

Lyu Pan added 5 commits February 7, 2024 19:25
In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit:
ERROR: row is too big: size xxx, maximum size xxx.

With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html).

The default alignment is int4 while the default storage is plain, compatible with previous versions.

#263
@leopan3 leopan3 marked this pull request as draft February 8, 2024 00:58
@leopan3 leopan3 marked this pull request as ready for review February 9, 2024 00:26
Lyu Pan added 2 commits February 14, 2024 21:52
…rove readability.

When the dabase gets upgraded to a new version(solib is updated) but pg_tle version is not, the old 5-arg version SQL function is invoked by users, but the previous C function always 7 args.

In this chagne, a new pg_tle_create_base_type_7args C function is created and used to create the create_base_type SQL function.
src/datatype.c Outdated
*/
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args);
Datum
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is adding 7args to the name standard practice? Maybe pg_tle_create_base_type_with_storage would be more descriptive

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the standard practice, but I have seen this naming style in postgres source code, for example
gin_extract_tsquery_5args(PG_FUNCTION_ARGS) https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/tsginidx.c#L318

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.

+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage, or maybe pg_tle_create_base_type_with_storage_type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?

Copy link
Copy Markdown
Contributor Author

@leopan3 leopan3 Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated function name to pg_tle_create_base_type_with_storage.

The C function pg_tle_create_base_type_7args/with_storage is not exposed to the users, only pg_tle_create_base_type function (used in the previous 1.3.4 version) is exposed to the user.

v1.3.4: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type'
v1.4.0+: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type_with_storage'

Adding an internal pg_tle_create_base_type_with_storage C function (without modifying the exist one) makes it cleaner and easier to maintain.

Background: in my first commit, I modified the pg_tle_create_base_type C function to add these 2 arguments, however, it has a hidden bug for upgrade (See commit message in f56985b), this was discovered by Josh Tiefenbach and I made this update based on the suggestion, we think this approach creates less confusion and maintenance issues.

Makefile Outdated
@@ -1,13 +1,13 @@
EXTENSION = pg_tle
EXTVERSION = 1.3.4
EXTVERSION = 1.3.5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get some opinions on whether to bump the version to 1.3.5 or 1.4.0. My initial thought was 1.3.5 since this is a small update to an existing feature, but it also introduces an API change which may warrant a minor version bump.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated the new version number to 1.4.0

* `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1.
* `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
Copy link
Copy Markdown
Contributor

@anth0nyleung anth0nyleung Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check, are the allowed values the same for all major version we support?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.

Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.

I'd recommend rephrasing this:

A variable-length type (e.g., text) must have a byte-alignment of at least 4 due to the size of its header.

Copy link
Copy Markdown
Contributor Author

@leopan3 leopan3 Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check, are the allowed values the same for all major version we support?

Yes, this didn't change at least since PG11, https://www.postgresql.org/docs/11/sql-createtype.html

alignment
The storage alignment requirement of the data type. If specified, it must be char, int2, int4, or double; the default is int4.

storage
The storage strategy for the data type. If specified, must be plain, external, extended, or main; the default is plain.

Comment on lines +20 to +22
DROP FUNCTION pgtle.create_base_type;
DROP FUNCTION pgtle.create_base_type_if_not_exists;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need CASCADE to drop the GRANT we did here https://github.com/aws/pg_tle/blob/93e330cb146c4055a983b38a47d73df6ac924b33/pg_tle--1.1.1.sql#L555C1-L571 for the old versions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated!

@anth0nyleung
Copy link
Copy Markdown
Contributor

could you also test upgrading to this version works and share the results?

Makefile Outdated
@@ -1,13 +1,13 @@
EXTENSION = pg_tle
EXTVERSION = 1.3.4
EXTVERSION = 1.3.5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0)

* `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter.
* `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1.
* `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.

Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.

I'd recommend rephrasing this:

A variable-length type (e.g., text) must have a byte-alignment of at least 4 due to the size of its header.

src/datatype.c Outdated
*/
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args);
Datum
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.

+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage, or maybe pg_tle_create_base_type_with_storage_type

src/datatype.c Outdated
*/
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args);
Datum
pg_tle_create_base_type_7args(PG_FUNCTION_ARGS)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?

src/datatype.c Outdated
Comment on lines +260 to +271
if (pg_strcasecmp(alignmentStr, "double") == 0)
alignment = TYPALIGN_DOUBLE;
else if (pg_strcasecmp(alignmentStr, "int4") == 0)
alignment = TYPALIGN_INT;
else if (pg_strcasecmp(alignmentStr, "int2") == 0)
alignment = TYPALIGN_SHORT;
else if (pg_strcasecmp(alignmentStr, "char") == 0)
alignment = TYPALIGN_CHAR;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("alignment \"%s\" not recognized", alignmentStr)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: put this lookup into its own function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, updated!

src/datatype.c Outdated
Comment on lines +273 to +284
if (pg_strcasecmp(storageStr, "plain") == 0)
storage = TYPSTORAGE_PLAIN;
else if (pg_strcasecmp(storageStr, "external") == 0)
storage = TYPSTORAGE_EXTERNAL;
else if (pg_strcasecmp(storageStr, "extended") == 0)
storage = TYPSTORAGE_EXTENDED;
else if (pg_strcasecmp(storageStr, "main") == 0)
storage = TYPSTORAGE_MAIN;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("storage \"%s\" not recognized", storageStr)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: put this lookup into its own function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@leopan3
Copy link
Copy Markdown
Contributor Author

leopan3 commented Feb 23, 2024

could you also test upgrading to this version works and share the results?

Yes, tested locally. See results below. (This PR only changes the API to allow extra type options, it doesn't change the type storage/alignment if already defined)

  1. Create pg_tle with old version library and version
postgres=# create extension pg_tle;
CREATE EXTENSION
postgres=# \dx
                        List of installed extensions
  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 pg_tle  | 1.3.4   | pgtle      | Trusted Language Extensions for PostgreSQL
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
  1. Create custom types and data
...
-- Variable-length
postgres=# SELECT pgtle.create_base_type('public', 'test_citext', 'test_citext_in(text)'::regprocedure, 'test_citext_out(bytea)'::regprocedure, -1);
 create_base_type 
------------------
 
(1 row)

-- fixed length
postgres=# SELECT pgtle.create_base_type('public', 'test_int2', 'test_int2_in(text)'::regprocedure, 'test_int2_out(bytea)'::regprocedure, 2);
 create_base_type 
------------------
 
(1 row)
postgres=# INSERT INTO test_dt_int2 VALUES ('11,22');
INSERT 0 1
postgres=# SELECT * FROM test_dt_int2;
  c1   
-------
 11,22
(1 row)

postgres=# INSERT INTO test_dt VALUES ('SELECT'), ('INSERT'), ('UPDATE'), ('DELETE');
INSERT 0 4
postgres=# select * from test_dt;
   c1   
--------
 SELECT
 INSERT
 UPDATE
 DELETE
(4 rows)

  1. Install the new extension library (1.4.0), restart postgres and upgrade
postgres=# ALTER EXTENSION pg_tle UPDATE;
ALTER EXTENSION
postgres=# \dx
                        List of installed extensions
  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 pg_tle  | 1.4.0   | pgtle      | Trusted Language Extensions for PostgreSQL
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
  1. Check the old types (tables) can still read/write
postgres=# select * from test_dt;
   c1   
--------
 SELECT
 INSERT
 UPDATE
 DELETE
(4 rows)

postgres=# select * from test_dt_int2;
  c1   
-------
 11,22
(1 row)

postgres=# delete from test_dt;
DELETE 4
postgres=# delete from test_dt_int2;
DELETE 1
postgres=# INSERT INTO test_dt VALUES ('SELECT'), ('INSERT'), ('UPDATE'), ('DELETE');
INSERT 0 4
postgres=# INSERT INTO test_dt_int2 VALUES ('11,22');
INSERT 0 1
postgres=# select * from test_dt;
   c1   
--------
 SELECT
 INSERT
 UPDATE
 DELETE
(4 rows)

postgres=# select * from test_dt_int2;
  c1   
-------
 11,22
(1 row)

Copy link
Copy Markdown
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple of minor comments. Nice job!

Comment on lines +60 to +68
static Datum
pg_tle_create_base_type_internal(Oid typeNamespace,
char *typeName,
Oid inputFuncId,
Oid outputFuncId,
int16 internalLength,
char *alignmentStr,
char *storageStr,
char *funcProbin);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the output from pgindent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

Co-authored-by: Jonathan S. Katz <jkatz@users.noreply.github.com>
@leopan3
Copy link
Copy Markdown
Contributor Author

leopan3 commented Feb 27, 2024

Thank you all for the review!

@adamguo0 adamguo0 merged commit 4e18a50 into aws:main Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants