Support customized alignment and storage for pg_tle create type API#266
Support customized alignment and storage for pg_tle create type API#266
Conversation
In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit: ERROR: row is too big: size xxx, maximum size xxx. With this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html). The default alignment is int4 while the default storage is plain, compatible with previous versions. #263
…rove readability. When the dabase gets upgraded to a new version(solib is updated) but pg_tle version is not, the old 5-arg version SQL function is invoked by users, but the previous C function always 7 args. In this chagne, a new pg_tle_create_base_type_7args C function is created and used to create the create_base_type SQL function.
src/datatype.c
Outdated
| */ | ||
| PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args); | ||
| Datum | ||
| pg_tle_create_base_type_7args(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
nit: is adding 7args to the name standard practice? Maybe pg_tle_create_base_type_with_storage would be more descriptive
There was a problem hiding this comment.
I'm not sure if this is the standard practice, but I have seen this naming style in postgres source code, for example
gin_extract_tsquery_5args(PG_FUNCTION_ARGS) https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/tsginidx.c#L318
There was a problem hiding this comment.
I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.
+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage, or maybe pg_tle_create_base_type_with_storage_type
There was a problem hiding this comment.
Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?
There was a problem hiding this comment.
Updated function name to pg_tle_create_base_type_with_storage.
The C function pg_tle_create_base_type_7args/with_storage is not exposed to the users, only pg_tle_create_base_type function (used in the previous 1.3.4 version) is exposed to the user.
v1.3.4: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type'
v1.4.0+: create function pg_tle.create_base_type() .... AS 'pg_tle_create_base_type_with_storage'
Adding an internal pg_tle_create_base_type_with_storage C function (without modifying the exist one) makes it cleaner and easier to maintain.
Background: in my first commit, I modified the pg_tle_create_base_type C function to add these 2 arguments, however, it has a hidden bug for upgrade (See commit message in f56985b), this was discovered by Josh Tiefenbach and I made this update based on the suggestion, we think this approach creates less confusion and maintenance issues.
Makefile
Outdated
| @@ -1,13 +1,13 @@ | |||
| EXTENSION = pg_tle | |||
| EXTVERSION = 1.3.4 | |||
| EXTVERSION = 1.3.5 | |||
There was a problem hiding this comment.
Let's get some opinions on whether to bump the version to 1.3.5 or 1.4.0. My initial thought was 1.3.5 since this is a small update to an existing feature, but it also introduces an API change which may warrant a minor version bump.
There was a problem hiding this comment.
If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0)
There was a problem hiding this comment.
Thanks for the suggestion. Updated the new version number to 1.4.0
docs/09_datatypes.md
Outdated
| * `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | ||
| * `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | ||
| * `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1. | ||
| * `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component. |
There was a problem hiding this comment.
Just a sanity check, are the allowed values the same for all major version we support?
There was a problem hiding this comment.
I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.
Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
I'd recommend rephrasing this:
A variable-length type (e.g.,
text) must have a byte-alignment of at least4due to the size of its header.
There was a problem hiding this comment.
Just a sanity check, are the allowed values the same for all major version we support?
Yes, this didn't change at least since PG11, https://www.postgresql.org/docs/11/sql-createtype.html
alignment
The storage alignment requirement of the data type. If specified, it must be char, int2, int4, or double; the default is int4.
storage
The storage strategy for the data type. If specified, must be plain, external, extended, or main; the default is plain.
pg_tle--1.3.4--1.3.5.sql
Outdated
| DROP FUNCTION pgtle.create_base_type; | ||
| DROP FUNCTION pgtle.create_base_type_if_not_exists; | ||
|
|
There was a problem hiding this comment.
Do we need CASCADE to drop the GRANT we did here https://github.com/aws/pg_tle/blob/93e330cb146c4055a983b38a47d73df6ac924b33/pg_tle--1.1.1.sql#L555C1-L571 for the old versions?
There was a problem hiding this comment.
Good catch, updated!
|
could you also test upgrading to this version works and share the results? |
Makefile
Outdated
| @@ -1,13 +1,13 @@ | |||
| EXTENSION = pg_tle | |||
| EXTVERSION = 1.3.4 | |||
| EXTVERSION = 1.3.5 | |||
There was a problem hiding this comment.
If we go strictly by semantic versioning, a new, backwards compatible change would require a minor version bump (1.4.0)
docs/09_datatypes.md
Outdated
| * `infunc`: The name of a previously defined function to convert the type's external textual representation to the internal representation (`bytea`). The function must take one argument of type `text` and return `bytea`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | ||
| * `outfunc`: The name of a previously defined function to convert the type's internal binary representation (`bytea`) to the external textual representation. The function must take one argument of type `bytea` and return `text`. The function must also be declared as `IMMUTABLE` and `STRICT`. It will not be called with a NULL parameter. | ||
| * `internallength`: Total length of the base data type as bytes. Base data types can be fixed-length, in which case internallength is a positive integer, or variable-length, in which case internallength is -1. | ||
| * `alignment`: The optional alignment parameter specifies the storage alignment requirement of the data type. Allowed values are 'char', 'int2', 'int4', 'double'. The allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries. The default is 'int4', alignment on 4 byte boundaries. Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component. |
There was a problem hiding this comment.
I'd recommend giving more guidance on how this is used. Most builders are not going to be familiar with byte alignment.
Note that variable-length types must have an alignment of at least 4, since they necessarily contain an int4 as their first component.
I'd recommend rephrasing this:
A variable-length type (e.g.,
text) must have a byte-alignment of at least4due to the size of its header.
src/datatype.c
Outdated
| */ | ||
| PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args); | ||
| Datum | ||
| pg_tle_create_base_type_7args(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
I'm not a fan of this naming scheme. While there's precedent in upstream, we don't have to mimic it from a user-exposed interface.
+1 to @adamguo0 suggestion: pg_tle_create_base_type_with_storage, or maybe pg_tle_create_base_type_with_storage_type
src/datatype.c
Outdated
| */ | ||
| PG_FUNCTION_INFO_V1(pg_tle_create_base_type_7args); | ||
| Datum | ||
| pg_tle_create_base_type_7args(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Also - to expose this, would we be able to rewrite the existing function to support 7 arguments, and have defaults fo rthe ones that are not necessarily set?
src/datatype.c
Outdated
| if (pg_strcasecmp(alignmentStr, "double") == 0) | ||
| alignment = TYPALIGN_DOUBLE; | ||
| else if (pg_strcasecmp(alignmentStr, "int4") == 0) | ||
| alignment = TYPALIGN_INT; | ||
| else if (pg_strcasecmp(alignmentStr, "int2") == 0) | ||
| alignment = TYPALIGN_SHORT; | ||
| else if (pg_strcasecmp(alignmentStr, "char") == 0) | ||
| alignment = TYPALIGN_CHAR; | ||
| else | ||
| ereport(ERROR, | ||
| (errcode(ERRCODE_INVALID_PARAMETER_VALUE), | ||
| errmsg("alignment \"%s\" not recognized", alignmentStr))); |
There was a problem hiding this comment.
Suggestion: put this lookup into its own function.
There was a problem hiding this comment.
Thanks for the suggestion, updated!
src/datatype.c
Outdated
| if (pg_strcasecmp(storageStr, "plain") == 0) | ||
| storage = TYPSTORAGE_PLAIN; | ||
| else if (pg_strcasecmp(storageStr, "external") == 0) | ||
| storage = TYPSTORAGE_EXTERNAL; | ||
| else if (pg_strcasecmp(storageStr, "extended") == 0) | ||
| storage = TYPSTORAGE_EXTENDED; | ||
| else if (pg_strcasecmp(storageStr, "main") == 0) | ||
| storage = TYPSTORAGE_MAIN; | ||
| else | ||
| ereport(ERROR, | ||
| (errcode(ERRCODE_INVALID_PARAMETER_VALUE), | ||
| errmsg("storage \"%s\" not recognized", storageStr))); |
There was a problem hiding this comment.
Suggestion: put this lookup into its own function.
Yes, tested locally. See results below. (This PR only changes the API to allow extra type options, it doesn't change the type storage/alignment if already defined)
|
jkatz
left a comment
There was a problem hiding this comment.
LGTM, a couple of minor comments. Nice job!
| static Datum | ||
| pg_tle_create_base_type_internal(Oid typeNamespace, | ||
| char *typeName, | ||
| Oid inputFuncId, | ||
| Oid outputFuncId, | ||
| int16 internalLength, | ||
| char *alignmentStr, | ||
| char *storageStr, | ||
| char *funcProbin); |
There was a problem hiding this comment.
Was this the output from pgindent?
Co-authored-by: Jonathan S. Katz <jkatz@users.noreply.github.com>
|
Thank you all for the review! |
Issue #263.
In the previous version, we used default int4 alignment and plain storage (non-TOASTable). The data will always be stored in-line and not compressed. An error will be reported if the value exceeds the limit:
ERROR: row is too big: size xxx, maximum size xxxWith this change, users can customize the alignment and storage strategies. Compression or out-of-line storage can be used depending on the storage strategy (https://www.postgresql.org/docs/current/storage-toast.html).
The default alignment is int4 while the default storage is plain, compatible with previous versions.
A new version 1.3.5 is created along with this change.