Bugzilla – Bug 3664
ssl_crtd fails to build on OpenSolaris/OpenIndiana/Solaris 11
Last modified: 2017-04-18 01:23:56 UTC
Passing '--enable-ssl-crtd' to configure on OpenSolaris/OpenIndiana/Solaris 11 causes failed build due to absence of flock() in any Solaris version and unavailability of UCB compatibility software since OpenSolaris/OpenIndiana/Solaris 11 g++ -DHAVE_CONFIG_H -I../.. -I../../include -I../../lib -I../../src -I../../include -I/usr/include/gssapi -I/usr/include/kerberosv5 -I/opt/smbsys/include -I/usr/include/libxml2 -m64 -O3 -I/opt/smbsys/include -I/opt/smbbld/include -I/usr/include/gssapi -I/usr/include/kerberosv5 -I/usr/include/libxml2 -Wall -Wpointer-arith -Wwrite-strings -Wcomments -Werror -pipe -D_REENTRANT -pthreads -m64 -m64 -O3 -I/opt/smbsys/include -I/opt/smbbld/include -Wno-error=maybe-uninitialized -MT certificate_db.o -MD -MP -MF .deps/certificate_db.Tpo -c -o certificate_db.o certificate_db.cc certificate_db.cc: In member function 'void Ssl::Lock::lock()': certificate_db.cc:63:19: error: 'LOCK_EX' was not declared in this scope certificate_db.cc: In member function 'void Ssl::Lock::unlock()': certificate_db.cc:78:19: error: 'LOCK_UN' was not declared in this scope make[3]: *** [certificate_db.o] Error 1 make[3]: Leaving directory `/src/smbsys/squid-3.2.1/src/ssl' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/src/smbsys/squid-3.2.1/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/src/smbsys/squid-3.2.1/src' make: *** [all-recursive] Error 1
Already answered at http://www.squid-cache.org/mail-archive/squid-users/201203/0300.html But what about replacing flock/funlock with fcntl(fd, F_SETLK, F_WRLCK) and fcntl(fd, F_SETLK, F_UNLCK)? It at least works :)
Created attachment 2752 [details] changes to certificate_db.cc Or even simpler solution using lockf(fd, F_LOCK, 0) and lockf(fd, F_ULOCK, 0). See attached diff. Working fine for me.
Reproduces with 3.3.9
Using lockf() may be indeed the right approach because, according to manual pages, lockf() is "more standard" than flock() and, hence, it may be more portable. If lockf() is not supported on some platforms we care about, we would need to add one more #ifdef clause to Ssl::Lock::lock/unlock() methods to cover all three known Unix and Windows variants: flock(), lockf(), and LockFile(). As for the attachment 2752 [details] patch you have proposed, could you please comment on the change to the open() call? What does that change do? Is it required? Can you use symbolic flag constant(s) rather than the magical "2"?
"2" at open() stands for O_RDWR flag (while original "0" stands for O_RDONLY). I've just used the same style as in original code here, symbolic name will be definitely better. This is needed because lockf(3C) man at Solaris box says: #include <unistd.h> int lockf(int fildes, int function, off_t size); ... The fildes argument is an open file descriptor. The file descriptor must have O_WRONLY or O_RDWR permission in order to establish locks with this function call. ... So, I've used O_RDWR flag to allow reading from this file descriptor (I've not followed code path but it is logical that we'll be at least reading from it at some time).
Sounds good. The final version should use O_RDWR then.
Any update on this bug?
Comment #4 reflects the current status AFAICT. We now have an explanation regarding the magic constant "2", but somebody would still need to make the change to O_RDWR and verify lockf() portability to produce the final patch.
(In reply to comment #8) > Comment #4 reflects the current status AFAICT. We now have an explanation > regarding the magic constant "2", but somebody would still need to make the > change to O_RDWR and verify lockf() portability to produce the final patch. Alex, I confirmed this patch works perfectly. I've tested it with 3.4.10 on Solaris 10 u11 (kernel 150401-16) and all ok, works like charm. Build options are: root @ ktulhu / # /usr/local/squid/sbin/squid -v Squid Cache: Version 3.4.10 configure options: '--prefix=/usr/local/squid' '--enable-translation' '--enable-external-acl-helpers=file_userip' '--enable-icap-client' '--enable-ipf-transparent' '--enable-storeio=diskd' '--enable-removal-policies=lru,heap' '--enable-devpoll' '--disable-wccp' '--enable-wccpv2' '--enable-http-violations' '--enable-follow-x-forwarded-for' '--enable-arp-acl' '--enable-htcp' '--enable-cache-digests' '--with-dl' '--enable-auth-negotiate=none' '--disable-auth-digest' '--disable-auth-ntlm' '--disable-auth-basic' '--enable-storeid-rewrite-helpers=file' '--enable-log-daemon-helpers=file' '--enable-ssl' '--enable-ssl-crtd' '--with-filedescriptors=131072' '--with-build-environment=POSIX_V6_LP64_OFF64' 'CFLAGS=-O3 -m64 -fPIE -fstack-protector -mtune=core2 --param=ssp-buffer-size=4 -pipe' 'CXXFLAGS=-O3 -m64 -fPIE -fstack-protector -mtune=core2 --param=ssp-buffer-size=4 -pipe' 'CPPFLAGS=-I/opt/csw/include' 'LDFLAGS=-fPIE -pie -Wl,-z,now' --enable-ltdl-convenience
Patch applied with O_RDWR adjustment to Squid-3.
Applied to 3.5 and 3.4
FYI: The committed fix corrupts certificate databases on FreeBSD and possibly other POSIX-compliant platforms. I do not know whether it actually works correctly on Solaris (beyond compilation). See bug 4212 for a script you can use to test your environment.
I'm taking the #ifdef approach mentioned in comment 4 as the fix for bug 4212, making the use of lockf() only apply to Solaris and reverting all other OS to flock() for now. That way we should not have to re-open this bug.
I wonder whether bug 4212 problem exists on Solaris. This bug was about making Squid compile on Solaris, and we know it does now. However, it is possible that Solaris lockf() is as broken as FreeBSD and Linux lockf(). If that is the case, Squid on Solaris will sooner or later corrupt its certificate database. If you are running Squid on Solaris, I recommend that you test your setup for locking failures using Guy's script posted in bug 4212.
(In reply to Alex Rousskov from comment #14) > I wonder whether bug 4212 problem exists on Solaris. This bug was about > making Squid compile on Solaris, and we know it does now. However, it is > possible that Solaris lockf() is as broken as FreeBSD and Linux lockf(). If > that is the case, Squid on Solaris will sooner or later corrupt its > certificate database. > > If you are running Squid on Solaris, I recommend that you test your setup > for locking failures using Guy's script posted in bug 4212. Sure. wrong number of fields on line 6 (looking for field 6, got 1, '' left) /usr/local/squid/libexec/ssl_crtd: The SSL certificate database /var/lib/ssl_db is corrupted. Please rebuild wrong number of fields on line 6 (looking for field 6, got 1, '' left) Invalid response from ssl_crtd: '' length=0 wrong number of fields on line 6 (looking for field /usr/local/squid/libexec/ssl_crtd: 6The SSL certificate database /var/lib/ssl_db is corrupted. Please rebuild, got 1, '' left) Invalid response from ssl_crtd: '' length=0 /usr/local/squid/libexec/ssl_crtd: The SSL certificate database /var/lib/ssl_db is corrupted. Please rebuild Invalid response from ssl_crtd: '' length=0 wrong number of fields on line 6 (looking for field 6, got 1, '' left) /usr/local/squid/libexec/ssl_crtd: The SSL certificate database /var/lib/ssl_db is corrupted. Please rebuild Invalid response from ssl_crtd: '' length=0 So, seems broken. But what is alternative?
(In reply to Yuri Voinov from comment #15) > But what is alternative? Good question. Solaris is not my area of expertise, but after hitting this problem again in a different context, I can offer the following summary: Long term, your best option is to convince the Solaris/Oracle team to add proper support for flock() and/or "Open file description" locks (i.e., fcntl(F_OFD_SETLK...)). I am not saying this to shovel this issue under the carpet and/or upset you. I genuinely believe this is the best long-term solution for this growing problem. Meanwhile, as a temporary workaround, it is possible to substitute flock() with custom locking based on file creation. It would be an ugly (on several levels) Solaris-only solution, but a quality patch implementing such a workaround should be accepted IMO. I cannot volunteer such an implementation, unfortunately. FWIW, googling shows that other projects step on this Solaris landmine as well, iteratively developing an understanding that there is no good alternative for a proper flock()-like system call. For example: https://github.com/golang/go/issues/11113