Well!!!! This is not something i have been fortunate enough to find at a one single place in any book or umpteen tutorials over the internet. There are very good installation documents like
http://www.puschitz.com/InstallingOracle9i.shtml and Metalink note: 184821.1, to name a few, which provide excellent step by step installation procedures but they don't really address the common installation problems consolidated at one single location. Ofcourse, one reason is that some problems are not generic and faced during one particular installation only. Following is an attempt towards the same (with highest of regards to all available notes and tutorials).
Generic installation problems on Linux:
1. Installation can fail during linking phase with errors like "errors in invoking target Install_isqlplus of makefile /u01/app/oracle/product/220.127.116.11/sqlplus/lib/ins_sqlplus.mk"
Reason/Resolution: Linking problems are usually associated with incorrect version of gcc packages of your OS version.
For 9i installation, gcc version should be 3.2.3 and for 10g installation, it should be 3.4.6. You can check the version by the command "gcc -v". Usually, 3.4.6 is the default.
For activating correct gcc version for 9i installation on 32bit OS (i386):
$ mv /usr/bin/gcc /usr/bin/gcc.orig
$ mv /usr/bin/g++ /usr/bin/g++.orig
$ ln -s /usr/bin/i386-redhat-linux-gcc32 /usr/bin/gcc
$ ln -s /usr/bin/i386-redhat-linux-g++32 /usr/bin/g++
For activating correct gcc version for 9i installation on 64bit OS (x86_64):
$ mv /usr/bin/gcc /usr/bin/gcc.orig
$ mv /usr/bin/g++ /usr/bin/g++.orig
$ ln -s /usr/bin/x86_64-redhat-linux-gcc32 /usr/bin/gcc
$ ln -s /usr/bin/x86_64-redhat-linux-g++32 /usr/bin/g++
Refer to Metalink Note: 353529.1 and 169706.1 for installation pre-requisites
2. "There is no non-empty value for variable s_jservPort under section Ports in file /u01/app/oracle/product/9.2.0/Apache/ports.ini"
Reason/Resolution: This problem is usually encountered when you are making a second attempt for installing the software after a failed previous installation. This is an ignorable error. If you open the file : /u01/app/oracle/product/9.2.0/Apache/ports.ini , you will see that the "s_jservPort " might be defined above the "Ports" section . We need to just place this variable under "ports" section. In case you are not using IAS or Grid Control, you can safely ignore this error or do the settings manually as mentioned above. In any case there should be no operational impacts on the database.
3. Errors in writing few files like "error in writing to file /u01/app/oracle/product/9.2.0/Apache/Apache/conf/ssl.key/server.key"
Reason/Resolution: Again, This problem is usually encountered when you are making a second attempt for installing the software after a failed previous installation. The files mentioned in these errors are actually created during the previous attempt and can not be overwritten because they are created as read only while installation. So to proceed with the installation, you need to change the permissions on these files (using chmod) to make them writable. Even better solution is that before starting the installation again, remove the Oracle_Home completely which was created and populated during the previous attempt for installation, and create a fresh and empty directory for Oracle_Home
4. "Error occurred during initialization of VM
Unable to load native library: /tmp/OraInstall2003-10-25_03-14-57PM/jre/lib/i386/libjava.so: symbol __libc_wait, version GLIBC_2.0 not defined in file libc.so.6 with link time reference"
Reason/Resolution: To resolve the __libc_wait symbol issue, download the p3006854_9204 patch p3006854_9204_LINUX.zip from http://metalink.oracle.com/. See bug 3006854 for more information. To apply the patch, run
su - root
# unzip p3006854_9204_LINUX.zip
# cd 3006854
# sh rhel3_pre_install.sh
Patch successfully applied
5. OUI Hangs at 18% - "Copying naeet.o"
Reason/Resolution: The reason is that environment variable LD_ASSUME_KERNEL has not been set. Check the metalink notes:
Note: 360142.1: When Running OUI, OUI Hangs at 18% Copying naeet.o
Note: 377217.1: What should the value of LD_ASSUME_KERNEL be set to for Linux?
Problems specific to 9i RAC installation:
1. On starting the ORACM service, you can get the error
ocmstart.sh: Error: Restart is too frequent
ocmstart.sh: Info: Check the system configuration and fix the problem.
ocmstart.sh: Info: After you fixed the problem, remove the timestamp file
ocmstart.sh: Info: "/u01/app/oracle/product/18.104.22.168/oracm/log/ocmstart.ts"
Reason/Resolution: To resolve this, remove the file $ORACLE_HOME/oracm/log/osmstart.ts and then you should be able to start the service.
2. During installation of CM patch set (like 9207 patchset or 9208 patchset), following error: "error in writing to file '/u01/app/oracle/product/22.214.171.124/oracm/bin/oracm (text file busy)"
Reason/Resolution: This error occurs if you are trying to install the CM patch set without stopping the ORACM service. ORACM services on both nodes should be stopped before installing the CM patch set.
3. After installing Cluster manager, ORACM service should be started on all nodes to proceed with the RDBMS installation. I have faced this situation personally that service does not starts on both nodes. For example, if it is a 2 nodes RAC, service could be started on one node only. Starting the service on one node kills the service on other node.
Reason/Resolution: The service has to be started as root and requires the LD_ASSUME_KERNEL to be set correctly. So i had set the LD_ASSUME_KERNEL properly as "oracle" user and when switching to "root" user to start the service, i was doing "su -" instead of "su". "su -" does not carries over the environment settings and hence value of LD_ASSUME_KERNEL was not carried over to the user "root"
4. When trying to apply the 9208 CM patch set, all nodes were not considered by the installation. Following error was found in the installation logs:
"Cluster nodes cannot be retrieved from the vendor clusterware (/tmp/OraInstall2008-03-20_12-12-02AM/oui/bin/lsnodes.bin: error while loading shared libraries: libcmdll.so: cannot open shared object file: No such file or directory). This system will not be considered as a vendor clusterware"
Also, "lsnodes" command, which can be used to verify all the nodes in the RAC was failing.
Reason/Resolution: Actually the correct order to be followed to install the 9208 RAC should be:
--> 9204 CM
--> 9204 RDBMS
--> 9208 CM Patchset
--> 9208 RDBMS patchset.
The reason for the above error and "lsnodes" failing is that $ORACLE_HOME/lib32 directory does not exist. The file libcmdll.so mentioned in the error is located inside the lib32 directory and lib32 is created only after the installation of RDBMS software and not the CM. So if you don't follow the correct order and try to install 9208 CM patchset after 9204 CM, you'll get this error and "lsnodes" will also not work to show all the nodes in the cluster (which you assume should be there after you have installed 9204 CM successfully). Instead after 9204, you should be installing 9204 RDBMS software. Then if you apply the 9208 patchset, this error won't be seen and "lsnodes" will also work.
5. Always check the inventory on all nodes to verify that correct versions of CM and RDBMS patchsets have been applied on all nodes. Inventory can be verified by launching OUI. The abnormality in versions is more prevalent in CM patchsets where if you apply 9208 CM patchset on one node, the CM version on other node is still 9204. However, this can be true for RDBMS patchsets also. So in such cases, you need to apply the patchset on other node also separately (ideally all the installation in RAC happens from a single node only and other nodes are updated automatically) to have the correct version. You can check the version of CM on each node by following command after starting the ORACM service:
$ head -1 $ORACLE_HOME/oracm/log/cm.log
This was the first set of real life installation problems i have faced. I have more to come soon. So stay tuned and keep checking this space for more updates.
"Suggestions are more than welcome" :-)