numeric-linalg
Educational material on the SciPy implementation of numerical linear algebra algorithms
Name | Size | Mode | |
.. | |||
lapack/DOCS/lawn81.tex | 71100B | -rw-r--r-- |
0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 0040 0041 0042 0043 0044 0045 0046 0047 0048 0049 0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 0060 0061 0062 0063 0064 0065 0066 0067 0068 0069 0070 0071 0072 0073 0074 0075 0076 0077 0078 0079 0080 0081 0082 0083 0084 0085 0086 0087 0088 0089 0090 0091 0092 0093 0094 0095 0096 0097 0098 0099 0100 0101 0102 0103 0104 0105 0106 0107 0108 0109 0110 0111 0112 0113 0114 0115 0116 0117 0118 0119 0120 0121 0122 0123 0124 0125 0126 0127 0128 0129 0130 0131 0132 0133 0134 0135 0136 0137 0138 0139 0140 0141 0142 0143 0144 0145 0146 0147 0148 0149 0150 0151 0152 0153 0154 0155 0156 0157 0158 0159 0160 0161 0162 0163 0164 0165 0166 0167 0168 0169 0170 0171 0172 0173 0174 0175 0176 0177 0178 0179 0180 0181 0182 0183 0184 0185 0186 0187 0188 0189 0190 0191 0192 0193 0194 0195 0196 0197 0198 0199 0200 0201 0202 0203 0204 0205 0206 0207 0208 0209 0210 0211 0212 0213 0214 0215 0216 0217 0218 0219 0220 0221 0222 0223 0224 0225 0226 0227 0228 0229 0230 0231 0232 0233 0234 0235 0236 0237 0238 0239 0240 0241 0242 0243 0244 0245 0246 0247 0248 0249 0250 0251 0252 0253 0254 0255 0256 0257 0258 0259 0260 0261 0262 0263 0264 0265 0266 0267 0268 0269 0270 0271 0272 0273 0274 0275 0276 0277 0278 0279 0280 0281 0282 0283 0284 0285 0286 0287 0288 0289 0290 0291 0292 0293 0294 0295 0296 0297 0298 0299 0300 0301 0302 0303 0304 0305 0306 0307 0308 0309 0310 0311 0312 0313 0314 0315 0316 0317 0318 0319 0320 0321 0322 0323 0324 0325 0326 0327 0328 0329 0330 0331 0332 0333 0334 0335 0336 0337 0338 0339 0340 0341 0342 0343 0344 0345 0346 0347 0348 0349 0350 0351 0352 0353 0354 0355 0356 0357 0358 0359 0360 0361 0362 0363 0364 0365 0366 0367 0368 0369 0370 0371 0372 0373 0374 0375 0376 0377 0378 0379 0380 0381 0382 0383 0384 0385 0386 0387 0388 0389 0390 0391 0392 0393 0394 0395 0396 0397 0398 0399 0400 0401 0402 0403 0404 0405 0406 0407 0408 0409 0410 0411 0412 0413 0414 0415 0416 0417 0418 0419 0420 0421 0422 0423 0424 0425 0426 0427 0428 0429 0430 0431 0432 0433 0434 0435 0436 0437 0438 0439 0440 0441 0442 0443 0444 0445 0446 0447 0448 0449 0450 0451 0452 0453 0454 0455 0456 0457 0458 0459 0460 0461 0462 0463 0464 0465 0466 0467 0468 0469 0470 0471 0472 0473 0474 0475 0476 0477 0478 0479 0480 0481 0482 0483 0484 0485 0486 0487 0488 0489 0490 0491 0492 0493 0494 0495 0496 0497 0498 0499 0500 0501 0502 0503 0504 0505 0506 0507 0508 0509 0510 0511 0512 0513 0514 0515 0516 0517 0518 0519 0520 0521 0522 0523 0524 0525 0526 0527 0528 0529 0530 0531 0532 0533 0534 0535 0536 0537 0538 0539 0540 0541 0542 0543 0544 0545 0546 0547 0548 0549 0550 0551 0552 0553 0554 0555 0556 0557 0558 0559 0560 0561 0562 0563 0564 0565 0566 0567 0568 0569 0570 0571 0572 0573 0574 0575 0576 0577 0578 0579 0580 0581 0582 0583 0584 0585 0586 0587 0588 0589 0590 0591 0592 0593 0594 0595 0596 0597 0598 0599 0600 0601 0602 0603 0604 0605 0606 0607 0608 0609 0610 0611 0612 0613 0614 0615 0616 0617 0618 0619 0620 0621 0622 0623 0624 0625 0626 0627 0628 0629 0630 0631 0632 0633 0634 0635 0636 0637 0638 0639 0640 0641 0642 0643 0644 0645 0646 0647 0648 0649 0650 0651 0652 0653 0654 0655 0656 0657 0658 0659 0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 0670 0671 0672 0673 0674 0675 0676 0677 0678 0679 0680 0681 0682 0683 0684 0685 0686 0687 0688 0689 0690 0691 0692 0693 0694 0695 0696 0697 0698 0699 0700 0701 0702 0703 0704 0705 0706 0707 0708 0709 0710 0711 0712 0713 0714 0715 0716 0717 0718 0719 0720 0721 0722 0723 0724 0725 0726 0727 0728 0729 0730 0731 0732 0733 0734 0735 0736 0737 0738 0739 0740 0741 0742 0743 0744 0745 0746 0747 0748 0749 0750 0751 0752 0753 0754 0755 0756 0757 0758 0759 0760 0761 0762 0763 0764 0765 0766 0767 0768 0769 0770 0771 0772 0773 0774 0775 0776 0777 0778 0779 0780 0781 0782 0783 0784 0785 0786 0787 0788 0789 0790 0791 0792 0793 0794 0795 0796 0797 0798 0799 0800 0801 0802 0803 0804 0805 0806 0807 0808 0809 0810 0811 0812 0813 0814 0815 0816 0817 0818 0819 0820 0821 0822 0823 0824 0825 0826 0827 0828 0829 0830 0831 0832 0833 0834 0835 0836 0837 0838 0839 0840 0841 0842 0843 0844 0845 0846 0847 0848 0849 0850 0851 0852 0853 0854 0855 0856 0857 0858 0859 0860 0861 0862 0863 0864 0865 0866 0867 0868 0869 0870 0871 0872 0873 0874 0875 0876 0877 0878 0879 0880 0881 0882 0883 0884 0885 0886 0887 0888 0889 0890 0891 0892 0893 0894 0895 0896 0897 0898 0899 0900 0901 0902 0903 0904 0905 0906 0907 0908 0909 0910 0911 0912 0913 0914 0915 0916 0917 0918 0919 0920 0921 0922 0923 0924 0925 0926 0927 0928 0929 0930 0931 0932 0933 0934 0935 0936 0937 0938 0939 0940 0941 0942 0943 0944 0945 0946 0947 0948 0949 0950 0951 0952 0953 0954 0955 0956 0957 0958 0959 0960 0961 0962 0963 0964 0965 0966 0967 0968 0969 0970 0971 0972 0973 0974 0975 0976 0977 0978 0979 0980 0981 0982 0983 0984 0985 0986 0987 0988 0989 0990 0991 0992 0993 0994 0995 0996 0997 0998 0999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684
\documentclass[11pt]{report} \usepackage{indentfirst} \usepackage[body={6in,8.5in}]{geometry} \usepackage{hyperref} \usepackage{graphicx} \DeclareGraphicsRule{.ps}{eps}{}{} \renewcommand{\thesection}{\arabic{section}} \setcounter{tocdepth}{3} \setcounter{secnumdepth}{3} \begin{document} \begin{center} {\Large LAPACK Working Note 81\\ Quick Installation Guide for LAPACK on Unix Systems\footnote{This work was supported by NSF Grant No. ASC-8715728 and NSF Grant No. 0444486}} \end{center} \begin{center} % Edward Anderson\footnote{Current address: Cray Research Inc., % 655F Lone Oak Drive, Eagan, MN 55121}, The LAPACK Authors\\ Department of Computer Science \\ University of Tennessee \\ Knoxville, Tennessee 37996-1301 \\ \end{center} \begin{center} REVISED: VERSION 3.1.1, February 2007 \\ REVISED: VERSION 3.2.0, November 2008 \end{center} \begin{center} Abstract \end{center} This working note describes how to install, and test version 3.2.0 of LAPACK, a linear algebra package for high-performance computers, on a Unix System. The timing routines are not actually included in release 3.2.0, and that part of the LAWN refers to release 3.0. Also, version 3.2.0 contains many prototype routines needing user feedback. Non-Unix installation instructions and further details of the testing and timing suites are only contained in LAPACK Working Note 41, and not in this abbreviated version. %Separate instructions are provided for the Unix and non-Unix %versions of the test package. %Further details are also given on the design of the test and timing %programs. \newpage \tableofcontents \newpage % Introduction to Implementation Guide \section{Introduction} LAPACK is a linear algebra library for high-performance computers. The library includes Fortran subroutines for the analysis and solution of systems of simultaneous linear algebraic equations, linear least-squares problems, and matrix eigenvalue problems. Our approach to achieving high efficiency is based on the use of a standard set of Basic Linear Algebra Subprograms (the BLAS), which can be optimized for each computing environment. By confining most of the computational work to the BLAS, the subroutines should be transportable and efficient across a wide range of computers. This working note describes how to install, test, and time this release of LAPACK on a Unix System. The instructions for installing, testing, and timing \footnote{timing are only provided in LAPACK 3.0 and before} are designed for a person whose responsibility is the maintenance of a mathematical software library. We assume the installer has experience in compiling and running Fortran programs and in creating object libraries. The installation process involves untarring the file, creating a set of libraries, and compiling and running the test and timing programs \footnotemark[\value{footnote}]. %This guide combines the instructions for the Unix and non-Unix %versions of the LAPACK test package (the non-Unix version is in Appendix %~\ref{appendixe}). %At this time, the non-Unix version of LAPACK can only be obtained %after first untarring the Unix tar tape and then following the instructions in %Appendix ~\ref{appendixe}. Section~\ref{fileformat} describes how the files are organized in the file, and Section~\ref{overview} gives a general overview of the parts of the test package. Step-by-step instructions appear in Section~\ref{installation}. %for the Unix version and in the appendix for the non-Unix version. For users desiring additional information, please refer to LAPACK Working Note 41. % Sections~\ref{moretesting} %and ~\ref{moretiming} give %details of the test and timing programs and their input files. %Appendices ~\ref{appendixa} and ~\ref{appendixb} briefly describe %the LAPACK routines and auxiliary routines provided %in this release. %Appendix ~\ref{appendixc} lists the operation counts we have computed %for the BLAS and for some of the LAPACK routines. Appendix ~\ref{appendixd}, entitled ``Caveats'', is a compendium of the known problems from our own experiences, with suggestions on how to overcome them. \textbf{It is strongly advised that the user read Appendix A before proceeding with the installation process.} %Appendix E contains the execution times of the different test %and timing runs on two sample machines. %Appendix ~\ref{appendixe} contains the instructions to install LAPACK on a non-Unix %system. \section{Revisions Since the First Public Release} Since its first public release in February, 1992, LAPACK has had several updates, which have encompassed the introduction of new routines as well as extending the functionality of existing routines. The first update, June 30, 1992, was version 1.0a; the second update, October 31, 1992, was version 1.0b; the third update, March 31, 1993, was version 1.1; version 2.0 on September 30, 1994, coincided with the release of the Second Edition of the LAPACK Users' Guide; version 3.0 on June 30, 1999 coincided with the release of the Third Edition of the LAPACK Users' Guide; version 3.1 was released on November, 2006; version 3.1.1 was released on November, 2007; and version 3.2.0 was released on November, 2008. All LAPACK routines reflect the current version number with the date on the routine indicating when it was last modified. For more information on revisions in the latest release, please refer to the \texttt{revisions.info} file in the lapack directory on netlib. \begin{quote} \url{http://www.netlib.org/lapack/revisions.info} \end{quote} %The distribution \texttt{tar} file \texttt{lapack.tar.z} that is %available on netlib is always the most up-to-date. % %On-line manpages (troff files) for LAPACK driver and computational %routines, as well as most of the BLAS routines, are available via %the \texttt{lapack} index on netlib. \section{File Format}\label{fileformat} The software for LAPACK is distributed in the form of a gzipped tar file (via anonymous ftp or the World Wide Web), which contains the Fortran source for LAPACK, the Basic Linear Algebra Subprograms (the Level 1, 2, and 3 BLAS) needed by LAPACK, the testing programs, and the timing programs\footnotemark[\value{footnote}]. Users who wish to have a non-Unix installation should refer to LAPACK Working Note 41, although the overview in section~\ref{overview} applies to both the Unix and non-Unix versions. %Users who wish to have a non-Unix installation should go to Appendix ~\ref{appendixe}, %although the overview in section ~\ref{overview} applies to both the Unix and non-Unix %versions. The package may be accessed via the World Wide Web through the URL address: \begin{quote} \url{http://www.netlib.org/lapack/lapack.tgz} \end{quote} Or, you can retrieve the file via anonymous ftp at netlib: \begin{verbatim} ftp ftp.netlib.org login: anonymous password: <your email address> cd lapack binary get lapack.tgz quit \end{verbatim} The software in the \texttt{tar} file is organized in a number of essential directories as shown in Figure 1. Please note that this figure does not reflect everything that is contained in the \texttt{LAPACK} directory. Input and instructional files are also located at various levels. \begin{figure} \vspace{11pt} \centerline{\includegraphics[width=6.5in,height=3in]{org2.ps}} \caption{Unix organization of LAPACK 3.0} \vspace{11pt} \end{figure} Libraries are created in the LAPACK directory and executable files are created in one of the directories BLAS, TESTING, or TIMING\footnotemark[\value{footnote}]. Input files for the test and timing\footnotemark[\value{footnote}] programs are also found in these three directories so that testing may be carried out in the directories LAPACK/BLAS, LAPACK/TESTING, and LAPACK/TIMING \footnotemark[\value{footnote}]. A top-level makefile in the LAPACK directory is provided to perform the entire installation procedure. \section{Overview of Tape Contents}\label{overview} Most routines in LAPACK occur in four versions: REAL, DOUBLE PRECISION, COMPLEX, and COMPLEX*16. The first three versions (REAL, DOUBLE PRECISION, and COMPLEX) are written in standard Fortran and are completely portable; the COMPLEX*16 version is provided for those compilers which allow this data type. Some routines use features of Fortran 90. For convenience, we often refer to routines by their single precision names; the leading `S' can be replaced by a `D' for double precision, a `C' for complex, or a `Z' for complex*16. For LAPACK use and testing you must decide which version(s) of the package you intend to install at your site (for example, REAL and COMPLEX on a Cray computer or DOUBLE PRECISION and COMPLEX*16 on an IBM computer). \subsection{LAPACK Routines} There are three classes of LAPACK routines: \begin{itemize} \item \textbf{driver} routines solve a complete problem, such as solving a system of linear equations or computing the eigenvalues of a real symmetric matrix. Users are encouraged to use a driver routine if there is one that meets their requirements. The driver routines are listed in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}. %in Appendix ~\ref{appendixa}. \item \textbf{computational} routines, also called simply LAPACK routines, perform a distinct computational task, such as computing the $LU$ decomposition of an $m$-by-$n$ matrix or finding the eigenvalues and eigenvectors of a symmetric tridiagonal matrix using the $QR$ algorithm. The LAPACK routines are listed in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}. %The LAPACK routines are listed in Appendix ~\ref{appendixa}; see also LAPACK %Working Note \#5 \cite{WN5}. \item \textbf{auxiliary} routines are all the other subroutines called by the driver routines and computational routines. %Among them are subroutines to perform subtasks of block algorithms, %in particular, the unblocked versions of the block algorithms; %extensions to the BLAS, such as matrix-vector operations involving %complex symmetric matrices; %the special routines LSAME and XERBLA which first appeared with the %BLAS; %and a number of routines to perform common low-level computations, %such as computing a matrix norm, generating an elementary Householder %transformation, and applying a sequence of plane rotations. %Many of the auxiliary routines may be of use to numerical analysts %or software developers, so we have documented the Fortran source for %these routines with the same level of detail used for the LAPACK %routines and driver routines. The auxiliary routines are listed in LAPACK Working Note 41~\cite{WN41} and the LAPACK Users' Guide~\cite{LUG}. %The auxiliary routines are listed in Appendix ~\ref{appendixb}. \end{itemize} \subsection{Level 1, 2, and 3 BLAS} The BLAS are a set of Basic Linear Algebra Subprograms that perform vector-vector, matrix-vector, and matrix-matrix operations. LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all of the parallelism in the LAPACK routines is contained in the BLAS. Therefore, the key to getting good performance from LAPACK lies in having an efficient version of the BLAS optimized for your particular machine. Optimized BLAS libraries are available on a variety of architectures, refer to the BLAS FAQ on netlib for further information. \begin{quote} \url{http://www.netlib.org/blas/faq.html} \end{quote} There are also freely available BLAS generators that automatically tune a subset of the BLAS for a given architecture. E.g., \begin{quote} \url{http://www.netlib.org/atlas/} \end{quote} And, if all else fails, there is the Fortran~77 reference implementation of the Level 1, 2, and 3 BLAS available on netlib (also included in the LAPACK distribution tar file). \begin{quote} \url{http://www.netlib.org/blas/blas.tgz} \end{quote} No matter which BLAS library is used, the BLAS test programs should always be run. Users should not expect too much from the Fortran~77 reference implementation BLAS; these versions were written to define the basic operations and do not employ the standard tricks for optimizing Fortran code. The formal definitions of the Level 1, 2, and 3 BLAS are in \cite{BLAS1}, \cite{BLAS2}, and \cite{BLAS3}. The BLAS Quick Reference card is available on netlib. \subsection{Mixed- and Extended-Precision BLAS: XBLAS} The XBLAS extend the BLAS to work with mixed input and output precisions as well as using extra precision internally. The XBLAS are used in the prototype extra-precise iterative refinement codes. The current release of the XBLAS is available through Netlib\footnote{Development versions may be available through \url{http://www.cs.berkeley.edu/~yozo/} or \url{http://www.nersc.gov/~xiaoye/XBLAS/}.} at \begin{quote} \url{http://www.netlib.org/xblas} \end{quote} Their formal definition is in \cite{XBLAS}. \subsection{LAPACK Test Routines} This release contains two distinct test programs for LAPACK routines in each data type. One test program tests the routines for solving linear equations and linear least squares problems, and the other tests routines for the matrix eigenvalue problem. The routines for generating test matrices are used by both test programs and are compiled into a library for use by both test programs. \subsection{LAPACK Timing Routines (for LAPACK 3.0 and before) } This release also contains two distinct timing programs for the LAPACK routines in each data type. The linear equation timing program gathers performance data in megaflops on the factor, solve, and inverse routines for solving linear systems, the routines to generate or apply an orthogonal matrix given as a sequence of elementary transformations, and the reductions to bidiagonal, tridiagonal, or Hessenberg form for eigenvalue computations. The operation counts used in computing the megaflop rates are computed from a formula; see LAPACK Working Note 41~\cite{WN41}. % see Appendix ~\ref{appendixc}. The eigenvalue timing program is used with the eigensystem routines and returns the execution time, number of floating point operations, and megaflop rate for each of the requested subroutines. In this program, the number of operations is computed while the code is executing using special instrumented versions of the LAPACK subroutines. \section{Installing LAPACK on a Unix System}\label{installation} Installing, testing, and timing\footnotemark[\value{footnote}] the Unix version of LAPACK involves the following steps: \begin{enumerate} \item Gunzip and tar the file. \item Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}. \item Edit the file \texttt{LAPACK/Makefile} and type \texttt{make}. %\item Test and Install the Machine-Dependent Routines \\ %\emph{(WARNING: You may need to supply a correct version of second.f and %dsecnd.f for your machine)} %{\tt %\begin{list}{}{} %\item cd LAPACK %\item make install %\end{list} } % %\item Create the BLAS Library, \emph{if necessary} \\ %\emph{(NOTE: For best performance, it is recommended you use the manufacturers' BLAS)} %{\tt %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make blaslib} %\end{list} } % %\item Run the Level 1, 2, and 3 BLAS Test Programs %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make blas\_testing} %\end{list} % %\item Create the LAPACK Library %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make lapacklib} %\end{list} % %\item Create the Library of Test Matrix Generators %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make tmglib} %\end{list} % %\item Run the LAPACK Test Programs %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make testing} %\end{list} % %\item Run the LAPACK Timing Programs %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make timing} %\end{list} % %\item Run the BLAS Timing Programs %\begin{list}{}{} %\item \texttt{cd LAPACK} %\item \texttt{make blas\_timing} %\end{list} \end{enumerate} \subsection{Untar the File} If you received a tar file of LAPACK via the World Wide Web or anonymous ftp, enter the following command: \begin{list}{} \item{\texttt{gunzip -c lapack.tgz | tar xvf -}} \end{list} \noindent This will create a top-level directory called \texttt{LAPACK}, which requires approximately 34 Mbytes of disk space. The total space requirements including the object files and executables is approximately 100 Mbytes for all four data types. \subsection{Copy and edit the file \texttt{LAPACK/make.inc.example to LAPACK/make.inc}} Before the libraries can be built, or the testing and timing\footnotemark[\value{footnote}] programs run, you must define all machine-specific parameters for the architecture to which you are installing LAPACK. All machine-specific parameters are contained in the file \texttt{LAPACK/make.inc}. An example of \texttt{LAPACK/make.inc} for a LINUX machine with GNU compilers is given in \texttt{LAPACK/make.inc.example}, copy that file to LAPACK/make.inc by entering the following command: \begin{list}{} \item{\texttt{cp LAPACK/make.inc.example LAPACK/make.inc}} \end{list} \noindent Now modify your \texttt{LAPACK/make.inc} by applying the following recommendations. The first line of this \texttt{make.inc} file is: \begin{quote} SHELL = /bin/sh \end{quote} and it will need to be modified to \texttt{SHELL = /sbin/sh} if you are installing LAPACK on an SGI architecture. Next, you will need to modify \texttt{FC}, \texttt{FFLAGS}, \texttt{FFLAGS\_DRV}, \texttt{FFLAGS\_NOOPT}, and \texttt{LDFLAGS} to specify the compiler, compiler options, compiler options for the testing and timing\footnotemark[\value{footnote}] main programs, and linker options. Next you will have to choose which function you will use to time in the \texttt{SECOND} and \texttt{DSECND} routines. \begin{verbatim} # Default: SECOND and DSECND will use a call to the # EXTERNAL FUNCTION ETIME #TIMER = EXT_ETIME # For RS6K: SECOND and DSECND will use a call to the # EXTERNAL FUNCTION ETIME_ #TIMER = EXT_ETIME_ # For gfortran compiler: SECOND and DSECND will use a call to the # INTERNAL FUNCTION ETIME TIMER = INT_ETIME # If your Fortran compiler does not provide etime (like Nag Fortran # Compiler, etc...) SECOND and DSECND will use a call to the # INTERNAL FUNCTION CPU_TIME #TIMER = INT_CPU_TIME # If none of these work, you can use the NONE value. # In that case, SECOND and DSECND will always return 0. #TIMER = NONE \end{verbatim} Refer to the section~\ref{second} to get more information. Next, you will need to modify \texttt{AR}, \texttt{ARFLAGS}, and \texttt{RANLIB} to specify archiver, archiver options, and ranlib for your machine. If your architecture does not require \texttt{ranlib} to be run after each archive command (as is the case with CRAY computers running UNICOS, Hewlett Packard computers running HP-UX, or SUN SPARCstations running Solaris), set \texttt{RANLIB = echo}. And finally, you must modify the \texttt{BLASLIB} definition to specify the BLAS library to which you will be linking. If an optimized version of the BLAS is available on your machine, you are highly recommended to link to that library. Otherwise, by default, \texttt{BLASLIB} is set to the Fortran~77 version. If you want to enable the XBLAS, define the variable \texttt{USEXBLAS} to some value, for example \texttt{USEXBLAS = Yes}. Then set the variable \texttt{XBLASLIB} to point at the XBLAS library. Note that the prototype iterative refinement routines and their testers will not be built unless \texttt{USEXBLAS} is defined. \textbf{NOTE:} Example \texttt{make.inc} include files are contained in the \texttt{LAPACK/INSTALL} directory. Please refer to Appendix~\ref{appendixd} for machine-specific installation hints, and/or the \texttt{release\_notes} file on \texttt{netlib}. \begin{quote} \url{http://www.netlib.org/lapack/release\_notes} \end{quote} \subsection{Edit the file \texttt{LAPACK/Makefile}}\label{toplevelmakefile} This \texttt{Makefile} can be modified to perform as much of the installation process as the user desires. Ideally, this is the ONLY makefile the user must modify. However, modification of lower-level makefiles may be necessary if a specific routine needs to be compiled with a different level of optimization. First, edit the definitions of \texttt{blaslib}, \texttt{lapacklib}, \texttt{tmglib}, \texttt{lapack\_testing}, and \texttt{timing}\footnotemark[\value{footnote}] in the file \texttt{LAPACK/Makefile} to specify the data types desired. For example, if you only wish to compile the single precision real version of the LAPACK library, you would modify the \texttt{lapacklib} definition to be: \begin{verbatim} lapacklib: $(MAKE) -C SRC single \end{verbatim} Likewise, you could specify \texttt{double, complex, or complex16} to build the double precision real, single precision complex, or double precision complex libraries, respectively. By default, the presence of no arguments following the \texttt{make} command will result in the building of all four data types. The make command can be run more than once to add another data type to the library if necessary. %If you are installing LAPACK on a Silicon Graphics machine, you must %modify the respective definitions of \texttt{testing} and \texttt{timing} to be %\begin{verbatim} %testing: % ( cd TESTING; $(MAKE) -f Makefile.sgi ) %\end{verbatim} %and %\begin{verbatim} %timing: % ( cd TIMING; $(MAKE) -f Makefile.sgi ) %\end{verbatim} Next, if you will be using a locally available BLAS library, you will need to remove \texttt{blaslib} from the \texttt{lib} definition. And finally, if you do not wish to build all of the libraries individually and likewise run all of the testing and timing separately, you can modify the \texttt{all} definition to specify the amount of the installation process that you want performed. By default, the \texttt{all} definition is set to \begin{verbatim} all: lapack_install lib lapack_testing blas_testing \end{verbatim} which will perform all phases of the installation process -- testing of machine-dependent routines, building the libraries, BLAS testing and LAPACK testing. The entire installation process will then be performed by typing \texttt{make}. Questions and/or comments can be directed to the authors as described in Section~\ref{sendresults}. If test failures occur, please refer to the appropriate subsection in Section~\ref{furtherdetails}. If disk space is limited, we suggest building each data type separately and/or deleting all object files after building the libraries. Likewise, all testing and timing executables can be deleted after the testing and timing process is completed. The removal of all object files and executables can be accomplished by the following: \begin{list}{}{} \item \texttt{cd LAPACK} \item \texttt{make cleanobj} \end{list} \section{Further Details of the Installation Process}\label{furtherdetails} Alternatively, you can choose to run each of the phases of the installation process separately. The following sections give details on how this may be achieved. \subsection{Test and Install the Machine-Dependent Routines.} There are six machine-dependent functions in the test and timing package, at least three of which must be installed. They are \begin{tabbing} MONOMO \= DOUBLE PRECISION \= \kill LSAME \> LOGICAL \> Test if two characters are the same regardless of case \\ SLAMCH \> REAL \> Determine machine-dependent parameters \\ DLAMCH \> DOUBLE PRECISION \> Determine machine-dependent parameters \\ SECOND \> REAL \> Return time in seconds from a fixed starting time \\ DSECND \> DOUBLE PRECISION \> Return time in seconds from a fixed starting time\\ ILAENV \> INTEGER \> Checks that NaN and infinity arithmetic are IEEE-754 compliant \end{tabbing} \noindent If you are working only in single precision, you do not need to install DLAMCH and DSECND, and if you are working only in double precision, you do not need to install SLAMCH and SECOND. These six subroutines are provided in \texttt{LAPACK/INSTALL}, along with six test programs. To compile the six test programs and run the tests, go to \texttt{LAPACK} and type \texttt{make lapack\_install}. The test programs are called \texttt{testlsame, testslamch, testdlamch, testsecond, testdsecnd} and \texttt{testieee}. If you do not wish to run all tests, you will need to modify the \texttt{lapack\_install} definition in the \texttt{LAPACK/Makefile} to only include the tests you wish to run. Otherwise, all tests will be performed. The expected results of each test program are described below. \subsubsection{Installing LSAME} LSAME is a logical function with two character parameters, A and B. It returns .TRUE. if A and B are the same regardless of case, or .FALSE. if they are different. For example, the expression \begin{list}{}{} \item \texttt{LSAME( UPLO, 'U' )} \end{list} \noindent is equivalent to \begin{list}{}{} \item \texttt{( UPLO.EQ.'U' ).OR.( UPLO.EQ.'u' )} \end{list} The test program in \texttt{lsametst.f} tests all combinations of the same character in upper and lower case for A and B, and two cases where A and B are different characters. Run the test program by typing \texttt{testlsame}. If LSAME works correctly, the only message you should see after the execution of \texttt{testlsame} is \begin{verbatim} ASCII character set Tests completed \end{verbatim} The file \texttt{lsame.f} is automatically copied to \texttt{LAPACK/BLAS/SRC/} and \texttt{LAPACK/SRC/}. The function LSAME is needed by both the BLAS and LAPACK, so it is safer to have it in both libraries as long as this does not cause trouble in the link phase when both libraries are used. \subsubsection{Installing SLAMCH and DLAMCH} SLAMCH and DLAMCH are real functions with a single character parameter that indicates the machine parameter to be returned. The test program in \texttt{slamchtst.f} simply prints out the different values computed by SLAMCH, so you need to know something about what the values should be. For example, the output of the test program executable \texttt{testslamch} for SLAMCH on a Sun SPARCstation is \begin{verbatim} Epsilon = 5.96046E-08 Safe minimum = 1.17549E-38 Base = 2.00000 Precision = 1.19209E-07 Number of digits in mantissa = 24.0000 Rounding mode = 1.00000 Minimum exponent = -125.000 Underflow threshold = 1.17549E-38 Largest exponent = 128.000 Overflow threshold = 3.40282E+38 Reciprocal of safe minimum = 8.50706E+37 \end{verbatim} On a Cray machine, the safe minimum underflows its output representation and the overflow threshold overflows its output representation, so the safe minimum is printed as 0.00000 and overflow is printed as R. This is normal. If you would prefer to print a representable number, you can modify the test program to print SFMIN*100. and RMAX/100. for the safe minimum and overflow thresholds. Likewise, the test executable \texttt{testdlamch} is run for DLAMCH. If both tests were successful, go to Section~\ref{second}. If SLAMCH (or DLAMCH) returns an invalid value, you will have to create your own version of this function. The following options are used in LAPACK and must be set: \begin{list}{}{} \item {`B': } Base of the machine \item {`E': } Epsilon (relative machine precision) \item {`O': } Overflow threshold \item {`P': } Precision = Epsilon*Base \item {`S': } Safe minimum (often same as underflow threshold) \item {`U': } Underflow threshold \end{list} Some people may be familiar with R1MACH (D1MACH), a primitive routine for setting machine parameters in which the user must comment out the appropriate assignment statements for the target machine. If a version of R1MACH is on hand, the assignments in SLAMCH can be made to refer to R1MACH using the correspondence \begin{list}{}{} \item {SLAMCH( `U' )} $=$ R1MACH( 1 ) \item {SLAMCH( `O' )} $=$ R1MACH( 2 ) \item {SLAMCH( `E' )} $=$ R1MACH( 3 ) \item {SLAMCH( `B' )} $=$ R1MACH( 5 ) \end{list} \noindent The safe minimum returned by SLAMCH( 'S' ) is initially set to the underflow value, but if $1/(\mathrm{overflow}) \geq (\mathrm{underflow})$ it is recomputed as $(1/(\mathrm{overflow})) * ( 1 + \varepsilon )$, where $\varepsilon$ is the machine precision. BE AWARE that the initial call to SLAMCH or DLAMCH is expensive. We suggest that installers run it once, save the results, and hard-code the constants in the version they put in their library. \subsubsection{Installing SECOND and DSECND}\label{second} Both the timing routines\footnotemark[\value{footnote}] and the test routines call SECOND (DSECND), a real function with no arguments that returns the time in seconds from some fixed starting time. Our version of this routine returns only ``user time'', and not ``user time $+$ system time''. The following version of SECOND in \texttt{second\_EXT\_ETIME.f, second\_INT\_ETIME.f} calls ETIME, a Fortran library routine available on some computer systems. If ETIME is not available or a better local timing function exists, you will have to provide the correct interface to SECOND and DSECND on your machine. Since LAPACK 3.1.1 we provide 5 different flavours of the SECOND and DSECND routines. The version that will be used depends on the value of the TIMER variable in the make.inc \begin{itemize} \item If ETIME is available as an external function, set the value of the TIMER variable in your make.inc to \texttt{EXT\_ETIME}: \texttt{second\_EXT\_ETIME.f} and \texttt{dsecnd\_EXT\_ETIME.f} will be used. Usually on HPPA architectures, the compiler and linker flag \texttt{+U77} should be included to access the function \texttt{ETIME}. \item If ETIME\_ is available as an external function, set the value of the TIMER variable in your make.inc to \texttt{EXT\_ETIME\_}: \texttt{second\_EXT\_ETIME\_.f} and \texttt{dsecnd\_EXT\_ETIME\_.f} will be used. It is the case on some IBM architectures such as IBM RS/6000s. \item If ETIME is available as an internal function, set the value of the TIMER variable in your make.inc to \texttt{INT\_ETIME}: \texttt{second\_INT\_ETIME.f} and \texttt{dsecnd\_INT\_ETIME.f} will be used. This is the case with gfortan. \item If CPU\_TIME is available as an internal function, set the value of the TIMER variable in your make.inc to \texttt{INT\_CPU\_TIME}: \texttt{second\_INT\_CPU\_TIME.f} and \texttt{dsecnd\_INT\_CPU\_TIME.f} will be used. \item If none of these function is available, set the value of the TIMER variable in your make.inc to \texttt{NONE}: \texttt{second\_NONE.f} and \texttt{dsecnd\_NONE.f} will be used. These routines will always return zero. \end{itemize} The test program in \texttt{secondtst.f} performs a million operations using 5000 iterations of the SAXPY operation $y := y + \alpha x$ on a vector of length 100. The total time and megaflops for this test is reported, then the operation is repeated including a call to SECOND on each of the 5000 iterations to determine the overhead due to calling SECOND. The test program executable is called \texttt{testsecond} (or \texttt{testdsecnd}). There is no single right answer, but the times in seconds should be positive and the megaflop ratios should be appropriate for your machine. \subsubsection{Testing IEEE arithmetic and ILAENV}\label{testieee} %\textbf{If you are installing LAPACK on a non-IEEE machine, you MUST %modify ILAENV! Otherwise, ILAENV will crash . By default, ILAENV %assumes an IEEE machine, and does a test for IEEE-754 compliance.} As some new routines in LAPACK rely on IEEE-754 compliance, two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and infinity arithmetic, respectively. By default, ILAENV assumes an IEEE machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, as this test inside ILAENV will crash!} If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance, and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant. Thus, for non-IEEE machines, the user must hard-code the setting of (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in his library. There are also specialized testing and timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified. \begin{itemize} \item Testing/timing version of \texttt{LAPACK/TESTING/LIN/ilaenv.f} \item Testing/timing version of \texttt{LAPACK/TESTING/EIG/ilaenv.f} \item Testing/timing version of \texttt{LAPACK/TIMING/LIN/ilaenv.f} \item Testing/timing version of \texttt{LAPACK/TIMING/EIG/ilaenv.f} \end{itemize} %Some new routines in LAPACK rely on IEEE-754 compliance, and if non-compliance %is detected (via a call to the function ILAENV), alternative (slower) %algorithms will be chosen. %For further details, refer to the leading comments of routines such %as \texttt{LAPACK/SRC/sstevr.f}. The test program in \texttt{LAPACK/INSTALL/tstiee.f} checks an installation architecture to see if infinity arithmetic and NaN arithmetic are IEEE-754 compliant. A warning message to the user is printed if non-compliance is detected. This same test is performed inside the function ILAENV. If \texttt{ILAENV( 10, $\ldots$ )} or \texttt{ILAENV( 11, $\ldots$ )} is issued, then \texttt{ILAENV=1} is returned to signal IEEE-754 compliance, and \texttt{ILAENV=0} if the architecture is non-IEEE-754 compliant. To avoid this IEEE test being run every time you call \texttt{ILAENV( 10, $\ldots$)} or \texttt{ILAENV( 11, $\ldots$ )}, we suggest that the user hard-code the setting of \texttt{ILAENV=1} or \texttt{ILAENV=0} in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in his library. As aforementioned, there are also specialized testing and timing\footnotemark[\value{footnote}] versions of ILAENV that will also need to be modified. \subsection{Create the BLAS Library} Ideally, a highly optimized version of the BLAS library already exists on your machine. In this case you can go directly to Section~\ref{testblas} to make the BLAS test programs. \begin{itemize} \item[a)] Go to \texttt{LAPACK} and edit the definition of \texttt{blaslib} in the file \texttt{Makefile} to specify the data types desired, as in the example in Section~\ref{toplevelmakefile}. If you already have some of the BLAS, you will need to edit the file \texttt{LAPACK/BLAS/SRC/Makefile} to comment out the lines defining the BLAS you have. \item[b)] Type \texttt{make blaslib}. The make command can be run more than once to add another data type to the library if necessary. \end{itemize} \noindent The BLAS library is created in \texttt{LAPACK/librefblas.a}, or in the user-defined location specified by \texttt{BLASLIB} in the file \texttt{LAPACK/make.inc}. \subsection{Run the BLAS Test Programs}\label{testblas} Test programs for the Level 1, 2, and 3 BLAS are in the directory \texttt{LAPACK/BLAS/TESTING}. To compile and run the Level 1, 2, and 3 BLAS test programs, go to \texttt{LAPACK} and type \texttt{make blas\_testing}. The executable files are called \texttt{xblat\_s}, \texttt{xblat\_d}, \texttt{xblat\_c}, and \texttt{xblat\_z}, where the \_ (underscore) is replaced by 1, 2, or 3, depending upon the level of BLAS that it is testing. All executable and output files are created in \texttt{LAPACK/BLAS/}. For the Level 1 BLAS tests, the output file names are \texttt{sblat1.out}, \texttt{dblat1.out}, \texttt{cblat1.out}, and \texttt{zblat1.out}. For the Level 2 and 3 BLAS, the name of the output file is indicated on the first line of the input file and is currently defined to be \texttt{sblat2.out} for the Level 2 REAL version, and \texttt{sblat3.out} for the Level 3 REAL version, with similar names for the other data types. If the tests using the supplied data files were completed successfully, consider whether the tests were sufficiently thorough. For example, on a machine with vector registers, at least one value of $N$ greater than the length of the vector registers should be used; otherwise, important parts of the compiled code may not be exercised by the tests. If the tests were not successful, either because the program did not finish or the test ratios did not pass the threshold, you will probably have to find and correct the problem before continuing. If you have been testing a system-specific BLAS library, try using the Fortran BLAS for the routines that did not pass the tests. For more details on the BLAS test programs, see \cite{BLAS2-test} and \cite{BLAS3-test}. \subsection{Create the LAPACK Library} \begin{itemize} \item[a)] Go to the directory \texttt{LAPACK} and edit the definition of \texttt{lapacklib} in the file \texttt{Makefile} to specify the data types desired, as in the example in Section~\ref{toplevelmakefile}. \item[b)] Type \texttt{make lapacklib}. The make command can be run more than once to add another data type to the library if necessary. \end{itemize} \noindent The LAPACK library is created in \texttt{LAPACK/liblapack.a}, or in the user-defined location specified by \texttt{LAPACKLIB} in the file \texttt{LAPACK/make.inc}. \subsection{Create the Test Matrix Generator Library} \begin{itemize} \item[a)] Go to the directory \texttt{LAPACK} and edit the definition of \texttt{tmglib} in the file \texttt{Makefile} to specify the data types desired, as in the example in Section~\ref{toplevelmakefile}. \item[b)] Type \texttt{make tmglib}. The make command can be run more than once to add another data type to the library if necessary. \end{itemize} \noindent The test matrix generator library is created in \texttt{LAPACK/libtmglib.a}, or in the user-defined location specified by \texttt{TMGLIB} in the file \texttt{LAPACK/make.inc}. \subsection{Run the LAPACK Test Programs} There are two distinct test programs for LAPACK routines in each data type, one for the linear equation routines and one for the eigensystem routines. In each data type, there is one input file for testing the linear equation routines and eighteen input files for testing the eigenvalue routines. The input files reside in \texttt{LAPACK/TESTING}. For more information on the test programs and how to modify the input files, please refer to LAPACK Working Note 41~\cite{WN41}. % see Section~\ref{moretesting}. If you do not wish to run each of the tests individually, you can go to \texttt{LAPACK}, edit the definition \texttt{lapack\_testing} in the file \texttt{Makefile} to specify the data types desired, and type \texttt{make lapack\_testing}. This will compile and run the tests as described in sections~\ref{testlin} and ~\ref{testeig}. %If you are installing LAPACK on a Silicon Graphics machine, you must %modify the definition of \texttt{testing} to be %\begin{verbatim} %testing: % ( cd TESTING; $(MAKE) -f Makefile.sgi ) %\end{verbatim} \subsubsection{Testing the Linear Equations Routines}\label{testlin} \begin{itemize} \item[a)] Go to \texttt{LAPACK/TESTING/LIN} and type \texttt{make} followed by the data types desired. The executable files are called \texttt{xlintsts, xlintstc, xlintstd}, or \texttt{xlintstz} and are created in \texttt{LAPACK/TESTING}. \item[b)] Go to \texttt{LAPACK/TESTING} and run the tests for each data type. For the REAL version, the command is \begin{list}{}{} \item{} \texttt{xlintsts < stest.in > stest.out} \end{list} \noindent The tests using \texttt{xlintstd}, \texttt{xlintstc}, and \texttt{xlintstz} are similar with the leading `s' in the input and output file names replaced by `d', `c', or `z'. \end{itemize} If you encountered failures in this phase of the testing process, please refer to Section~\ref{sendresults}. \subsubsection{Testing the Eigensystem Routines}\label{testeig} \begin{itemize} \item[a)] Go to \texttt{LAPACK/TESTING/EIG} and type \texttt{make} followed by the data types desired. The executable files are called \texttt{xeigtsts, xeigtstc, xeigtstd}, and \texttt{xeigtstz} and are created in \texttt{LAPACK/TESTING}. \item[b)] Go to \texttt{LAPACK/TESTING} and run the tests for each data type. The tests for the eigensystem routines use eighteen separate input files for testing the nonsymmetric eigenvalue problem, the symmetric eigenvalue problem, the banded symmetric eigenvalue problem, the generalized symmetric eigenvalue problem, the generalized nonsymmetric eigenvalue problem, the singular value decomposition, the banded singular value decomposition, the generalized singular value decomposition, the generalized QR and RQ factorizations, the generalized linear regression model, and the constrained linear least squares problem. The tests for the REAL version are as follows: \begin{list}{}{} \item \texttt{xeigtsts < nep.in > snep.out} \item \texttt{xeigtsts < sep.in > ssep.out} \item \texttt{xeigtsts < svd.in > ssvd.out} \item \texttt{xeigtsts < sec.in > sec.out} \item \texttt{xeigtsts < sed.in > sed.out} \item \texttt{xeigtsts < sgg.in > sgg.out} \item \texttt{xeigtsts < sgd.in > sgd.out} \item \texttt{xeigtsts < ssg.in > ssg.out} \item \texttt{xeigtsts < ssb.in > ssb.out} \item \texttt{xeigtsts < sbb.in > sbb.out} \item \texttt{xeigtsts < sbal.in > sbal.out} \item \texttt{xeigtsts < sbak.in > sbak.out} \item \texttt{xeigtsts < sgbal.in > sgbal.out} \item \texttt{xeigtsts < sgbak.in > sgbak.out} \item \texttt{xeigtsts < glm.in > sglm.out} \item \texttt{xeigtsts < gqr.in > sgqr.out} \item \texttt{xeigtsts < gsv.in > sgsv.out} \item \texttt{xeigtsts < lse.in > slse.out} \end{list} The tests using \texttt{xeigtstc}, \texttt{xeigtstd}, and \texttt{xeigtstz} also use the input files \texttt{nep.in}, \texttt{sep.in}, \texttt{svd.in}, \texttt{glm.in}, \texttt{gqr.in}, \texttt{gsv.in}, and \texttt{lse.in}, but the leading `s' in the other input file names must be changed to `c', `d', or `z'. \end{itemize} If you encountered failures in this phase of the testing process, please refer to Section~\ref{sendresults}. \subsection{Run the LAPACK Timing Programs (For LAPACK 3.0 and before)} There are two distinct timing programs for LAPACK routines in each data type, one for the linear equation routines and one for the eigensystem routines. The timing program for the linear equation routines is also used to time the BLAS. We encourage you to conduct these timing experiments in REAL and COMPLEX or in DOUBLE PRECISION and COMPLEX*16; it is not necessary to send timing results in all four data types. Two sets of input files are provided, a small set and a large set. The small data sets are appropriate for a standard workstation or other non-vector machine. The large data sets are appropriate for supercomputers, vector computers, and high-performance workstations. We are mainly interested in results from the large data sets, and it is not necessary to run both the large and small sets. The values of N in the large data sets are about five times larger than those in the small data set, and the large data sets use additional values for parameters such as the block size NB and the leading array dimension LDA. Small data sets finished with the \_small in their name , such as \texttt{stime\_small.in}, and large data sets finished with \_large in their name, such as \texttt{stime\_large.in}. Except as noted, the leading `s' in the input file name must be replaced by `d', `c', or `z' for the other data types. We encourage you to obtain timing results with the large data sets, as this allows us to compare different machines. If this would take too much time, suggestions for paring back the large data sets are given in the instructions below. We also encourage you to experiment with these timing programs and send us any interesting results, such as results for larger problems or for a wider range of block sizes. The main programs are dimensioned for the large data sets, so the parameters in the main program may have to be reduced in order to run the small data sets on a small machine, or increased to run experiments with larger problems. The minimum time each subroutine will be timed is set to 0.0 in the large data files and to 0.05 in the small data files, and on many machines this value should be increased. If the timing interval is not long enough, the time for the subroutine after subtracting the overhead may be very small or zero, resulting in megaflop rates that are very large or zero. (To avoid division by zero, the megaflop rate is set to zero if the time is less than or equal to zero.) The minimum time that should be used depends on the machine and the resolution of the clock. For more information on the timing programs and how to modify the input files, please refer to LAPACK Working Note 41~\cite{WN41}. % see Section~\ref{moretiming}. If you do not wish to run each of the timings individually, you can go to \texttt{LAPACK}, edit the definition \texttt{lapack\_timing} in the file \texttt{Makefile} to specify the data types desired, and type \texttt{make lapack\_timing}. This will compile and run the timings for the linear equation routines and the eigensystem routines (see Sections~\ref{timelin} and ~\ref{timeeig}). %If you are installing LAPACK on a Silicon Graphics machine, you must %modify the definition of \texttt{timing} to be %\begin{verbatim} %timing: % ( cd TIMING; $(MAKE) -f Makefile.sgi ) %\end{verbatim} If you encounter failures in any phase of the timing process, please feel free to contact the authors as directed in Section~\ref{sendresults}. Tell us the type of machine on which the tests were run, the version of the operating system, the compiler and compiler options that were used, and details of the BLAS library or libraries that you used. You should also include a copy of the output file in which the failure occurs. Please note that the BLAS timing runs will still need to be run as instructed in ~\ref{timeblas}. \subsubsection{Timing the Linear Equations Routines}\label{timelin} The linear equation timing program is found in \texttt{LAPACK/TIMING/LIN} and the input files are in \texttt{LAPACK/TIMING}. Three input files are provided in each data type for timing the linear equation routines, one for square matrices, one for band matrices, and one for rectangular matrices. The small data sets for the REAL version are \texttt{stime\_small.in}, \texttt{sband\_small.in}, and \texttt{stime2\_small.in}, respectively, and the large data sets are \texttt{stime\_large.in}, \texttt{sband\_large.in}, and \texttt{stime2\_large.in}. The timing program for the least squares routines uses special instrumented versions of the LAPACK routines to time individual sections of the code. The first step in compiling the timing program is therefore to make a library of the instrumented routines. \begin{itemize} \item[a)] \begin{sloppypar} To make a library of the instrumented LAPACK routines, first go to \texttt{LAPACK/TIMING/LIN/LINSRC} and type \texttt{make} followed by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. The library of instrumented code is created in \texttt{LAPACK/TIMING/LIN/linsrc.a}. \end{sloppypar} \item[b)] To make the linear equation timing programs, go to \texttt{LAPACK/TIMING/LIN} and type \texttt{make} followed by the data types desired, as in the examples in Section~\ref{toplevelmakefile}. The executable files are called \texttt{xlintims}, \texttt{xlintimc}, \texttt{xlintimd}, and \texttt{xlintimz} and are created in \texttt{LAPACK/TIMING}. \item[c)] Go to \texttt{LAPACK/TIMING} and make any necessary modifications to the input files. You may need to set the minimum time a subroutine will be timed to a positive value, or to restrict the size of the tests if you are using a computer with performance in between that of a workstation and that of a supercomputer. The computational requirements can be cut in half by using only one value of LDA. If it is necessary to also reduce the matrix sizes or the values of the blocksize, corresponding changes should be made to the BLAS input files (see Section~\ref{timeblas}). \item[d)] Run the programs for each data type you are using. For the REAL version, the commands for the small data sets are \begin{list}{}{} \item{} \texttt{xlintims < stime\_small.in > stime\_small.out } \item{} \texttt{xlintims < sband\_small.in > sband\_small.out } \item{} \texttt{xlintims < stime2\_small.in > stime2\_small.out } \end{list} or the commands for the large data sets are \begin{list}{}{} \item{} \texttt{xlintims < stime\_large.in > stime\_large.out } \item{} \texttt{xlintims < sband\_large.in > sband\_large.out } \item{} \texttt{xlintims < stime2\_large.in > stime2\_large.out } \end{list} \noindent Similar commands should be used for the other data types. \end{itemize} \subsubsection{Timing the BLAS}\label{timeblas} The linear equation timing program is also used to time the BLAS. Three input files are provided in each data type for timing the Level 2 and 3 BLAS. These input files time the BLAS using the matrix shapes encountered in the LAPACK routines, and we will use the results to analyze the performance of the LAPACK routines. For the REAL version, the small data files are \texttt{sblasa\_small.in}, \texttt{sblasb\_small.in}, and \texttt{sblasc\_small.in} and the large data files are \texttt{sblasa\_large.in}, \texttt{sblasb\_large.in}, and \texttt{sblasc\_large.in}. There are three sets of inputs because there are three parameters in the Level 3 BLAS, M, N, and K, and in most applications one of these parameters is small (on the order of the blocksize) while the other two are large (on the order of the matrix size). In \texttt{sblasa\_small.in}, M and N are large but K is small, while in \texttt{sblasb\_small.in} the small parameter is M, and in \texttt{sblasc\_small.in} the small parameter is N. The Level 2 BLAS are timed only in the first data set, where K is also used as the bandwidth for the banded routines. \begin{itemize} \item[a)] Go to \texttt{LAPACK/TIMING} and make any necessary modifications to the input files. You may need to set the minimum time a subroutine will be timed to a positive value. If you modified the values of N or NB in Section~\ref{timelin}, set M, N, and K accordingly. The large parameters among M, N, and K should be the same as the matrix sizes used in timing the linear equation routines, and the small parameter should be the same as the blocksizes used in timing the linear equation routines. If necessary, the large data set can be simplified by using only one value of LDA. \item[b)] Run the programs for each data type you are using. For the REAL version, the commands for the small data sets are \begin{list}{}{} \item{} \texttt{xlintims < sblasa\_small.in > sblasa\_small.out } \item{} \texttt{xlintims < sblasb\_small.in > sblasb\_small.out } \item{} \texttt{xlintims < sblasc\_small.in > sblasc\_small.out } \end{list} or the commands for the large data sets are \begin{list}{}{} \item{} \texttt{xlintims < sblasa\_large.in > sblasa\_large.out } \item{} \texttt{xlintims < sblasb\_large.in > sblasb\_large.out } \item{} \texttt{xlintims < sblasc\_large.in > sblasc\_large.out } \end{list} \noindent Similar commands should be used for the other data types. \end{itemize} \subsubsection{Timing the Eigensystem Routines}\label{timeeig} The eigensystem timing program is found in \texttt{LAPACK/TIMING/EIG} and the input files are in \texttt{LAPACK/TIMING}. Four input files are provided in each data type for timing the eigensystem routines, one for the generalized nonsymmetric eigenvalue problem, one for the nonsymmetric eigenvalue problem, one for the symmetric and generalized symmetric eigenvalue problem, and one for the singular value decomposition. For the REAL version, the small data sets are called \texttt{sgeptim\_small.in}, \texttt{sneptim\_small.in}, \texttt{sseptim\_small.in}, and \texttt{ssvdtim\_small.in}, respectively. and the large data sets are called \texttt{sgeptim\_large.in}, \texttt{sneptim\_large.in}, \texttt{sseptim\_large.in}, and \texttt{ssvdtim\_large.in}. Each of the four input files reads a different set of parameters, and the format of the input is indicated by a 3-character code on the first line. The timing program for eigenvalue/singular value routines accumulates the operation count as the routines are executing using special instrumented versions of the LAPACK routines. The first step in compiling the timing program is therefore to make a library of the instrumented routines. \begin{itemize} \item[a)] \begin{sloppypar} To make a library of the instrumented LAPACK routines, first go to \texttt{LAPACK/TIMING/EIG/EIGSRC} and type \texttt{make} followed by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. The library of instrumented code is created in \texttt{LAPACK/TIMING/EIG/eigsrc.a}. \end{sloppypar} \item[b)] To make the eigensystem timing programs, go to \texttt{LAPACK/TIMING/EIG} and type \texttt{make} followed by the data types desired, as in the examples of Section~\ref{toplevelmakefile}. The executable files are called \texttt{xeigtims}, \texttt{xeigtimc}, \texttt{xeigtimd}, and \texttt{xeigtimz} and are created in \texttt{LAPACK/TIMING}. \item[c)] Go to \texttt{LAPACK/TIMING} and make any necessary modifications to the input files. You may need to set the minimum time a subroutine will be timed to a positive value, or to restrict the number of tests if you are using a computer with performance in between that of a workstation and that of a supercomputer. Instead of decreasing the matrix dimensions to reduce the time, it would be better to reduce the number of matrix types to be timed, since the performance varies more with the matrix size than with the type. For example, for the nonsymmetric eigenvalue routines, you could use only one matrix of type 4 instead of four matrices of types 1, 3, 4, and 6. Refer to LAPACK Working Note 41~\cite{WN41} for further details. % See Section~\ref{moretiming} for further details. \item[d)] Run the programs for each data type you are using. For the REAL version, the commands for the small data sets are \begin{list}{}{} \item{} \texttt{xeigtims < sgeptim\_small.in > sgeptim\_small.out } \item{} \texttt{xeigtims < sneptim\_small.in > sneptim\_small.out } \item{} \texttt{xeigtims < sseptim\_small.in > sseptim\_small.out } \item{} \texttt{xeigtims < ssvdtim\_small.in > ssvdtim\_small.out } \end{list} or the commands for the large data sets are \begin{list}{}{} \item{} \texttt{xeigtims < sgeptim\_large.in > sgeptim\_large.out } \item{} \texttt{xeigtims < sneptim\_large.in > sneptim\_large.out } \item{} \texttt{xeigtims < sseptim\_large.in > sseptim\_large.out } \item{} \texttt{xeigtims < ssvdtim\_large.in > ssvdtim\_large.out } \end{list} \noindent Similar commands should be used for the other data types. \end{itemize} \subsection{Send the Results to Tennessee}\label{sendresults} Congratulations! You have now finished installing, testing, and timing LAPACK. If you encountered failures in any phase of the testing or timing process, please consult our \texttt{release\_notes} file on netlib. \begin{quote} \url{http://www.netlib.org/lapack/release\_notes} \end{quote} This file contains machine-dependent installation clues which hopefully will alleviate your difficulties or at least let you know that other users have had similar difficulties on that machine. If there is not an entry for your machine or the suggestions do not fix your problem, please feel free to contact the authors at \begin{list}{}{} \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}. \end{list} Tell us the type of machine on which the tests were run, the version of the operating system, the compiler and compiler options that were used, and details of the BLAS library or libraries that you used. You should also include a copy of the output file in which the failure occurs. We would like to keep our \texttt{release\_notes} file as up-to-date as possible. Therefore, if you do not see an entry for your machine, please contact us with your testing results. Comments and suggestions are also welcome. We encourage you to make the LAPACK library available to your users and provide us with feedback from their experiences. %This release of LAPACK is not guaranteed to be compatible %with any previous test release. \subsection{Get support}\label{getsupport} First, take a look at the complete installation manual in the LAPACK Working Note 41~\cite{WN41}. if you still cannot solve your problem, you have 2 ways to go: \begin{itemize} \item either send a post in the LAPACK forum \begin{quote} \url{http://icl.cs.utk.edu/lapack-forum} \end{quote} \item or send an email to the LAPACK mailing list: \begin{list}{}{} \item \href{mailto:lapack@cs.utk.edu}{\texttt{lapack@cs.utk.edu}}. \end{list} \end{itemize} \section*{Acknowledgments} Ed Anderson and Susan Blackford contributed to previous versions of this report. \appendix \chapter{Caveats}\label{appendixd} In this appendix we list a few of the machine-specific difficulties we have encountered in our own experience with LAPACK. A more detailed list of machine-dependent problems, bugs, and compiler errors encountered in the LAPACK installation process is maintained on \emph{netlib}. \begin{quote} \url{http://www.netlib.org/lapack/release\_notes} \end{quote} We assume the user has installed the machine-specific routines correctly and that the Level 1, 2 and 3 BLAS test programs have run successfully, so we do not list any warnings associated with those routines. \section{\texttt{LAPACK/make.inc}} All machine-specific parameters are specified in the file \texttt{LAPACK/make.inc}. The first line of this \texttt{make.inc} file is: \begin{quote} SHELL = /bin/sh \end{quote} and will need to be modified to \texttt{SHELL = /sbin/sh} if you are installing LAPACK on an SGI architecture. \section{ETIME} On HPPA architectures, the compiler and linker flag \texttt{+U77} should be included to access the function \texttt{ETIME}. \section{ILAENV and IEEE-754 compliance} %By default, ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) assumes an IEEE and IEEE-754 %compliant architecture, and thus sets (\texttt{ILAENV=1}) for (\texttt{ISPEC=10}) %and (\texttt{ISPEC=11}) settings in ILAENV. % %If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, %as this test inside ILAENV will crash! As some new routines in LAPACK rely on IEEE-754 compliance, two settings (\texttt{ISPEC=10} and \texttt{ISPEC=11}) have been added to ILAENV (\texttt{LAPACK/SRC/ilaenv.f}) to denote IEEE-754 compliance for NaN and infinity arithmetic, respectively. By default, ILAENV assumes an IEEE machine, and does a test for IEEE-754 compliance. \textbf{NOTE: If you are installing LAPACK on a non-IEEE machine, you MUST modify ILAENV, as this test inside ILAENV will crash!} Thus, for non-IEEE machines, the user must hard-code the setting of (\texttt{ILAENV=0}) for (\texttt{ISPEC=10} and \texttt{ISPEC=11}) in the version of \texttt{LAPACK/SRC/ilaenv.f} to be put in his library. For further details, refer to section~\ref{testieee}. Be aware that some IEEE compilers by default do not enforce IEEE-754 compliance, and a compiler flag must be explicitly set by the user. On SGIs for example, you must set the \texttt{-OPT:IEEE\_NaN\_inf=ON} compiler flag to enable IEEE-754 compliance. And lastly, the test inside ILAENV to detect IEEE-754 compliance, will result in IEEE exceptions for ``Divide by Zero'' and ``Invalid Operation''. Thus, if the user is installing on a machine that issues IEEE exception warning messages (like a Sun SPARCstation), the user can disregard these messages. To avoid these messages, the user can hard-code the values inside ILAENV as explained in section~\ref{testieee}. \section{Lack of \texttt{/tmp} space} If \texttt{/tmp} space is small (i.e., less than approximately 16 MB) on your architecture, you may run out of space when compiling. There are a few possible solutions to this problem. \begin{enumerate} \item You can ask your system administrator to increase the size of the \texttt{/tmp} partition. \item You can change the environment variable \texttt{TMPDIR} to point to your home directory for temporary space. E.g., \begin{quote} \texttt{setenv TMPDIR /home/userid/} \end{quote} where \texttt{/home/userid/} is the user's home directory. \item If your archive command has an \texttt{l} option, you can change the archive command to \texttt{ar crl} so that the archive command will only place temporary files in the current working directory rather than in the default temporary directory /tmp. \end{enumerate} \section{BLAS} If you suspect a BLAS-related problem and you are linking with an optimized version of the BLAS, we would strongly suggest as a first step that you link to the Fortran~77 version of the suspected BLAS routine and see if the error has disappeared. We have included test programs for the Level 1 BLAS. Users should therefore beware of a common problem in machine-specific implementations of xNRM2, the function to compute the 2-norm of a vector. The Fortran version of xNRM2 avoids underflow or overflow by scaling intermediate results, but some library versions of xNRM2 are not so careful about scaling. If xNRM2 is implemented without scaling intermediate results, some of the LAPACK test ratios may be unusually high, or a floating point exception may occur in the problems scaled near underflow or overflow. The solution to these problems is to link the Fortran version of xNRM2 with the test program. \emph{On some CRAY architectures, the Fortran77 version of xNRM2 should be used.} \section{Optimization} If a large numbers of test failures occur for a specific matrix type or operation, it could be that there is an optimization problem with your compiler. Thus, the user could try reducing the level of optimization or eliminating optimization entirely for those routines to see if the failures disappear when you rerun the tests. %LAPACK is written in Fortran 77. Prospective users with only a %Fortran 66 compiler will not be able to use this package. \section{Compiling testing/timing drivers} The testing and timing main programs (xCHKAA, xCHKEE, xTIMAA, and xTIMEE) allocate large amounts of local variables. Therefore, it is vitally important that the user know if his compiler by default allocates local variables statically or on the stack. It is not uncommon for those compilers which place local variables on the stack to cause a stack overflow at runtime in the testing or timing process. The user then has two options: increase your stack size, or force all local variables to be allocated statically. On HPPA architectures, the compiler and linker flag \texttt{-K} should be used when compiling these testing and timing main programs to avoid such a stack overflow. I.e., set \texttt{FFLAGS\_DRV = -K} in the \texttt{LAPACK/make.inc} file. For similar reasons, on SGI architectures, the compiler and linker flag \texttt{-static} should be used. I.e., set \texttt{FFLAGS\_DRV = -static} in the \texttt{LAPACK/make.inc} file. \section{IEEE arithmetic} Some of our test matrices are scaled near overflow or underflow, but on the Crays, problems with the arithmetic near overflow and underflow forced us to scale by only the square root of overflow and underflow. The LAPACK auxiliary routine SLABAD (or DLABAD) is called to take the square root of underflow and overflow in cases where it could cause difficulties. We assume we are on a Cray if $ \log_{10} (\mathrm{overflow})$ is greater than 2000 and take the square root of underflow and overflow in this case. The test in SLABAD is as follows: \begin{verbatim} IF( LOG10( LARGE ).GT.2000. ) THEN SMALL = SQRT( SMALL ) LARGE = SQRT( LARGE ) END IF \end{verbatim} Users of other machines with similar restrictions on the effective range of usable numbers may have to modify this test so that the square roots are done on their machine as well. \emph{Usually on HPPA architectures, a similar restriction in SLABAD should be enforced for all testing involving complex arithmetic.} SLABAD is located in \texttt{LAPACK/SRC}. For machines which have a narrow exponent range or lack gradual underflow (DEC VAXes for example), it is not uncommon to experience failures in sec.out and/or dec.out with SLAQTR/DLAQTR or DTRSYL. The failures in SLAQTR/DLAQTR and DTRSYL occur with test problems which are very badly scaled when the norm of the solution is very close to the underflow threshold (or even underflows to zero). We believe that these failures could probably be avoided by an even greater degree of care in scaling, but we did not want to delay the release of LAPACK any further. These tests pass successfully on most other machines. An example failure in dec.out on a MicroVAX II looks like the following: \begin{verbatim} Tests of the Nonsymmetric eigenproblem condition estimation routines DLALN2, DLASY2, DLANV2, DLAEXC, DTRSYL, DTREXC, DTRSNA, DTRSEN, DLAQTR Relative machine precision (EPS) = 0.277556D-16 Safe minimum (SFMIN) = 0.587747D-38 Routines pass computational tests if test ratio is less than 20.00 DEC routines passed the tests of the error exits ( 35 tests done) Error in DTRSYL: RMAX = 0.155D+07 LMAX = 5323 NINFO= 1600 KNT= 27648 Error in DLAQTR: RMAX = 0.344D+04 LMAX = 15792 NINFO= 26720 KNT= 45000 \end{verbatim} \section{Timing programs} In the eigensystem timing program, calls are made to the LINPACK and EISPACK equivalents of the LAPACK routines to allow a direct comparison of performance measures. In some cases we have increased the minimum number of iterations in the LINPACK and EISPACK routines to allow them to converge for our test problems, but even this may not be enough. One goal of the LAPACK project is to improve the convergence properties of these routines, so error messages in the output file indicating that a LINPACK or EISPACK routine did not converge should not be regarded with alarm. In the eigensystem timing program, we have equivalenced some work arrays and then passed them to a subroutine, where both arrays are modified. This is a violation of the Fortran~77 standard, which says ``if a subprogram reference causes a dummy argument in the referenced subprogram to become associated with another dummy argument in the referenced subprogram, neither dummy argument may become defined during execution of the subprogram.'' \footnote{ ANSI X3.9-1978, sec. 15.9.3.6} If this causes any difficulties, the equivalence can be commented out as explained in the comments for the main eigensystem timing programs. %\section*{MACHINE-SPECIFIC DIFFICULTIES} %Some IBM compilers do not recognize DBLE as a generic function as used %in LAPACK. The software tools we use to convert from single precision %to double precision convert REAL(C) and AIMAG(C), where C is COMPLEX, %to DBLE(Z) and DIMAG(Z), where Z is COMPLEX*16, but %IBM compilers use DREAL(Z) and DIMAG(Z) to take the real and %imaginary parts of a double complex number. %IBM users can fix this problem by changing DBLE to DREAL when the %argument of DBLE is COMPLEX*16. % %IBM compilers do not permit the data type COMPLEX*16 in a FUNCTION %subprogram definition. The data type on the first line of the %function subprogram must be changed from COMPLEX*16 to DOUBLE COMPLEX %for the following functions: % %\begin{tabbing} %\dent ZLATMOO \= from the test matrix generator library \kill %\dent ZBEG \> from the Level 2 BLAS test program \\ %\dent ZBEG \> from the Level 3 BLAS test program \\ %\dent ZLADIV \> from the LAPACK library \\ %\dent ZLARND \> from the test matrix generator library \\ %\dent ZLATM2 \> from the test matrix generator library \\ %\dent ZLATM3 \> from the test matrix generator library %\end{tabbing} %The functions ZDOTC and ZDOTU from the Level 1 BLAS are already %declared DOUBLE COMPLEX. If that doesn't work, try the declaration %COMPLEX FUNCTION*16. \newpage \addcontentsline{toc}{section}{Bibliography} \begin{thebibliography}{9} \bibitem{LUG} E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, \textit{LAPACK Users' Guide}, Second Edition, {SIAM}, Philadelphia, PA, 1995. \bibitem{WN16} E. Anderson and J. Dongarra, \textit{LAPACK Working Note 16: Results from the Initial Release of LAPACK}, University of Tennessee, CS-89-89, November 1989. \bibitem{WN41} E. Anderson, J. Dongarra, and S. Ostrouchov, \textit{LAPACK Working Note 41: Installation Guide for LAPACK}, University of Tennessee, CS-92-151, February 1992 (revised June 1999). \bibitem{WN5} C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, and D. Sorensen, \textit{LAPACK Working Note \#5: Provisional Contents}, Argonne National Laboratory, ANL-88-38, September 1988. \bibitem{WN13} Z. Bai, J. Demmel, and A. McKenney, \textit{LAPACK Working Note \#13: On the Conditioning of the Nonsymmetric Eigenvalue Problem: Theory and Software}, University of Tennessee, CS-89-86, October 1989. \bibitem{XBLAS} X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, and D. J. Yoo, \textit{Design, implementation and testing of extended and mixed precision BLAS}, \textit{ACM Trans. Math. Soft.}, 28, 2:152--205, June 2002. \bibitem{BLAS3} J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling, ``A Set of Level 3 Basic Linear Algebra Subprograms,'' \textit{ACM Trans. Math. Soft.}, 16, 1:1-17, March 1990 %Argonne National Laboratory, ANL-MCS-P88-1, August 1988. \bibitem{BLAS3-test} J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling, ``A Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test Programs,'' \textit{ACM Trans. Math. Soft.}, 16, 1:18-28, March 1990 %Argonne National Laboratory, ANL-MCS-TM-119, June 1988. \bibitem{BLAS2} J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, ``An Extended Set of Fortran Basic Linear Algebra Subprograms,'' \textit{ACM Trans. Math. Soft.}, 14, 1:1-17, March 1988. \bibitem{BLAS2-test} J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, ``An Extended Set of Fortran Basic Linear Algebra Subprograms: Model Implementation and Test Programs,'' \textit{ACM Trans. Math. Soft.}, 14, 1:18-32, March 1988. \bibitem{BLAS1} C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, ``Basic Linear Algebra Subprograms for Fortran Usage,'' \textit{ACM Trans. Math. Soft.}, 5, 3:308-323, September 1979. \end{thebibliography} \end{document}