Wanted: Fast FEA Solvers...

Submitted by Ajit R. Jadhav on Fri, 05/14/2010 - 14:18

Summary:

I am thinking of informally conducting a specific case-study concerning the FEA solvers. The reference problem is a very simple but typical problem from stress analysis, leading of course to the linear systems: Ax = b and Ax = Lx.

I seek advice as to what software libraries currently available in the public domain would be best to use---the ones that would be fastest in terms of execution time for the reference problem.

I have a personal and longer-term research interest with certain issues related to the solvers technologies.

Suggestions and comments are welcome!

(1.) The Reference Problem:

(1.1) Consider a homogeneous thin rectangular plate made of MS, say of the size 200 mm X 100 mm, with a thickness of, say, 1 mm.

For the initial requirement, the plate carries no hole, though a small 60 mm dia. hole at the center might be introduced later on, during a separate phase of this study.

(1.2) For static analysis, the plate is loaded with a uniform traction acting on the two shorter sides of the plate, whereas the longer sides are kept free. For modal frequency analysis, the plate is considered clamped on all the four sides.

(1.3) Simple, standard finite elements are to be used: (a) CST and LST for the static analysis, and (b) DKT flat-shell element for the modal analysis.

(1.4) The domain is to be meshed using high-quality irregular triangles, the smallest allowed angle being ~34 degrees as in Shewchuk's Triangle library [^] or Niceno's EasyMesh [^].

To obtain a medium-fine mesh, the triangle side may be restricted to < 5 mm. This choice leads to about 2,500 triangles, 1,200 corner nodes, and 4,000 edges---i.e. about 1,200 nodes for CST analysis and 5,200 nodes for LST analysis.

However, if the upper bound on the triangle side is halved (< 2.5 mm), then we obtain a very fine mesh of about 10,000 triangles, 5,000 corner nodes, and 15,000 edges---i.e. about 5,000 nodes for CST and 20,000 nodes for LST.

Note that these numbers refer to the geometry nodes. In the FE model, each such a node would carry several DOFs.

(1.5) The linear systems resulting after the FE-discretization are to be solved for both static and modal analyses.

(2.) The Software/Hardware to be Used:

(2.1) The linear system is to be solved using C/C++ callable and fairly well-tested open-source libraries (libraries of the kind: LAPACK, ARPACK, Taucs, etc.).

(2.2) The library itself might have been written in FORTRAN; the only requirement is that compiled binaries and C/C++ wrappers should be readily available.

(2.3) Dependencies on open-source libraries/platforms such as GoToBlas, Boost, MTL, etc. are OK.

(2.4) Assume this (lower-end) software-hardware platform: A single 32-bit desktop PC, Intel Core2 Duo @ ~3 Ghz main clock, 1 MB L2 cache, 2 GB of RAM. Assume the OS to be Windows 2K/XP.

(2.5) The compiler of preference is VC++ 6. However, other free compilers like VC++ Express Edition 2008 can be considered. Also, I am open to using GCC or other compilers, with or without their CMake, MinGW requirements etc.

(2.6) The sequential mode execution is assumed. No parallel processing, whether using shared memory, clusters (MPI), or GPUs. For the same reason, it's OK if the solver library is not parallel processing-enabled, and does not take advantage of an additional core. Thus, for this study, it is OK even if the total CPU usage on a double-core machine doesn't exceed 50%.

(2.7) All the solver operations are expected to occur in-core (not out-of-core).

(2.8) Assume that all mathematical operations would be peformed in double precision (8 bytes).

(3.) What Is Being Sought:

(3.1) Considering the above requirements, please suggest the libraries and methods that might provide the highest performance (the least execution time) for the following categories of solvers:

-- direct solver for static analysis (Ax = b)

-- iterative solver for static analysis (Ax = b)

-- direct solver for eigenvalues computations (Ax = Lx)

-- iterative solver for eigenvalues computations (Ax = Lx)

For iterative solvers, assume the usual kind of convergence requirements (error norms).

(3.2) The total execution time is to be measured (a) from the tick that the reading of all the disk files containing all the input matrices to RAM is complete, (b) to the tick that the solution is first fully ready in RAM, waiting to be written to the output disk files.

(3.3) Please provide any additional information like the assumption of a specific pre-conditioner, the reason why you recommend a particular algorithm for this type of problem, etc.

(3.4) Not very important right now, but any side suggestions you might have for nonsymmetric A matrices would also be welcome.

(3.5) A general point of reference for this query is this URL:

http://www.netlib.org/utk/people/JackDongarra/la-sw.html

(4.) Why This Study:

The purpose is something like this. I have some preliminary ideas concerning solvers.

I would like to test my ideas against the available state of the art/cutting-edge solver implementations, in the context of the above kind of applications---viz. that the K matrix wouldn't be tridiagonal but would be banded SPD, having a topology implied by the above category of problems.

It's easily possible that my ideas may not work out. I wish to put them to the testing ground anyway. (I really am just at a very preliminary stage.)

(5.) Your Suggestions/References:

Well thought-out comments/suggestions w.r.t the point (3.1) are sought.

Since I am not affiliated to any institution having e-Journals access, in case you provide links to research papers, I would greatly appreciate if you could also send e-copies to me by email: aj175tp[ at ]yahoo[ dot ]co[ dot ]in.

Thanks in advance!

--Ajit

PS: Posted also at my blog here [^].

[E&OE]

Mario Juha

Wanted: Fast FEA Solvers...

I do not know what you mean by FEA solvers, but anyway, I can recommend to you SuperLU library for solving directly large sparse matrices. Also, you could try iterative solvers, there are many available, specially based on Preconditioned Conjugate Gradient Method.

I have written a library in C++ (compiled using GNU compilers for GNU/Linux) that use LAPACK, ATLAS and SuperLU to manipulate general matrices, squarematrices and sparse matrices. Unfortunately, the one is in my homepage is out-of-date and it is strongly not recommended for your applications.

cordially,

Mario

Sat, 05/15/2010 - 14:49 Permalink

Teng zhang

Another useful saprse web site

Hi Ajit,

Here is another website for collection of sparse solvers, mainly developed by Prof. Davis in ufl. The code is writted in C.

http://www.cise.ufl.edu/research/sparse/

The algorithm have been implemented in MATLAB as a default sparse solver.

I am currently seaking fast sparse solver too, thanks for posting this discussion.

Teng Zhang

Sat, 05/15/2010 - 15:17 Permalink

Alejandro Orti…

Re: Wanted: Fast FEA solvers

Dear Ajit,

I think (in a global view) MUMPS is the best sparse solver available. With respect to TAUCS, MUMPS can handle more types of matrices: symmetric positive definite, symmetric indefinite and unsymmetric. The only problem is that it is written in fortran. However, you can make it to work under C. Just look at this site. The advantage with respect to UMFPACK is that MUMPS supports METIS. If you want to pursue this research I recommend this site also. It has many interesting things that can help you.

Good luck!

Sat, 05/15/2010 - 16:52 Permalink

Ajit R. Jadhav

Replies to comments on Fast FEA Solvers

Mario, Teng, Alejandro,

0. Thank you all for replying and providing useful pointers.

1. Mario, I know the term "FEA solver" is awkward. But if one mentions only Ax = b and Ax = Lx, then it becomes a too general statement.

What I wanted to highlight was that the A matrix here (i.e. the K and/or M matrices) would be square, sparse, generally banded, and also: symmetric, positive, definite. Also important is the kind of sparsity pattern or the topological interrelation existing between the non-zero elements. The method of generating matrices directly determines matrix characteristics, including topology.

The matrix characteristics, in turn, decide the speed of solution. There are solvers that could conceivably be faster for other types of matrices such as the dense, indefinite, or even non-square ones---the matrices arising in domains such as financial applications, graph theory, etc. However, it is easily possible that these solvers would turn out to be slower for FEA-produced matrices. Just as an example, consider not only the memory cost but also the increased execution cost merely for memory access (repeated cache-faults), if a dense solver is used as is for a large sparse system. Similarly, for other matrix characteristics (including the suitability to a particular type of a preconditioner).

In a small way, I also wanted to distinguish the FEM matrices from the FVM-generated CFD matrices. Most characteristics are similar, but not quite all.

These things matter when you are trying to extract that final bit performance from the machine.

2. Since posting my query, I have located a good survey article by Gould, Hu and Scott:

ftp://ftp.numerical.rl.ac.uk/pub/reports/ghsRAL200505.pdf

However, this article does not include SuperLU in comparisons. It also doesn't separatively give any data specific to FEA applications. The authors have clubbed all the Matrix Market samples together. So, as far as FEA solvers go, the article is useful only in an indicative or general sense.

3. About SuperLU. I suppose FEAP uses it (though I have not tried either, so far). I would like if any one has some comparative data w.r.t. SuperLU vs. other solvers used in FEAP: SGI, UMFPACK and esp., WSMP.

4. If any one has direct experience using Pardiso for FEA applications, I would appreciate knowing more about it. Gould's study indicates that Pardiso is an unusually strong contender.

5. Teng, I guess I will give CSparse a try.

6. Alejandro, thanks for pointing out those useful sites. And yes, the METIS advantage would be important, as you point out.

7. One reminder. Also eigenvalue computations are important to me. Has anyone here tried Anasazi? I mean, I know that Rich himself blogs here at iMechanica, but I really wanted to see if there was any real alternative to what he/his group has written, viz., (P)ARPACK and Anasazi.

Thinking on these lines, I have no idea about SLEPc at all. Has anyone tried it? Also PETSc?

8. To recap now:

-- Is there any data comparing PETSc with MUMPS/Taucs/SuperLU/CSparse, and SLEPc with ARPACK/Anasazi? Esp. for FEM produced data?

-- Is there any more recent survey on the lines of Gould et al.'s abovementioned article?

Thanks in advance!

--Ajit

- - - - -

[E&OE]

Sun, 05/16/2010 - 07:45 Permalink

Teng zhang

Dear Ajit, Csparse is not for very large problem

Dear Ajit,

Csparse is not for very large problem, the code is used to illustrate the idea of direct solver. CHOLMOD is faster and also tends to use less memory.

I also noted the platform PETSc recently. Here is an application of PETSc in fuse model for fracture:

P. V. V. Nukala et al. Fracture in three-dimensional random fuse model: recent advances through high-performance computing. Journal of Computer-Aided Materials Design. 2008

Hope this helpful. I also look forward to the experience in use of PETSc.

Teng

Sun, 05/16/2010 - 13:16 Permalink

Ajit R. Jadhav

Reply to Teng

Hi Teng,

Thanks for alerting me to the limitation of CSparse. Also see my general comment below.

About Nakula's paper. Do they mention any performance data? If yes, and if you have an eprint, could you please email me: aj175tp [at] yahoo [dot] co [dot] in ? Thanks in advance.

--Ajit

- - - - -

[E&OE]

Mon, 05/17/2010 - 15:38 Permalink

Jafar

CVF Fast FEA Solvers

Dear Ajit,

I recommend to you compaq visual fortran (CVF) numerical library (Start-> programs-> CVF->IMLS fortran 90 MP library help.pdf). In chapter 1 various FEA solvers is explained. You can use this library with command "use numerical_libraries" in the beginning of your fortran code.

J. Amani

Sun, 05/16/2010 - 14:15 Permalink

Ajit R. Jadhav

Reply to Jafar

Dear Jafar,

[Controls laughter] I can follow your advice only up to Start -> programs.

--Ajit

- - - - -

[E&OE]

Mon, 05/17/2010 - 15:40 Permalink

Ajit R. Jadhav

Conclusions for the time being (re. fast FEA solvers)

I think I will first try Petsc primarily because it also gives a
route to eigenvalue computations via Slepc... Otherwise, if it were only Ax = b, Taucs would have been a comfortable choice because I
have already succeeded compiling and running it using VC6.

For the same reason, apart from Petsc, there seems to be no alternative to Trilinios-Anasazi. I have downloaded the bits (48 MB expand to ~200 MB and I am drowning in the build instructions).

Once these are done, sometime in future, then I would look at other alternatives.

But, as I said, if eigenvalues also are to be computed, then I think the above two choices (Petsc and Trilinios) seem in order, even though I think I am going to have a hard time getting them all to compile and run on a Windows machine.

One reminder: Do drop a note if there is a specific performance data given for the above (Petsc and Trilinios). Also, if you spot or care to share any tips for building/running them on the Windows platform. Thanks in advance.

Bye for now; guess I would be offline for a few days, and may be will check back on the next weekend.

--Ajit

- - - - -

[E&OE]

Mon, 05/17/2010 - 15:50 Permalink

Ajit R. Jadhav

FEA Solvers: Update (May 22, 2010)

Just an update.

1. Teng and Jafar sent me some relevant documents/papers by email. Thanks!!

2. As to libraries like Trilinos-Anasazi and Petsc, I am about to give up the idea of using them because of my inability to deal with every required nuance of command-line and Unix-only environments so brazenly and arrogantly presumed in the documentation, the unnecessary interference arising out of the Unix-sympathetic languages like Python, and, overall, the distinctly emanating stink of government orgs. These could very fit very nicely in the former USSR.

3. I have located one more useful survey article by Claire Mouton here: http://verdandi.gforge.inria.fr/doc/linear_algebra_libraries.pdf. It doesn't carry any performance data, but just look at the quality of those summaries---they tell precisely what someone like me would be on the look out for, e.g., about existence of eigencomputations in the limitations section.

4. After going through Claire's article, I find that the Eigen library seems pretty close to what I had in mind.

Also, I wish to mention TNT (v1.26) and JAMA (v 1.25).

Any reports about reliability and performance of either? I mean Jama would be expected to be slow, but how slow, as compared to, say, ARPACK for eigenvalue computations?

5. Any word about Seldon? Another library seemingly close...

6. Any word about MTL2? Another library seemingly close...

7. I am sure no American worth his salt (esp. a government-employed American) is going to address any of these questions simply because I raised them, but that doesn't mean that I am going to help them carry on their games. I distinctly remember all the follow-up I have ever suffered---whether scientific or other kind, also including the times I was writing my PhD papers.... Sometimes---though not always---the mere act of exposing is enough.

- - - - -

[E&OE]

Sat, 05/22/2010 - 15:58 Permalink