But if you use SOA, understand the pitfalls.
I learned parallel computing the "academic" way. We had a task to speed up. We created mathematical proofs. We ran experiments. We backed our theories with the results. Then the results were reviewed by professors and peers.
We knew that distributed computing was going to be the next big thing. And it was! But it was taught to the masses in magazines and internet articles. Developers got all the coolness of SOA without seeing the hard parts, or having to work through the theory.
So here is my list of three major things I’ve seen people get bitten by. I came up with this list during my last project. That project handled 2.5 million page views per day, many of which were processed by using SOA. One page required 43 service calls. SOA was our lifeblood and our curse.
Top Three SOA Pitfalls
Distributed Computing is Slower: Conventional wisdom claims the opposite, lots of computers processing instead of one. So while we know it can be faster, the question is why did our app get slower?
Let’s answer that question with another question: How long does it take to make a function call on your local computer? Don’t know? Then run a quick experiment and come back. I’ll wait.
Ok, you’re back. I got about 20 nanoseconds with Java. What did you get?
What about making a distributed call? To do that I have to:
- Convert data to XML
- Pipe it over the wire
- Parse XML
- Convert response to XML
- Pipe it pack over the wire
- Parse XML into the data I want.
Just one network hop usually changes us from measuring time in nanoseconds to measuring time in milliseconds. Quick, go run your own experiment. (Making a network call localhost doesn’t count!)
I got 20 milliseconds. That makes a local call 1,000,000 times faster than a network call. What did you get? I’ll bet it was a lot slower than your local call!
But wait! That’s not slow enough for some people. What if we add another layer of middleware or a message queue? Heck, at my last job we added both. Architects love layers. The best architects are educated at Hogwarts. They draw diagrams and wave their hands at meetings. Then management gets a glazed look on their eyes and magically starts writing checks. Why didn’t I go to that school? (*)
Before SOA was big, I worked at a small company that built a user-customized portal called Agdayta. It showed futures quotes, news tickers, and other investment-type things. We pulled data using all sorts of services. By keeping in mind the expense of these calls, we managed to produce custom portal pages in less than 10 ms. Using Pentium II computers. And full session fail-over. Before any of this was built into Java.
Later I worked at a big company. We had 5 machine clusters, each machine with 8 processor cores. Even though we made less complicated pages, the pages took 50 times longer to render. Why? There were too many services and calls and leaky abstractions.
Distributed computing CAN be faster if the advantage of distribution makes up for the overhead. But you have to do the math and understand the theory.
Distributed computing is less reliable: Wait, isn’t it more reliable?
This is where some background in safety engineering can be good. I recommend learning to draw fault trees. Again, a skill often learned in graduate school. But you can pick it up in no time.
Distributed computing can increase reliability if you know how to assemble the systems correctly. Otherwise you end up with a system with lots of parts; any of them can bring the system down. If a system fails once per year, and you have two systems, expect a failure twice per year. Plus some extra down time for the equipment that hooks them together.
What about fail-over? One server goes down and the other picks up the slack? Great, unless the database dies. Or the service you are calling. Or the network line. Or you deploy the same bug to both servers. Common-mode faults can bring down your backup system just as easily as your primary. Draw your systems with fault-tree diagrams.
Versioning is hard: Often, quite the opposite is sold. For example, a selling point for SOA is that you can make an update in one place, and all systems calling that service will be updated. Perfectly logical.
But wait, you wouldn’t deploy that change to a production system without testing it, would you? So if you update a service, you have to test that update with each and every system that calls it. If a dozen programs use your service, you have to pay a dozen different teams to test your SOA update. And hope that none of them has more important things to do, forcing your update to get put on hold.
It gets worse! Applications using your service, are updating as well. So you can’t test their updated code with your updated code unless you deploy your code at the same time. You must ask them to go through a test with their current production code. This usually means putting their new code on hold.
At the large bank I last worked with, we have quarterly releases. This helped with issues on versioning between front-end applications and services because everyone released at once. There was a big disadvantage through. If an important service did not make a quarterly release date, no one made the quarterly release date. If a service was deployed and did not work, everyone rolled back (or you limped along with a broken system). And with small critical bugs, you still had to test against old code-bases.
Conclusion: I don't dislike SOA. In fact, I think it is a wonderful and critical tool in a developer's arsenal. But there is some kind of perverse thrill that developers get in making remote procedure calls. Some developers create remote calls without proving it will help improve speed and reliability. Some don't know how to prove it. Some go nuts and use it everywhere. And as for manageability, if anyone knows a solution, let me know.
(*) I liked our architects. They were smart. And if I was one, I would have loved making presentations with boxes and arrows.