Node.js From the Enterprise Java Perspective

Node.js currently is getting much attention because it uses a concurrency model that shows great promise in scalability: event-driven asynchronous Input/Output. This model can handle thousands of concurrent user-requests and do that with a tiny memory footprint, things that cannot be done with the traditional multi-threaded concurrency model of Enterprise Java. This article explains this new approach from the viewpoint of an Enterprise Java developer.

Node.js Basics

Node.js is a JavaScript server-side framework that sits on top of Google’s V8 JavaScript engine. The V8 engine allows JavaScript to be executed from the console instead of in the browser. Node.js provides the operations that are needed on a server: protocol support such as HTTP, DNS and TCP, as well as disk, sockets, and database operations (some through 3rd party modules).

The HelloWorld example on the Node.js website is a complete HTTP Server in six lines of code. It answers an HTTP request with “HelloWorld”. This example tells a lot about Node.js: It works at a much low-level than a typical enterprise developer normally does. Enterprise application developers write applications on top of an application server, they don’t write them on top of an HTTP server, and they certainly don’t write an HTTP server. Even though frameworks on top of Node.js are written at a breakneck speed they are many layers away from the things we call frameworks in the JEE world.

In that respect one could easily dismiss Node.js, but there is another thing that is very intriguing: It scales to proportions unheard of by enterprise developers. It scales so well because it doesn’t use multi-threading for concurrency, but uses event-driven asynchronous Input/Output. Let’s compare the Enterprise Java way with the Node.js way to see where that leaves the enterprise developer.

The Enterprise Java Way of Concurrency

Support for multiple users in Enterprise Java applications is provided by the application server in the form of multi-threading. Each user-request is mapped to a thread, which takes care of responding to the user-request. The application developer writes his application as if only one user was using it (with a sprinkle of thread safety). The server takes care of the rest by running your code in multiple threads and mapping user-requests to threads. What works for one user suddenly also works for thousands of users. This is a great advantage: Not much to do for the developer to accommodate a large number of users.

There are also disadvantages to this multi-threading concurrency model:

  • There is a computing overhead in constantly switching threads.
  • There is a linear relationship between the number of threads and the memory used, because at minimum each thread gets a stack allocated to him. A stack uses memory.

The disadvantages only really show at a very large number of threads. Traditional JEE applications before Ajax and Comet (server-side push events) needed to manage maybe 100 concurrent threads (not concurrent sessions, this is often a much larger number). Modern applications that rely heavily on Ajax – which increases the number of user-requests – and Comet – which drastically increases the time a user-request blocks a single thread – need a much higher number of concurrent threads. At numbers like 10000 concurrent threads the disadvantages of multi-threading do start to show.

The Node.js Way of Concurrency

Node.js believes a new way of building web applications is necessary, one that accommodates more user-requests and longer-running user-requests by providing better CPU and memory utilisation. Node.js addresses these requirements with asynchronous event-driven non-blocking I/O: it uses a single thread that handles all user-requests, but outsources all I/O operations so they do not block the tread.

I/O operations are all those operations that access resources outside the CPU and the memory: disk operations, database operations, communication via the internet, etc. I/O operations are by far the slowest operations the computer can do, and that is the reason they are handled outside of the main thread. They are started from the main thread, run asynchronously, and send an event to the main thread when they completed. With this model Node.js can handle thousands of concurrent user-requests easily with very little memory.

Do we Need a new Concurrency Model in Enterprise Java?

Node.js is in a totally different space than Enterprise Java, so in general it is comparing apples to oranges. Aside from that you could raise the question whether JEE should consider the concurrency model of Node.js if it scales better than multi-threading.

To answer that question you first have to ask yourself: Do I have a problem with my enterprise applications in terms of concurrency? If you are writing anything but the most modern Ajax/Comet applications and do not have a very large user base the answer most likely be no (If it is yes but you don’t have a large user base you should check your application first for common anti-patterns such as the I-keep-half-of-the-database-in-the-user-session pattern). There are only few applications that need this high concurrency at the moment. This changes a little bit as applications use Ajax and Comet together extensively, but on top of these two technologies you also need a very large user base to get to 10000 concurrent user-requests. This is a similar conclusion Ant Kutschera drew from his experiments.

Second, asynchronous event-driven frameworks do exists in the Java world. Examples the jboss.org project Netty and Apache MINA, witch are both based on the Java NIO (New I/O) API. According to this test done by Bruno Fernandez-Ruiz the NIO API performs very well against Node.js. Problem is that these frameworks are outside of the JEE specifications. In the Enterprise Java world it is the JEE specification that lays out the way applications deal with concurrency, and so far they are sticking to the multi-threaded model. If you want to be JEE conform you have to use multi-threading for concurrency.

Third, the current JEE 6 specification addresses the need to break the one-user-request-one-thread paradigm. Servlet 3.0 introduces the ability for asynchronous processing of requests, so that a thread may be returned to the server if it is just waiting. In that respect the JEE 6 specification is in step with current trends, so if the need arises for high concurrency the Enterprise Java developer can use a JEE 6-conform application server to do the heavy lifting.

My answer to this question therefore is that chances are you don’t have a concurrency problem in the first place. If you do then you can look at Java options around the NIO API, but beware that you are not writing JEE applications anymore. If you want to stay within the JEE space you can use the asynchronous possibilities of the current JEE 6 specification. Since the JEE 6 specification is relatively new it remains to be seen whether this will be fast enough, but I suspect it will.

About Marc Fasel

Marc is a Senior Consultant with Shine Technologies. He has written code in 19 programming languages, but can only speak two natural languages. He enjoys referring to himself in the third person.
This entry was posted in Java, Javascript, Node.js. Bookmark the permalink.

27 Responses to Node.js From the Enterprise Java Perspective

  1. Matt Sergeant says:

    Note that almost nobody building web apps on node are writing their own HTTP servers. They are using a framework like express.

    The problem with other languages’ implementations of NIO is that those languages’ libraries all tend to assume blocking I/O, typically these are database access libraries, but also things like memcached and so on. What node brings to the table is that *everything* assumes non-blocking I/O, so you don’t have to have a hybrid model to cope with using those libraries. That’s the big advantage.

  2. henk says:

    First of all, it’s Java EE, not JEE ;)

    Secondly, I don’t really understand the point of your article. First you go out of your way to explain how Java EE is strongly tied to the 1 request 1 thread model, and ask if needs to adopt to the “new” concurrency model as implemented by Node.js. But then you say, oh, but actually Java EE already has it.

    To me it reads like you first wrote a sort of rant against Java, and then with your article nearly finished you discovered Java EE already has asynchronous request handling, and sort of tagged that on to the end. Maybe that wasn’t your intention, but that’s a little how it comes across now. Otherwise it’s a nice article ;)

  3. Marc Fasel says:

    Henk,

    I structured the article chronologically according to my personal development. I have been an Enterprise Java developer for the past 10 years, so I am deeply ingrained in the Java EE world. When I got wind of node.js and this “new” concurrency model I thought “if this scales so much better, why isn’t the Java community on top of this?” 
    I did research to understand what the Java EE concurrency model really entails, and what the node.js concurrency model entails to realise, that it is not just something under the hood that you can change to suddenly make your application work with asynchronous, event-driven I/O. 
    Your application has to be built from the ground up to this new concurrency model (I will compare synchronous and asynchronous programming patterns I found on the way in an upcoming article). Switching the concurrency model is short of abandoning Java EE altogether.
    I put the pieces back together to conclude that right now you probably don’t have a concurrency problem anyway, but if you do you have to abandon JEE. Short of that you can try going with Java EE 6, which sticks to multi-threading, but allows the server to recycle threads. Whether this is enough remains to be seen.
    Shine Technologies is currently working on a large Node.js project, and soon we will be able to draw some conclusions from working with this new technology. I am looking forward to that!

  4. Nathan Appere says:

    I am surprised to hear that Java EE still uses a multithreaded approach since it has been known for a decade that it can not scale well IO wise (or at least not on its own). The way Nodejs handles IO concurrency internally is nothing new, great implementations of reactor / proactor patterns have been out there for a long time.
    The real interest is the low level approach (you design web services and not only http applications) coupled with a high level language which also runs client-side!

  5. Marc Fasel says:

    @Matt pondered on your comment a little longer, because I needed to do some research to answer it.
    1) We currently use Express on top of Node.js to develop a web-application just like you were saying, so I understand your argument. I wanted to describe Node.js by itself, so other Enterprise Java developers can understand at what level Node.js is located in comparison to the level a Java EE developer actually writes software.
    2) The acronym NIO (“New I/O”) is a Java-specific API, at least that’s what I found in Wikipedia(http://en.wikipedia.org/wiki/New_I/O). I cannot comment on other languages,  because I don’t know enough about other languages. Java NIO provides – among other things – “A multiplexed, non-blocking I/O facility for writing scalable servers.” 
    If I look at Netty and Mina – which are built on top of NIO – they provide similar functionality to Node.js: Asynchronous I/O for protocols such as TCP, HTTP, HTTPS/TLS, DNS, and file I/O. In that regard they go the same route as Node.js. 
    My understanding is, though, that Java people use these frameworks only to write applications that need very high concurrency, and those are not web applications. For web applications Java people stick to application servers using multi-threading, because it work very well and is easy to use.
    I do recognise the effort made by the community around Node.js to provide non-blocking I/O around Node.js, like database drivers. In that sense the Node.js base model can be used to create more high-level frameworks to build web applications, which are layers above what Node.js provides.
    It is my guess that the reason Netty and Mina have not gotten that much attention for writing web applications, and the reason why Java people don’t try to write web applications with those frameworks, is the fact that multi-threading works well for almost all web-applications. I have not worked for huge sites like Google or Facebook, but I have seen some pretty big web-sites running perfectly well on standard Java EE multi-threading.
    It may now be the time that this changes, but this is partially due to the fact That the HTTP protocol is used for things it wasn’t designed to do: Long-running request-response cycles introduced by Comet. At that point the weaknesses of multi-threading become apparent. Java EE 6 is trying to address that by changing things under the hood, but it remains to be seen whether this will be enough in the long run.

  6. Koen says:

    @Nathan
    >I am surprised to hear that Java EE still uses a multithreaded approach

    But this isn’t true! Java EE gives you the choice. multi-threaded or asynchronous.

  7. Marc Fasel says:

    @Nathan multi-threading scales well up to hundreds of concurrent requests, which is a lot. This covers 95% of all web applications out there right now. Multi-threading works well on current applications, is well understood and easy to use.
    If your application needs more concurrency that that asynchronous I/O may be a better choice. This comes at a price, though: An increase in code complexity. At this stage I cannot say whether this is just a perceived complexity of old-style programmers that need to learn a new way of thinking, or whether it is really more difficulty to write things asynchronously. Keep in mind that asynchronous programming was there long before multi-threading, and the reason multi-threading was developed is to make life easier for the programmer. One sign that asynchronous programming with call-backs like Node.js may not be the answer to everything is the fact that many people struggle with this, and write frameworks that mimic synchronous programming on top of that (Tim Caswell himself admitted to be author of several of them).

  8. Nathan Appere says:

    @Koen: glad to know!
    @Marc: the “increase in code complexity” is quite exaggerated, the only difference being that instead of doing read() you have to do something like read(&callback) and split the code to move the completion logic in callback(). So I really think it is perceived complexity.
    And I never felt like multithreading was introduced to simplify the life of programmers but rather to increase performance, and au contraire, it often become a mess when you use a language where you have to manage memory yourself.
    To choose a solution that has been known not to be scalable for a long time (Dan Kegel’s C10K was written almost 15years ago) instead of using AIO patterns seems foolish to me, except perhaps when you have the absolute certainty that scalability will never be an issue (which I believe to be the exception when developping network oriented programs).

  9. Marc Fasel says:

    @Nathan: On Friday I spoke to a developer who recently stared with Node.js and came from a Java background, and he also said that you can get used to asynchronous programming quite quickly. As I just started on a Node.js project I am looking forward to validating that it is just “perceived complexity” myself.
    The C10K in my opinion is a bit like asking “Don’t you think it’s time that cars go 500km/h?” Right now we cannot even utilise a car the does 250km/h, so the answer is: No, it is not time. Same with the C10K question “It’s time for web servers to handle ten thousand clients simultaneously, don’t you think?” No, it is not time yet, not even 15 years after the question was stated.
    Java EE is complex and tedious, so it is only good for certain customers: Those who have complex business needs, those who have enough money to pay Java developers, and those for which the development is a long-term investment.
    Typically any customer that deals extensively with money will be a candidate. All banks and insurance companies I know of run all their stuff including their consumer web applications in Java EE. These are some pretty big sites. According to this post (http://highscalability.com/blog/2009/3/31/ebay-history-and-architecture.html) EBay runs their site in Java EE. That to me is an indicator that Java EE scales pretty well, even though there is a natural limit to how many concurrent user-requests a multi-threaded application server can handle.

  10. Marc Fasel says:

    BTW: If you are building the next generation web application using Comet like Plurk have a look at this: http://amix.dk/blog/post/19490. There are good reasons to use something like Node.js for theses types of applications.

  11. lep says:

    @Marc Fasel: “Java EE is complex and tedious, so it is only good for certain customers: Those who have complex business needs, those who have enough money to pay Java developers, and those for which the development is a long-term investment.”

    Weird reasoning. Isn’t it rather because, as the creator of websphere confessed it: “I call it the endgame fallacy. It was too complex for people to master. I overdesigned it.

    Because we were IBM, we survived it, but if we’d been a start-up, we’d have gone to the wall.”

  12. Marc Fasel says:

    @lep: Very interesting quote, I would never have thought that there was one creator of Websphere, but of course there had to be somebody who committed that crime (http://www.bbc.co.uk/news/business-11944966).

    Java EE in my opinion is very suitable for special types of enterprise applications. I do agree, though, that Java EE today is used in many applications that don’t really need it. Those applications could be written in PHP or Rails or anything less tedious than Java EE and would do perfectly fine, be much cheaper to build and to run.

    Websphere is from my experience the pinnacle of the complex and tedious within the Java EE world. After working with Websphere for years going to something more lightweight like JBoss or even better Tomcat felt like programming heaven. Websphere and Java EE has much improved in the past years in terms of useability and turn-around time time to get your code running on your application server, but it it lightyears away from something like Node.js.

  13. Ryan Sharp says:

    When I hear people saying “but most people don’t require this level on concurrency”, I can’t help but think they are utterly backwards. Node scales down just as well as it scales up. Why waste heaps of memory and use mutex locks when it’s already proven to be a poor way of implementing concurrency?

  14. Marc Fasel says:

    @Ryan I think you are on to something there: Big companies think high concurrency means server farms. Nobody expects Facebook or ebay to run on a single machine. A company that has their own power plant to run their server farm is not unusual. Looking into the future one may ask “wouldn’t it be cool to be able to run Facebook on one (or a few) machines?” Yes, that would be way cool!

  15. Correct me if I’m wrong, but asynchronous programming shouldn’t be too terrible for a JEE guy.

    I mean, writing some stuff in Swing (releasing a GUI thread while doing some DB and rules processing the background) or writing a bit of AJAX into a webapp is basically asynchronous programming, right?

    • Marc Fasel says:

      Hi,
      in general I would agree with you that asynchronous programming shouldn’t be too terrible, for a Java EE guy or otherwise. The main difference I see is that while in Swing and with AJAX most of the processing is done synchronously, with a few exceptions, in Node.js everything is done asynchronously, with a few exceptions.
      Asynchronous code is more difficult than synchronous code because of the management of different life cycles. With each new life cycle to manage your code complexity increases. Because of the event-loop model you are forced to go asynchronous on everything, and with each asynchronous call you ad one new life-cycle to manage. While in Swing you may have one main thread and one database thread fetching a number of records and then possibly processing those records before displaying them. Two life-cycles that have to be coordinated.
      In Node.js you would have one main event loop that starts fetching each of the database records asynchronously. For each one returning you would then process it asynchronously. You could even go as far as displaying them asynchronously with AJAX. That is quite a number of life cycles that have to be managed. That to me is more complex than doing things synchronously.

  16. agentgt says:

    I wrote a similar post on my blog: http://adamgent.com/post/10440924094/does-java-have-an-answer-to-node-js . I think one of the biggest challenges that Java has is not its async HTTP handling but its persistence drivers are not asynchronous. Purportedly there are unbelievable performance to be had if your are using asynchronous database connections for Postgresql.

  17. Matías Mirabelli says:

    Another important point that I didn’t read yet is how transactions are handled in an asynchronous environment. So far Node doesn’t support transactions, even out of the box using 3rd party drivers or frameworks. And my answer to “why” is because it is very complex.

    In a one-request-per-thread environment it’s very easy: you start a transaction when the request comes and then you commit or rollback that just before sending the response. Keeping track of each transaction is also easy because it’s bound to the thread.

    In Node every time you perform an IO operation the request lifecycle is forked again and again, so the transaction must be alive until the last forked lifecycle finishes and then we need to read again our books about synchronization patterns. Of course, we don’t have a thread to bound to, so we need a transparent way to tie the connection to the request and put it back to the pool (if any) once finished. I’m thinking in a completely stateless application, I cannot imagine how could it work in a system that use sessions to store information.

    • Marc Fasel says:

      Hi Matias,

      very interesting point! I haven’t worked with SQL database transactions with Node.js, but I never thought it would be a problem. I will certainly keep an eye out for this, as transactions are crucial to most applications.

      • Federico says:

        sequelize.js is an ORM written for node and it has “chain queries” which mimic database transactions with javascript. Do you think that’s enough to fill the need for transactions?

  18. This is a nice article. Thanks!

  19. Danny Baggs says:

    Just wanted to say thanks for the article. I’ve a background in Enterprise Java but am more and more picking up skills in bleeding edge technologies so I found this to be a very useful and balanced article with good discussion in the comments.

    Thanks,

    Dan

  20. Pingback: Thread – Concurrence – Event driven « Alessioma’s Weblog

  21. Juan says:

    Maybe it’s a little late, but I think you should give a try to use Play Framework. http://www.playframework.org

  22. Pingback: Conheça o Tyrus – sua implementação para WebSocket com backend em Node e Java | blog.caelum.com.br

  23. Good article. I will be going through some of these issues
    as well..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s