Erlang 中的Module级别热部署 - erlang

群组首页 → 编程语言 → erlang → 知识库 → 入门介绍 → Erlang 中的Module级别热部署

原创作者: AvinDev 阅读:2547次评论:0条更新时间:2011-06-01

Java里面要实现Code Replacement，也就是什么热部署，通常是使用ClassLoader机制。不久前看到了一篇Google的Paper，里面讲解的C++代码热部署更为复杂。
在Erlang里面，实现Code Replacement其实很简单，最方便的方法可以参考《Erlang Reference Manual》的12.3：

-module(m).
-export([loop/0]).

loop() ->
    receive
        code_switch ->
            m:loop();
        Msg ->
            ...
            loop()
    end.

这种简单的HelloWorld例子，不能满足我的疑问。在一个更加复杂的应用里面，比如有多个Process，部分Process的代码更换后，其他没有更新的Process会怎样呢？下面做个试验。
实例代码：codereload.erl

-module(codereload).
-export([main/0, master_loop/2, worker_loop/0]).
-define(VERSION, "0.1").

main() ->
    process_flag(trap_exit, true),
    Pid1 = spawn(?MODULE, worker_loop, []),
    Pid2 = spawn(?MODULE, worker_loop, []),
    spawn(fun() -> register(master, self()), master_loop(Pid1, Pid2) end).


master_loop(Pid1, Pid2) ->
    io:format("Pid1, Pid2 is alive ~p ~p~n",[is_process_alive(Pid1), is_process_alive(Pid2)]),
    receive
	refresh ->
	    io:format("Master code reload~n"),
	    Pid1 ! refresh,
	    Pid2 ! refresh,
	    codereload:master_loop(Pid1, Pid2);
	Any ->
	    io:format("Master ~p receive message: ~p~n", [?VERSION, Any]),
	    Pid1 ! Any,
	    Pid2 ! Any,
	    master_loop(Pid1, Pid2)
    end.

worker_loop() ->
    receive
	refresh ->
	    io:format("Worker code ~p reload~n", [self()]),
	    codereload:worker_loop();
	Any ->
	    io:format("Worker ~p at ~p also receive message: ~p~n", [?VERSION, self(), Any]),
	    worker_loop()
    end.

程序里面有三个进程，一个Master和两个Worker，下面使用分布式进程通信来实现Code Replacement。

使用

引用

erl -sname foo@localhost
erl -sname bar@localhost

启动erlang shell

启动codereload：

引用

(foo@localhost)1> codereload:main().
Pid1, Pid2 is alive true true
<0.39.0>

发一个"HI"过去看看：

引用

(bar@localhost)1> {master, 'foo@localhost'} ! "HI".
"HI"

嗯，收到了

引用

(foo@localhost)2> Master "0.1" receive message: "HI"
(foo@localhost)2> Pid1, Pid2 is alive true true
(foo@localhost)2> Worker "0.1" at <0.37.0> also receive message: "HI"
(foo@localhost)2> Worker "0.1" at <0.38.0> also receive message: "HI"

好了，将源程序改一下：

-define(VERSION, "0.1"). --> -define(VERSION, "0.2").

Pid2 ! refresh, --> %Pid2 ! refresh,

这里改了版本号，还有注释掉发送给Pid2的更新通知

先编译新版本

引用

(foo@localhost)2> c(codereload).
{ok,codereload}

发送更新通知：

引用

(bar@localhost)2> {master, 'foo@localhost'} ! refresh.
refresh

收到了：

引用

(foo@localhost)3> Master code reload
(foo@localhost)3> Worker code <0.37.0> reload
(foo@localhost)3> Worker code <0.38.0> reload
(foo@localhost)3> Pid1, Pid2 is alive true true

这时候master, Pid1, Pid2应该都是v0.2的版本，看看是不是这样：

引用

(bar@localhost)3> {master, 'foo@localhost'} ! "HI".
"HI"

引用

(foo@localhost)3> Master "0.2" receive message: "HI"
(foo@localhost)3> Pid1, Pid2 is alive true true
(foo@localhost)3> Worker "0.2" at <0.37.0> also receive message: "HI"
(foo@localhost)3> Worker "0.2" at <0.38.0> also receive message: "HI"

OK，改成了 "0.2" 了，代码替换成功。事情还没有完，留意上面将 Pid2 ! refresh 给注释掉了么，这时候更新会怎样呢？先修改程序：

-define(VERSION, "0.2"). --> -define(VERSION, "0.3").

%Pid2 ! refresh, --> Pid2 ! refresh,

再次编译：

引用

(foo@localhost)3> c(codereload).
{ok,codereload}

发送更新通知：

引用

(bar@localhost)7> {master, 'foo@localhost'} ! refresh.
refresh

引用

(foo@localhost)4> Master code reload
(foo@localhost)4> Worker code <0.37.0> reload
(foo@localhost)4> Pid1, Pid2 is alive true true

可见，现在只有Pid1收到了更新通知进行了更新，会产生怎样的结果呢：

引用

(bar@localhost)8> {master, 'foo@localhost'} ! "HI".
"HI"

引用

(foo@localhost)4> Master "0.3" receive message: "HI"
(foo@localhost)4> Pid1, Pid2 is alive true true
(foo@localhost)4> Worker "0.3" at <0.37.0> also receive message: "HI"
(foo@localhost)4> Worker "0.2" at <0.38.0> also receive message: "HI"

噢，Pid2还停留在v0.2的代码上，可见各个进程的代码是独立的。

发送更新通知，让Pid2加载v0.3的代码：

引用

(bar@localhost)8> {master, 'foo@localhost'} ! refresh.
refresh

引用

(foo@localhost)4> Master code reload
(foo@localhost)4> Worker code <0.37.0> reload
(foo@localhost)4> Worker code <0.38.0> reload
(foo@localhost)4> Pid1, Pid2 is alive true true

这回Pid2更新到新版本了：

(bar@localhost)9> {master, 'foo@localhost'} ! "HI".
"HI"

引用

看到头晕了么？我也晕了，还有一种情况呢。在上面，

引用

这步，如果不输入

引用

{master, 'foo@localhost'} ! refresh.

而是再次编译：

引用

(foo@localhost)4> c(codereload).
{ok,codereload}

发送个消息过去看看：

(bar@localhost)9> {master, 'foo@localhost'} ! "HI".
"HI"

引用

(foo@localhost)5> Master "0.3" receive message: "HI"
(foo@localhost)5> Pid1, Pid2 is alive true false
(foo@localhost)5> Worker "0.3" at <0.37.0> also receive message: "HI"

噢，怎么回事，is_process_alive(Pid2)返回false了，进程怎么挂了？

《Erlang Reference Manual》的12.3：

引用

The code of a module can exist in two variants in a system: current and old. When a module is loaded into the system for the first time, the code becomes 'current'. If then a new instance of the module is loaded, the code of the previous instance becomes 'old' and the new instance becomes 'current'.

Bot old and current code is valid, and may be evaluated concurrently. Fully qualified function calls always refer to current code. Old code may still be evaluated because of processes lingering in the old code.

If a third instance of the module is loaded, the code server will remove (purge) the old code and any processes lingering in it will be terminated. Then the third instance becomes 'current' and the previously current code becomes 'old'.

可见，Erlang里面，模块的代码只有新旧两个版本，当时Pid1对应于v0.3(Current)，Pid2对应于v0.2(Old)，当进行编译之后，Pid1的代码就是Old，而Pid2就被强制终止了。

对于以守护进程形式启动的Erlang进程，就不能在shell里面直接编译了，如果在外部编译，这个进程是不认的。我想到的一个方法就是使用rpc来调用编译：
rpc:call('foo@localhost', shell_default, c, [codereload]).

对于热部署，Erlang还有一种更强大的形式，就是使用OTP的Release Handling，我也不会，以后再学习了。

p.s：对本文有任何异议，欢迎拍板:)

Erlang 的分布通讯安全策略 | Erlang里的OO和Java里的OO

评论共 0 条请登录后发表评论

发表评论

您还没有登录,请您登录后再发表评论

文章信息

知识库: erlang

由AvinDev在2009-02-03创建
由AvinDev在2011-06-01更新