如何将曲线拟合到直方图分布?

[英]How can I fit a curve to a histogram distribution?


Someone asked me a question via e-mail about integer partitions the other day (as I had released a Perl module, Integer::Partition, to generate them), that I was unable to answer.

前几天有人通过电子邮件向我询问了有关整数分区的问题(因为我已经发布了一个Perl模块,Integer :: Partition来生成它们),我无法回答。

Background: here are all the integer partitions of 7 (the sum of each row equals 7).

背景:这里是7的所有整数分区(每行的总和等于7)。

7
6 1
5 2
5 1 1
4 3
4 2 1
4 1 1 1
3 3 1
3 2 2
3 2 1 1
3 1 1 1 1
2 2 2 1
2 2 1 1 1
2 1 1 1 1 1
1 1 1 1 1 1 1

Now if we look at the lengths of each partition and count how many there are of each length:

现在,如果我们查看每个分区的长度并计算每个长度的数量:

1 1
2 3
3 4
4 3
5 2
6 1
7 1

... we see one partition has a length of 1 (7), one has a length of 7 (1 1 1 1 1 1 1). There are 4 partitions that have a length of 3: (5 1 1), (4 2 1), (3 3 1), (3 2 2).

...我们看到一个分区的长度为1(7),一个分区的长度为7(1 1 1 1 1 1 1)。有4个分区长度为3:(5 1 1),(4 2 1),(3 3 1),(3 2 2)。

For larger numbers of N, if you graph the distribution of partition lengths, an asymetric curve emerges, skewed towards the origin. If you're curious, graph the following partition length counts for N=40.

对于较大数量的N,如果绘制分区长度的分布图,则会出现不对称曲线,偏向原点。如果您感到好奇,请将以下分区长度计算为N = 40。

1 20 133 478 1115 1945 2738 3319 3589 3590 3370 3036 2637 2241 1861 1530 1236 995 790 627 490 385 297 231 176 135 101 77 56 42 30 22 15 11 7 5 3 2 1 1

1 20 133 478 1115 1945 2738 3319 3589 3590 3370 3036 2637 2241 1861 1530 1236 995 790 627 490 385 297 231 176 135 101 77 56 42 30 22 15 11 7 5 3 2 1 1

If you're interested in generating these distribution counts, here's the code I used:

如果您对生成这些分发计数感兴趣,请使用以下代码:

#! /usr/local/bin/perl

use strict;
use warnings;

use Integer::Partition;

my $n = shift || 1;

while (1) {
    my $start = time;
    my $i = Integer::Partition->new($n);
    my %size;
    while (my $p = $i->next) {
        $size{scalar @$p}++;
    }

    open my $out, '>>', "bucket-count.out";
    for my $s (sort {$a <=> $b} keys %size) {
        print $out "$n\t$s\t$size{$s}\n";
    }
    close $out;
    my $delta = time - $start;
    print "$n\t$delta secs\n";
    ++$n;
}

(note: on my computer, N=90 takes about 10 minutes to generate).

(注意:在我的电脑上,N = 90大约需要10分钟才能生成)。

So my question is: what equation can be used to match the observed distribution curve? Is it a Gauss (can a Gaussian distribution be asymetric?) or Poisson distribution, or something else?

所以我的问题是:可以使用哪个方程来匹配观察到的分布曲线?它是高斯(高斯分布可以是不对称的吗?)还是泊松分布,还是别的什么?

How do I solve it for N? If I remember my maths from high-school, I can determine the peak by solving when the derivative intersects 0. How do I produce the derivative? I've searched the web but all I get back are abstruse mathematical papers. I just need some code :)

我如何为N解决它?如果我记得高中时的数学,我可以通过求解导数与0相交来确定峰值。如何产生导数?我在网上搜索过,但我得到的都是深奥的数学论文。我只需要一些代码:)

1 个解决方案

#1


2  

I think a poisson distribution is a reasonable estimate. Given that presumption your problem now turns to one of fnding the maximum frequency, k, given N. I think you have two approaches:

我认为泊松分布是一个合理的估计。考虑到这个推定,你的问题现在转向给出最大频率k,给定N.我认为你有两种方法:

  1. figure it out from a mathematical standpoint (I would start by looking at combinatorics, but that may not be a particularly good steer)
  2. 从数学的角度来看(我会先看看组合学,但这可能不是一个特别好的转向)

  3. presume it is poisson and measure the peak for any given N, as you have above.
  4. 假设它是泊松并测量任何给定N的峰值,如上所述。

Once you have the peak (k), estimating lambda should be straightforward (try a few out) and you have your curve.

一旦你有了峰值(k),估计lambda应该是直截了当的(尝试一些)并且你有你的曲线。

Another approach is to work the whole thing up in python and ask on the numpy or scipy boards :-)

另一种方法是在python中处理整个事情并询问numpy或scipy板:-)

HTH

智能推荐

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2008/10/25/f5fa17a8b0c77fa40c7b94b6edd1e510.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告