Functional Programming on .NET part 2

7 12 2007

So most who read this aught to know what a prime number is, a number whose only factors are one and itself. This post is not on primes but does use them in a simple program which uses a simple algorithm to factor any given number. Now, many of you should know the difficulty in factoring numbers and how much of current cryptographic techniques rely on how hard the problem is. The program I wrote was written as a benchmark between the languages F# and Nemerle. Primarily as an “expressioning” benchmark and then as an aside a note on speed.

The task was to write a program which takes an unbounded positive integer and outputs a list that is the decomposition of the number into its prime factors. I decided to leverage laziness to allow me to write a flexible and short implementation. As I will get to later neither of these languages support laziness in the true sense and hence corecursion (in the sense of the dual of recursion) was unavailable, disallowing techniques such as a lazy sieve. Firstly the theory.

Theory

Given some number n, there exists primes (p_1)(p_2)...(p_k) such that n = (p_1)(p_2)...(p_k) and the product (p_1)(p_2)...(p_k) is unique to n. This is known as the Fundamental Theorem of Arithmetic or the unique factorization theorem. I will leave the proof as an excercise to the reader. A way to get to this is to divide n by its smallest factor > 1. Store the factor and repeat on the quotient until we get to 1. Why the smallest factor? Because the smallest factor of n > 1 must be prime. Why?

Suppose that this was false and smallest factor or least divisor of n, say a, was composite. Then we would have that there exists some b and some c such that a = b * c and that b > 1 and b < a (we are working with positive integers so if b = 1 then c = a, remember that b and c < a). So b also divides n (consider n = a * d = b * c * d). But we stated earlier that a was the smallest divisor of n so by contradiction a must be prime.

So as an example we take n = 18. we have a = 9 and d = 2. 3 divides 9 so it must divide 18 and 18 / 3 = 3 * 2. Now to continue.

Suppose we keep n = 18. We know 3 is its least divisor and its prime. so p1 = 3. We do 18/3 = 6.
Next we find least divisor of 6 which is 2. so p2 = 2. Now we do 6/2 = 3. 3 is prime so 18 = 3 * 2 * 3.

So it seems to do this we need to know how to find the least divisor for a given number. Well we know that the least divisor of a number must always be prime. So if given n, to find ld of n we start from 2, check if n mod 2 = 0 and work up till we find a prime that divides n. So for 35 we’d go through 2, 3, and hit 5 as its least divisor. A way to stop unnecessary checks is to leverage the fact that if an integer n > 1 is not divisible by any primes < = \sqrt{n} then n is prime.

Let n = (p)(a). Let us suppose that p is the least divisor of n, we know it must be prime. Now suppose p > \sqrt{n} then a > \sqrt{n} since p <= a (remember p is the smallest divisor). Then we have that n = p \cdot a > (\sqrt{n})(\sqrt{n}) = n which is a contradiction since it says n > n. So at least one of p or a must be less than \sqrt{n}. We know p <= a so we know that at least p < \sqrt{n}. This proves our proposition. Note also that since p \le a, p^2 \le p \cdot a = n. We use that fact to lessen our checks.

The basic idea then is to define an infinite list of all natural numbers and the primes as those numbers which fail some primality test filtered out. We define that tests as any number which is its own least divisor. I will not explain the code in depth – there is plenty of documentation on both languages – this is mostly to show how easy it is to go from reasoning to implementation.

Implementation.

Below is the F# code which implements this functionality:

open System

module Primes_f = 
    let (+:) a alList = LazyList.cons a alList

    let Z3 : bigint LazyList = LazyList.unfold (fun x -> Some(x , x + 1I)) 3I 

    let rec least_denominator_byfactor l n = 
        match l with 
            LazyList.Cons(p,ps) -> if n % p = 0I then 
                                        p
                                   elif p * p > n then
                                        n
                                   else
                                        least_denominator_byfactor ps n  
                       
    let rec prime = function
        n when n  failwith "Not positive Int"
        | n when n = 1I -> false
        | n when n > 1I -> least_denominator n = n
        | _ -> failwith "?"
        
    and primes =  2I +: (LazyList.filter prime Z3)

    and least_denominator n = least_denominator_byfactor primes n
    
    let rec factors = function
        n when n  failwith "Not positive Int"
        | n when n = 1I -> []
        | n when n > 1I -> let p = least_denominator n  
                           p :: (factors (n / p)) 
                           
        | _ -> failwith "?"


To display:

    let dtm = DateTime.Now
    print_any ("Time is: " ^ dtm.ToString() )
    
    //99999988888776655423101   //let z = factors 999988710I//999998888877665542310I    
    let rnd = Random()                                        
    
    let rec Generate tx l = 
        match tx with
            0 -> l
            | _ -> let z = bigint.FromInt32 (rnd.Next(1000000,int.MaxValue))    
                   Generate (tx - 1) ( ( factors z )::l )
                                        
    let list_of_primefactors =  Generate 10000 []//factors 9876543210I  //3 mins 100,000 -  20sec 10,000 .05
    let dtm2 = DateTime.Now 
    let d = dtm2 - dtm
  
    let primes1 = List.filter (fun l -> if List.length l = 1 then true else false) (list_of_primefactors)
    
    let probprimes = System.Convert.ToDouble( (List.length primes1)) / 1000.0
    
    
    List.map (fun c -> print_any c ; System.Console.WriteLine ("") ) primes1   
    print_any probprimes
    //print_any list_of_primefactors      
       
    System.Console.Write ("\n\nTime is: " ^ dtm2.ToString() ^ " Took: " ^ d.ToString() ^ "\n" )  
    System.Console.ReadLine ()


The F# code was pretty straightforward to implement. The language remains quite functional and I had a lazy list data structure which allowed semi-corecursion via unfold. Now the Nemerle code. Because of its more object-centricism (not a critique) I had to approach how I arranged the code a bit differently.

  
module Number
    private least_denominator_byfactor(l : Primes,  n : decimal) : Decimal 
        def p =  l.prime
        
        if((p==3) && (n%2 == 0))            
            2m
        else    
            if(n % p == 0)  
                p
            else if(p * p > n)
                n
            else
                least_denominator_byfactor(l.next,n)          
            
    public least_denominator(n : Decimal) : Decimal 
        least_denominator_byfactor(Primes(3), n ) 
        
    public IsPrime(k: Decimal) : bool
        | n when n  throw ex("Not positive")
        | n when n == 1 => false
        | n when n == 2 => true
        | n when n % 2.0m == 0 => false
        | n when n > 2 => (n == least_denominator(n))
   
    public next_prime (v : Decimal) : Decimal
        def ip = IsPrime(v + 1)
        match(ip) 
            | true => v + 1
            | false => next_prime(v + 1)
        
    public factors(number : Decimal) : list[Decimal]
            | n when n  throw ex("Not Positive")
            | n when n == 1 => []
            | n when n > 1 => def p = least_denominator(n) ; p :: factors(n/p)          

class Primes
    public prime : Decimal
    public next : LazyValue [Primes]
            
    public this (p : Decimal) 
        prime = p
        next = lazy (Primes (Number.next_prime(p)))


To display:

  
module Program
    Main() : void
    
        def dtm = DateTime.Now 
        WriteLine($"Time now is: $dtm") 
                  // 9998876655423101
        //Test Nums: 99998877665542310
        def rnd = Random()
        
        def Generate(tx, l = [])
            match(tx)
                | 0 => l 
                | _ => Generate(tx - 1, Number.factors( rnd.Next(1000000, int.MaxValue) )::l) 
            
        def list_of_primefactors = Generate(152)//Number.factors(9766427) 
       
        def dtm2 = DateTime.Now       
        
        foreach(pfactors in list_of_primefactors)
            foreach(num in pfactors)
               Write($"$num | ") 
            WriteLine("")       */
        
        WriteLine($"Time Now is: $dtm2, Took $(dtm2 - dtm)")
        ReadLine()

While for example in the F# code I had only functions and general functionality I had to package the code into appropriate classes in nemerle and to have behaviour sort of baked into those data types (for example I could not unfold into a generic list of primes, I had to build a next mechanism into the lazy prime class). The nemerle implementation (algorithim not language) is quite a bit less inefficient. But this is because I did not really redress the design to go with objects. For example in F# I could afford mutual recursion since F# has light weight <fast> functions and has the caveat that the lazy list caches its data.

In Nemerle the Primality check in IsPrime requires more and more objects to be created until the test is passed (note the circularity wher ld looks for a next prime which in turn calls ld to find if its prime) so it bogs down quite quickly. Indeed for small numbers (less than 10 digits) the performance is quite close – implying that the actual languages are not far apart in speed. Nonetheless the rate of increase in time usage for Nemerle is much steeper (by a factor of ~14 before change but dropping to near F#’s after for single iterations) than for F# due to the object thrashing I was doing. For example by merely injecting the following code

private ldf(k : decimal,n : decimal) : decimal
        match(k) 
            | m when n % m == 0 => m
            | m when m * m > n => n
            | _ => ldf(k + 1, n) 

private ld(n : decimal) : decimal
        ldf(2,n)  

and replacing the appropriate function in IsPrime, factoring 99998877665542310 went from taking 25 seconds to 3 seconds. A similar injection to F# increased the speed from 2 seconds to 8. F# certainly benefits from the caching in its Lazy list implementation. Both languages are eagerly evaluated except in small localized places. In effect the evaluation is more delayed than it is lazy in the true sense that one would find in Haskell. Thus for example this:

let rec sieve function = LazyList.Cons(z, zs) -> z +: sieve (LazyList.filter (fun n -> (n % z) <> 0I) zs

would fail to terminate when given sieve Z3, while similar code in Haskell knows where to stop evaluation.

Speed

As Noted earlier the F# code was fairly fast (helped, I suspect because of it caching, thus larger numbers benefited from previous computations and in having to do less work. One can really tell over multiple iterations where past iterations aid future ones. For example it took 3 minutes to factor 100,000 numbers between, 1,000,000 and 2^32. While it takes 15 secs (down from ~ 60 secs) in Nemerle to factor 120 nums, it takes F# 3 secs. In the next post I will post on Visual Studio integration, some graphs and data and improvements in the arrangement of code and primality testing for Nemerle.

One note is that F# sometimes generates confusing error messages. This error was due to using an int in place of a bigint i.e. n % 0 vs n % 0I. Error 1 Type error in the use of the overloaded operator ‘( % )’. No operator ‘%’ (full name ‘op_Modulus’) supported by the type ‘bigint’ matches the argument types. See also C:\projects\prime\prime\prime.fs(12,23)-(12,24). See also . Note the nice type system at work =)

As well, sometimes when you dont define your pattern matching cases properly it fails to build yet gives no error. Still some ways to go before being commercial. But hope it doesnt sell out when it does.

Interestingly .NET had no built in arbitrary precision integer data type (had because they just put it in for 3.5). F# does but Nemerle had to use decimal. Note though that to change I but have to rename type annotations. The code is general. Excluding the modules and classes Nemerle code and F# are very much on par.