Functional Programming on .NET part 2

7 12 2007

So most who read this aught to know what a prime number is, a number whose only factors are one and itself. This post is not on primes but does use them in a simple program which uses a simple algorithm to factor any given number. Now, many of you should know the difficulty in factoring numbers and how much of current cryptographic techniques rely on how hard the problem is. The program I wrote was written as a benchmark between the languages F# and Nemerle. Primarily as an “expressioning” benchmark and then as an aside a note on speed.

The task was to write a program which takes an unbounded positive integer and outputs a list that is the decomposition of the number into its prime factors. I decided to leverage laziness to allow me to write a flexible and short implementation. As I will get to later neither of these languages support laziness in the true sense and hence corecursion (in the sense of the dual of recursion) was unavailable, disallowing techniques such as a lazy sieve. Firstly the theory.

Theory

Given some number n, there exists primes (p_1)(p_2)...(p_k) such that n = (p_1)(p_2)...(p_k) and the product (p_1)(p_2)...(p_k) is unique to n. This is known as the Fundamental Theorem of Arithmetic or the unique factorization theorem. I will leave the proof as an excercise to the reader. A way to get to this is to divide n by its smallest factor > 1. Store the factor and repeat on the quotient until we get to 1. Why the smallest factor? Because the smallest factor of n > 1 must be prime. Why?

Suppose that this was false and smallest factor or least divisor of n, say a, was composite. Then we would have that there exists some b and some c such that a = b * c and that b > 1 and b < a (we are working with positive integers so if b = 1 then c = a, remember that b and c < a). So b also divides n (consider n = a * d = b * c * d). But we stated earlier that a was the smallest divisor of n so by contradiction a must be prime.

So as an example we take n = 18. we have a = 9 and d = 2. 3 divides 9 so it must divide 18 and 18 / 3 = 3 * 2. Now to continue.

Suppose we keep n = 18. We know 3 is its least divisor and its prime. so p1 = 3. We do 18/3 = 6.
Next we find least divisor of 6 which is 2. so p2 = 2. Now we do 6/2 = 3. 3 is prime so 18 = 3 * 2 * 3.

So it seems to do this we need to know how to find the least divisor for a given number. Well we know that the least divisor of a number must always be prime. So if given n, to find ld of n we start from 2, check if n mod 2 = 0 and work up till we find a prime that divides n. So for 35 we’d go through 2, 3, and hit 5 as its least divisor. A way to stop unnecessary checks is to leverage the fact that if an integer n > 1 is not divisible by any primes < = \sqrt{n} then n is prime.

Let n = (p)(a). Let us suppose that p is the least divisor of n, we know it must be prime. Now suppose p > \sqrt{n} then a > \sqrt{n} since p <= a (remember p is the smallest divisor). Then we have that n = p \cdot a > (\sqrt{n})(\sqrt{n}) = n which is a contradiction since it says n > n. So at least one of p or a must be less than \sqrt{n}. We know p <= a so we know that at least p < \sqrt{n}. This proves our proposition. Note also that since p \le a, p^2 \le p \cdot a = n. We use that fact to lessen our checks.

The basic idea then is to define an infinite list of all natural numbers and the primes as those numbers which fail some primality test filtered out. We define that tests as any number which is its own least divisor. I will not explain the code in depth – there is plenty of documentation on both languages – this is mostly to show how easy it is to go from reasoning to implementation.

Implementation.

Below is the F# code which implements this functionality:

open System

module Primes_f = 
    let (+:) a alList = LazyList.cons a alList

    let Z3 : bigint LazyList = LazyList.unfold (fun x -> Some(x , x + 1I)) 3I 

    let rec least_denominator_byfactor l n = 
        match l with 
            LazyList.Cons(p,ps) -> if n % p = 0I then 
                                        p
                                   elif p * p > n then
                                        n
                                   else
                                        least_denominator_byfactor ps n  
                       
    let rec prime = function
        n when n  failwith "Not positive Int"
        | n when n = 1I -> false
        | n when n > 1I -> least_denominator n = n
        | _ -> failwith "?"
        
    and primes =  2I +: (LazyList.filter prime Z3)

    and least_denominator n = least_denominator_byfactor primes n
    
    let rec factors = function
        n when n  failwith "Not positive Int"
        | n when n = 1I -> []
        | n when n > 1I -> let p = least_denominator n  
                           p :: (factors (n / p)) 
                           
        | _ -> failwith "?"


To display:

    let dtm = DateTime.Now
    print_any ("Time is: " ^ dtm.ToString() )
    
    //99999988888776655423101   //let z = factors 999988710I//999998888877665542310I    
    let rnd = Random()                                        
    
    let rec Generate tx l = 
        match tx with
            0 -> l
            | _ -> let z = bigint.FromInt32 (rnd.Next(1000000,int.MaxValue))    
                   Generate (tx - 1) ( ( factors z )::l )
                                        
    let list_of_primefactors =  Generate 10000 []//factors 9876543210I  //3 mins 100,000 -  20sec 10,000 .05
    let dtm2 = DateTime.Now 
    let d = dtm2 - dtm
  
    let primes1 = List.filter (fun l -> if List.length l = 1 then true else false) (list_of_primefactors)
    
    let probprimes = System.Convert.ToDouble( (List.length primes1)) / 1000.0
    
    
    List.map (fun c -> print_any c ; System.Console.WriteLine ("") ) primes1   
    print_any probprimes
    //print_any list_of_primefactors      
       
    System.Console.Write ("\n\nTime is: " ^ dtm2.ToString() ^ " Took: " ^ d.ToString() ^ "\n" )  
    System.Console.ReadLine ()


The F# code was pretty straightforward to implement. The language remains quite functional and I had a lazy list data structure which allowed semi-corecursion via unfold. Now the Nemerle code. Because of its more object-centricism (not a critique) I had to approach how I arranged the code a bit differently.

  
module Number
    private least_denominator_byfactor(l : Primes,  n : decimal) : Decimal 
        def p =  l.prime
        
        if((p==3) && (n%2 == 0))            
            2m
        else    
            if(n % p == 0)  
                p
            else if(p * p > n)
                n
            else
                least_denominator_byfactor(l.next,n)          
            
    public least_denominator(n : Decimal) : Decimal 
        least_denominator_byfactor(Primes(3), n ) 
        
    public IsPrime(k: Decimal) : bool
        | n when n  throw ex("Not positive")
        | n when n == 1 => false
        | n when n == 2 => true
        | n when n % 2.0m == 0 => false
        | n when n > 2 => (n == least_denominator(n))
   
    public next_prime (v : Decimal) : Decimal
        def ip = IsPrime(v + 1)
        match(ip) 
            | true => v + 1
            | false => next_prime(v + 1)
        
    public factors(number : Decimal) : list[Decimal]
            | n when n  throw ex("Not Positive")
            | n when n == 1 => []
            | n when n > 1 => def p = least_denominator(n) ; p :: factors(n/p)          

class Primes
    public prime : Decimal
    public next : LazyValue [Primes]
            
    public this (p : Decimal) 
        prime = p
        next = lazy (Primes (Number.next_prime(p)))


To display:

  
module Program
    Main() : void
    
        def dtm = DateTime.Now 
        WriteLine($"Time now is: $dtm") 
                  // 9998876655423101
        //Test Nums: 99998877665542310
        def rnd = Random()
        
        def Generate(tx, l = [])
            match(tx)
                | 0 => l 
                | _ => Generate(tx - 1, Number.factors( rnd.Next(1000000, int.MaxValue) )::l) 
            
        def list_of_primefactors = Generate(152)//Number.factors(9766427) 
       
        def dtm2 = DateTime.Now       
        
        foreach(pfactors in list_of_primefactors)
            foreach(num in pfactors)
               Write($"$num | ") 
            WriteLine("")       */
        
        WriteLine($"Time Now is: $dtm2, Took $(dtm2 - dtm)")
        ReadLine()

While for example in the F# code I had only functions and general functionality I had to package the code into appropriate classes in nemerle and to have behaviour sort of baked into those data types (for example I could not unfold into a generic list of primes, I had to build a next mechanism into the lazy prime class). The nemerle implementation (algorithim not language) is quite a bit less inefficient. But this is because I did not really redress the design to go with objects. For example in F# I could afford mutual recursion since F# has light weight <fast> functions and has the caveat that the lazy list caches its data.

In Nemerle the Primality check in IsPrime requires more and more objects to be created until the test is passed (note the circularity wher ld looks for a next prime which in turn calls ld to find if its prime) so it bogs down quite quickly. Indeed for small numbers (less than 10 digits) the performance is quite close – implying that the actual languages are not far apart in speed. Nonetheless the rate of increase in time usage for Nemerle is much steeper (by a factor of ~14 before change but dropping to near F#’s after for single iterations) than for F# due to the object thrashing I was doing. For example by merely injecting the following code

private ldf(k : decimal,n : decimal) : decimal
        match(k) 
            | m when n % m == 0 => m
            | m when m * m > n => n
            | _ => ldf(k + 1, n) 

private ld(n : decimal) : decimal
        ldf(2,n)  

and replacing the appropriate function in IsPrime, factoring 99998877665542310 went from taking 25 seconds to 3 seconds. A similar injection to F# increased the speed from 2 seconds to 8. F# certainly benefits from the caching in its Lazy list implementation. Both languages are eagerly evaluated except in small localized places. In effect the evaluation is more delayed than it is lazy in the true sense that one would find in Haskell. Thus for example this:

let rec sieve function = LazyList.Cons(z, zs) -> z +: sieve (LazyList.filter (fun n -> (n % z) <> 0I) zs

would fail to terminate when given sieve Z3, while similar code in Haskell knows where to stop evaluation.

Speed

As Noted earlier the F# code was fairly fast (helped, I suspect because of it caching, thus larger numbers benefited from previous computations and in having to do less work. One can really tell over multiple iterations where past iterations aid future ones. For example it took 3 minutes to factor 100,000 numbers between, 1,000,000 and 2^32. While it takes 15 secs (down from ~ 60 secs) in Nemerle to factor 120 nums, it takes F# 3 secs. In the next post I will post on Visual Studio integration, some graphs and data and improvements in the arrangement of code and primality testing for Nemerle.

One note is that F# sometimes generates confusing error messages. This error was due to using an int in place of a bigint i.e. n % 0 vs n % 0I. Error 1 Type error in the use of the overloaded operator ‘( % )’. No operator ‘%’ (full name ‘op_Modulus’) supported by the type ‘bigint’ matches the argument types. See also C:\projects\prime\prime\prime.fs(12,23)-(12,24). See also . Note the nice type system at work =)

As well, sometimes when you dont define your pattern matching cases properly it fails to build yet gives no error. Still some ways to go before being commercial. But hope it doesnt sell out when it does.

Interestingly .NET had no built in arbitrary precision integer data type (had because they just put it in for 3.5). F# does but Nemerle had to use decimal. Note though that to change I but have to rename type annotations. The code is general. Excluding the modules and classes Nemerle code and F# are very much on par.





Functional Programming on .NET – Part 1

5 12 2007

So it seems that functional programming is the new hype. This could be a good thing but it could also be a sad/bad thing. Looking at the last major hype – object oriented programming and seeing what was lost from Smalltalk to C++ one can only hope that although concepts are brought in section by section they are done faithfully rather than as washed down ugly caricatures.

At the moment there are only 2 real offerings for functional programming on .NET and those are F# and Nemerle. There are other languages:

Scala is capable of targeting IL but I do not know its interop capabilities since it is primarily a consumer of the JVM. There is the new initiative with IronScheme but it is too early yet. As well there is the language L# but it is meant as a lightweight embeddable lisp like scripts eater. Finally there is Hugs98.NET which does not target IL but can still consume .NET code and present Haskell code for consumption. Nonetheless the lack of built in interop and the requirement to (semi)manually wrap all .NET classes to a format understood by Haskell is cumbersome. As well due to the vastly different mindsets of both platforms trying to get them to work along side each other is un-natural – like trying to get Orcs and Elves to work together.

So for now the only real contenders are F# and Nemerle. So what is it like to use each? I will not compare since that would be futile but I will note differences and such.

Firstly I will note that these languages are constrained by the platform under the hood, they are not really functional but merely compiling to .NET objects, arrays and similar. But that is not the problem (i.e. at the end of the day haskell still compiles to a bunch of gotos and alters CPU state), functional programming is more a mentality than anything. However since the functional methodologies are at the moment still not first class paradigms, to interop properly with the platform certain departures must be made from pure functional concepts in favour of a more object friendly representation. Detracting from the advantages offered by functional languages like Haskell and ML. I also suspect that the underlying “dynamic” objects (e.g. availability of reflection <- good thing?, late binding, type casting >_<) nature of the platform makes implementing a proper [static] type system a bit more of an art than necessary. Nonetheless these are all solvable issues to be addressed by Microsoft and the language designers. But at the moment there is still a fair jump down in expressive power from abstraction when going from Haskell to either of these languages.

If you go to the main Nemerle site you it will seem to be dead. One needs to look at the mailing lists to realize that the language is still being actively developed. The only detracting issue is that the language is not the focus of the developers even though it is still moving forward (amusingly, the project seems to also be very eastern with mostly eastern europe and Japan being active users). The language is very meta-powerful, allowing one to extend is capabilities using macros. Some of the things that have been done include a join calculus for concurrency, design by contract type verification support and multi-stage programming support. As a language it differs from F# as it is a bit more object oriented in the traditional sense. This I suspect may be related to why its type inference is inferior to F#’s, requiring quite a bit more type annotations. Both F# and Nemerle are on the same level of abstraction (however that would be judged) with support for succinct lightweight syntax and whitespace based formatting. Nemerle also has support for traditional ugly curly brace syntax to help bring over C# programmers. Indeed Nemerle is capable of looking very much like C# or very much like ML to suite the programmer’s tastes, making it a good candidate to act as a bridging language to functional programming.

The languages are fairly equi-capable, with the main differences between the languages being the more object oriented mentality of Nemerle (thus distracting from the functional paradigms and making its type system arguably less elegant than F#’s) and the stronger meta system of Nemerle. Finally F# is also more mature, with it being a research focus of a group of people, and production ready – it has recently been branded as an official product. Hence you are more likely to hit bugs and stumble on language inconsistencies in Nemerle than in F#. Nonetheless Nemerle is not an inferior language. It has a slightly different mentality and is an interesting combination of ideas that whose full implementation will make it a very interesting language. I feel it to be a mix of scheme, haskell and err C#.

I have split this post to reduce its size, the next post will be on a silly little lazily evaluated program in both languages I wrote which gives the prime factorization of a number. I will give my very informal speed comparisons and how each language’s mentality affected implementation.